WeSearch

PAVO-Bench – 50K voice turns and an 85K-param router for ASR→LLM→TTS

·4 min read · 0 reactions · 0 comments · 2 views
#voice orchestration#asr-llm-tts pipeline#inference routing#edge computing#benchmark dataset
PAVO-Bench – 50K voice turns and an 85K-param router for ASR→LLM→TTS
⚡ TL;DR · AI summary

PAVO introduces a pipeline-aware voice orchestration system that uses an 85,041-parameter meta-controller trained in 106 seconds to dynamically route voice assistant tasks between cloud and edge based on real-time demands. Evaluated on 50,000 voice turns, PAVO reduces P95 latency by 10.3%, median latency by 34%, and energy per turn by 71% compared to a fixed-cloud baseline while cutting coherence failures from 7.1% to 0.9%. The system accounts for inter-stage coupling, where upstream ASR quality directly impacts downstream LLM performance, and releases PAVO-Bench, a benchmark with complexity-labeled voice interactions. The approach enables efficient, quality-preserving inference routing in ASR→LLM→TTS pipelines.

Original article
GitHub
Read full at GitHub →
Opening excerpt (first ~120 words) tap to expand

PAVO: Pipeline-Aware Voice Orchestration Demand-conditioned inference routing for real-time ASR → LLM → TTS voice pipelines. PAVO treats the voice-assistant pipeline as a jointly optimizable inference graph. An 85,041-parameter meta-controller, trained with multi-objective PPO in 106 seconds, decides per turn whether to route each ASR → LLM → TTS call to a cloud or edge configuration. The empirical contribution is a characterization of inter-stage coupling constraints — quality dependencies where upstream ASR choices bound what downstream LLMs can recover from. Authors: NarasingaMoorthy VeiluKanthaPerumal (University of Pennsylvania) and Mohammed Imthathullah (Google).

Excerpt limited to ~120 words for fair-use compliance. The full article is at GitHub.

Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Threads WhatsApp Bluesky Mastodon Email

Discussion

0 comments

More from GitHub