PAVO-Bench – 50K voice turns and an 85K-param router for ASR→LLM→TTS
PAVO introduces a pipeline-aware voice orchestration system that uses an 85,041-parameter meta-controller trained in 106 seconds to dynamically route voice assistant tasks between cloud and edge based on real-time demands. Evaluated on 50,000 voice turns, PAVO reduces P95 latency by 10.3%, median latency by 34%, and energy per turn by 71% compared to a fixed-cloud baseline while cutting coherence failures from 7.1% to 0.9%. The system accounts for inter-stage coupling, where upstream ASR quality directly impacts downstream LLM performance, and releases PAVO-Bench, a benchmark with complexity-labeled voice interactions. The approach enables efficient, quality-preserving inference routing in ASR→LLM→TTS pipelines.
Opening excerpt (first ~120 words) tap to expand
PAVO: Pipeline-Aware Voice Orchestration Demand-conditioned inference routing for real-time ASR → LLM → TTS voice pipelines. PAVO treats the voice-assistant pipeline as a jointly optimizable inference graph. An 85,041-parameter meta-controller, trained with multi-objective PPO in 106 seconds, decides per turn whether to route each ASR → LLM → TTS call to a cloud or edge configuration. The empirical contribution is a characterization of inter-stage coupling constraints — quality dependencies where upstream ASR choices bound what downstream LLMs can recover from. Authors: NarasingaMoorthy VeiluKanthaPerumal (University of Pennsylvania) and Mohammed Imthathullah (Google).
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at GitHub.