TwinRouterBench: Fast Static and Live Dynamic Evaluation for Realistic Agentic LLM Routing
The article introduces TwinRouterBench, a new benchmark for evaluating LLM routing in various applications. It features two tracks: a static track with pre-verified data and a dynamic track for live agent execution. This approach aims to improve cost efficiency while maintaining task success in long-horizon applications.
- ▪TwinRouterBench evaluates LLM routing through a static and a dynamic track.
- ▪The static track includes 970 router-visible prefixes from 520 instances across multiple benchmarks.
- ▪The dynamic track tests routers on a full suite of cases to measure task resolution and API spending.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Machine Learning arXiv:2605.18859 (cs) [Submitted on 14 May 2026] Title:TwinRouterBench: Fast Static and Live Dynamic Evaluation for Realistic Agentic LLM Routing Authors:Pei Yang, Wanyi Chen, Tongyun Yang, Pengbin Feng, Jiarong Xing, Wentao Guo, Yuhang Yao, Yuhang Han, Hanchen Li, Xu Wang, Zeyu Wang, Jie Xiao, Anjie Yang, Liang Tian, Lynn Ai, Eric Yang, Tianyu Shi View a PDF of the paper titled TwinRouterBench: Fast Static and Live Dynamic Evaluation for Realistic Agentic LLM Routing, by Pei Yang and 16 other authors View PDF HTML (experimental) Abstract:LLM routing matters most in long-horizon applications such as coding agents, deep research systems, and computer-use agents, where a single user request triggers many model calls.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.