Zhengkid/AutoTTS: Agentic Discovery for Test-Time Scaling
AutoTTS introduces an agentic approach to test-time scaling in large language models by automating the discovery of inference controllers through a replay-based environment. The method eliminates the need for hand-crafted heuristics and gradient updates, relying instead on a coding agent that iteratively refines code-defined controllers. Evaluated on AIME and HMMT benchmarks, the discovered Confidence Momentum Controller achieves competitive accuracy with significant token savings.
- ▪AutoTTS uses a coding agent to automatically discover inference controllers in a replay environment without LLM calls during evaluation.
- ▪The discovered Confidence Momentum Controller reduces token usage by ~69.5% compared to SC@64 while maintaining similar accuracy across multiple model scales.
- ▪The entire discovery process costs an estimated $39.9 and takes 160 minutes, with zero LLM calls during evaluation due to cached replays.
- ▪Controllers are trained on AIME24 data and generalize well to held-out AIME25 and HMMT25 benchmarks.
- ▪The system enables fine-grained policy improvements by recording full execution traces and scaling curves for iterative refinement.
Opening excerpt (first ~120 words) tap to expand
AutoTTS LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling Tong Zheng, Haolin Liu, Chengsong Huang, Huiwen Bao, Sheng Zhang, Rui Liu, Runpeng Dai, Ruibo Chen, Chenxi Liu, Tianyi Xiong, Xidong Wu, Hongming Zhang, Heng Huang UMD · UVA · WUSTL · UNC · Google · Meta Project page AutoTTS reframes TTS strategy design from hand-crafting heuristics to environment-driven automatic search: humans only construct an offline replay environment (states, actions, feedback, objectives), and a coding agent iteratively proposes and refines code-defined controllers within it — code edits, no gradient updates. Cheap: 0 LLM calls, fully replay.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at GitHub.