Zhengkid/AutoTTS: Agentic Discovery for Test-Time Scaling

May 17, 2026 · 2:01 AM UTC ·16 min read · 0 reactions · 0 comments · 15 views

#ai research #language models #test-time scaling #autotuning #machine learning

Zhengkid/AutoTTS: Agentic Discovery for Test-Time Scaling

⚡ TL;DR · AI summary

AutoTTS introduces an agentic approach to test-time scaling in large language models by automating the discovery of inference controllers through a replay-based environment. The method eliminates the need for hand-crafted heuristics and gradient updates, relying instead on a coding agent that iteratively refines code-defined controllers. Evaluated on AIME and HMMT benchmarks, the discovered Confidence Momentum Controller achieves competitive accuracy with significant token savings.

Key facts

▪AutoTTS uses a coding agent to automatically discover inference controllers in a replay environment without LLM calls during evaluation.
▪The discovered Confidence Momentum Controller reduces token usage by ~69.5% compared to SC@64 while maintaining similar accuracy across multiple model scales.
▪The entire discovery process costs an estimated $39.9 and takes 160 minutes, with zero LLM calls during evaluation due to cached replays.
▪Controllers are trained on AIME24 data and generalize well to held-out AIME25 and HMMT25 benchmarks.
▪The system enables fine-grained policy improvements by recording full execution traces and scaling curves for iterative refinement.

Original article

GitHub

Read full at GitHub →

Opening excerpt (first ~120 words) tap to expand

AutoTTS LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling Tong Zheng, Haolin Liu, Chengsong Huang, Huiwen Bao, Sheng Zhang, Rui Liu, Runpeng Dai, Ruibo Chen, Chenxi Liu, Tianyi Xiong, Xidong Wu, Hongming Zhang, Heng Huang UMD · UVA · WUSTL · UNC · Google · Meta Project page AutoTTS reframes TTS strategy design from hand-crafting heuristics to environment-driven automatic search: humans only construct an offline replay environment (states, actions, feedback, objectives), and a coding agent iteratively proposes and refines code-defined controllers within it — code edits, no gradient updates. Cheap: 0 LLM calls, fully replay.

…

Excerpt limited to ~120 words for fair-use compliance. The full article is at GitHub.

Anonymous · no account needed

Discussion

0 comments

Zhengkid/AutoTTS: Agentic Discovery for Test-Time Scaling

Discussion

More from GitHub