ToolGate: Token-Efficient Pre-Call Control for Tool-Augmented Vision-Language Agents
The paper introduces ToolGate, a system designed to improve the efficiency of tool-augmented vision-language agents. ToolGate addresses the pre-call control problem by predicting whether to execute or skip tool calls, significantly reducing token costs. The results indicate that explicit control over tool outputs enhances the performance of these agents.
- ▪ToolGate reduces token cost to 64-69% of the unrestricted ReAct baseline while maintaining accuracy.
- ▪The baseline agent shows poor local selectivity, with helpful and harmful calls occurring at similar rates.
- ▪With matched-domain trajectory training, ToolGate improves average accuracy by 1.65 points.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Artificial Intelligence arXiv:2606.03054 (cs) [Submitted on 2 Jun 2026] Title:ToolGate: Token-Efficient Pre-Call Control for Tool-Augmented Vision-Language Agents Authors:Anjie Liu, Yan Song, Zhixun Chen, Ziqin Gong, Zhongwei Yu, Jun Wang View a PDF of the paper titled ToolGate: Token-Efficient Pre-Call Control for Tool-Augmented Vision-Language Agents, by Anjie Liu and 5 other authors View PDF HTML (experimental) Abstract:Tool-augmented vision-language agents can acquire external perceptual evidence through OCR, detection, segmentation, and other tools, but executing every proposed tool call is costly and sometimes unnecessary.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.