AQuaUI: Visual Token Reduction for GUI Agents with Adaptive Quadtrees
The paper introduces AQuaUI, a novel method for visual token reduction in GUI agents using adaptive quadtrees. This approach addresses the challenge of non-uniform spatial information density in GUI screenshots without requiring additional training. AQuaUI demonstrates significant improvements in accuracy and efficiency, achieving notable speedups and reductions in visual tokens while maintaining high performance.
- ▪AQuaUI is a training-free inference-time token reduction method for GUI agent models.
- ▪It constructs an adaptive quadtree for each screenshot input, preserving spatial positions of retained tokens.
- ▪The method achieves up to 13.22% speedup and 29.52% fewer visual tokens on specific benchmarks.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Artificial Intelligence arXiv:2605.19260 (cs) [Submitted on 19 May 2026] Title:AQuaUI: Visual Token Reduction for GUI Agents with Adaptive Quadtrees Authors:Yuankai Li, Tinghui Zhu, Ha Min Son, Zhe Zhao, Xin Liu, Muhao Chen View a PDF of the paper titled AQuaUI: Visual Token Reduction for GUI Agents with Adaptive Quadtrees, by Yuankai Li and 5 other authors View PDF HTML (experimental) Abstract:Large Multimodal Models (LMMs) have recently emerged as promising backbones for GUI-agent models, where high-resolution GUI screenshots are introduced to the prompts at each iteration step.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.