DRS-GUI: Dynamic Region Search for Training-Free GUI Grounding

May 18, 2026 · 4:00 AM UTC ·2 min read · 0 reactions · 0 comments · 12 views

#artificial intelligence #machine learning #user interface

⚡ TL;DR · AI summary

The paper presents DRS-GUI, a training-free framework for GUI grounding that enhances the performance of Multimodal Large Language Models. It introduces a lightweight UI Perceptor that mimics human-like perceptual actions to identify relevant regions in complex user interfaces. Experimental results indicate a significant improvement in grounding performance, achieving a 14% increase on benchmark tests.

Key facts

▪DRS-GUI is designed to improve the grounding of instruction-relevant elements in high-resolution screenshots.
▪The framework employs a UI Perceptor that performs actions such as Focus, Shift, and Scatter to explore interfaces.
▪A Monte Carlo Tree Search-based Action Planner is used to dynamically schedule perceptual actions and evaluate region quality.

Original article

arXiv cs.AI

Read full at arXiv cs.AI →

Opening excerpt (first ~120 words) tap to expand

Computer Science > Artificial Intelligence arXiv:2605.15542 (cs) [Submitted on 15 May 2026] Title:DRS-GUI: Dynamic Region Search for Training-Free GUI Grounding Authors:Yichao Liu, Huawen Shen, Liu Yu, Shiyu Liu, Zeyu Chen, Yu Zhou View a PDF of the paper titled DRS-GUI: Dynamic Region Search for Training-Free GUI Grounding, by Yichao Liu and 5 other authors View PDF HTML (experimental) Abstract:GUI agents powered by Multimodal Large Language Models (MLLMs) have demonstrated impressive capability in understanding and executing user instructions. However, accurately grounding instruction-relevant elements from high-resolution screenshots cluttered with irrelevant UI components remains challenging for existing approaches.

…

Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.

Anonymous · no account needed

Discussion

0 comments

DRS-GUI: Dynamic Region Search for Training-Free GUI Grounding

Discussion

More from arXiv cs.AI