GROW: Aligning GRPO with State-Action Modeling for Open-World VLM Agents

May 22, 2026 · 4:00 AM UTC ·3 min read · 0 reactions · 0 comments · 38 views

#machine learning #artificial intelligence #reinforcement learning

TL;DR · WeSearch summary

The paper introduces GROW, a reinforcement learning framework designed for open-world vision-language model agents. It addresses limitations in existing methods that rely on supervised fine-tuning by utilizing Group Relative Policy Optimization in a more effective manner. Experiments demonstrate that GROW achieves state-of-the-art performance on over 800 Minecraft tasks.

Key facts

▪GROW decomposes collected trajectories into state-action samples for better reinforcement learning performance.
▪The framework computes advantages between these samples rather than treating full trajectories as single entities.
▪Experiments show that GROW achieves state-of-the-art performance in open-world tasks.

Original article

arXiv cs.AI

Read full at arXiv cs.AI →

Opening excerpt (first ~120 words) tap to expand

Computer Science > Machine Learning arXiv:2605.20246 (cs) [Submitted on 18 May 2026 (v1), last revised 21 May 2026 (this version, v2)] Title:GROW: Aligning GRPO with State-Action Modeling for Open-World VLM Agents Authors:Xiongbin Wu, Zhihao Luo, Shanzhe Lei, Lechao Zhang, Xuhong Wang, Jie Yang, Zhonglong Zheng, Yuanjie Zheng, Xin Tan, Wei Liu View a PDF of the paper titled GROW: Aligning GRPO with State-Action Modeling for Open-World VLM Agents, by Xiongbin Wu and 9 other authors View PDF HTML (experimental) Abstract:Recently, vision-language model (VLM) agents have shown promising progress in open-world tasks, where successful task completion often requires multiple turns of visual perception and action execution.

…

Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.

Anonymous · no account needed

Discussion

0 comments

GROW: Aligning GRPO with State-Action Modeling for Open-World VLM Agents

Discussion

More from arXiv cs.AI