AGPO: Adaptive Group Policy Optimization with Dual Statistical Feedback

May 22, 2026 · 4:00 AM UTC ·2 min read · 0 reactions · 0 comments · 13 views

#machine learning #artificial intelligence #reinforcement learning

⚡ TL;DR · AI summary

The paper presents Adaptive Group Policy Optimization (AGPO), a new method for improving reinforcement learning in large language models. AGPO utilizes group-level statistics to enhance training efficiency by controlling update magnitude and exploration. The results indicate that models trained with AGPO outperform traditional methods on various benchmarks.

Key facts

▪AGPO is a critic-free refinement of GRPO that enhances training stability.
▪It employs adaptive clipping and bidirectional adaptive temperature sampling for better performance.
▪Models trained with AGPO achieved significant improvements on math and STEM benchmarks.

Original article

arXiv cs.AI

Read full at arXiv cs.AI →

Opening excerpt (first ~120 words) tap to expand

Computer Science > Machine Learning arXiv:2605.20722 (cs) [Submitted on 20 May 2026] Title:AGPO: Adaptive Group Policy Optimization with Dual Statistical Feedback Authors:Miaobo Hu, Shuhao Hu, Bokun Wang, Ruohan Wang, Xin Wang, Xiaobo Guo, Daren Zha, Jun Xiao View a PDF of the paper titled AGPO: Adaptive Group Policy Optimization with Dual Statistical Feedback, by Miaobo Hu and 6 other authors View PDF Abstract:Reinforcement learning improves LLM reasoning, but PPO/GRPO typically use fixed clipping and decoding temperature, which makes training brittle and tuning-heavy. We propose Adaptive Group Policy Optimization (AGPO), a critic-free refinement of GRPO that uses group-level statistics to control both update magnitude and exploration.

…

Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.

Anonymous · no account needed

Discussion

0 comments

AGPO: Adaptive Group Policy Optimization with Dual Statistical Feedback

Discussion

More from arXiv cs.AI