WeSearch

Beyond 80/20: High-Entropy Minority Tokens Drive Effective RL for LLM Reasoning

·3 min read · 0 reactions · 0 comments · 10 views
#reinforcement learning#language models#reasoning#token entropy#machine learning
Beyond 80/20: High-Entropy Minority Tokens Drive Effective RL for LLM Reasoning
⚡ TL;DR · AI summary

This study investigates Reinforcement Learning with Verifiable Rewards (RLVR) in large language models by analyzing token entropy patterns during reasoning. It finds that high-entropy minority tokens, which act as decision points in reasoning paths, are primarily responsible for performance gains in RLVR. By focusing policy updates on these forking tokens, the method achieves comparable or superior results to full-gradient updates while using only a fraction of the tokens.

Key facts
Original article
arXiv.org
Read full at arXiv.org →
Opening excerpt (first ~120 words) tap to expand

Computer Science > Computation and Language arXiv:2506.01939 (cs) [Submitted on 2 Jun 2025 (v1), last revised 13 Nov 2025 (this version, v2)] Title:Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning Authors:Shenzhi Wang, Le Yu, Chang Gao, Chujie Zheng, Shixuan Liu, Rui Lu, Kai Dang, Xionghui Chen, Jianxin Yang, Zhenru Zhang, Yuqiong Liu, An Yang, Andrew Zhao, Yang Yue, Shiji Song, Bowen Yu, Gao Huang, Junyang Lin View a PDF of the paper titled Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning, by Shenzhi Wang and 17 other authors View PDF HTML (experimental) Abstract:Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a powerful approach to enhancing the…

Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv.org.

Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Threads WhatsApp Bluesky Mastodon Email

Discussion

0 comments

More from arXiv.org