WeSearch

ICRL: Learning to Internalize Self-Critique with Reinforcement Learning

·3 min read · 0 reactions · 0 comments · 15 views
#artificial intelligence#reinforcement learning#language models
ICRL: Learning to Internalize Self-Critique with Reinforcement Learning
⚡ TL;DR · AI summary

The paper presents a novel framework called ICRL, which aims to enhance the self-improvement capabilities of language model-based agents. By jointly training a solver and a critic, ICRL enables the model to internalize critique and improve its performance without relying on external feedback. The results demonstrate significant performance gains on various reasoning tasks, indicating the effectiveness of this approach.

Key facts
Original article
arXiv cs.AI
Read full at arXiv cs.AI →
Opening excerpt (first ~120 words) tap to expand

Computer Science > Artificial Intelligence arXiv:2605.15224 (cs) [Submitted on 13 May 2026] Title:ICRL: Learning to Internalize Self-Critique with Reinforcement Learning Authors:Jianbo Lin, Xiaomin Yu, Yi Xin, Yifu Guo, Zhuosong Jiang, Zhongqi Yue, Weishi Wang, Heqing Zou, Chengwei Qin, Hui Xiong View a PDF of the paper titled ICRL: Learning to Internalize Self-Critique with Reinforcement Learning, by Jianbo Lin and 8 other authors View PDF HTML (experimental) Abstract:Large language model-based agents make mistakes, yet critique can often guide the same model toward correct behavior. However, when critique is removed, the model may fail again on the same query, indicating that it has not internalized the critique's guidance into its underlying capability.

Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.

Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Threads WhatsApp Bluesky Mastodon Email

Discussion

0 comments

More from arXiv cs.AI