Metacognition as Reward: Reinforcing LLM Reasoning via Knowledge and Regulation Signals
The paper introduces a new reinforcement learning framework called Metacognition-as-Reward (MaR) aimed at enhancing the reasoning capabilities of large language models (LLMs). MaR focuses on guiding LLM reasoning through metacognitive knowledge and regulation, providing a more comprehensive reward system. Experiments demonstrate that MaR significantly improves model performance across various benchmarks, surpassing existing models in several instances.
- ▪Recent reinforcement learning methods have improved the reasoning abilities of large language models.
- ▪The Metacognition-as-Reward framework guides LLM reasoning through metacognitive knowledge and regulation.
- ▪Experiments show that MaR achieves up to a 7.7% gain over base models and outperforms stronger models on individual benchmarks.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Computation and Language arXiv:2605.23384 (cs) [Submitted on 22 May 2026] Title:Metacognition as Reward: Reinforcing LLM Reasoning via Knowledge and Regulation Signals Authors:Sirui Chen, Lei Xu, Yuying Zhao, Yutian Chen, Yu Wang, Beier Zhu, Hanwang Zhang, Shengjie Zhao, Chaochao Lu View a PDF of the paper titled Metacognition as Reward: Reinforcing LLM Reasoning via Knowledge and Regulation Signals, by Sirui Chen and 8 other authors View PDF Abstract:Recent RL methods have substantially improved the reasoning abilities of LLMs.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.