ReCrit: Transition-Aware Reinforcement Learning for Scientific Critic Reasoning
The paper introduces ReCrit, a transition-aware reinforcement learning framework aimed at improving scientific reasoning in large language models. It addresses the challenges of critic interaction by framing it as a correctness-transition problem. The proposed method shows significant improvements in critic accuracy across various scientific reasoning benchmarks.
- ▪ReCrit improves critic accuracy from 38.15 to 51.49 on Qwen3.5-4B and from 45.40 to 55.59 on Qwen3.5-9B.
- ▪The framework decomposes critic interaction into four quadrants: Correction, Sycophancy, Robustness, and Boundary.
- ▪Final-answer rewards provide little interaction-level gain compared to transition-aware rewards and quadrant weighting.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Machine Learning arXiv:2605.18799 (cs) [Submitted on 11 May 2026] Title:ReCrit: Transition-Aware Reinforcement Learning for Scientific Critic Reasoning Authors:Wanghan Xu, Yuhao Zhou, Hengyuan Zhao, Shuo Li, Dianzhi Yu, Zhenfei Yin, Yaowen Hu, Fengli Xu, Wanli Ouyang, Wenlong Zhang, Lei Bai View a PDF of the paper titled ReCrit: Transition-Aware Reinforcement Learning for Scientific Critic Reasoning, by Wanghan Xu and 10 other authors View PDF HTML (experimental) Abstract:Large language models can fail in critic interaction not only by answering incorrectly, but also by abandoning an initially correct scientific solution after user criticism. This is especially risky in scientific reasoning, where user criticism can turn a valid answer into an incorrect one.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.