ReCrit: Transition-Aware Reinforcement Learning for Scientific Critic Reasoning

May 20, 2026 · 4:00 AM UTC ·3 min read · 0 reactions · 0 comments · 16 views

#machine learning #artificial intelligence #scientific reasoning

⚡ TL;DR · AI summary

The paper introduces ReCrit, a transition-aware reinforcement learning framework aimed at improving scientific reasoning in large language models. It addresses the challenges of critic interaction by framing it as a correctness-transition problem. The proposed method shows significant improvements in critic accuracy across various scientific reasoning benchmarks.

Key facts

▪ReCrit improves critic accuracy from 38.15 to 51.49 on Qwen3.5-4B and from 45.40 to 55.59 on Qwen3.5-9B.
▪The framework decomposes critic interaction into four quadrants: Correction, Sycophancy, Robustness, and Boundary.
▪Final-answer rewards provide little interaction-level gain compared to transition-aware rewards and quadrant weighting.

Original article

arXiv cs.AI

Read full at arXiv cs.AI →

Opening excerpt (first ~120 words) tap to expand

Computer Science > Machine Learning arXiv:2605.18799 (cs) [Submitted on 11 May 2026] Title:ReCrit: Transition-Aware Reinforcement Learning for Scientific Critic Reasoning Authors:Wanghan Xu, Yuhao Zhou, Hengyuan Zhao, Shuo Li, Dianzhi Yu, Zhenfei Yin, Yaowen Hu, Fengli Xu, Wanli Ouyang, Wenlong Zhang, Lei Bai View a PDF of the paper titled ReCrit: Transition-Aware Reinforcement Learning for Scientific Critic Reasoning, by Wanghan Xu and 10 other authors View PDF HTML (experimental) Abstract:Large language models can fail in critic interaction not only by answering incorrectly, but also by abandoning an initially correct scientific solution after user criticism. This is especially risky in scientific reasoning, where user criticism can turn a valid answer into an incorrect one.

…

Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.

Anonymous · no account needed

Discussion

0 comments

ReCrit: Transition-Aware Reinforcement Learning for Scientific Critic Reasoning

Discussion

More from arXiv cs.AI