On the limits and opportunities of AI reviewers: Reviewing the reviews of Nature-family papers with 45 expert scientists
A recent study evaluates the effectiveness of AI reviewers in scientific peer review. Conducted by 45 experts, the research highlights both the strengths and limitations of AI in assessing research papers. The findings suggest that while AI reviewers can complement human reviewers, they are not yet ready to fully replace them.
- ▪The study involved 45 domain scientists who rated 2,960 criticisms from human and AI-generated reviews of 82 Nature-family papers.
- ▪AI reviewers, particularly one powered by GPT-5.2, scored higher than the top-rated human reviewer in terms of correctness, significance, and sufficiency of evidence.
- ▪AI reviewers identified 26% of issues that human reviewers did not raise, but they also exhibited significant overlap and recurring weaknesses compared to human reviewers.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Computation and Language arXiv:2605.20668 (cs) [Submitted on 20 May 2026] Title:On the limits and opportunities of AI reviewers: Reviewing the reviews of Nature-family papers with 45 expert scientists Authors:Seungone Kim, Dongkeun Yoon, Kiril Gashteovski, Juyoung Suk, Jinheon Baek, Pranjal Aggarwal, Ian Wu, Viktor Zaverkin, Spase Petkoski, Daniel R. Schrider, Ilija Dukovski, Francesco Santini, Biljana Mitreska, Yong Jeong, Kyeongha Kwon, Young Min Sim, Dragana Manasova, Arthur Porto, Biljana Mojsoska, Makoto Takamoto, Marko Shuntov, Ruoqi Liu, Hyunjoo Jenny Lee, Niyazi Ulas Dinç, Yehhyun Jo, Sunkyu Han, Chungwoo Lee, Huishan Li, Esther H. R.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.