When AI reviews science: Can we trust the referee?

Apr 28, 2026 · 4:00 AM UTC ·3 min read · 0 reactions · 0 comments · 1 view

#ai peer review #scientific integrity #llm vulnerabilities #research ethics #artificial intelligence

⚡ TL;DR · AI summary

As scientific submissions grow, AI is increasingly used to assist peer review, but concerns about reliability and security are mounting. The paper examines how AI referees can be manipulated through tactics like hidden prompts and biased phrasing, and presents experimental evidence of vulnerabilities in review outcomes. It calls for rigorous evaluation and safeguards to ensure trustworthy AI-assisted peer review. A taxonomy of potential attacks across the review lifecycle is proposed, along with targeted mitigation strategies.

Original article

arXiv.org

Read full at arXiv.org →

Full article excerpt tap to expand

Computer Science > Artificial Intelligence arXiv:2604.23593 (cs) [Submitted on 26 Apr 2026] Title:When AI reviews science: Can we trust the referee? Authors:Jialiang Wang, Yuchen Liu, Hang Xu, Kaichun Hu, Shimin Di, Wangze Ni, Linan Yue, Min-Ling Zhang, Kui Ren, Lei Chen View a PDF of the paper titled When AI reviews science: Can we trust the referee?, by Jialiang Wang and 9 other authors View PDF HTML (experimental) Abstract:The volume of scientific submissions continues to climb, outpacing the capacity of qualified human referees and stretching editorial timelines. At the same time, modern large language models (LLMs) offer impressive capabilities in summarization, fact checking, and literature triage, making the integration of AI into peer review increasingly attractive -- and, in practice, unavoidable. Yet early deployments and informal adoption have exposed acute failure modes. Recent incidents have revealed that hidden prompt injections embedded in manuscripts can steer LLM-generated reviews toward unjustifiably positive judgments. Complementary studies have also demonstrated brittleness to adversarial phrasing, authority and length biases, and hallucinated claims. These episodes raise a central question for scholarly communication: when AI reviews science, can we trust the AI referee? This paper provides a security- and reliability-centered analysis of AI peer review. We map attacks across the review lifecycle -- training and data retrieval, desk review, deep review, rebuttal, and system-level. We instantiate this taxonomy with four treatment-control probes on a stratified set of ICLR 2025 submissions, using two advanced LLM-based referees to isolate the causal effects of prestige framing, assertion strength, rebuttal sycophancy, and contextual poisoning on review scores. Together, this taxonomy and experimental audit provide an evidence-based baseline for assessing and tracking the reliability of AI peer review and highlight concrete failure points to guide targeted, testable mitigations. Subjects: Artificial Intelligence (cs.AI) Cite as: arXiv:2604.23593 [cs.AI] (or arXiv:2604.23593v1 [cs.AI] for this version) https://doi.org/10.48550/arXiv.2604.23593 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Journal reference: The Innovation Informatics 2:100030 (2026) Related DOI: https://doi.org/10.59717/j.xinn-inform.2026.100030 Focus to learn more DOI(s) linking to related resources Submission history From: Jialiang Wang [view email] [v1] Sun, 26 Apr 2026 08:03:32 UTC (3,000 KB) Full-text links: Access Paper: View a PDF of the paper titled When AI reviews science: Can we trust the referee?, by Jialiang Wang and 9 other authorsView PDFHTML (experimental)TeX Source view license Current browse context: cs.AI < prev | next > new | recent | 2026-04 Change to browse by: cs References & Citations NASA ADSGoogle Scholar Semantic Scholar export BibTeX citation Loading... BibTeX formatted citation × loading... Data provided by: Bookmark Bibliographic Tools Bibliographic and Citation Tools Bibliographic Explorer Toggle Bibliographic Explorer (What is the Explorer?) Connected Papers Toggle Connected Papers (What is Connected Papers?) Litmaps Toggle Litmaps (What is Litmaps?) scite.ai Toggle scite Smart Citations (What are Smart Citations?) Code, Data, Media Code, Data and Media Associated with this Article alphaXiv Toggle alphaXiv (What is alphaXiv?) Links to Code Toggle CatalyzeX Code Finder for Papers (What is…

This excerpt is published under fair use for community discussion. Read the full article at arXiv.org.

Anonymous · no account needed

Discussion

0 comments

When AI reviews science: Can we trust the referee?

Discussion

More from arXiv.org