The Attribution Blind Spot: Detecting When Language Models Rely on Memory Rather Than Retrieved Context
The paper discusses the challenges of verifying whether language models rely on retrieved context or their internal memory. It introduces a new method called Computational Reality Monitoring (CRM) to address the issue of attribution blind spots in language models. The authors demonstrate that internal representations can reveal insights about evidence provenance that are not visible at the output level.
- ▪The attribution blind spot occurs when language models produce outputs that appear context-consistent but are actually generated from memory.
- ▪Computational Reality Monitoring (CRM) is proposed as a solution to detect internal representation divergence.
- ▪The study shows that this divergence is measurable and can inform the development of systems that better understand evidence provenance.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Artificial Intelligence arXiv:2605.26778 (cs) [Submitted on 26 May 2026] Title:The Attribution Blind Spot: Detecting When Language Models Rely on Memory Rather Than Retrieved Context Authors:Zhe Yu, Wenpeng Xing, Yunzhao Wei, Bo Yang, Chen Ye, Gaolei Li, Meng Han View a PDF of the paper titled The Attribution Blind Spot: Detecting When Language Models Rely on Memory Rather Than Retrieved Context, by Zhe Yu and 6 other authors View PDF HTML (experimental) Abstract:Retrieval-augmented generation promises to ground language model outputs in external evidence, yet the field has no reliable way to verify whether retrieved context actually governs generation -- a prerequisite for any high-stakes deployment.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.