Can LLMs Introspect? A Reality Check
The paper titled 'Can LLMs Introspect? A Reality Check' questions the ability of large language models (LLMs) to introspect and report their internal states. The authors argue that current evidence is insufficient to support claims of genuine introspection, suggesting that observed behaviors may stem from pattern matching rather than true self-awareness. They re-evaluate two paradigms used in previous studies and find that LLMs struggle to distinguish internal state manipulations from input changes, indicating limitations in their metacognitive abilities.
- ▪The paper argues that large language models may not genuinely introspect as previously claimed.
- ▪It highlights the need to differentiate between true introspection and surface-level pattern matching.
- ▪The authors found that LLMs cannot reliably detect tampering of their internal states.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Artificial Intelligence arXiv:2605.26242 (cs) [Submitted on 25 May 2026] Title:Can LLMs Introspect? A Reality Check Authors:Shashwat Singh, Tal Linzen, Shauli Ravfogel View a PDF of the paper titled Can LLMs Introspect? A Reality Check, by Shashwat Singh and 2 other authors View PDF HTML (experimental) Abstract:Can large language models detect and report their own internal states? A number of studies have argued that the answer to this question is yes. We argue, based on lessons from human metacognition research, that this conclusion may be premature: to be convinced of this conclusion we need to distinguish genuine introspection from pattern matching based on surface-level cues.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.