Convergence Without Understanding: When Language Models Agree on Representations but Disagree on Reasoning
The paper explores the convergence of internal representations among large language models while highlighting their differing reasoning processes. It identifies three key dissociations in model performance across various reasoning tasks. The findings suggest that shared representations do not equate to shared reasoning strategies, impacting model interpretability and design.
- ▪The study evaluates 16 language models from 8 families on 800 reasoning problems.
- ▪Models showed a difficulty inversion, converging more on problems they collectively failed than on those they solved.
- ▪Pre-decision representations aligned well, while post-decision representations diverged significantly.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Computation and Language arXiv:2605.23315 (cs) [Submitted on 22 May 2026] Title:Convergence Without Understanding: When Language Models Agree on Representations but Disagree on Reasoning Authors:Muhammad Usama, Dong Eui Chang View a PDF of the paper titled Convergence Without Understanding: When Language Models Agree on Representations but Disagree on Reasoning, by Muhammad Usama and Dong Eui Chang View PDF HTML (experimental) Abstract:Large language models trained under diverse objectives and architectures have been shown to develop increasingly similar internal representations, an observation formalized as the Platonic Representation Hypothesis.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.