Composition Collapse: Stable Factual Knowledge Does Not Imply Compositional Reasoning
The paper discusses the concept of composition collapse in artificial intelligence, where stable factual knowledge does not guarantee effective compositional reasoning. It introduces a double-gate protocol to better assess the composition capabilities of AI models beyond aggregate metrics. The findings suggest that improvements in multi-hop reasoning should be evaluated with more nuanced metrics that account for atomic knowledge access.
- ▪The study reveals that models with similar atomic knowledge can exhibit vastly different compositional behaviors.
- ▪A new double-gate protocol is proposed to analyze composition failure more accurately.
- ▪The research indicates that many composition failures are due to computational constraints during generation rather than a lack of ability to compose.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Artificial Intelligence arXiv:2605.26789 (cs) [Submitted on 26 May 2026] Title:Composition Collapse: Stable Factual Knowledge Does Not Imply Compositional Reasoning Authors:Zhe Yu, Wenpeng Xing, Yunzhao Wei, Jie Chen, Hongzhi Wang, Xuyang Teng, Meng Han View a PDF of the paper titled Composition Collapse: Stable Factual Knowledge Does Not Imply Compositional Reasoning, by Zhe Yu and 6 other authors View PDF HTML (experimental) Abstract:Post-training is routinely evaluated through aggregate benchmark scores that treat multi-hop reasoning as a single capability -- as if a model that answers more questions correctly must be better at assembling facts.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.