Sparse Autoencoders Map Brain-LLM Alignment onto Cortical Semantic Topography
A recent study explores the alignment between large language models and human brain responses to language. Using sparse autoencoders, researchers identified interpretable features that significantly predict brain activity. The findings suggest a strong correlation between semantic features and cortical organization across multiple languages.
- ▪Intermediate layers of large language models best predict human brain responses to language.
- ▪The study decomposed models like GPT-2 XL and Llama-3.1-8B into interpretable features, revealing that semantic features recover 94% of peak encoding performance.
- ▪A formal convergence test confirmed that these features align with known cortical semantic organization.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Computation and Language arXiv:2605.23035 (cs) [Submitted on 21 May 2026] Title:Sparse Autoencoders Map Brain-LLM Alignment onto Cortical Semantic Topography Authors:Dongxin Guo, Jikun Wu, Siu Ming Yiu View a PDF of the paper titled Sparse Autoencoders Map Brain-LLM Alignment onto Cortical Semantic Topography, by Dongxin Guo and 2 other authors View PDF HTML (experimental) Abstract:Intermediate layers of large language models (LLMs) best predict human brain responses to language, one of the most robust findings in computational neurolinguistics, yet why remains mechanistically unexplained.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.