Sparse Autoencoders Map Brain-LLM Alignment onto Cortical Semantic Topography

May 25, 2026 · 4:00 AM UTC ·3 min read · 0 reactions · 0 comments · 11 views

#neuroscience #language #artificial intelligence

⚡ TL;DR · AI summary

A recent study explores the alignment between large language models and human brain responses to language. Using sparse autoencoders, researchers identified interpretable features that significantly predict brain activity. The findings suggest a strong correlation between semantic features and cortical organization across multiple languages.

Key facts

▪Intermediate layers of large language models best predict human brain responses to language.
▪The study decomposed models like GPT-2 XL and Llama-3.1-8B into interpretable features, revealing that semantic features recover 94% of peak encoding performance.
▪A formal convergence test confirmed that these features align with known cortical semantic organization.

Original article

arXiv cs.AI

Read full at arXiv cs.AI →

Opening excerpt (first ~120 words) tap to expand

Computer Science > Computation and Language arXiv:2605.23035 (cs) [Submitted on 21 May 2026] Title:Sparse Autoencoders Map Brain-LLM Alignment onto Cortical Semantic Topography Authors:Dongxin Guo, Jikun Wu, Siu Ming Yiu View a PDF of the paper titled Sparse Autoencoders Map Brain-LLM Alignment onto Cortical Semantic Topography, by Dongxin Guo and 2 other authors View PDF HTML (experimental) Abstract:Intermediate layers of large language models (LLMs) best predict human brain responses to language, one of the most robust findings in computational neurolinguistics, yet why remains mechanistically unexplained.

…

Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.

Anonymous · no account needed

Discussion

0 comments

Sparse Autoencoders Map Brain-LLM Alignment onto Cortical Semantic Topography

Discussion

More from arXiv cs.AI