WeSearch

Grokking as Structural Inference: Transformers Need Bayesian Lottery Tickets

·3 min read · 0 reactions · 0 comments · 13 views
#machine learning#artificial intelligence#transformers
Grokking as Structural Inference: Transformers Need Bayesian Lottery Tickets
⚡ TL;DR · AI summary

The paper discusses the phenomenon of 'grokking' in Transformers, where models take a long time to generalize despite memorizing their training data. It introduces a formalization of attention as a Bayesian posterior and identifies two necessary conditions for generalization. The authors propose that this delay in generalization can be explained as a structural inference process, which can be accelerated through specific interventions.

Key facts
Original article
arXiv cs.AI
Read full at arXiv cs.AI →
Opening excerpt (first ~120 words) tap to expand

Computer Science > Machine Learning arXiv:2605.15787 (cs) [Submitted on 15 May 2026] Title:Grokking as Structural Inference: Transformers Need Bayesian Lottery Tickets Authors:Kai Hidajat, Solden Stoll, Joseph An View a PDF of the paper titled Grokking as Structural Inference: Transformers Need Bayesian Lottery Tickets, by Kai Hidajat and 2 other authors View PDF HTML (experimental) Abstract:Why does a Transformer that has memorized its training set wait thousands of steps before it generalizes? Existing accounts locate this delay in norm minimization, feature emergence, or the late discovery of sparse subnetworks.

Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.

Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Threads WhatsApp Bluesky Mastodon Email

Discussion

0 comments

More from arXiv cs.AI