WeSearch

Generalization Dynamics of LM Pre-Training

Jiaxin Wen· ·4 min read · 0 reactions · 0 comments · 15 views
#machine learning#language models#generalization
⚡ TL;DR · AI summary

The article discusses the generalization dynamics of language models (LMs) during pre-training, revealing a phenomenon called mode-hopping. This behavior shows that LMs can abruptly switch between parrot-like and intelligence-like modes, challenging the traditional view of gradual learning. The findings suggest that generalization is influenced by competition between different circuits within the model, and scaling alone does not eliminate mode-hopping.

Key facts
Original article
Github · Jiaxin Wen
Read full at Github →
Opening excerpt (first ~120 words) tap to expand

We build an eval suite that exposes such behavioral fingerprints for generalization (see Table 1 for details), and use it to track generalization dynamics across LM pre-training. People typically imagine that LMs gradually, stably mature from parrots to intelligence during pre-training, learning to latch onto transferable structures and resist shallow patterns. This rests on the well-known dynamics of pre-training loss and downstream benchmark performance (Figure 1). We find this mental model is wrong: throughout pre-training, LMs frequently and suddenly hop between parrot-like and intelligence-like modes, i.e. distinct algorithms implemented by distinct circuits. We call this mode-hopping.

Excerpt limited to ~120 words for fair-use compliance. The full article is at Github.

Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Threads WhatsApp Bluesky Mastodon Email

Discussion

0 comments

More from Github