Every Component is a Lookup: Token Attribution and Composition from a Single Decomposition

May 25, 2026 · 4:00 AM UTC ·3 min read · 0 reactions · 0 comments · 26 views

#machine learning #artificial intelligence #transformers

TL;DR · WeSearch summary

The paper discusses a method called Unpack for mechanistic interpretability of transformers. It focuses on token attribution and composition through a single decomposition, allowing for the identification of interaction strengths between components. The method has been evaluated on the indirect object identification task, demonstrating its effectiveness across various model sizes.

Key facts

▪Unpack is a backward recursion method that decomposes credit through transformer sublayers.
▪The method produces interaction strengths and per-token attribution without requiring gradients or auxiliary training.
▪It consistently tracks mechanistic structure across different scales of the Pythia family models.

Original article

arXiv cs.AI

Read full at arXiv cs.AI →

Opening excerpt (first ~120 words) tap to expand

Computer Science > Machine Learning arXiv:2605.23393 (cs) [Submitted on 22 May 2026] Title:Every Component is a Lookup: Token Attribution and Composition from a Single Decomposition Authors:Po-Kai Chen, Niki van Stein, Aske Plaat View a PDF of the paper titled Every Component is a Lookup: Token Attribution and Composition from a Single Decomposition, by Po-Kai Chen and 2 other authors View PDF HTML (experimental) Abstract:Mechanistic interpretability of transformers requires identifying not just which components matter but how they compose into the computational route that produced a prediction. Both attention and MLP follow a shared key-value template $\phi(S)U$.

…

Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.

Anonymous · no account needed

Discussion

0 comments

Every Component is a Lookup: Token Attribution and Composition from a Single Decomposition

Discussion

More from arXiv cs.AI