Exact Linear Attention
The paper titled 'Exact Linear Attention' introduces a new mechanism for Transformer attention that achieves linear computational complexity. It addresses issues found in previous linear attention methods by imposing kernel constraints to ensure better performance. The author also presents several engineering innovations to enhance the attention mechanism's interpretability and effectiveness.
- ▪Exact Linear Attention (ELA) achieves linear computational complexity for Transformer attention without approximation error.
- ▪The paper proposes several kernel functions to address gradient explosion and token attention dilution.
- ▪Innovations include a Hyper Link structure, a Memory Lobe module, and a routing score based bias mechanism.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Machine Learning arXiv:2605.18848 (cs) [Submitted on 13 May 2026] Title:Exact Linear Attention Authors:Weinuo Ou View a PDF of the paper titled Exact Linear Attention, by Weinuo Ou View PDF HTML (experimental) Abstract:This paper introduces Exact Linear Attention (ELA), a mechanism that achieves linear computational complexity for Transformer attention by leveraging the exact decomposition property of kernel functions, without any approximation error. It identifies and addresses gradient explosion and token attention dilution in prior linear attention methods by imposing kernel constraints that ensure non-negativity, discriminability, and geometric interpretability.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.