RoPE Distinguishes Neither Positions Nor Tokens in Long Contexts, Provably

May 18, 2026 · 4:00 AM UTC ·3 min read · 0 reactions · 0 comments · 13 views

#artificial intelligence #machine learning #computation

⚡ TL;DR · AI summary

A recent paper discusses the limitations of Rotary Positional Embeddings (RoPE) in long-context language models. The authors prove that as context length increases, RoPE loses its effectiveness in distinguishing between positions and tokens. Their findings suggest that new mechanisms may be necessary for future Transformer models.

Key facts

▪The paper identifies intrinsic limitations of RoPE in Transformer-based long-context language models.
▪As context length increases, RoPE-based attention becomes unpredictable and loses locality bias and consistency in token relevance.
▪Adjusting the RoPE base can distinguish tokens but sacrifices the ability to distinguish positions.

Original article

arXiv cs.AI

Read full at arXiv cs.AI →

Opening excerpt (first ~120 words) tap to expand

Computer Science > Computation and Language arXiv:2605.15514 (cs) [Submitted on 15 May 2026] Title:RoPE Distinguishes Neither Positions Nor Tokens in Long Contexts, Provably Authors:Yufeng Du, Phillip Harris, Minyang Tian, Eliu A Huerta, Srikanth Ronanki, Subendhu Rongali, Aram Galstyan, Hao Peng View a PDF of the paper titled RoPE Distinguishes Neither Positions Nor Tokens in Long Contexts, Provably, by Yufeng Du and 7 other authors View PDF HTML (experimental) Abstract:We identify intrinsic limitations of Rotary Positional Embeddings (RoPE) in Transformer-based long-context language models. Our theoretical analysis abstracts away from the specific content of the context and depends only on its length.

…

Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.

Anonymous · no account needed

Discussion

0 comments

RoPE Distinguishes Neither Positions Nor Tokens in Long Contexts, Provably

Discussion

More from arXiv cs.AI