Adaptive Mass-Segmented KV Compression for Long-Context Reasoning
The paper presents a new framework called Adaptive Mass-Segmented KV Compression aimed at improving long-context reasoning in large language models. It addresses the limitations of existing key-value compression methods by ensuring that important reasoning segments are preserved during inference. The proposed method has been shown to enhance model performance across various tasks while being compatible with existing systems.
- ▪Adaptive Mass-Segmented KV Compression mitigates the bottleneck of linear growth in KV cache during long-form LLM inference.
- ▪The framework shifts from token-level competition to region-aware quota allocation, ensuring vital reasoning segments receive guaranteed memory.
- ▪Extensive experiments demonstrate that AMS reduces structural fragmentation and improves performance in tasks like mathematical reasoning and code completion.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Machine Learning arXiv:2605.23200 (cs) [Submitted on 22 May 2026] Title:Adaptive Mass-Segmented KV Compression for Long-Context Reasoning Authors:Junzhe Yang, Xiaoyu Shen View a PDF of the paper titled Adaptive Mass-Segmented KV Compression for Long-Context Reasoning, by Junzhe Yang and 1 other authors View PDF HTML (experimental) Abstract:The linear growth of the Key-Value (KV) cache is a critical bottleneck in long-form LLM inference. Existing KV compression methods mitigate this by evicting tokens based on importance scores. However, we show that their reliance on global Top-k selection triggers Region Wipe-out: the severe eviction of contiguous reasoning blocks that derails logical coherence.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.