D-PACE: Dynamic Position-Aware Cross-Entropy for Parallel Speculative Drafting
The paper presents D-PACE, a new method for improving speculative decoding in large language models. It introduces a dynamic position-aware cross-entropy loss that adapts training weights based on the positions limiting acceptance during training. The results show significant improvements in speed and output length across various benchmarks without altering the drafter's architecture.
- ▪D-PACE stands for Dynamic Position-Aware Cross-Entropy.
- ▪The method enhances speculative decoding by adjusting training weights based on log-probability gradients.
- ▪D-PACE improves both wall-clock speedup and average emitted length with minimal training-time overhead.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Machine Learning arXiv:2605.18810 (cs) [Submitted on 12 May 2026] Title:D-PACE: Dynamic Position-Aware Cross-Entropy for Parallel Speculative Drafting Authors:Tianyu Wu, Yu Yao, Zhenting Qi, Han Zheng, Zhuohan Wang, Haoran Ma, Lawrence Liao, Himabindu Lakkaraju, Ju Li, Yilun Du View a PDF of the paper titled D-PACE: Dynamic Position-Aware Cross-Entropy for Parallel Speculative Drafting, by Tianyu Wu and 9 other authors View PDF HTML (experimental) Abstract:Speculative decoding accelerates LLM inference by having a small drafter propose tokens that a larger target model verifies in parallel. Recent diffusion-based parallel drafters such as DFlash predict the full B-token block in one forward pass, enabling deeper drafters and longer accepted blocks.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.