DEL: Digit Entropy Loss for Numerical Learning of Large Language Models
The paper introduces Digit Entropy Loss (DEL) for improving numerical learning in large language models (LLMs). It critiques existing methods for number prediction and presents DEL as a solution that enhances prediction accuracy. The authors demonstrate DEL's effectiveness through experiments on various mathematical reasoning benchmarks.
- ▪Number prediction is crucial for large language models in tasks like mathematical problem-solving and code generation.
- ▪Existing numerical learning methods often lead to over-sharpened and over-flattened digit distributions.
- ▪Digit Entropy Loss reformulates unsupervised entropy optimization to improve accuracy in predicting integers and floating-point numbers.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Computation and Language arXiv:2605.20369 (cs) [Submitted on 19 May 2026] Title:DEL: Digit Entropy Loss for Numerical Learning of Large Language Models Authors:Zhaohui Zheng, Chenhang He, Shihao Wang, Yuxuan Li, Ming-Ming Cheng, Lei Zhang View a PDF of the paper titled DEL: Digit Entropy Loss for Numerical Learning of Large Language Models, by Zhaohui Zheng and 5 other authors View PDF HTML (experimental) Abstract:Number prediction stands as a fundamental capability of large language models (LLMs) in mathematical problem-solving and code generation. The widely adopted maximum likelihood estimation (MLE) for LLM training is not tailored to number prediction.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.