Distribution-Aware Reward: Reinforcement Learning over Predictive Distributions for LLM Regression
The paper introduces a novel reinforcement learning objective called Distribution-Aware Reward, aimed at improving predictive distributions in large language models for regression tasks. This method evaluates multiple decoded samples as an empirical predictive distribution, enhancing both accuracy and dispersion of predictions. The authors demonstrate that their approach outperforms traditional supervised fine-tuning and pointwise reinforcement learning across various tasks, leading to better uncertainty diagnostics and model robustness.
- ▪The method focuses on training language models to produce better predictive distributions rather than just optimizing individual outputs.
- ▪It evaluates decoded samples using the Continuous Ranked Probability Score and rewards predictions based on their contribution to distribution quality.
- ▪The approach shows significant improvements in rank correlation and remains competitive with advanced models in molecular property prediction.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Machine Learning arXiv:2605.20740 (cs) [Submitted on 20 May 2026] Title:Distribution-Aware Reward: Reinforcement Learning over Predictive Distributions for LLM Regression Authors:Jungsoo Park, Hyungjoo Chae, Ethan Mendes, Jay DeYoung, Varsha Kishore, Wei Xu, Alan Ritter View a PDF of the paper titled Distribution-Aware Reward: Reinforcement Learning over Predictive Distributions for LLM Regression, by Jungsoo Park and 6 other authors View PDF HTML (experimental) Abstract:Large language models can predict real-valued quantities from heterogeneous inputs such as text, code, and molecular strings, but most training objectives score each decoded floating-point number independently, improving point estimates without ensuring calibrated predictive distributions.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.