Dreaming Smoothly and Sample Efficiently with Gradient Penalized Latent Dynamics
The paper presents a new approach to model-based reinforcement learning called Gradient Penalized Latent Dynamics (GPLD). This method enhances sample efficiency by enforcing local smoothness in transition dynamics, which is often overlooked in existing models. Empirical results show that GPLD significantly improves performance in complex locomotion tasks compared to traditional methods.
- ▪GPLD applies a gradient-penalized regularization to the latent dynamics of the DreamerV3 model.
- ▪The approach encourages locally smooth transition learning, which is beneficial for continuous control environments.
- ▪Empirical tests demonstrate that GPLD achieves higher returns earlier and maintains consistent learning over longer horizons in challenging tasks.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Machine Learning arXiv:2605.23089 (cs) [Submitted on 21 May 2026] Title:Dreaming Smoothly and Sample Efficiently with Gradient Penalized Latent Dynamics Authors:Romil V. Sonigra (1), P. R. Kumar (1) ((1) Texas A&M University) View a PDF of the paper titled Dreaming Smoothly and Sample Efficiently with Gradient Penalized Latent Dynamics, by Romil V. Sonigra (1) and 1 other authors View PDF HTML (experimental) Abstract:Model-based reinforcement learning improves sample efficiency by learning a world model. However, existing latent world models such as DreamerV3 do not explicitly enforce local smoothness in their learned transition dynamics, leaving a useful inductive bias for transition dynamics learning unexploited.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.