Completion vs Optimality: Policy Gradient in Long-Horizon Cumulative-Damage Problems
The paper discusses long-horizon decision problems characterized by cumulative damage and the challenges faced by policy-gradient methods. It identifies two failure modes: completion and optimality, and proposes a decomposition to address these issues. The authors evaluate their predictions in two different environments, demonstrating the applicability of their findings across various scenarios.
- ▪The study focuses on long-horizon decision problems with cumulative damage.
- ▪It identifies two failure modes for policy-gradient methods: completion and optimality.
- ▪The authors derive four testable predictions and evaluate them in two distinct environments.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Artificial Intelligence arXiv:2605.26657 (cs) [Submitted on 26 May 2026] Title:Completion vs Optimality: Policy Gradient in Long-Horizon Cumulative-Damage Problems Authors:Wolfgang Maass, Sabine Janzen View a PDF of the paper titled Completion vs Optimality: Policy Gradient in Long-Horizon Cumulative-Damage Problems, by Wolfgang Maass and 1 other authors View PDF HTML (experimental) Abstract:Long-horizon decision problems with cumulative damage couple locally attractive actions to globally adverse outcomes.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.