Robots That Know What to Ask: Recovering Misaligned Rewards through Targeted Explanations
A new framework for robots aims to recover misaligned rewards by actively soliciting targeted explanations. This approach identifies underspecified features in demonstrations and queries for corrective demonstrations to improve learning outcomes. The method has shown significant improvements in reward recovery compared to traditional passive data collection methods.
- ▪The framework detects underspecified features in robot demonstrations.
- ▪Robots explain their uncertainties in natural language to request targeted demonstrations.
- ▪The approach significantly reduces ambiguity in learning from imperfect demonstrations.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Robotics arXiv:2605.22986 (cs) [Submitted on 21 May 2026] Title:Robots That Know What to Ask: Recovering Misaligned Rewards through Targeted Explanations Authors:Helena Merker, Nick Walker, Andreea Bobu View a PDF of the paper titled Robots That Know What to Ask: Recovering Misaligned Rewards through Targeted Explanations, by Helena Merker and 2 other authors View PDF HTML (experimental) Abstract:Learning reward functions from demonstrations assumes that demonstrations provide adequate supervision over all features -- or task-relevant aspects of behavior. In practice, demonstrations are often imperfect: humans may under-emphasize certain features due to cognitive load or physical difficulty, or the training regime may fail to sufficiently cover all relevant situations.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.