Probing Embodied LLMs: When Higher Observation Fidelity Hurts Problem Solving
A recent study explores the performance of embodied large language models (LLMs) in robotic tasks. The research indicates that LLMs perform better with lower observation fidelity, such as raw RGB input, compared to higher fidelity observations. This counterintuitive finding suggests that success rates may be influenced by perceptual errors rather than robust problem-solving capabilities.
- ▪Embodied LLMs were evaluated using a mechanical puzzle called the Lockbox.
- ▪Agents performed best under raw RGB input and worst under perfect ground-truth observations.
- ▪Moderate noise in perceived action outcomes improved performance, peaking at a 40% flip probability.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Artificial Intelligence arXiv:2605.20072 (cs) [Submitted on 19 May 2026] Title:Probing Embodied LLMs: When Higher Observation Fidelity Hurts Problem Solving Authors:Oussama Zenkri, Oliver Brock View a PDF of the paper titled Probing Embodied LLMs: When Higher Observation Fidelity Hurts Problem Solving, by Oussama Zenkri and Oliver Brock View PDF HTML (experimental) Abstract:Large Language Models are increasingly proposed as cognitive components for robotic systems, yet their opaque decision processes make it difficult to explain success or failure in closed-loop embodied tasks. Following an empirical AI methodology, we study embodied LLM agents behaviorally by varying the information available to the agent and measuring the resulting changes in behavior.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.