RealUserSim: Bridging the Reality Gap in Agent Benchmarking via Grounded User Simulation
The paper introduces RealUserSim, a new user simulation framework designed to improve agent benchmarking by grounding simulations in real behavioral data. It highlights the limitations of current LLM-based simulations, which often fail to accurately represent human behavior. By utilizing data from over 14,000 authentic conversations, the framework significantly enhances the fidelity of agent evaluations.
- ▪RealUserSim is the first user simulation framework grounded in real behavioral data.
- ▪The framework improves match rates from 24.2% to 45.3% across five behavioral dimensions.
- ▪Grounded simulation reveals failure mechanisms that are not visible in cooperative simulators.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Human-Computer Interaction arXiv:2605.20204 (cs) [Submitted on 7 Apr 2026] Title:RealUserSim: Bridging the Reality Gap in Agent Benchmarking via Grounded User Simulation Authors:Ming Zhu, Juntao Tan, Rithesh Murthy, Jielin Qiu, Liangwei Yang, Wenting Zhao, Silvio Savarese, Shelby Heinecke, Huan Wang View a PDF of the paper titled RealUserSim: Bridging the Reality Gap in Agent Benchmarking via Grounded User Simulation, by Ming Zhu and 8 other authors View PDF HTML (experimental) Abstract:LLM-based user simulation is the primary mechanism for end-to-end agent evaluation, yet simulated users are poor proxies for real humans: unconstrained LLM defaults produce a Formalism Ceiling (style match rates of 6-8% against real users), while hand-crafted behavioral directives…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.