ShopGym: An Integrated Framework for Realistic Simulation and Scalable Benchmarking of E-Commerce Web Agents
ShopGym is a new framework designed for the simulation and benchmarking of e-commerce web agents. It addresses the limitations of existing methodologies by providing a realistic and controllable environment for evaluation. The framework combines ShopArena for simulation and ShopGuru for task synthesis, resulting in stable and inspectable evaluation settings.
- ▪ShopGym integrates simulation and benchmarking for e-commerce web agents.
- ▪The framework allows for the creation of realistic and diverse evaluation environments.
- ▪ShopArena converts live storefronts into sandbox shops, while ShopGuru synthesizes benchmark tasks.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Artificial Intelligence arXiv:2605.16116 (cs) [Submitted on 15 May 2026] Title:ShopGym: An Integrated Framework for Realistic Simulation and Scalable Benchmarking of E-Commerce Web Agents Authors:Chinmay Savadikar, Mingyu Zhao, Yuanzheng Zhu, Han Li, Shuang Xie, Alberto Castelo, Tianfu Wu, Lingyun Wang View a PDF of the paper titled ShopGym: An Integrated Framework for Realistic Simulation and Scalable Benchmarking of E-Commerce Web Agents, by Chinmay Savadikar and 7 other authors View PDF HTML (experimental) Abstract:Developing and evaluating e-commerce web agents requires environments that preserve meaningful task structure while enabling controllable, reproducible, and scalable scientific comparison.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.