WebGameBench: Requirement-to-Application Evaluation for Coding Agents via Browser-Native Games
WebGameBench is a new benchmark designed to evaluate coding agents' ability to create browser-native games from structured specifications. The benchmark assesses the usability of the generated games through a runtime evaluator that interacts with the applications in real browsers. Initial results indicate that while some coding agents can produce playable games, there remains a significant gap in achieving high-quality outputs.
- ▪WebGameBench evaluates coding agents on their ability to turn structured specifications into browser-accessible games.
- ▪The benchmark assigns usability labels of EXCELLENT, USABLE, or UNUSABLE based on human gameplay reviews.
- ▪The best configuration of coding agents achieved a 76.9% usable rate but only a 20.2% excellent rate.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Artificial Intelligence arXiv:2605.17637 (cs) [Submitted on 17 May 2026] Title:WebGameBench: Requirement-to-Application Evaluation for Coding Agents via Browser-Native Games Authors:Wenyu Zhang, Guoliang You, Tianlun, Haotian Zhao, Tianshu Zhu, Haoran Wang, Xiaoxuan Tang, Mingyang Dai, Jingnan Gu, Daxiang Dong, Jianmin Wu View a PDF of the paper titled WebGameBench: Requirement-to-Application Evaluation for Coding Agents via Browser-Native Games, by Wenyu Zhang and 10 other authors View PDF HTML (experimental) Abstract:Coding agents are increasingly used as application builders, yet many evaluations still focus on source code, repository-level tests, or intermediate traces rather than the delivered application.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.