WebGameBench: Requirement-to-Application Evaluation for Coding Agents via Browser-Native Games

May 19, 2026 · 4:00 AM UTC ·3 min read · 0 reactions · 0 comments · 13 views

#artificial intelligence #game development #benchmarking

⚡ TL;DR · AI summary

WebGameBench is a new benchmark designed to evaluate coding agents' ability to create browser-native games from structured specifications. The benchmark assesses the usability of the generated games through a runtime evaluator that interacts with the applications in real browsers. Initial results indicate that while some coding agents can produce playable games, there remains a significant gap in achieving high-quality outputs.

Key facts

▪WebGameBench evaluates coding agents on their ability to turn structured specifications into browser-accessible games.
▪The benchmark assigns usability labels of EXCELLENT, USABLE, or UNUSABLE based on human gameplay reviews.
▪The best configuration of coding agents achieved a 76.9% usable rate but only a 20.2% excellent rate.

Original article

arXiv cs.AI

Read full at arXiv cs.AI →

Opening excerpt (first ~120 words) tap to expand

Computer Science > Artificial Intelligence arXiv:2605.17637 (cs) [Submitted on 17 May 2026] Title:WebGameBench: Requirement-to-Application Evaluation for Coding Agents via Browser-Native Games Authors:Wenyu Zhang, Guoliang You, Tianlun, Haotian Zhao, Tianshu Zhu, Haoran Wang, Xiaoxuan Tang, Mingyang Dai, Jingnan Gu, Daxiang Dong, Jianmin Wu View a PDF of the paper titled WebGameBench: Requirement-to-Application Evaluation for Coding Agents via Browser-Native Games, by Wenyu Zhang and 10 other authors View PDF HTML (experimental) Abstract:Coding agents are increasingly used as application builders, yet many evaluations still focus on source code, repository-level tests, or intermediate traces rather than the delivered application.

…

Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.

Anonymous · no account needed

Discussion

0 comments

WebGameBench: Requirement-to-Application Evaluation for Coding Agents via Browser-Native Games

Discussion

More from arXiv cs.AI