WeSearch

WebGameBench: Requirement-to-Application Evaluation for Coding Agents via Browser-Native Games

·3 min read · 0 reactions · 0 comments · 13 views
#artificial intelligence#game development#benchmarking
WebGameBench: Requirement-to-Application Evaluation for Coding Agents via Browser-Native Games
⚡ TL;DR · AI summary

WebGameBench is a new benchmark designed to evaluate coding agents' ability to create browser-native games from structured specifications. The benchmark assesses the usability of the generated games through a runtime evaluator that interacts with the applications in real browsers. Initial results indicate that while some coding agents can produce playable games, there remains a significant gap in achieving high-quality outputs.

Key facts
Original article
arXiv cs.AI
Read full at arXiv cs.AI →
Opening excerpt (first ~120 words) tap to expand

Computer Science > Artificial Intelligence arXiv:2605.17637 (cs) [Submitted on 17 May 2026] Title:WebGameBench: Requirement-to-Application Evaluation for Coding Agents via Browser-Native Games Authors:Wenyu Zhang, Guoliang You, Tianlun, Haotian Zhao, Tianshu Zhu, Haoran Wang, Xiaoxuan Tang, Mingyang Dai, Jingnan Gu, Daxiang Dong, Jianmin Wu View a PDF of the paper titled WebGameBench: Requirement-to-Application Evaluation for Coding Agents via Browser-Native Games, by Wenyu Zhang and 10 other authors View PDF HTML (experimental) Abstract:Coding agents are increasingly used as application builders, yet many evaluations still focus on source code, repository-level tests, or intermediate traces rather than the delivered application.

Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.

Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Threads WhatsApp Bluesky Mastodon Email

Discussion

0 comments

More from arXiv cs.AI