BenchBench

Rohit Krishnan· May 29, 2026 · 12:15 PM UTC ·4 min read · 0 reactions · 0 comments · 15 views

#artificial intelligence #machine learning #benchmarking

⚡ TL;DR · AI summary

BenchBench is a new benchmark designed to evaluate how well AI models can create benchmarks for themselves. GPT 5.2 emerged as the only successful model in this task, while others struggled to produce effective benchmarks. The initiative highlights the distinction between models' abilities as creators versus solvers, revealing interesting insights into their capabilities.

Key facts

▪BenchBench evaluates AI models on their ability to create effective benchmarks.
▪GPT 5.2 was the only model that successfully created a useful benchmark.
▪Other models, including GPT 5.5 and Opus 4.6, struggled to produce challenging benchmarks.

Original article

Hacker News (Newest) · Rohit Krishnan

Read full at Hacker News (Newest) →

Opening excerpt (first ~120 words) tap to expand

Introducing BenchBenchRohit KrishnanMay 25, 20261552ShareTL;DR: presenting the ultimate benchmark, getting models to create benchmarks for each other, and GPT 5.2 is the current (only) winnerModels are getting much much better at almost every benchmark we’ve thrown at them. Creating benchmarks is now a job relegated to the smartest and best of us. Even the newest and best ones seem to get saturated in record time. What this means is that increasingly the hardest job is to create a good enough AI benchmark.So I took the obvious next step. Created a benchmark to see how well the models can create a benchmark.

…

Excerpt limited to ~120 words for fair-use compliance. The full article is at Hacker News (Newest).

Anonymous · no account needed

Discussion

0 comments

BenchBench

Discussion

More from Hacker News (Newest)