FastKernels: Benchmarking GPU Kernel Generation in Production

May 25, 2026 · 4:00 AM UTC ·3 min read · 0 reactions · 0 comments · 32 views

#machine learning #gpu #artificial intelligence

TL;DR · WeSearch summary

FastKernels introduces a new benchmark for GPU kernel generation that addresses the misalignment between existing benchmarks and production environments. The benchmark includes a minimal set of architectures that cover a vast majority of HuggingFace Transformers. Evaluations show that current kernel agents struggle to achieve significant speedup over production baselines, highlighting the need for better alignment in benchmarking.

Key facts

▪FastKernels is designed to improve GPU kernel generation by aligning benchmarks with production inference frameworks.
▪The benchmark includes 46 representative architectures that cover 96.2% of HuggingFace Transformers.
▪Current state-of-the-art kernel agents achieve only modest speedups over production baselines, indicating a critical bottleneck.

Original article

arXiv cs.AI

Read full at arXiv cs.AI →

Opening excerpt (first ~120 words) tap to expand

Computer Science > Machine Learning arXiv:2605.23215 (cs) [Submitted on 22 May 2026] Title:FastKernels: Benchmarking GPU Kernel Generation in Production Authors:Gabriele Oliaro, Yichao Fu, May Jiang, Owen Lu, Junli Wang, Zhihao Jia, Hao Zhang, Samyam Rajbhandari View a PDF of the paper titled FastKernels: Benchmarking GPU Kernel Generation in Production, by Gabriele Oliaro and 7 other authors View PDF HTML (experimental) Abstract:LLM-based agents for GPU kernel generation are advancing rapidly, yet their progress is fundamentally constrained by the benchmarks they optimize against.

…

Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.

Anonymous · no account needed

Discussion

0 comments

FastKernels: Benchmarking GPU Kernel Generation in Production

Discussion

More from arXiv cs.AI