WeSearch

Coda: Rewriting Transformer Blocks as GEMM-Epilogue Programs

·3 min read · 0 reactions · 0 comments · 10 views
#machine learning#transformers#gpu#optimization
Coda: Rewriting Transformer Blocks as GEMM-Epilogue Programs
⚡ TL;DR · AI summary

The paper introduces CODA, a new GPU kernel abstraction designed to optimize Transformer block computations. By reparameterizing these computations as GEMM-plus-epilogue programs, CODA aims to reduce memory-bound bottlenecks in training systems. The results indicate that this approach can enhance both productivity and efficiency in machine learning frameworks.

Key facts
Original article
arXiv.org
Read full at arXiv.org →
Opening excerpt (first ~120 words) tap to expand

Computer Science > Machine Learning arXiv:2605.19269 (cs) [Submitted on 19 May 2026 (v1), last revised 20 May 2026 (this version, v2)] Title:CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programs Authors:Han Guo, Jack Zhang, Arjun Menon, Driss Guessous, Vijay Thakkar, Yoon Kim, Tri Dao View a PDF of the paper titled CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programs, by Han Guo and 6 other authors View PDF HTML (experimental) Abstract:Transformer training systems are built around dense linear algebra, yet a nontrivial fraction of end-to-end time is spent on surrounding memory-bound operators.

Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv.org.

Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Threads WhatsApp Bluesky Mastodon Email

Discussion

0 comments

More from arXiv.org