WeSearch

Writing High-Performance Kernels in TileLang, from GEMM to MLA

·13 min read · 0 reactions · 0 comments · 11 views
#gpu#programming#performance#deep learning#python
Writing High-Performance Kernels in TileLang, from GEMM to MLA
⚡ TL;DR · AI summary

TileLang is a programming language designed for writing high-performance GPU kernels, positioned between Triton and CUTLASS in terms of control and complexity. It allows developers to explicitly manage shared memory and pipeline stages while benefiting from compiler optimizations. The article discusses the mental model behind TileLang and provides a practical example of writing a GEMM kernel.

Key facts
Original article
DEV.to (Top)
Read full at DEV.to (Top) →
Opening excerpt (first ~120 words) tap to expand

try { if(localStorage) { let currentUser = localStorage.getItem('current_user'); if (currentUser) { currentUser = JSON.parse(currentUser); if (currentUser.id === 3815847) { document.getElementById('article-show-container').classList.add('current-user-is-article-author'); } } } } catch (e) { console.error(e); } Atlas Cloud Posted on May 26 Writing High-Performance Kernels in TileLang, from GEMM to MLA #deeplearning #llm #performance #python If you write GPU kernels, you live somewhere on a spectrum. At one end is Triton: quick to write, but the compiler makes most of the layout and shared-memory decisions for you. At the other end is CUTLASS / CuTe: total control, at the cost of a lot of template machinery. TileLang sits in the middle.

Excerpt limited to ~120 words for fair-use compliance. The full article is at DEV.to (Top).

Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Threads WhatsApp Bluesky Mastodon Email

Discussion

0 comments

More from DEV.to (Top)