Writing High-Performance Kernels in TileLang, from GEMM to MLA
TileLang is a programming language designed for writing high-performance GPU kernels, positioned between Triton and CUTLASS in terms of control and complexity. It allows developers to explicitly manage shared memory and pipeline stages while benefiting from compiler optimizations. The article discusses the mental model behind TileLang and provides a practical example of writing a GEMM kernel.
- ▪TileLang offers a balance between ease of use and control for GPU kernel development.
- ▪Developers can explicitly allocate shared memory and manage thread operations in TileLang.
- ▪The article includes a practical example of creating a GEMM kernel using TileLang.
Opening excerpt (first ~120 words) tap to expand
try { if(localStorage) { let currentUser = localStorage.getItem('current_user'); if (currentUser) { currentUser = JSON.parse(currentUser); if (currentUser.id === 3815847) { document.getElementById('article-show-container').classList.add('current-user-is-article-author'); } } } } catch (e) { console.error(e); } Atlas Cloud Posted on May 26 Writing High-Performance Kernels in TileLang, from GEMM to MLA #deeplearning #llm #performance #python If you write GPU kernels, you live somewhere on a spectrum. At one end is Triton: quick to write, but the compiler makes most of the layout and shared-memory decisions for you. At the other end is CUTLASS / CuTe: total control, at the cost of a lot of template machinery. TileLang sits in the middle.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at DEV.to (Top).