WeSearch

A Case for Tracing Based DSL Kernel Languages

·18 min read · 0 reactions · 0 comments · 17 views
#gpu#programming#dsl
A Case for Tracing Based DSL Kernel Languages
⚡ TL;DR · AI summary

The article discusses the architectural differences between parsing and tracing kernel DSLs for NVIDIA GPU programming. It highlights the emergence of various Pythonic DSLs that aim to simplify the process of writing GPU kernels. The author argues in favor of a tracing-based approach over traditional parsing methods for better performance and flexibility.

Key facts
Original article
George's Blog
Read full at George's Blog →
Opening excerpt (first ~120 words) tap to expand

On the architectural divide between parsing and tracing kernel DSLs, and what tends to go wrong in each. The language for writing NVIDIA GPU kernels was always exclusively CUDA, but since Triton appeared, a wave of Pythonic DSLs has followed: CuTe-DSL, cuTile, Pallas, Gluon, Warp, and the more recent TileLang used in DeepSeek’s DeepGEMM. Most of these systems share the same goal of lowering a tile-oriented program into PTX or LLVM-IR, and are embedded in Python. The question is how to embed the DSL into Python. Triton and CuTe-DSL parse the source AST. Pallas runs the function under abstract values and traces the resulting operations.

Excerpt limited to ~120 words for fair-use compliance. The full article is at George's Blog.

Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Threads WhatsApp Bluesky Mastodon Email

Discussion

0 comments

More from George's Blog