WeSearch

Matrix Multiplications on GPUs Run Faster When Given "Predictable" Data

Horace He· ·8 min read · 0 reactions · 0 comments · 12 views
#technology#gpu#performance
Matrix Multiplications on GPUs Run Faster When Given "Predictable" Data
⚡ TL;DR · AI summary

Matrix multiplications on GPUs exhibit varying performance based on the data provided. Research indicates that the content of the matrices can influence runtime due to power consumption dynamics in semiconductors. This finding challenges traditional assumptions about matrix multiplication performance consistency.

Key facts
Original article
Hacker News (Newest) · Horace He
Read full at Hacker News (Newest) →
Opening excerpt (first ~120 words) tap to expand

Strangely, Matrix Multiplications on GPUs Run Faster When Given "Predictable" Data! [short]Great minds discuss flops per watt.Horace HeApr 29, 20241632110ShareIt’s 2022. I check out this cool new project, CUTLASS, with very fast matmuls. I take a large matmul, 8192 x 8192 x 8192, and benchmark it in PyTorch, which calls CuBLAS.python mm_bench.py > CuBLAS: 258 TeraflopsNot bad, 83% flop utilization. Now let’s check out Cutlass’s performance using their profiler../cutlass_profiler --operation=Gemm --m=8192 --n=8192 --k=8192 > CUTLASS: 288 Teraflops!!! 10% higher perf? That’s incredible.

Excerpt limited to ~120 words for fair-use compliance. The full article is at Hacker News (Newest).

Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Threads WhatsApp Bluesky Mastodon Email

Discussion

0 comments