Matrix Multiplications on GPUs Run Faster When Given "Predictable" Data

Horace He· May 23, 2026 · 12:11 PM UTC ·8 min read · 0 reactions · 0 comments · 24 views

TL;DR · WeSearch summary

Matrix multiplications on GPUs exhibit varying performance based on the data provided. Research indicates that the content of the matrices can influence runtime due to power consumption dynamics in semiconductors. This finding challenges traditional assumptions about matrix multiplication performance consistency.

Key facts

▪Matrix multiplications on GPUs can run faster when given predictable data.
▪CUTLASS outperformed CuBLAS by 10% in initial benchmarks but showed inconsistent results when run in Python.
▪The performance of matrix multiplications is affected by dynamic power consumption in semiconductors.

Original article

Hacker News (Newest) · Horace He

Read full at Hacker News (Newest) →

Opening excerpt (first ~120 words) tap to expand

Strangely, Matrix Multiplications on GPUs Run Faster When Given "Predictable" Data! [short]Great minds discuss flops per watt.Horace HeApr 29, 20241632110ShareIt’s 2022. I check out this cool new project, CUTLASS, with very fast matmuls. I take a large matmul, 8192 x 8192 x 8192, and benchmark it in PyTorch, which calls CuBLAS.python mm_bench.py > CuBLAS: 258 TeraflopsNot bad, 83% flop utilization. Now let’s check out Cutlass’s performance using their profiler../cutlass_profiler --operation=Gemm --m=8192 --n=8192 --k=8192 > CUTLASS: 288 Teraflops!!! 10% higher perf? That’s incredible.

…

Excerpt limited to ~120 words for fair-use compliance. The full article is at Hacker News (Newest).

Anonymous · no account needed

Discussion

0 comments

Matrix Multiplications on GPUs Run Faster When Given "Predictable" Data

Discussion

More from Hacker News (Newest)