Leaving Performance on the Table
The article discusses optimizing workloads using LLVM and compiler flags. It highlights the importance of providing compilers with information about likely execution paths to improve performance. Two primary optimization methods are described: instrumented and statistical profiling, with a focus on the benefits of each approach.
- ▪Optimizing binaries can significantly enhance performance beyond basic compiler flags like -O3.
- ▪Instrumented profiling involves running a workload with an instrumented binary to capture execution paths for optimization.
- ▪Using LLVM's BOLT as a post-link optimizer can further improve performance, achieving a nearly 1.5x reduction in workload time.
Opening excerpt (first ~120 words) tap to expand
I have been working with LLVM at $DAYJOB$, and I have gotten to become familiar with the benefits of optimizing your workloads. I tend to think of optimizing my binaries as thinking about whether I have attached -O3 to my compiler flags; maybe if I’m particularly advanced that day I’ll sprinkle in some -flto (link time optimziation) and call it a day. Turns out though that’s leaving lots of performance on the table. Compilers work under the assumption that every branch is is equally taken, unless you are hints like [[likely]] (ref). If we can feed the compilers more information about the likely path that our workloads often take, then they can produce much more performant code. There are two primary ways to optimize a binary: instrumented or statistical.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at Farid Zakaria’s Blog.