FlashLib: Bringing Flash Magic to Classical Machine Learning Operators
FlashLib is a new GPU library designed for classical machine learning operators, optimized for modern hardware and workloads. It demonstrates significant performance improvements over existing solutions, achieving up to 208 times faster execution for certain algorithms. The library aims to enhance the efficiency of AI systems by integrating classical ML operators into real-time processing workflows.
- ▪FlashLib offers substantial speed improvements over cuML on Hopper GPUs, with KMeans running up to 26 times faster.
- ▪The library features a predictive API that estimates runtime and memory usage for workloads in approximately 5 microseconds.
- ▪FlashLib supports multi-GPU execution and employs heuristic kernel selection to minimize autotuning delays.
Opening excerpt (first ~120 words) tap to expand
FlashLib: Bringing Flash Magic to Classical Machine Learning Operators Shuo Yang1, Haocheng Xi1, Yilong Zhao1, Qiuyang Mang1, Zhe Wang2, Shanlin Sun2, Kurt Keutzer1, Joseph E. Gonzalez1, Song Han3, Chenfeng Xu4,*, Ion Stoica1,* 1UC Berkeley · 2UC Irvine · 3MIT · 4UT Austin · *Co-advising Code: github.com/FlashML-org/flashlib 26× KMeans 19× KNN 208× TruncatedSVD 47× PCA 7× UMAP 40× HDBSCAN 147× t-SNE (exact) 49× MultinomialNB Introducing FlashLib — a GPU library for classical machine learning operators on modern hardwares, rebuilt for today's ML workloads and emerging agentic AI systems.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at Github.