Scalable Packed Layouts for Vector-Length-Agnostic ML Code Generation
The article discusses a new approach for enabling vector-length-agnostic (VLA) code generation in machine learning compilation. This method utilizes scalable vector instruction sets, allowing for performance improvements across various hardware configurations. The results show significant speedups compared to existing code generation methods, highlighting the effectiveness of scalable vectorization.
- ▪The approach integrates vector-length-aware packed data layouts into the MLIR/IREE compilation pipeline.
- ▪It achieves up to 1.45 times speedup over existing NEON-based code generation.
- ▪The generated code demonstrates performance portability across different hardware configurations.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Performance arXiv:2605.12445 (cs) [Submitted on 12 May 2026 (v1), last revised 18 May 2026 (this version, v2)] Title:Scalable Packed Layouts for Vector-Length-Agnostic ML Code Generation Authors:Ege Beysel, Maximilian Bartel, Jan Moritz Joseph View a PDF of the paper titled Scalable Packed Layouts for Vector-Length-Agnostic ML Code Generation, by Ege Beysel and 2 other authors View PDF HTML (experimental) Abstract:Scalable vector instruction sets such as Arm SVE enable vector-length-agnostic (VLA) execution, allowing a single implementation to adapt across hardware with different vector lengths. However, they complicate compiler code generation, as tiling and data layout decisions can no longer be fixed at compile time.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv.org.