Characterization of machine learning compilers for LLM inference on NVIDIA GPUs
The article evaluates machine learning compilers for LLM inference on NVIDIA GPUs, focusing on the trade-offs between performance, productivity, and portability. It analyzes four prominent MLC tools and their effectiveness with PyTorch-based models. Findings indicate that while architecture-specific tools can enhance performance, they may not be compatible with all models, highlighting the importance of choosing the right compiler based on specific needs.
- ▪The P3 problem in AI inference involves balancing performance, developer productivity, and device portability.
- ▪The study assesses the deployment trade-offs of PyTorch-based LLMs using tools like torch.compile, TensorRT, XLA, and ONNX Runtime.
- ▪Results show that Ahead-Of-Time compilation requires architecture-specific tools for peak performance, while Just-In-Time solutions offer flexibility but inconsistent acceleration.
Opening excerpt (first ~120 words) tap to expand
Home The Journal of Supercomputing Article Characterization of machine learning compilers for LLM inference on NVIDIA GPUs Open access Published: 15 May 2026 Volume 82, article number 420, (2026) Cite this article You have full access to this open access article Download PDF Save article View saved research The Journal of Supercomputing Aims and scope Submit manuscript Characterization of machine learning compilers for LLM inference on NVIDIA GPUs Download PDF Alejandro Carmona-Martínez1,2, Gregorio Bernabé1 na1 & José M. García1 313 Accesses Explore all metrics AbstractAI inference is conflicted between Performance, developer Productivity, and device Portability–the P3 problem.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at Springer.