Characterization of machine learning compilers for LLM inference on NVIDIA GPUs

May 24, 2026 · 1:59 AM UTC ·43 min read · 0 reactions · 0 comments · 43 views

#machine learning #nvidia #artificial intelligence #compilers #deep learning #Alejandro Carmona-Martínez #Gregorio Bernabé #José M. García #NVIDIA #PyTorch

Characterization of machine learning compilers for LLM inference on NVIDIA GPUs

TL;DR · WeSearch summary

The article evaluates machine learning compilers for LLM inference on NVIDIA GPUs, focusing on the trade-offs between performance, productivity, and portability. It analyzes four prominent MLC tools and their effectiveness with PyTorch-based models. Findings indicate that while architecture-specific tools can enhance performance, they may not be compatible with all models, highlighting the importance of choosing the right compiler based on specific needs.

Key facts

▪The P3 problem in AI inference involves balancing performance, developer productivity, and device portability.
▪The study assesses the deployment trade-offs of PyTorch-based LLMs using tools like torch.compile, TensorRT, XLA, and ONNX Runtime.
▪Results show that Ahead-Of-Time compilation requires architecture-specific tools for peak performance, while Just-In-Time solutions offer flexibility but inconsistent acceleration.

Original article

Springer

Read full at Springer →

Opening excerpt (first ~120 words) tap to expand

Home The Journal of Supercomputing Article Characterization of machine learning compilers for LLM inference on NVIDIA GPUs Open access Published: 15 May 2026 Volume 82, article number 420, (2026) Cite this article You have full access to this open access article Download PDF Save article View saved research The Journal of Supercomputing Aims and scope Submit manuscript Characterization of machine learning compilers for LLM inference on NVIDIA GPUs Download PDF Alejandro Carmona-Martínez1,2, Gregorio Bernabé1 na1 & José M. García1 313 Accesses Explore all metrics AbstractAI inference is conflicted between Performance, developer Productivity, and device Portability–the P3 problem.

…

Excerpt limited to ~120 words for fair-use compliance. The full article is at Springer.

Anonymous · no account needed

Discussion

0 comments

Characterization of machine learning compilers for LLM inference on NVIDIA GPUs

Discussion

More from Springer