Running Large-Scale GPU Workloads on Kubernetes with Slurm

May 24, 2026 · 2:25 PM UTC ·10 min read · 0 reactions · 0 comments · 29 views

#gpu #kubernetes #slurm #cloud #technology

Running Large-Scale GPU Workloads on Kubernetes with Slurm

TL;DR · WeSearch summary

Slinky, developed by SchedMD and now part of NVIDIA, facilitates the integration of Slurm cluster management with Kubernetes. This integration allows for efficient management of large-scale GPU workloads, supporting advanced NVIDIA architectures. Production deployments have shown that Slinky can scale to over 8,000 GPUs while maintaining performance parity with traditional Slurm clusters.

Key facts

▪Slinky enables native Slurm cluster management on Kubernetes using Custom Resource Definitions.
▪It supports automated GPU management and topology-aware scheduling for advanced NVIDIA architectures.
▪Production deployments at NVIDIA have demonstrated Slinky's ability to scale to over 8,000 GPUs.

Original article

NVIDIA Technical Blog

Read full at NVIDIA Technical Blog →

Opening excerpt (first ~120 words) tap to expand

Data Center / Cloud English中文 Running Large-Scale GPU Workloads on Kubernetes with Slurm Apr 09, 2026 By Anton Polyakov, Fagani Hajizada, Marlow Warnicke and Skyler Malinowski Like Discuss (0) L T F R E AI-Generated Summary Like Dislike Slinky, developed by SchedMD (now part of NVIDIA), enables native Slurm cluster management on Kubernetes by representing all Slurm daemons as Kubernetes Custom Resource Definitions, supporting full Slurm lifecycle orchestration and high availability without relying on Slurm's native HA.Integration with the NVIDIA GPU Operator and DRA/ComputeDomains allows automated GPU management, topology-aware multinode scheduling, and per-job GPU monitoring, supporting advanced NVIDIA architectures like GB200 NVL72 with dynamic Internode Memory Exchange and topology…

Excerpt limited to ~120 words for fair-use compliance. The full article is at NVIDIA Technical Blog.

Anonymous · no account needed

Discussion

0 comments

Running Large-Scale GPU Workloads on Kubernetes with Slurm

Discussion

More from NVIDIA Technical Blog