PyTorch 2.12 Release
The release of PyTorch 2.12 introduces significant performance improvements and new features. Key enhancements include a batched linalg.eigh on CUDA that is up to 100x faster and a new device-agnostic Graph API for unified graph capture and replay. This version continues to evolve PyTorch into a versatile platform for production training and inference across various hardware.
- ▪PyTorch 2.12 features a batched linalg.eigh on CUDA that is up to 100x faster due to an updated cuSolver backend.
- ▪The new torch.accelerator.Graph API unifies graph capture and replay across multiple backends.
- ▪Adagrad now supports a fused variant, reducing kernel launch overhead and improving performance.
Opening excerpt (first ~120 words) tap to expand
Featured projects We are excited to announce the release of PyTorch® 2.12 (release notes)! The PyTorch 2.12 release features the following changes: Batched linalg.eigh on CUDA is up to 100x faster due to updated cuSolver backend selection New torch.accelerator.Graph API unifies graph capture and replay across CUDA, XPU, and out-of-tree backends torch.export.save now supports Microscaling (MX) quantization formats, enabling full export of aggressively compressed models Adagrad now supports fused=True, joining Adam, AdamW, and SGD with a single-kernel optimizer implementation torch.cond control flow can now be captured and replayed inside CUDA Graphs ROCm users gain expandable memory segments, rocSHMEM symmetric memory collectives, and FlexAttention pipelining This release is composed of…
Excerpt limited to ~120 words for fair-use compliance. The full article is at Pytorch.