WeSearch

Running PyTorch Models on Apple Silicon GPUs with the ExecuTorch MLX Delegate

·4 min read · 0 reactions · 0 comments · 15 views
#technology#machine learning#apple silicon
⚡ TL;DR · AI summary

The ExecuTorch MLX Delegate has been introduced to enable optimized GPU-accelerated inference for PyTorch models on Apple Silicon Macs. This new backend integrates with the PyTorch 2 export stack and supports a variety of quantization options. Currently experimental, the MLX delegate significantly enhances performance for generative AI workloads compared to previous ExecuTorch options.

Key facts
Original article
Pytorch
Read full at Pytorch →
Opening excerpt (first ~120 words) tap to expand

Featured projects TL;DR: Introducing the ExecuTorch MLX Delegate The new MLX delegate enables optimized, GPU-accelerated inference for PyTorch models on Apple Silicon Macs, using Apple’s MLX framework. The delegate seamlessly integrates with the PyTorch 2 export stack and supports a wide range of quantization options (BF16, FP16, FP32, 2/4/8-bit affine, NVFP4). It supports various models, including dense transformers (Llama, Qwen, Gemma), sparse Mixture-of-Experts, and speech-to-text models (Whisper, Voxtral, Parakeet) for both offline and real-time transcription. Note: The MLX delegate is currently experimental. Apple Silicon has become a popular platform for running large language models locally.

Excerpt limited to ~120 words for fair-use compliance. The full article is at Pytorch.

Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Threads WhatsApp Bluesky Mastodon Email

Discussion

0 comments

More from Pytorch