From SGD to Muon: Adaptive Optimization via Schatten-p Norms

May 20, 2026 · 4:00 AM UTC ·3 min read · 0 reactions · 0 comments · 14 views

#artificial intelligence #optimization #deep learning

⚡ TL;DR · AI summary

The article introduces a new adaptive optimization framework that utilizes Schatten-p norms for deep neural networks. This framework dynamically selects optimal update geometries based on runtime data, improving upon traditional fixed geometries. The proposed method demonstrates competitive performance against established optimizers like Muon and AdamW across various training scenarios.

Key facts

▪Modern optimizers impose matrix-wise geometry constraints on updates.
▪The new framework allows for dynamic selection of proxy-optimal update geometries.
▪It achieves only a 3% runtime overhead on highly optimized baselines.

Original article

arXiv cs.AI

Read full at arXiv cs.AI →

Opening excerpt (first ~120 words) tap to expand

Computer Science > Artificial Intelligence arXiv:2605.19781 (cs) [Submitted on 19 May 2026] Title:From SGD to Muon: Adaptive Optimization via Schatten-p Norms Authors:Thomas Massena (IRIT, DTIPG - SNCF, UT3), Corentin Friedrich, Mathieu Serrurier (IRIT) View a PDF of the paper titled From SGD to Muon: Adaptive Optimization via Schatten-p Norms, by Thomas Massena (IRIT and 4 other authors View PDF Abstract:Modern optimizers, like Muon, impose matrix-wise geometry constraints on their updates. These matrix-wise constraints can be unified under Linear Minimization Oracle (LMO) theory. However, all current methods impose fixed LMO geometries for the update rules, chosen by-design or empirically, which are not necessarily optimal according to the problem's geometry.

…

Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.

Anonymous · no account needed

Discussion

0 comments

From SGD to Muon: Adaptive Optimization via Schatten-p Norms

Discussion

More from arXiv cs.AI