Insights on muon from simple quadratics.arXiv preprint arXiv:2602.11948, 2026

Antoine Gonon, Andreea-Alexandra Mu¸ sat, Nicolas Boumal · 2026 · arXiv 2602.11948

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

Phases of Muon: When Muon Eclipses SignSGD

math.OC · 2026-05-10 · unverdicted · novelty 7.0

On power-law covariance least squares problems, SignSVD (Muon) and SignSGD (Adam proxy) show three phases of relative performance depending on data exponent α and target exponent β.

Sharp Capacity Scaling of Spectral Optimizers in Learning Associative Memory

cs.LG · 2026-03-27 · unverdicted · novelty 7.0

Muon achieves higher storage capacity than SGD and matches Newton's method in one-step recovery rates for associative memory under power-law distributions, while saturating at larger critical batch sizes and showing faster initial multi-step dynamics.

Scale-Invariant Neural Network Optimization: Norm Geometry and Heavy-Tailed Noise

math.OC · 2026-05-18 · unverdicted · novelty 6.0 · 2 refs

Establishes matching Ω and O(min{m,n} ε^-(3p-2)/(p-1)) bounds for scale-invariant spectral-norm methods under heavy-tailed noise, plus an improved O(min{m,n} ε^-(5p-3)/(2p-2)) rate via transported Scion under Hessian Lipschitz continuity.

Symmetry-Compatible Principle for Optimizer Design: Embeddings, LM Heads, SwiGLU MLPs, and MoE Routers

math.OC · 2026-05-18 · unverdicted · novelty 6.0 · 2 refs

Proposes equivariant optimizer updates matched to layer symmetries for embeddings, SwiGLU MLPs, and MoE routers, with reported gains in validation loss and training stability on several language model architectures.

Zeta: Dual Whitening for Matrix Optimization via Coordinate-Adaptive Preconditioning

cs.LG · 2026-06-12 · unverdicted · novelty 5.0

Zeta applies coordinate whitening followed by spectral whitening in a fixed order to reduce orthogonalization error in matrix optimization for neural networks.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Sharp Capacity Scaling of Spectral Optimizers in Learning Associative Memory cs.LG · 2026-03-27 · unverdicted · none · ref 16
Muon achieves higher storage capacity than SGD and matches Newton's method in one-step recovery rates for associative memory under power-law distributions, while saturating at larger critical batch sizes and showing faster initial multi-step dynamics.
Zeta: Dual Whitening for Matrix Optimization via Coordinate-Adaptive Preconditioning cs.LG · 2026-06-12 · unverdicted · none · ref 15
Zeta applies coordinate whitening followed by spectral whitening in a fixed order to reduce orthogonalization error in matrix optimization for neural networks.

Insights on muon from simple quadratics.arXiv preprint arXiv:2602.11948, 2026

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer