Regularized Muon induces a damped Hamiltonian flow on probability measures over matrix parameters, yielding exponential convergence under gradient dominance assumptions.
arXiv preprint arXiv:2202.07052 , year=
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 5roles
background 2polarities
background 2representative citing papers
Momentum in Muon functions as a spectral filter on signal-plus-perturbation gradients, enlarging the gap to stabilize singular subspaces before orthogonalization and outperforming the reverse order.
Proposes equivariant optimizer updates matched to layer symmetries for embeddings, SwiGLU MLPs, and MoE routers, with reported gains in validation loss and training stability on several language model architectures.
Pion is an optimizer that preserves the singular values of weight matrices in LLM training by applying orthogonal equivalence transformations.
Proposes low-rank orthogonalization and derives low-rank Muon and MSGD variants that outperform standard Muon on GPT-2 and LLaMA pretraining while providing iteration complexity bounds.
citing papers explorer
-
Move on Muon : A Hamiltonian probability gradient flow perspective of Muon optimizer
Regularized Muon induces a damped Hamiltonian flow on probability measures over matrix parameters, yielding exponential convergence under gradient dominance assumptions.
-
Denoise First, Orthogonalize Later: Understanding Momentum in Muon via Spectral Filtering
Momentum in Muon functions as a spectral filter on signal-plus-perturbation gradients, enlarging the gap to stabilize singular subspaces before orthogonalization and outperforming the reverse order.
-
Symmetry-Compatible Principle for Optimizer Design: Embeddings, LM Heads, SwiGLU MLPs, and MoE Routers
Proposes equivariant optimizer updates matched to layer symmetries for embeddings, SwiGLU MLPs, and MoE routers, with reported gains in validation loss and training stability on several language model architectures.
-
Pion: A Spectrum-Preserving Optimizer via Orthogonal Equivalence Transformation
Pion is an optimizer that preserves the singular values of weight matrices in LLM training by applying orthogonal equivalence transformations.
-
Low-rank Orthogonalization for Large-scale Matrix Optimization with Applications to Foundation Model Training
Proposes low-rank orthogonalization and derives low-rank Muon and MSGD variants that outperform standard Muon on GPT-2 and LLaMA pretraining while providing iteration complexity bounds.