arXiv preprint arXiv:2202.07052 , year=

· 2022 · arXiv 2202.07052

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Move on Muon : A Hamiltonian probability gradient flow perspective of Muon optimizer

stat.ML · 2026-05-22 · unverdicted · novelty 7.0

Regularized Muon induces a damped Hamiltonian flow on probability measures over matrix parameters, yielding exponential convergence under gradient dominance assumptions.

Denoise First, Orthogonalize Later: Understanding Momentum in Muon via Spectral Filtering

cs.LG · 2026-06-02 · unverdicted · novelty 6.0

Momentum in Muon functions as a spectral filter on signal-plus-perturbation gradients, enlarging the gap to stabilize singular subspaces before orthogonalization and outperforming the reverse order.

Symmetry-Compatible Principle for Optimizer Design: Embeddings, LM Heads, SwiGLU MLPs, and MoE Routers

math.OC · 2026-05-18 · unverdicted · novelty 6.0 · 2 refs

Proposes equivariant optimizer updates matched to layer symmetries for embeddings, SwiGLU MLPs, and MoE routers, with reported gains in validation loss and training stability on several language model architectures.

Pion: A Spectrum-Preserving Optimizer via Orthogonal Equivalence Transformation

cs.LG · 2026-05-12 · unverdicted · novelty 5.0

Pion is an optimizer that preserves the singular values of weight matrices in LLM training by applying orthogonal equivalence transformations.

Low-rank Orthogonalization for Large-scale Matrix Optimization with Applications to Foundation Model Training

cs.LG · 2025-09-15 · unverdicted · novelty 5.0

Proposes low-rank orthogonalization and derives low-rank Muon and MSGD variants that outperform standard Muon on GPT-2 and LLaMA pretraining while providing iteration complexity bounds.

citing papers explorer

Showing 5 of 5 citing papers.

Move on Muon : A Hamiltonian probability gradient flow perspective of Muon optimizer stat.ML · 2026-05-22 · unverdicted · none · ref 16
Regularized Muon induces a damped Hamiltonian flow on probability measures over matrix parameters, yielding exponential convergence under gradient dominance assumptions.
Denoise First, Orthogonalize Later: Understanding Momentum in Muon via Spectral Filtering cs.LG · 2026-06-02 · unverdicted · none · ref 55
Momentum in Muon functions as a spectral filter on signal-plus-perturbation gradients, enlarging the gap to stabilize singular subspaces before orthogonalization and outperforming the reverse order.
Symmetry-Compatible Principle for Optimizer Design: Embeddings, LM Heads, SwiGLU MLPs, and MoE Routers math.OC · 2026-05-18 · unverdicted · none · ref 148 · 2 links
Proposes equivariant optimizer updates matched to layer symmetries for embeddings, SwiGLU MLPs, and MoE routers, with reported gains in validation loss and training stability on several language model architectures.
Pion: A Spectrum-Preserving Optimizer via Orthogonal Equivalence Transformation cs.LG · 2026-05-12 · unverdicted · none · ref 77
Pion is an optimizer that preserves the singular values of weight matrices in LLM training by applying orthogonal equivalence transformations.
Low-rank Orthogonalization for Large-scale Matrix Optimization with Applications to Foundation Model Training cs.LG · 2025-09-15 · unverdicted · none · ref 51
Proposes low-rank orthogonalization and derives low-rank Muon and MSGD variants that outperform standard Muon on GPT-2 and LLaMA pretraining while providing iteration complexity bounds.

arXiv preprint arXiv:2202.07052 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer