On power-law covariance least squares problems, SignSVD (Muon) and SignSGD (Adam proxy) show three phases of relative performance depending on data exponent α and target exponent β.
Adaptive gradient methods at the edge of stability
5 Pith papers cite this work. Polarity classification is still indexing.
years
2026 5verdicts
UNVERDICTED 5representative citing papers
A two-level DMFT predicts width-consistent outlier escape and hyperparameter transfer under μP in deep networks, with bulk restructuring dominating for tasks with many outputs.
Rod flow models for Adam and related optimizers track discrete iterates at the edge of stability more accurately than standard stable flows across tested ML architectures.
Zeroth-order methods achieve mean-square stability when the step size satisfies a condition involving the entire Hessian spectrum, with full-batch ZO optimizers operating at the edge of stability and large steps regularizing the Hessian trace.
Momentum SGD exhibits two distinct EoSS regimes for batch sharpness, stabilizing at 2(1-β)/η for small batches and 2(1+β)/η for large batches, aligning with linear stability thresholds.
citing papers explorer
-
Phases of Muon: When Muon Eclipses SignSGD
On power-law covariance least squares problems, SignSVD (Muon) and SignSGD (Adam proxy) show three phases of relative performance depending on data exponent α and target exponent β.
-
Spectral Dynamics in Deep Networks: Feature Learning, Outlier Escape, and Learning Rate Transfer
A two-level DMFT predicts width-consistent outlier escape and hyperparameter transfer under μP in deep networks, with bulk restructuring dominating for tasks with many outputs.
-
A Rod Flow Model for Adam at the Edge of Stability
Rod flow models for Adam and related optimizers track discrete iterates at the edge of stability more accurately than standard stable flows across tested ML architectures.
-
Zeroth-Order Optimization at the Edge of Stability
Zeroth-order methods achieve mean-square stability when the step size satisfies a condition involving the entire Hessian spectrum, with full-batch ZO optimizers operating at the edge of stability and large steps regularizing the Hessian trace.
-
Momentum Further Constrains Sharpness at the Edge of Stochastic Stability
Momentum SGD exhibits two distinct EoSS regimes for batch sharpness, stabilizing at 2(1-β)/η for small batches and 2(1+β)/η for large batches, aligning with linear stability thresholds.