Adaptive gradient methods at the edge of stability

Jeremy M Cohen, Behrooz Ghorbani, Shankar Krishnan, Naman Agarwal, Sourabh Medapati, Michal Badura, Daniel Suo, David Cardoze, Zachary Nado, George E Dahl, et al · 2022 · arXiv 2207.14484

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

representative citing papers

Phases of Muon: When Muon Eclipses SignSGD

math.OC · 2026-05-10 · unverdicted · novelty 7.0

On power-law covariance least squares problems, SignSVD (Muon) and SignSGD (Adam proxy) show three phases of relative performance depending on data exponent α and target exponent β.

Spectral Dynamics in Deep Networks: Feature Learning, Outlier Escape, and Learning Rate Transfer

cond-mat.dis-nn · 2026-05-08 · unverdicted · novelty 7.0

A two-level DMFT predicts width-consistent outlier escape and hyperparameter transfer under μP in deep networks, with bulk restructuring dominating for tasks with many outputs.

A Rod Flow Model for Adam at the Edge of Stability

cs.LG · 2026-05-07 · unverdicted · novelty 7.0

Rod flow models for Adam and related optimizers track discrete iterates at the edge of stability more accurately than standard stable flows across tested ML architectures.

Zeroth-Order Optimization at the Edge of Stability

cs.LG · 2026-04-16 · unverdicted · novelty 7.0

Zeroth-order methods achieve mean-square stability when the step size satisfies a condition involving the entire Hessian spectrum, with full-batch ZO optimizers operating at the edge of stability and large steps regularizing the Hessian trace.

Momentum Further Constrains Sharpness at the Edge of Stochastic Stability

cs.LG · 2026-04-15 · unverdicted · novelty 7.0

Momentum SGD exhibits two distinct EoSS regimes for batch sharpness, stabilizing at 2(1-β)/η for small batches and 2(1+β)/η for large batches, aligning with linear stability thresholds.

citing papers explorer

Showing 5 of 5 citing papers.

Phases of Muon: When Muon Eclipses SignSGD math.OC · 2026-05-10 · unverdicted · none · ref 16
On power-law covariance least squares problems, SignSVD (Muon) and SignSGD (Adam proxy) show three phases of relative performance depending on data exponent α and target exponent β.
Spectral Dynamics in Deep Networks: Feature Learning, Outlier Escape, and Learning Rate Transfer cond-mat.dis-nn · 2026-05-08 · unverdicted · none · ref 26
A two-level DMFT predicts width-consistent outlier escape and hyperparameter transfer under μP in deep networks, with bulk restructuring dominating for tasks with many outputs.
A Rod Flow Model for Adam at the Edge of Stability cs.LG · 2026-05-07 · unverdicted · none · ref 10
Rod flow models for Adam and related optimizers track discrete iterates at the edge of stability more accurately than standard stable flows across tested ML architectures.
Zeroth-Order Optimization at the Edge of Stability cs.LG · 2026-04-16 · unverdicted · none · ref 2
Zeroth-order methods achieve mean-square stability when the step size satisfies a condition involving the entire Hessian spectrum, with full-batch ZO optimizers operating at the edge of stability and large steps regularizing the Hessian trace.
Momentum Further Constrains Sharpness at the Edge of Stochastic Stability cs.LG · 2026-04-15 · unverdicted · none · ref 7
Momentum SGD exhibits two distinct EoSS regimes for batch sharpness, stabilizing at 2(1-β)/η for small batches and 2(1+β)/η for large batches, aligning with linear stability thresholds.

Adaptive gradient methods at the edge of stability

fields

years

verdicts

representative citing papers

citing papers explorer