Momentum Further Constrains Sharpness at the Edge of Stochastic Stability

· 2026 · cs.LG · arXiv 2604.14108

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

Recent work suggests that (stochastic) gradient descent self-organizes near an instability boundary, shaping both optimization and the solutions found. Momentum and mini-batch gradients are widely used in practical deep learning optimization, but it remains unclear whether they operate in a comparable regime of instability. We demonstrate that SGD with momentum exhibits an Edge of Stochastic Stability (EoSS)-like regime with batch-size-dependent behavior that cannot be explained by a single momentum-adjusted stability threshold. Batch Sharpness (the expected directional mini-batch curvature) stabilizes in two distinct regimes: at small batch sizes it converges to a lower plateau $2(1-\beta)/\eta$, reflecting amplification of stochastic fluctuations by momentum and favoring flatter regions than vanilla SGD; at large batch sizes it converges to a higher plateau $2(1+\beta)/\eta$, where momentum recovers its classical stabilizing effect and favors sharper regions consistent with full-batch dynamics. We further show that this aligns with linear stability thresholds and discuss the implications for hyperparameter tuning and coupling.

representative citing papers

Edge of Stability Selectively Shapes Learning Across the Data Distribution

cs.LG · 2026-06-02 · unverdicted · novelty 6.0

Edge of stability acts as a selective mechanism that amplifies learning on data groups with aligned persistent gradients while suppressing others.

Does Weight Decay Enhance Training Stability?

cs.LG · 2026-05-15 · conditional · novelty 6.0

Weight decay slows progressive sharpening at the edge of stability, inducing damped oscillations in CNNs and a phase transition to sub-2/η sharpness in MLPs driven by parameter-sharpness gradient alignment, yielding more stable NTK dynamics.

citing papers explorer

Showing 2 of 2 citing papers.

Edge of Stability Selectively Shapes Learning Across the Data Distribution cs.LG · 2026-06-02 · unverdicted · none · ref 10 · internal anchor
Edge of stability acts as a selective mechanism that amplifies learning on data groups with aligned persistent gradients while suppressing others.
Does Weight Decay Enhance Training Stability? cs.LG · 2026-05-15 · conditional · none · ref 21 · internal anchor
Weight decay slows progressive sharpening at the edge of stability, inducing damped oscillations in CNNs and a phase transition to sub-2/η sharpness in MLPs driven by parameter-sharpness gradient alignment, yielding more stable NTK dynamics.

Momentum Further Constrains Sharpness at the Edge of Stochastic Stability

fields

years

verdicts

representative citing papers

citing papers explorer