pith. sign in

Understanding the Acceleration Phenomenon via High-Resolution Differential Equations

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it
abstract

Gradient-based optimization algorithms can be studied from the perspective of limiting ordinary differential equations (ODEs). Motivated by the fact that existing ODEs do not distinguish between two fundamentally different algorithms---Nesterov's accelerated gradient method for strongly convex functions (NAG-SC) and Polyak's heavy-ball method---we study an alternative limiting process that yields high-resolution ODEs. We show that these ODEs permit a general Lyapunov function framework for the analysis of convergence in both continuous and discrete time. We also show that these ODEs are more accurate surrogates for the underlying algorithms; in particular, they not only distinguish between NAG-SC and Polyak's heavy-ball method, but they allow the identification of a term that we refer to as "gradient correction" that is present in NAG-SC but not in the heavy-ball method and is responsible for the qualitative difference in convergence of the two methods. We also use the high-resolution ODE framework to study Nesterov's accelerated gradient method for (non-strongly) convex functions, uncovering a hitherto unknown result---that NAG-C minimizes the squared gradient norm at an inverse cubic rate. Finally, by modifying the high-resolution ODE of NAG-C, we obtain a family of new optimization methods that are shown to maintain the accelerated convergence rates of NAG-C for smooth convex functions.

fields

stat.ML 1

years

2026 1

verdicts

UNVERDICTED 1

representative citing papers

citing papers explorer

Showing 1 of 1 citing paper.

  • Dynamics of Stochastic Momentum with Sparse Updates in High Dimensions stat.ML · 2026-05-27 · unverdicted · none · ref 4 · internal anchor

    Characterizes high-dimensional phase structure of momentum under sparse updates via closed-form second-moment dynamics, with regimes matching SGD, unstable, or heavy-ball depending on retention-to-learning timescale ratio.