Adam-HNAG is a splitting-based reformulation of Adam that yields the first convergence proof for Adam-type methods, including accelerated rates, in convex smooth optimization.
Stochastic Modified Equations and Dynamics of Stochastic Gradient Algorithms I: Mathematical Foundations
4 Pith papers cite this work. Polarity classification is still indexing.
abstract
We develop the mathematical foundations of the stochastic modified equations (SME) framework for analyzing the dynamics of stochastic gradient algorithms, where the latter is approximated by a class of stochastic differential equations with small noise parameters. We prove that this approximation can be understood mathematically as an weak approximation, which leads to a number of precise and useful results on the approximations of stochastic gradient descent (SGD), momentum SGD and stochastic Nesterov's accelerated gradient method in the general setting of stochastic objectives. We also demonstrate through explicit calculations that this continuous-time approach can uncover important analytical insights into the stochastic gradient algorithms under consideration that may not be easy to obtain in a purely discrete-time setting.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 4roles
background 1polarities
background 1representative citing papers
Derives ODE deterministic equivalents and an adversarial homogenized SDE for SGD iterates in high-dim ℓ2-adversarial training, showing no constant learning rate ensures monotone descent for single-class adversarial least squares and equivalence to adaptive regularized standard SGD.
Four characterizations of irreversibility in training algorithms are equivalent to leading order in step size and produce an emergent force that breaks reparametrization symmetries while favoring minimum entropy production trajectories.
At the critical step-size scaling for SGD in high-dimensional single-layer networks, effective dynamics gain a diffusive correction term that changes the phase diagram and reduces to an Ornstein-Uhlenbeck process near fixed points, with the information exponent governing sample complexity.
citing papers explorer
-
Adam-HNAG: A Convergent Reformulation of Adam with Accelerated Rate
Adam-HNAG is a splitting-based reformulation of Adam that yields the first convergence proof for Adam-type methods, including accelerated rates, in convex smooth optimization.
-
Homogenization of $\ell_2$-Adversarial Training in High-Dimensions: Exact Dynamics under Stochastic Gradient Descent
Derives ODE deterministic equivalents and an adversarial homogenized SDE for SGD iterates in high-dim ℓ2-adversarial training, showing no constant learning rate ensures monotone descent for single-class adversarial least squares and equivalence to adaptive regularized standard SGD.
-
Thermodynamic Irreversibility of Training Algorithms
Four characterizations of irreversibility in training algorithms are equivalent to leading order in step size and produce an emergent force that breaks reparametrization symmetries while favoring minimum entropy production trajectories.
-
Limit Theorems for Stochastic Gradient Descent in High-Dimensional Single-Layer Networks
At the critical step-size scaling for SGD in high-dimensional single-layer networks, effective dynamics gain a diffusive correction term that changes the phase diagram and reduces to an Ornstein-Uhlenbeck process near fixed points, with the information exponent governing sample complexity.