Establishes that no positive stepsize schedule achieves better than o(n^{-1.334}) anytime convergence for function values or o(n^{-1}) for squared gradient norms in smooth convex optimization.
Altschuler
3 Pith papers cite this work. Polarity classification is still indexing.
abstract
Efficient computation of min-max problems is a central question in optimization, learning, games, and control. Arguably the most natural algorithm is gradient-descent-ascent (GDA). However, since the 1970s, conventional wisdom has argued that GDA fails to converge even on simple problems. This failure spurred an extensive literature on modifying GDA with additional building blocks such as extragradients, optimism, momentum, anchoring, etc. In contrast, we show that GDA converges in its original form by simply using a judicious choice of stepsizes. The key innovation is the proposal of unconventional stepsize schedules (dubbed slingshot stepsize schedules) that are time-varying, asymmetric, and periodically negative. We show that all three properties are necessary for convergence, and that altogether this enables GDA to converge on the classical counterexamples (e.g., unconstrained convex-concave problems). The core algorithmic intuition is that although negative stepsizes make backward progress, they de-synchronize the min and max variables (overcoming the cycling issue of GDA), and lead to a slingshot phenomenon in which the forward progress in the other iterations is overwhelmingly larger. This results in fast overall convergence. Geometrically, the slingshot dynamics leverage the non-reversibility of gradient flow: positive/negative steps cancel to first order, yielding a second-order net movement in a new direction that leads to convergence and is otherwise impossible for GDA to move in. We interpret this as a second-order finite-differencing algorithm and show that, intriguingly, it approximately implements consensus optimization, an empirically popular algorithm for min-max problems involving deep neural networks (e.g., training GANs).
fields
math.OC 3years
2026 3verdicts
UNVERDICTED 3representative citing papers
Negative momentum enables global convergence in convex-concave min-max optimization and accelerated rates in the strongly-convex-strongly-concave setting.
This expository article introduces stepsize hedging as a way to accelerate gradient descent without additional terms like momentum.
citing papers explorer
-
Lower Bounds for Anytime Acceleration of Gradient Descent
Establishes that no positive stepsize schedule achieves better than o(n^{-1.334}) anytime convergence for function values or o(n^{-1}) for squared gradient norms in smooth convex optimization.
-
Negative Momentum for Convex-Concave Optimization
Negative momentum enables global convergence in convex-concave min-max optimization and accelerated rates in the strongly-convex-strongly-concave setting.
-
Stepsize Hedging: an Alternative Mechanism for Accelerating Gradient Descent
This expository article introduces stepsize hedging as a way to accelerate gradient descent without additional terms like momentum.