Altschuler

Henry Shugart, Jason M · 2025 · math.OC · arXiv 2505.01423

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open full Pith review browse 3 citing papers arXiv PDF

abstract

Efficient computation of min-max problems is a central question in optimization, learning, games, and control. Arguably the most natural algorithm is gradient-descent-ascent (GDA). However, since the 1970s, conventional wisdom has argued that GDA fails to converge even on simple problems. This failure spurred an extensive literature on modifying GDA with additional building blocks such as extragradients, optimism, momentum, anchoring, etc. In contrast, we show that GDA converges in its original form by simply using a judicious choice of stepsizes. The key innovation is the proposal of unconventional stepsize schedules (dubbed slingshot stepsize schedules) that are time-varying, asymmetric, and periodically negative. We show that all three properties are necessary for convergence, and that altogether this enables GDA to converge on the classical counterexamples (e.g., unconstrained convex-concave problems). The core algorithmic intuition is that although negative stepsizes make backward progress, they de-synchronize the min and max variables (overcoming the cycling issue of GDA), and lead to a slingshot phenomenon in which the forward progress in the other iterations is overwhelmingly larger. This results in fast overall convergence. Geometrically, the slingshot dynamics leverage the non-reversibility of gradient flow: positive/negative steps cancel to first order, yielding a second-order net movement in a new direction that leads to convergence and is otherwise impossible for GDA to move in. We interpret this as a second-order finite-differencing algorithm and show that, intriguingly, it approximately implements consensus optimization, an empirically popular algorithm for min-max problems involving deep neural networks (e.g., training GANs).

representative citing papers

Lower Bounds for Anytime Acceleration of Gradient Descent

math.OC · 2026-07-02 · unverdicted · novelty 7.0

Establishes that no positive stepsize schedule achieves better than o(n^{-1.334}) anytime convergence for function values or o(n^{-1}) for squared gradient norms in smooth convex optimization.

Negative Momentum for Convex-Concave Optimization

math.OC · 2026-04-18 · unverdicted · novelty 7.0

Negative momentum enables global convergence in convex-concave min-max optimization and accelerated rates in the strongly-convex-strongly-concave setting.

Stepsize Hedging: an Alternative Mechanism for Accelerating Gradient Descent

math.OC · 2026-05-29 · unverdicted · novelty 2.0

This expository article introduces stepsize hedging as a way to accelerate gradient descent without additional terms like momentum.

citing papers explorer

Showing 3 of 3 citing papers after filters.

Lower Bounds for Anytime Acceleration of Gradient Descent math.OC · 2026-07-02 · unverdicted · none · ref 28 · internal anchor
Establishes that no positive stepsize schedule achieves better than o(n^{-1.334}) anytime convergence for function values or o(n^{-1}) for squared gradient norms in smooth convex optimization.
Negative Momentum for Convex-Concave Optimization math.OC · 2026-04-18 · unverdicted · none · ref 39 · internal anchor
Negative momentum enables global convergence in convex-concave min-max optimization and accelerated rates in the strongly-convex-strongly-concave setting.
Stepsize Hedging: an Alternative Mechanism for Accelerating Gradient Descent math.OC · 2026-05-29 · unverdicted · none · ref 41 · internal anchor
This expository article introduces stepsize hedging as a way to accelerate gradient descent without additional terms like momentum.

Altschuler

fields

years

verdicts

representative citing papers

citing papers explorer