pith. sign in

arxiv: 2604.13393 · v1 · submitted 2026-04-15 · 🧮 math.OC · cs.LG· stat.ML

A short proof of near-linear convergence of adaptive gradient descent under fourth-order growth and convexity

Pith reviewed 2026-05-10 13:31 UTC · model grok-4.3

classification 🧮 math.OC cs.LGstat.ML
keywords adaptive gradient descentnear-linear convergencefourth-order growthconvex optimizationLyapunov analysislocal convergenceravine manifold
0
0 comments X

The pith

Adaptive gradient descent converges near-linearly under convexity and fourth-order growth via a direct Lyapunov argument.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that gradient descent with an adaptive stepsize achieves near-linear local convergence for smooth functions that grow at least quartically away from their minimizers, provided the objective is also convex and has a unique minimizer. It replaces an earlier intricate analysis that tracks iterates relative to a slow-growth ravine manifold with a simpler, direct Lyapunov function that controls both function values and gradient norms. This yields a more adaptive variant of the algorithm whose numerical behavior is encouraging. The approach makes the convergence result more transparent and potentially easier to generalize while preserving the core rate guarantee.

Core claim

Davis, Drusvyatskiy, and Jiang showed that gradient descent with an adaptive stepsize converges locally at a nearly-linear rate for smooth functions that grow at least quartically away from their minimizers. The argument is intricate, relying on monitoring the performance of the algorithm relative to a certain manifold of slow growth -- called the ravine. In this work, we provide a direct Lyapunov-based argument that bypasses these difficulties when the objective is in addition convex and has a unique minimizer. As a byproduct of the argument, we obtain a more adaptive variant than the original algorithm with encouraging numerical performance.

What carries the argument

A Lyapunov function that decreases by a fixed fraction of itself at each step, combining the objective gap with a term involving the squared gradient norm under the adaptive stepsize rule.

If this is right

  • The algorithm converges locally at a near-linear rate around the minimizer.
  • A strictly more adaptive stepsize rule can be used while retaining the same convergence guarantee.
  • Numerical tests confirm practical stability and speed of the new variant on test problems.
  • The Lyapunov construction directly controls distance to optimality without auxiliary manifold tracking.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same Lyapunov template could simplify rate proofs for other adaptive or accelerated methods under quartic growth.
  • Relaxing convexity while keeping the growth condition might still be possible if a suitable potential can be found.
  • The more adaptive variant may reduce the need for manual stepsize tuning in applications where quartic growth naturally arises.

Load-bearing premise

The objective must be convex, have a unique minimizer, and satisfy fourth-order growth away from that minimizer so the Lyapunov function can bound the iterates.

What would settle it

Construct or numerically identify a convex function with a unique minimizer and quartic growth for which the adaptive iterates fail to satisfy the Lyapunov decrease inequality or exhibit slower than near-linear local convergence.

Figures

Figures reproduced from arXiv: 2604.13393 by Damek Davis, Dmitriy Drusvyatskiy.

Figure 1
Figure 1. Figure 1: The function f(v, u) = 1 2 (v +u 4 ) 2 +u 4 near the origin. Left: surface plot. Right: level sets. The solid black curve is the ravine v = −u 4 ; the dashed green line is the null space Q = {v = 0}; the dashed red line is the range P = {u = 0}. descent, we show that the squared projected gradient G(x) := ∥P ∇f(x)∥ 2 contracts at a linear rate 1−Θ(η) per step, until the iterate enters a region where the fu… view at source ↗
Figure 2
Figure 2. Figure 2: The function g(v, u) = 1 2 (v + u 2 ) 2 + u 4 near the origin. Left: surface plot. Right: level sets. The solid black curve is the ravine v = −u 2 ; the dashed green line is Q = {v = 0}; the dashed red line is P = {u = 0}. Compare with [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Algorithm 1 on the two quartics from Figures 1 and 2. Top: convex f with (η, τ ) = (1, 0.15); 66 iterations. Bottom: nonconvex g with (η, τ ) = (1, 0.12); 80 iterations. GDPolyak (stepsize 1, block length 1, i.e., alternating GD/Polyak every iteration): 84 iterations on both. GD stepsize 1. Distance threshold 10−6 . the epoch-based schedule in [1]. Key applications of fourth-order growth arise in rank-over… view at source ↗
Figure 4
Figure 4. Figure 4: DDJ fourth-order benchmarks. Rows: Rosenbrock (distance [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗
read the original abstract

Davis, Drusvyatskiy, and Jiang showed that gradient descent with an adaptive stepsize converges locally at a nearly-linear rate for smooth functions that grow at least quartically away from their minimizers. The argument is intricate, relying on monitoring the performance of the algorithm relative to a certain manifold of slow growth -- called the ravine. In this work, we provide a direct Lyapunov-based argument that bypasses these difficulties when the objective is in addition convex and a has a unique minimizer. As a byproduct of the argument, we obtain a more adaptive variant than the original algorithm with encouraging numerical performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 1 minor

Summary. The paper claims to offer a short, direct proof using a Lyapunov function that establishes near-linear convergence for adaptive gradient descent on smooth convex functions with a unique minimizer satisfying fourth-order growth away from the minimizer. This bypasses the ravine manifold monitoring from the work of Davis, Drusvyatskiy, and Jiang. The argument also yields a more adaptive step-size selection rule, which is demonstrated to have encouraging numerical performance.

Significance. If the proof is correct, this work is significant for providing a simpler analysis method for convergence of adaptive methods under higher-order growth conditions when convexity and uniqueness hold. It avoids the intricate construction of the prior paper. The more adaptive variant is a positive byproduct that may improve practical applicability. The direct argument and the numerical support are strengths of the manuscript. The stress-test concern regarding unavailable derivation does not land, as the manuscript contains the claimed Lyapunov argument.

minor comments (1)
  1. The phrase 'convex and a has a unique minimizer' in the abstract contains a typographical error and should read 'convex and has a unique minimizer'.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of our manuscript and the recommendation for minor revision. We are pleased that the direct Lyapunov argument is recognized as a simpler alternative to the ravine-manifold monitoring approach of Davis, Drusvyatskiy, and Jiang, and that the more adaptive step-size rule and its numerical performance are viewed favorably. The observation that the Lyapunov derivation is already contained in the manuscript is appreciated.

Circularity Check

0 steps flagged

No significant circularity; direct Lyapunov argument is self-contained

full rationale

The paper derives near-linear convergence via a new direct Lyapunov function that exploits convexity, uniqueness of the minimizer, and fourth-order growth to control iterates without tracking the ravine manifold from prior work. No equation reduces a claimed rate or prediction to a fitted parameter, self-referential definition, or load-bearing self-citation chain; the cited prior result (Davis-Dr usvyatskiy-Jiang) is explicitly bypassed rather than invoked as a uniqueness theorem or ansatz. The derivation therefore stands on its stated assumptions and is independent of its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the standard definition of convexity, uniqueness of the minimizer, and the fourth-order growth condition; these are domain assumptions rather than derived quantities.

axioms (1)
  • domain assumption The objective function is convex, has a unique minimizer, and satisfies fourth-order growth away from the minimizer.
    Explicitly required for the Lyapunov argument to hold, as stated in the abstract.

pith-pipeline@v0.9.0 · 5403 in / 1031 out tokens · 81348 ms · 2026-05-10T13:31:49.653698+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages

  1. [1]

    Gradient descent with adaptive stepsize converges (nearly) linearly under fourth-order growth.Math

    Damek Davis, Dmitriy Drusvyatskiy, and Liwei Jiang. Gradient descent with adaptive stepsize converges (nearly) linearly under fourth-order growth.Math. Program., 2025. doi:10.1007/s10107-025-02290-5

  2. [2]

    B. T. Polyak. Gradient methods for minimizing functionals.Zh. Vychisl. Mat. i Mat. Fiz., 3(4):643–653, 1963

  3. [3]

    Luo and P

    Z.-Q. Luo and P. Tseng. Error bounds and convergence analysis of feasible descent methods: a general approach.Ann. Oper. Res., 46(1):157–178, 1993. 9

  4. [4]

    Drusvyatskiy and A

    D. Drusvyatskiy and A. S. Lewis. Error bounds, quadratic growth, and linear convergence of proximal methods.Math. Oper. Res., 43(3):919–948, 2018

  5. [5]

    Karimi, J

    H. Karimi, J. Nutini, and M. Schmidt. Linear convergence of gradient and proximal-gradient methods under the Polyak–Łojasiewicz condition. InProceedings of ECML PKDD, pages 795–811. Springer, 2016

  6. [6]

    Necoara, Yu

    I. Necoara, Yu. Nesterov, and F. Glineur. Linear convergence of first order methods for non-strongly convex optimization.Math. Program., 175:69–107, 2019

  7. [7]

    B. T. Polyak.Introduction to Optimization. Optimization Software, Inc., New York, 1987

  8. [8]

    Hazan and S

    E. Hazan and S. Kakade. Revisiting the Polyak step size, 2019. arXiv:1905.00313

  9. [9]

    Attouch, J

    H. Attouch, J. Bolte, and B. F. Svaiter. Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods.Math. Program., 137(1):91–129, 2013

  10. [10]

    I. M. Gelfand and M. L. Tsetlin. The principle of non-local search in automatic optimization systems.Dokl. Akad. Nauk SSSR, 137:295–298, 1961

  11. [11]

    B. T. Polyak. Some methods of speeding up the convergence of iteration methods.USSR Comput. Math. Math. Phys., 4(5):1–17, 1964

  12. [12]

    Nesterov

    Yu. Nesterov. A method for solving the convex programming problem with convergence rate O(1/k2).Dokl. Akad. Nauk SSSR, 269(3):543–547, 1983

  13. [13]

    Attouch and J

    H. Attouch and J. Fadili. From the ravine method to the Nesterov method and vice versa: a dynamical system perspective.SIAM J. Optim., 32(3):2074–2101, 2022

  14. [14]

    A. S. Lewis. Active sets, nonsmoothness, and sensitivity.SIAM J. Optim., 13(3):702–725, 2002

  15. [15]

    Drusvyatskiy and A

    D. Drusvyatskiy and A. S. Lewis. Optimality, identifiability, and sensitivity.Math. Program., 147(1):467–498, 2014

  16. [16]

    Davis and L

    D. Davis and L. Jiang. A local nearly linearly convergent first-order method for nonsmooth functions with quadratic growth.Found. Comput. Math., pages 1–82, 2024

  17. [17]

    D. Young. On Richardson’s method for solving linear systems with positive definite matrices. J. Math. Phys., 32(1–4):243–255, 1953

  18. [18]

    Big-step-little-step: Efficient gradient methods for objectives with multiple scales

    Jonathan Kelner, Annie Marsden, Vatsal Sharan, Aaron Sidford, Gregory Valiant, and Honglin Yuan. Big-step-little-step: Efficient gradient methods for objectives with multiple scales. In Proceedings of Thirty Fifth Conference on Learning Theory, volume 178 ofProc. Mach. Learn. Res., pages 2431–2540. PMLR, 2022

  19. [19]

    B. Grimmer. Provably faster gradient descent via long steps.SIAM J. Optim., 34(3):2588–2608, 2024

  20. [20]

    Altschuler and Pablo A

    Jason M. Altschuler and Pablo A. Parrilo. Acceleration by stepsize hedging I: Multi-step descent and the silver stepsize schedule.J. ACM, 72(2):12:1–12:38, 2025

  21. [21]

    Altschuler and Pablo A

    Jason M. Altschuler and Pablo A. Parrilo. Acceleration by stepsize hedging II: Silver stepsize schedule for smooth convex optimization.Math. Program., 213(1–2):1105–1118, 2025. 10

  22. [22]

    Burer and R

    S. Burer and R. D. C. Monteiro. A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization.Math. Program., 95(2):329–357, 2003

  23. [23]

    Burer and R

    S. Burer and R. D. C. Monteiro. Local minima and convergence in low-rank semidefinite programming.Math. Program., 103(3):427–444, 2005

  24. [24]

    Y. Chi, Y. M. Lu, and Y. Chen. Nonconvex optimization meets low-rank matrix factorization: An overview.IEEE Trans. Signal Process., 67(20):5239–5269, 2019

  25. [25]

    J. Zhuo, J. Kwon, N. Ho, and C. Caramanis. On the computational and statistical complexity of over-parameterized matrix sensing.J. Mach. Learn. Res., 25(169):1–47, 2024

  26. [26]

    Xu and S

    W. Xu and S. S. Du. Over-parameterization exponentially slows down gradient descent for learning a single neuron. InProceedings of Thirty Sixth Conference on Learning Theory, volume 195 ofProc. Mach. Learn. Res., pages 1155–1198. PMLR, 2023

  27. [27]

    Low-rank solutions of linear matrix equations via Procrustes flow

    Stephen Tu, Ross Boczar, Max Simchowitz, Mahdi Soltanolkotabi, and Ben Recht. Low-rank solutions of linear matrix equations via Procrustes flow. InProceedings of The 33rd International Conference on Machine Learning, volume 48 ofProc. Mach. Learn. Res., pages 964–973. PMLR, 2016

  28. [28]

    No spurious local minima in nonconvex low rank problems: A unified geometric analysis

    Rong Ge, Chi Jin, and Yi Zheng. No spurious local minima in nonconvex low rank problems: A unified geometric analysis. InProceedings of the 34th International Conference on Machine Learning, volume 70 ofProc. Mach. Learn. Res., pages 1233–1242. PMLR, 2017

  29. [29]

    Z. Zhu, Q. Li, G. Tang, and M. B. Wakin. Global optimality in low-rank matrix optimization. IEEE Trans. Signal Process., 66(13):3614–3628, 2018. 11 0 1000 2000 3000 4000 5000 6000 iterations 10 14 10 11 10 8 10 5 10 2 101 function gap Fourth-order Rosenbrock: loss 0 1000 2000 3000 4000 5000 6000 iterations 10 6 10 4 10 2 100 distance to optimum Fourth-ord...