NEON+: Accelerated Gradient Methods for Extracting Negative Curvature for Non-Convex Optimization

Rong Jin; Tianbao Yang; Yi Xu

arxiv: 1712.01033 · v2 · pith:LWYU37JTnew · submitted 2017-12-04 · 🧮 math.OC · stat.ML

NEON+: Accelerated Gradient Methods for Extracting Negative Curvature for Non-Convex Optimization

Yi Xu , Rong Jin , Tianbao Yang This is my paper

classification 🧮 math.OC stat.ML

keywords optimizationmethodsgradientmethodnon-convexepsilonacceleratedcurvature

0 comments

read the original abstract

Accelerated gradient (AG) methods are breakthroughs in convex optimization, improving the convergence rate of the gradient descent method for optimization with smooth functions. However, the analysis of AG methods for non-convex optimization is still limited. It remains an open question whether AG methods from convex optimization can accelerate the convergence of the gradient descent method for finding local minimum of non-convex optimization problems. This paper provides an affirmative answer to this question. In particular, we analyze two renowned variants of AG methods (namely Polyak's Heavy Ball method and Nesterov's Accelerated Gradient method) for extracting the negative curvature from random noise, which is central to escaping from saddle points. By leveraging the proposed AG methods for extracting the negative curvature, we present a new AG algorithm with double loops for non-convex optimization~\footnote{this is in contrast to a single-loop AG algorithm proposed in a recent manuscript~\citep{AGNON}, which directly analyzed the Nesterov's AG method for non-convex optimization and appeared online on November 29, 2017. However, we emphasize that our work is an independent work, which is inspired by our earlier work~\citep{NEON17} and is based on a different novel analysis.}, which converges to second-order stationary point $\x$ such that $\|\nabla f(\x)\|\leq \epsilon$ and $\nabla^2 f(\x)\geq -\sqrt{\epsilon} I$ with $\widetilde O(1/\epsilon^{1.75})$ iteration complexity, improving that of gradient descent method by a factor of $\epsilon^{-0.25}$ and matching the best iteration complexity of second-order Hessian-free methods for non-convex optimization.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Sharp First-Order Lower Bounds for Higher-Order Smooth Nonconvex Optimization
cs.LG 2026-06 unverdicted novelty 9.0

Establishes matching Ω(ε^{-7/4}) and Ω(ε^{-5/3}) lower bounds via a block-chain construction for deterministic first-order methods under higher-order smoothness.
Scalable First-Order Interior Point Trust Region Algorithms for Linearly Constrained Optimization
cs.DS 2026-04 unverdicted novelty 7.0

An approximate IPTR framework for linearly constrained optimization uses low-rank projector updates to cut per-iteration cost while preserving feasibility and convergence guarantees, with experiments showing 2.48x speedup.
A Restart-Free Accelerated Algorithm for Non-Convex Minimization: Continuous and Discrete Analysis
math.OC 2026-06 unverdicted novelty 6.0

Two restart-free accelerated first-order methods for nonconvex functions with Lipschitz gradients and Hessians achieve O(ε^{-7/4}) complexity by discretizing a new ODE model, with adaptive Lipschitz estimation in one variant.