A Generic Approach for Escaping Saddle points

Alexander J Smola; Barnabas Poczos; Francis Bach; Manzil Zaheer; Ruslan Salakhutdinov; Sashank J Reddi; Suvrit Sra

arxiv: 1709.01434 · v1 · pith:5UPBLBAInew · submitted 2017-09-05 · 💻 cs.LG · cs.AI

A Generic Approach for Escaping Saddle points

Sashank J Reddi , Manzil Zaheer , Suvrit Sra , Barnabas Poczos , Francis Bach , Ruslan Salakhutdinov , Alexander J Smola This is my paper

classification 💻 cs.LG cs.AI

keywords pointsmethodssaddlesecond-orderfirst-orderchallengecomputationsframework

0 comments

read the original abstract

A central challenge to using first-order methods for optimizing nonconvex problems is the presence of saddle points. First-order methods often get stuck at saddle points, greatly deteriorating their performance. Typically, to escape from saddles one has to use second-order methods. However, most works on second-order methods rely extensively on expensive Hessian-based computations, making them impractical in large-scale settings. To tackle this challenge, we introduce a generic framework that minimizes Hessian based computations while at the same time provably converging to second-order critical points. Our framework carefully alternates between a first-order and a second-order subroutine, using the latter only close to saddle points, and yields convergence results competitive to the state-of-the-art. Empirical results suggest that our strategy also enjoys a good practical performance.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Accelerated Gradient Methods for Nonconvex Optimization: Escape Trajectories From Strict Saddle Points and Convergence to Local Minima
math.OC 2023-07 unverdicted novelty 7.0

Theoretical analysis of accelerated gradient methods showing almost-sure escape from strict saddles and linear exit times, plus a subclass achieving near-optimal convergence to local minima in convex neighborhoods of ...
Combining Stochastic Adaptive Cubic Regularization with Negative Curvature for Nonconvex Optimization
math.OC 2019-06 unverdicted novelty 7.0

Introduces the SANC algorithm combining negative curvature with stochastic adaptive cubic regularization for nonconvex optimization and claims it is the first such combination with consistent batch sizes for large-scale ML.
Dimension-Free Saddle-Point Escape in Muon
cs.LG 2026-05 unverdicted novelty 6.0

Muon achieves dimension-free saddle-point escape through non-linear spectral shaping, resolvent calculus, and structural incoherence, yielding an algebraically dimension-free escape bound.
Adaptive Federated Optimization
cs.LG 2020-02 unverdicted novelty 6.0

Proposes federated adaptive optimizers (FedAdagrad, FedAdam, FedYogi) with convergence analysis for non-convex objectives under data heterogeneity and reports empirical gains over FedAvg.