Establishes matching Ω(ε^{-7/4}) and Ω(ε^{-5/3}) lower bounds via a block-chain construction for deterministic first-order methods under higher-order smoothness.
Stochas- tic gradient descent escapes saddle points efficiently
4 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 4representative citing papers
A diameter criterion tied to a potential function certifies convergence of difference inclusions, enabling discrete proofs for first-order optimization methods with diminishing steps.
Diffusion strategy for distributed learning escapes saddle points in O(1/μ) iterations and returns approximate second-order stationary points in polynomial iterations with less restrictive noise assumptions than centralized methods.
Diffusion learning achieves linear-rate agreement around the network centroid in stochastic non-convex distributed optimization.
citing papers explorer
-
Sharp First-Order Lower Bounds for Higher-Order Smooth Nonconvex Optimization
Establishes matching Ω(ε^{-7/4}) and Ω(ε^{-5/3}) lower bounds via a block-chain construction for deterministic first-order methods under higher-order smoothness.
-
Convergence of difference inclusions via a diameter criterion
A diameter criterion tied to a potential function certifies convergence of difference inclusions, enabling discrete proofs for first-order optimization methods with diminishing steps.
-
Distributed Learning in Non-Convex Environments -- Part II: Polynomial Escape from Saddle-Points
Diffusion strategy for distributed learning escapes saddle points in O(1/μ) iterations and returns approximate second-order stationary points in polynomial iterations with less restrictive noise assumptions than centralized methods.
-
Distributed Learning in Non-Convex Environments -- Part I: Agreement at a Linear Rate
Diffusion learning achieves linear-rate agreement around the network centroid in stochastic non-convex distributed optimization.