pith. machine review for the scientific record. sign in

arxiv: 1503.02101 · v1 · submitted 2015-03-06 · 💻 cs.LG · math.OC· stat.ML

Recognition: unknown

Escaping From Saddle Points --- Online Stochastic Gradient for Tensor Decomposition

Authors on Pith no claims yet
classification 💻 cs.LG math.OCstat.ML
keywords gradientsaddledecompositionnon-convexstochastictensordescentfunctions
0
0 comments X
read the original abstract

We analyze stochastic gradient descent for optimizing non-convex functions. In many cases for non-convex functions the goal is to find a reasonable local minimum, and the main concern is that gradient updates are trapped in saddle points. In this paper we identify strict saddle property for non-convex problem that allows for efficient optimization. Using this property we show that stochastic gradient descent converges to a local minimum in a polynomial number of iterations. To the best of our knowledge this is the first work that gives global convergence guarantees for stochastic gradient descent on non-convex functions with exponentially many local minima and saddle points. Our analysis can be applied to orthogonal tensor decomposition, which is widely used in learning a rich class of latent variable models. We propose a new optimization formulation for the tensor decomposition problem that has strict saddle property. As a result we get the first online algorithm for orthogonal tensor decomposition with global convergence guarantee.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Boundary Mass and the Soft-to-Hard Limit in Mixture-of-Experts

    cs.LG 2026-05 unverdicted novelty 7.0

    Boundary mass in MoE is linear in slab width under smoothness and transversality, so the zero-temperature limit is governed by a thin geometric layer around routing interfaces rather than the full input space.