pith. machine review for the scientific record. sign in

arxiv: 1901.00451 · v1 · submitted 2019-01-02 · 💻 cs.LG · stat.ML

Recognition: unknown

SGD Converges to Global Minimum in Deep Learning via Star-convex Path

Authors on Pith no claims yet
classification 💻 cs.LG stat.ML
keywords globalminimumbeendeeptrainingconvergeslearningnetworks
0
0 comments X
read the original abstract

Stochastic gradient descent (SGD) has been found to be surprisingly effective in training a variety of deep neural networks. However, there is still a lack of understanding on how and why SGD can train these complex networks towards a global minimum. In this study, we establish the convergence of SGD to a global minimum for nonconvex optimization problems that are commonly encountered in neural network training. Our argument exploits the following two important properties: 1) the training loss can achieve zero value (approximately), which has been widely observed in deep learning; 2) SGD follows a star-convex path, which is verified by various experiments in this paper. In such a context, our analysis shows that SGD, although has long been considered as a randomized algorithm, converges in an intrinsically deterministic manner to a global minimum.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Stochastic Trust-Region Methods for Over-parameterized Models

    math.OC 2026-04 unverdicted novelty 7.0

    Stochastic trust-region methods achieve O(ε^{-2} log(1/ε)) complexity for unconstrained problems and O(ε^{-4} log(1/ε)) for equality-constrained problems under the strong growth condition, with experiments showing sta...