pith. sign in

arxiv: 1802.03487 · v4 · pith:MUURIRUQnew · submitted 2018-02-10 · 💻 cs.LG · math.OC· stat.ML

Small nonlinearities in activation functions create bad local minima in neural networks

classification 💻 cs.LG math.OCstat.ML
keywords networkslocalminimaresultslinearneuralspuriousdeep
0
0 comments X
read the original abstract

We investigate the loss surface of neural networks. We prove that even for one-hidden-layer networks with "slightest" nonlinearity, the empirical risks have spurious local minima in most cases. Our results thus indicate that in general "no spurious local minima" is a property limited to deep linear networks, and insights obtained from linear networks may not be robust. Specifically, for ReLU(-like) networks we constructively prove that for almost all practical datasets there exist infinitely many local minima. We also present a counterexample for more general activations (sigmoid, tanh, arctan, ReLU, etc.), for which there exists a bad local minimum. Our results make the least restrictive assumptions relative to existing results on spurious local optima in neural networks. We complete our discussion by presenting a comprehensive characterization of global optimality for deep linear networks, which unifies other results on this topic.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Hessian based analysis of SGD for Deep Nets: Dynamics and Generalization

    cs.LG 2019-07 unverdicted novelty 4.0

    Provides Hessian-based theoretical characterizations of SGD dynamics and a scale-invariant generalization bound for deep nets, backed by experiments on synthetic data, MNIST, and CIFAR-10.