Small nonlinearities in activation functions create bad local minima in neural networks

Ali Jadbabaie; Chulhee Yun; Suvrit Sra

arxiv: 1802.03487 · v4 · pith:MUURIRUQnew · submitted 2018-02-10 · 💻 cs.LG · math.OC· stat.ML

Small nonlinearities in activation functions create bad local minima in neural networks

Chulhee Yun , Suvrit Sra , Ali Jadbabaie This is my paper

classification 💻 cs.LG math.OCstat.ML

keywords networkslocalminimaresultslinearneuralspuriousdeep

0 comments

read the original abstract

We investigate the loss surface of neural networks. We prove that even for one-hidden-layer networks with "slightest" nonlinearity, the empirical risks have spurious local minima in most cases. Our results thus indicate that in general "no spurious local minima" is a property limited to deep linear networks, and insights obtained from linear networks may not be robust. Specifically, for ReLU(-like) networks we constructively prove that for almost all practical datasets there exist infinitely many local minima. We also present a counterexample for more general activations (sigmoid, tanh, arctan, ReLU, etc.), for which there exists a bad local minimum. Our results make the least restrictive assumptions relative to existing results on spurious local optima in neural networks. We complete our discussion by presenting a comprehensive characterization of global optimality for deep linear networks, which unifies other results on this topic.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Hessian based analysis of SGD for Deep Nets: Dynamics and Generalization
cs.LG 2019-07 unverdicted novelty 4.0

Provides Hessian-based theoretical characterizations of SGD dynamics and a scale-invariant generalization bound for deep nets, backed by experiments on synthetic data, MNIST, and CIFAR-10.