Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods

Majid Janzamin , Hanie Sedghi , Anima Anandkumar

Authors on Pith no claims yet

classification 💻 cs.LG cs.NEstat.ML

keywords trainingnetworksneuralguaranteedmethodtensorboundscomplexity

read the original abstract

Training neural networks is a challenging non-convex optimization problem, and backpropagation or gradient descent can get stuck in spurious local optima. We propose a novel algorithm based on tensor decomposition for guaranteed training of two-layer neural networks. We provide risk bounds for our proposed method, with a polynomial sample complexity in the relevant parameters, such as input dimension and number of neurons. While learning arbitrary target functions is NP-hard, we provide transparent conditions on the function and the input for learnability. Our training method is based on tensor decomposition, which provably converges to the global optimum, under a set of mild non-degeneracy conditions. It consists of simple embarrassingly parallel linear and multi-linear operations, and is competitive with standard stochastic gradient descent (SGD), in terms of computational complexity. Thus, we propose a computationally efficient method with guaranteed risk bounds for training neural networks with one hidden layer.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Equivalence of Coarse and Fine-Grained Models for Learning with Distribution Shift
cs.DS 2026-05 unverdicted novelty 7.0

PQ and TDS learning are equivalent in the distribution-free setting for Boolean classes, implying hardness for TDS halfspace learning but efficient algorithms with membership queries.
Tensor-based Multi-layer Decoupling
eess.SY 2026-04 unverdicted novelty 7.0

A new tensor framework for multi-layer decoupling of multivariate functions is proposed via ParaTuck decompositions and bilevel optimization.
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
cs.LG 2024-01 unverdicted novelty 6.0

SPIN lets weak LLMs become strong by self-generating training data from previous model versions and training to prefer human-annotated responses over its own outputs, outperforming DPO even with extra GPT-4 data on be...