Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods

· 2015 · cs.LG · arXiv 1506.08473

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

open full Pith review browse 5 citing papers arXiv PDF

abstract

Training neural networks is a challenging non-convex optimization problem, and backpropagation or gradient descent can get stuck in spurious local optima. We propose a novel algorithm based on tensor decomposition for guaranteed training of two-layer neural networks. We provide risk bounds for our proposed method, with a polynomial sample complexity in the relevant parameters, such as input dimension and number of neurons. While learning arbitrary target functions is NP-hard, we provide transparent conditions on the function and the input for learnability. Our training method is based on tensor decomposition, which provably converges to the global optimum, under a set of mild non-degeneracy conditions. It consists of simple embarrassingly parallel linear and multi-linear operations, and is competitive with standard stochastic gradient descent (SGD), in terms of computational complexity. Thus, we propose a computationally efficient method with guaranteed risk bounds for training neural networks with one hidden layer.

representative citing papers

Equivalence of Coarse and Fine-Grained Models for Learning with Distribution Shift

cs.DS · 2026-05-07 · unverdicted · novelty 8.0 · 2 refs

An efficient black-box reduction from PQ to TDS learning for any Boolean concept class in the distribution-free setting implies hardness for TDS learning of halfspaces, while membership queries enable efficient PQ learning of halfspaces via iterative Forster transforms.

Tensor-based Multi-layer Decoupling

eess.SY · 2026-04-12 · unverdicted · novelty 7.0

A new tensor framework for multi-layer decoupling of multivariate functions is proposed via ParaTuck decompositions and bilevel optimization.

Synchronous and Asynchronous Parallelism Approaches for Generalized Canonical Polyadic Tensor Decomposition with GenTen

math.NA · 2026-05-19 · unverdicted · novelty 6.0

Presents new synchronous and asynchronous parallel approaches for GCP tensor decomposition and evaluates computational cost and accuracy on synthetic and real-world datasets.

Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

cs.LG · 2024-01-02 · unverdicted · novelty 6.0

SPIN lets weak LLMs become strong by self-generating training data from previous model versions and training to prefer human-annotated responses over its own outputs, outperforming DPO even with extra GPT-4 data on benchmarks.

Robust and Resource Efficient Identification of Two Hidden Layer Neural Networks

cs.LG · 2019-06-30 · unverdicted · novelty 6.0

Presents an active-sampling method that approximates the weight subspace from Hessian finite differences, recovers the rank-1 tensors by robust nonlinear programming, and attributes layers with gradient descent, yielding stable recovery under a-posteriori verifiable conditions.

citing papers explorer

Showing 5 of 5 citing papers.

Equivalence of Coarse and Fine-Grained Models for Learning with Distribution Shift cs.DS · 2026-05-07 · unverdicted · none · ref 205 · 2 links · internal anchor
An efficient black-box reduction from PQ to TDS learning for any Boolean concept class in the distribution-free setting implies hardness for TDS learning of halfspaces, while membership queries enable efficient PQ learning of halfspaces via iterative Forster transforms.
Tensor-based Multi-layer Decoupling eess.SY · 2026-04-12 · unverdicted · none · ref 15
A new tensor framework for multi-layer decoupling of multivariate functions is proposed via ParaTuck decompositions and bilevel optimization.
Synchronous and Asynchronous Parallelism Approaches for Generalized Canonical Polyadic Tensor Decomposition with GenTen math.NA · 2026-05-19 · unverdicted · none · ref 15 · internal anchor
Presents new synchronous and asynchronous parallel approaches for GCP tensor decomposition and evaluates computational cost and accuracy on synthetic and real-world datasets.
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models cs.LG · 2024-01-02 · unverdicted · none · ref 58 · internal anchor
SPIN lets weak LLMs become strong by self-generating training data from previous model versions and training to prefer human-annotated responses over its own outputs, outperforming DPO even with extra GPT-4 data on benchmarks.
Robust and Resource Efficient Identification of Two Hidden Layer Neural Networks cs.LG · 2019-06-30 · unverdicted · none · ref 32 · internal anchor
Presents an active-sampling method that approximates the weight subspace from Hessian finite differences, recovers the rank-1 tensors by robust nonlinear programming, and attributes layers with gradient descent, yielding stable recovery under a-posteriori verifiable conditions.

Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods

fields

years

verdicts

representative citing papers

citing papers explorer