Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates

· 2017 · cs.LG · arXiv 1708.07120

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

open full Pith review browse 10 citing papers arXiv PDF

abstract

In this paper, we describe a phenomenon, which we named "super-convergence", where neural networks can be trained an order of magnitude faster than with standard training methods. The existence of super-convergence is relevant to understanding why deep networks generalize well. One of the key elements of super-convergence is training with one learning rate cycle and a large maximum learning rate. A primary insight that allows super-convergence training is that large learning rates regularize the training, hence requiring a reduction of all other forms of regularization in order to preserve an optimal regularization balance. We also derive a simplification of the Hessian Free optimization method to compute an estimate of the optimal learning rate. Experiments demonstrate super-convergence for Cifar-10/100, MNIST and Imagenet datasets, and resnet, wide-resnet, densenet, and inception architectures. In addition, we show that super-convergence provides a greater boost in performance relative to standard training when the amount of labeled training data is limited. The architectures and code to replicate the figures in this paper are available at github.com/lnsmith54/super-convergence. See http://www.fast.ai/2018/04/30/dawnbench-fastai/ for an application of super-convergence to win the DAWNBench challenge (see https://dawn.cs.stanford.edu/benchmark/).

citation-role summary

background 2

citation-polarity summary

background 1 unclear 1

representative citing papers

Generative models on phase space

hep-ph · 2026-04-02 · unverdicted · novelty 8.0

Generative diffusion and flow models are constructed to remain exactly on the Lorentz-invariant massless N-particle phase space manifold during sampling for particle physics applications.

OmniMol: Transferring Particle Physics Knowledge to Molecular Dynamics with Point-Edge Transformers

physics.chem-ph · 2026-01-15 · unverdicted · novelty 7.0

OmniMol transfers a billion-jet pre-trained PET foundation model from HEP to molecular dynamics via an interaction-matrix attention bias, delivering strong performance on the oMol dataset with minimal fine-tuning and fast inference.

Data-Driven Calibration of Large Liquid Detectors with Unsupervised Learning

physics.ins-det · 2025-12-19 · conditional · novelty 7.0

Unsupervised deep learning with a simplified optical photon transport model in the loss function extracts three PMT calibration constants per tube from background events in the SNO+ detector.

Beta-Scheduling: Momentum from Critical Damping as a Diagnostic and Correction Tool for Neural Network Training

cs.LG · 2026-03-30 · unverdicted · novelty 6.0

A momentum schedule from critical damping speeds convergence and yields an optimizer-invariant diagnostic for locating and correcting specific underperforming layers in trained networks.

Bayesian Modeling and Prediction of Generalized Contact Matrices

stat.ME · 2026-05-07 · unverdicted · novelty 6.0

A Bayesian model for multi-feature contact matrices that uses tensor structures and contingency table theory to satisfy structural constraints and impute missing contact features, validated on simulations and US/German survey data.

From Spherical to Gaussian: A Comparative Analysis of Point Cloud Cropping Strategies in Large-Scale 3D Environments

cs.CV · 2026-05-03 · unverdicted · novelty 5.0 · 2 refs

Gaussian and related cropping strategies for point cloud subclouds improve 3D neural network performance over spherical cropping on large outdoor scenes.

Now You See That: Learning End-to-End Humanoid Locomotion from Raw Pixels

cs.RO · 2026-02-06 · unverdicted · novelty 5.0

An end-to-end policy learns robust humanoid locomotion directly from noisy depth images via high-fidelity sensor simulation, vision-aware distillation from privileged maps, and terrain-specific multi-critic reward shaping.

Learning Minimal Representations of Many-Body Physics from Snapshots of a Quantum Simulator

quant-ph · 2025-09-17 · unverdicted · novelty 5.0

A VAE learns a minimal latent representation from noisy quantum simulator snapshots that correlates with the sine-Gordon equilibrium parameter and detects anomalous post-quench dynamics including frozen-in solitons.

Staged Factorial Screening for Budget-Constrained Micro-Pretraining

cs.LG · 2026-04-27 · unverdicted · novelty 3.0

Staged factorial screening recovers stable early penalties from total batch, depth, and width in 2-10 minute pretraining runs and supports a bridge-centered recommendation through 24-hour continuations on two hosts.

Deep learning applied to computational mechanics: A comprehensive review, state of the art, and the classics

cs.LG · 2022-12-18 · unverdicted · novelty 2.0

A comprehensive review of deep learning techniques for computational mechanics, including LSTM for constitutive modeling, PINNs for PDE solving, optimizers, and kernel methods.

citing papers explorer

Showing 10 of 10 citing papers.

Generative models on phase space hep-ph · 2026-04-02 · unverdicted · none · ref 77
Generative diffusion and flow models are constructed to remain exactly on the Lorentz-invariant massless N-particle phase space manifold during sampling for particle physics applications.
OmniMol: Transferring Particle Physics Knowledge to Molecular Dynamics with Point-Edge Transformers physics.chem-ph · 2026-01-15 · unverdicted · none · ref 36 · internal anchor
OmniMol transfers a billion-jet pre-trained PET foundation model from HEP to molecular dynamics via an interaction-matrix attention bias, delivering strong performance on the oMol dataset with minimal fine-tuning and fast inference.
Data-Driven Calibration of Large Liquid Detectors with Unsupervised Learning physics.ins-det · 2025-12-19 · conditional · none · ref 15 · internal anchor
Unsupervised deep learning with a simplified optical photon transport model in the loss function extracts three PMT calibration constants per tube from background events in the SNO+ detector.
Beta-Scheduling: Momentum from Critical Damping as a Diagnostic and Correction Tool for Neural Network Training cs.LG · 2026-03-30 · unverdicted · none · ref 16 · internal anchor
A momentum schedule from critical damping speeds convergence and yields an optimizer-invariant diagnostic for locating and correcting specific underperforming layers in trained networks.
Bayesian Modeling and Prediction of Generalized Contact Matrices stat.ME · 2026-05-07 · unverdicted · none · ref 87
A Bayesian model for multi-feature contact matrices that uses tensor structures and contingency table theory to satisfy structural constraints and impute missing contact features, validated on simulations and US/German survey data.
From Spherical to Gaussian: A Comparative Analysis of Point Cloud Cropping Strategies in Large-Scale 3D Environments cs.CV · 2026-05-03 · unverdicted · none · ref 75 · 2 links · internal anchor
Gaussian and related cropping strategies for point cloud subclouds improve 3D neural network performance over spherical cropping on large outdoor scenes.
Now You See That: Learning End-to-End Humanoid Locomotion from Raw Pixels cs.RO · 2026-02-06 · unverdicted · none · ref 45 · internal anchor
An end-to-end policy learns robust humanoid locomotion directly from noisy depth images via high-fidelity sensor simulation, vision-aware distillation from privileged maps, and terrain-specific multi-critic reward shaping.
Learning Minimal Representations of Many-Body Physics from Snapshots of a Quantum Simulator quant-ph · 2025-09-17 · unverdicted · none · ref 72 · internal anchor
A VAE learns a minimal latent representation from noisy quantum simulator snapshots that correlates with the sine-Gordon equilibrium parameter and detects anomalous post-quench dynamics including frozen-in solitons.
Staged Factorial Screening for Budget-Constrained Micro-Pretraining cs.LG · 2026-04-27 · unverdicted · none · ref 17 · internal anchor
Staged factorial screening recovers stable early penalties from total batch, depth, and width in 2-10 minute pretraining runs and supports a bridge-centered recommendation through 24-hour continuations on two hosts.
Deep learning applied to computational mechanics: A comprehensive review, state of the art, and the classics cs.LG · 2022-12-18 · unverdicted · none · ref 202 · internal anchor
A comprehensive review of deep learning techniques for computational mechanics, including LSTM for constitutive modeling, PINNs for PDE solving, optimizers, and kernel methods.

Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer