Cyclical Learning Rates for Training Neural Networks

Leslie N Smith · 2015 · cs.CV · arXiv 1506.01186

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

open full Pith review browse 5 citing papers arXiv PDF

abstract

It is known that the learning rate is the most important hyper-parameter to tune for training deep neural networks. This paper describes a new method for setting the learning rate, named cyclical learning rates, which practically eliminates the need to experimentally find the best values and schedule for the global learning rates. Instead of monotonically decreasing the learning rate, this method lets the learning rate cyclically vary between reasonable boundary values. Training with cyclical learning rates instead of fixed values achieves improved classification accuracy without a need to tune and often in fewer iterations. This paper also describes a simple way to estimate "reasonable bounds" -- linearly increasing the learning rate of the network for a few epochs. In addition, cyclical learning rates are demonstrated on the CIFAR-10 and CIFAR-100 datasets with ResNets, Stochastic Depth networks, and DenseNets, and the ImageNet dataset with the AlexNet and GoogLeNet architectures. These are practical tools for everyone who trains neural networks.

representative citing papers

Two-stage Convolutional Neural Network for pseudo six-dimensional phase space reconstruction

hep-ex · 2026-03-03 · unverdicted · novelty 7.0

A two-stage CNN reconstructs pseudo 6D phase space from 16 x-y images taken at varying rotation angles in the KEK-ATF injector.

Beta-Scheduling: Momentum from Critical Damping as a Diagnostic and Correction Tool for Neural Network Training

cs.LG · 2026-03-30 · unverdicted · novelty 6.0

A momentum schedule from critical damping speeds convergence and yields an optimizer-invariant diagnostic for locating and correcting specific underperforming layers in trained networks.

SGDR: Stochastic Gradient Descent with Warm Restarts

cs.LG · 2016-08-13 · accept · novelty 6.0

SGDR uses periodic warm restarts of the learning rate in SGD to reach new state-of-the-art error rates of 3.14% on CIFAR-10 and 16.21% on CIFAR-100.

Learning the Universe with the 2nd Generation of CAMELS: Varying 35 parameters of the IllustrisTNG model in (50Mpc/h)^3 boxes

astro-ph.CO · 2026-06-08 · unverdicted · novelty 4.0

New CAMELS simulations in larger (50 Mpc/h)^3 boxes with 35 varied parameters produce tighter neural-network constraints on model parameters than prior smaller-volume runs, with public data release.

Staged Factorial Screening for Budget-Constrained Micro-Pretraining

cs.LG · 2026-04-27 · unverdicted · novelty 3.0

Staged factorial screening recovers stable early penalties from total batch, depth, and width in 2-10 minute pretraining runs and supports a bridge-centered recommendation through 24-hour continuations on two hosts.

citing papers explorer

Showing 5 of 5 citing papers.

Two-stage Convolutional Neural Network for pseudo six-dimensional phase space reconstruction hep-ex · 2026-03-03 · unverdicted · none · ref 46 · internal anchor
A two-stage CNN reconstructs pseudo 6D phase space from 16 x-y images taken at varying rotation angles in the KEK-ATF injector.
Beta-Scheduling: Momentum from Critical Damping as a Diagnostic and Correction Tool for Neural Network Training cs.LG · 2026-03-30 · unverdicted · none · ref 15 · internal anchor
A momentum schedule from critical damping speeds convergence and yields an optimizer-invariant diagnostic for locating and correcting specific underperforming layers in trained networks.
SGDR: Stochastic Gradient Descent with Warm Restarts cs.LG · 2016-08-13 · accept · none · ref 15
SGDR uses periodic warm restarts of the learning rate in SGD to reach new state-of-the-art error rates of 3.14% on CIFAR-10 and 16.21% on CIFAR-100.
Learning the Universe with the 2nd Generation of CAMELS: Varying 35 parameters of the IllustrisTNG model in (50Mpc/h)^3 boxes astro-ph.CO · 2026-06-08 · unverdicted · none · ref 72 · internal anchor
New CAMELS simulations in larger (50 Mpc/h)^3 boxes with 35 varied parameters produce tighter neural-network constraints on model parameters than prior smaller-volume runs, with public data release.
Staged Factorial Screening for Budget-Constrained Micro-Pretraining cs.LG · 2026-04-27 · unverdicted · none · ref 16 · internal anchor
Staged factorial screening recovers stable early penalties from total batch, depth, and width in 2-10 minute pretraining runs and supports a bridge-centered recommendation through 24-hour continuations on two hosts.

Cyclical Learning Rates for Training Neural Networks

fields

years

verdicts

representative citing papers

citing papers explorer