pith. machine review for the scientific record. sign in

arxiv: 1506.01186 · v6 · submitted 2015-06-03 · 💻 cs.CV · cs.LG· cs.NE

Recognition: unknown

Cyclical Learning Rates for Training Neural Networks

Authors on Pith no claims yet
classification 💻 cs.CV cs.LGcs.NE
keywords learningrateratescyclicalnetworksneuraltrainingvalues
0
0 comments X
read the original abstract

It is known that the learning rate is the most important hyper-parameter to tune for training deep neural networks. This paper describes a new method for setting the learning rate, named cyclical learning rates, which practically eliminates the need to experimentally find the best values and schedule for the global learning rates. Instead of monotonically decreasing the learning rate, this method lets the learning rate cyclically vary between reasonable boundary values. Training with cyclical learning rates instead of fixed values achieves improved classification accuracy without a need to tune and often in fewer iterations. This paper also describes a simple way to estimate "reasonable bounds" -- linearly increasing the learning rate of the network for a few epochs. In addition, cyclical learning rates are demonstrated on the CIFAR-10 and CIFAR-100 datasets with ResNets, Stochastic Depth networks, and DenseNets, and the ImageNet dataset with the AlexNet and GoogLeNet architectures. These are practical tools for everyone who trains neural networks.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Beta-Scheduling: Momentum from Critical Damping as a Diagnostic and Correction Tool for Neural Network Training

    cs.LG 2026-03 unverdicted novelty 6.0

    A momentum schedule from critical damping speeds convergence and yields an optimizer-invariant diagnostic for locating and correcting specific underperforming layers in trained networks.

  2. SGDR: Stochastic Gradient Descent with Warm Restarts

    cs.LG 2016-08 accept novelty 6.0

    SGDR uses periodic warm restarts of the learning rate in SGD to reach new state-of-the-art error rates of 3.14% on CIFAR-10 and 16.21% on CIFAR-100.