In a random feature model, optimal SGD learning-rate schedules are polynomial decay in the easy phase and warmup-stable-decay in the hard phase, outperforming constant or simple power-law schedules and transferring differently across training horizons.
Seesaw: Accelerating training by balancing learning rate and batch size scheduling.arXiv preprint arXiv:2510.14717
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cond-mat.dis-nn 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Theory of Optimal Learning Rate Schedules and Scaling Laws for a Random Feature Model
In a random feature model, optimal SGD learning-rate schedules are polynomial decay in the easy phase and warmup-stable-decay in the hard phase, outperforming constant or simple power-law schedules and transferring differently across training horizons.