Deep double descent: Where bigger models and more data hurt

Preetum Nakkiran, Gal Kaplun, Yamini Bansal, Tristan Yang, Boaz Barak, Ilya Sutskever · 1912 · arXiv 1912.02292

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets

cs.LG · 2022-01-06 · unverdicted · novelty 8.0

Neural networks exhibit grokking on small algorithmic datasets, achieving perfect generalization well after overfitting.

Language Models (Mostly) Know What They Know

cs.CL · 2022-07-11 · unverdicted · novelty 6.0

Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.

A General Language Assistant as a Laboratory for Alignment

cs.CL · 2021-12-01 · conditional · novelty 6.0

Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.

citing papers explorer

Showing 3 of 3 citing papers.

Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets cs.LG · 2022-01-06 · unverdicted · none · ref 10
Neural networks exhibit grokking on small algorithmic datasets, achieving perfect generalization well after overfitting.
Language Models (Mostly) Know What They Know cs.CL · 2022-07-11 · unverdicted · none · ref 94
Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.
A General Language Assistant as a Laboratory for Alignment cs.CL · 2021-12-01 · conditional · none · ref 39
Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.

Deep double descent: Where bigger models and more data hurt

fields

years

verdicts

representative citing papers

citing papers explorer