Neural networks exhibit grokking on small algorithmic datasets, achieving perfect generalization well after overfitting.
Reconciling modern machine-learning practice and the classical bias– variance trade-off
5 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.
Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.
Lecture notes that treat statistical physics as probability theory and connect Ising models, spin glasses, and renormalization group ideas to Hopfield networks, restricted Boltzmann machines, and large language models.
citing papers explorer
-
Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets
Neural networks exhibit grokking on small algorithmic datasets, achieving perfect generalization well after overfitting.
-
Language Models (Mostly) Know What They Know
Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.
-
A General Language Assistant as a Laboratory for Alignment
Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.
-
Lecture Notes on Statistical Physics and Neural Networks
Lecture notes that treat statistical physics as probability theory and connect Ising models, spin glasses, and renormalization group ideas to Hopfield networks, restricted Boltzmann machines, and large language models.
- Scaling Laws for Neural Language Models