Neural networks exhibit grokking on small algorithmic datasets, achieving perfect generalization well after overfitting.
Dropout: a simple way to prevent neural networks from overfitting
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.LG 3representative citing papers
SAM solves a min-max problem to locate flat low-loss regions, improving generalization on CIFAR, ImageNet and label-noise tasks.
Many batch RL algorithms underperform both online DQN and the behavioral policy on Atari; an adapted discrete-action BCQ outperforms the others tested.
citing papers explorer
-
Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets
Neural networks exhibit grokking on small algorithmic datasets, achieving perfect generalization well after overfitting.
-
Sharpness-Aware Minimization for Efficiently Improving Generalization
SAM solves a min-max problem to locate flat low-loss regions, improving generalization on CIFAR, ImageNet and label-noise tasks.
-
Benchmarking Batch Deep Reinforcement Learning Algorithms
Many batch RL algorithms underperform both online DQN and the behavioral policy on Atari; an adapted discrete-action BCQ outperforms the others tested.