Recognition: unknown
In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning
read the original abstract
We present experiments demonstrating that some other form of capacity control, different from network size, plays a central role in learning multilayer feed-forward networks. We argue, partially through analogy to matrix factorization, that this is an inductive bias that can help shed light on deep learning.
This paper has not been read by Pith yet.
Forward citations
Cited by 7 Pith papers
-
Understanding deep learning requires rethinking generalization
State-of-the-art convolutional networks easily memorize random labels and unstructured noise images, indicating that generalization in deep learning cannot be explained by traditional capacity or regularization arguments.
-
Estimating Implicit Regularization in Deep Learning
Gradient matching empirically recovers implicit regularization effects such as l2 penalties from early stopping and dropout in neural networks.
-
Parameter Importance is Not Static: Evolving Parameter Isolation for Supervised Fine-Tuning
Evolving Parameter Isolation (EPI) periodically updates parameter isolation masks using online gradient signals during supervised fine-tuning to protect emerging task-critical parameters and reduce interference and fo...
-
Nexus: Same Pretraining Loss, Better Downstream Generalization via Common Minima
Nexus optimizer improves LLM downstream performance by converging to common minima across data sources despite identical pretraining loss.
-
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
SPIN lets weak LLMs become strong by self-generating training data from previous model versions and training to prefer human-annotated responses over its own outputs, outperforming DPO even with extra GPT-4 data on be...
-
Probing the Impact of Scale on Data-Efficient, Generalist Transformer World Models for Atari
Transformer world models on Atari exhibit game-specific scaling regimes, but joint training on 26 environments produces consistent monotonic gains that improve downstream control policies to a median normalized score ...
-
(How) Learning Rates Regulate Catastrophic Overtraining
Learning rate decay during SFT increases pretrained model sharpness, which exacerbates catastrophic forgetting and causes overtraining in LLMs.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.