Exploring Generalization in Deep Learning

Behnam Neyshabur; David McAllester; Nathan Srebro; Srinadh Bhojanapalli

arxiv: 1706.08947 · v2 · pith:GL4E3BRYnew · submitted 2017-06-27 · 💻 cs.LG

Exploring Generalization in Deep Learning

Behnam Neyshabur , Srinadh Bhojanapalli , David McAllester , Nathan Srebro This is my paper

classification 💻 cs.LG

keywords generalizationdeepmeasuressharpnessconnectionconsidercontroldifferent

0 comments

read the original abstract

With a goal of understanding what drives generalization in deep networks, we consider several recently suggested explanations, including norm-based control, sharpness and robustness. We study how these measures can ensure generalization, highlighting the importance of scale normalization, and making a connection between sharpness and PAC-Bayes theory. We then investigate how well the measures explain different observed phenomena.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 6 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Sample Complexity of Scientific Discovery: PAC Learnability of Compositional Function Trees
cs.LG 2026-06 unverdicted novelty 7.0

Proves that Rademacher complexity of depth-d compositional trees over finite operator vocabulary is controlled by (K b L)^{d} / sqrt(n) under Lipschitz conditions on operators.
A Sharper Picture of Generalization in Transformers
cs.LG 2026-05 unverdicted novelty 6.0

Sparse low-degree Fourier spectra allow flat minima in transformers for boolean functions up to context-length sparsity, enabling non-vacuous PAC-Bayes generalization bounds via an idealized low-sharpness learner.
A Sharper Picture of Generalization in Transformers
cs.LG 2026-05 unverdicted novelty 6.0

PAC-Bayes applied to low-sharpness flat minima yields non-vacuous generalization bounds for boolean functions whose Fourier spectra are sparse and low-degree, with parameters estimable by property testing.
Feature Starvation as Geometric Instability in Sparse Autoencoders
cs.LG 2026-05 unverdicted novelty 6.0

Adaptive elastic net SAEs (AEN-SAEs) mitigate feature starvation in SAEs by combining ℓ2 structural stability with adaptive ℓ1 reweighting, producing a Lipschitz-continuous sparse coding map that recovers global featu...
Margin-Adaptive Confidence Ranking for Reliable LLM Judgement
cs.LG 2026-05 unverdicted novelty 5.0

Introduces a margin-adaptive confidence ranking method that learns an estimator from simulated diversity and derives margin-dependent generalization bounds for use in fixed-sequence testing of LLM-human agreement.
Margin-Adaptive Confidence Ranking for Reliable LLM Judgement
cs.LG 2026-05 unverdicted novelty 4.0

Develops a margin-adaptive learned confidence estimator for LLMs with generalization guarantees to improve agreement rates with human judgments over heuristic baselines.