Rethinking generalization requires revisiting old ideas: statistical mechanics approaches and complex learning behavior

· 2017 · cs.LG · arXiv 1710.09553

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

We describe an approach to understand the peculiar and counterintuitive generalization properties of deep neural networks. The approach involves going beyond worst-case theoretical capacity control frameworks that have been popular in machine learning in recent years to revisit old ideas in the statistical mechanics of neural networks. Within this approach, we present a prototypical Very Simple Deep Learning (VSDL) model, whose behavior is controlled by two control parameters, one describing an effective amount of data, or load, on the network (that decreases when noise is added to the input), and one with an effective temperature interpretation (that increases when algorithms are early stopped). Using this model, we describe how a very simple application of ideas from the statistical mechanics theory of generalization provides a strong qualitative description of recently-observed empirical results regarding the inability of deep neural networks not to overfit training data, discontinuous learning and sharp transitions in the generalization properties of learning algorithms, etc.

representative citing papers

AlphaQ: Calibration-Free Bit Allocation for Mixture-of-Experts Quantization

cs.LG · 2026-06-03 · unverdicted · novelty 6.0

AlphaQ performs calibration-free mixed-precision quantization of MoE models by allocating higher bits to experts whose weight spectra exhibit stronger heavy-tailed structure according to HT-SR theory, outperforming calibration-based methods and reaching near full-precision accuracy at 3.5 average bi

Unveiling Multi-regime Patterns in SciML: Distinct Failure Modes and Regime-specific Optimization

cs.LG · 2026-05-27 · unverdicted · novelty 5.0

SciML models show a consistent three-regime structure with regime-specific optimization effectiveness and fine-grained failure modes.

citing papers explorer

Showing 2 of 2 citing papers.

AlphaQ: Calibration-Free Bit Allocation for Mixture-of-Experts Quantization cs.LG · 2026-06-03 · unverdicted · none · ref 17 · internal anchor
AlphaQ performs calibration-free mixed-precision quantization of MoE models by allocating higher bits to experts whose weight spectra exhibit stronger heavy-tailed structure according to HT-SR theory, outperforming calibration-based methods and reaching near full-precision accuracy at 3.5 average bi
Unveiling Multi-regime Patterns in SciML: Distinct Failure Modes and Regime-specific Optimization cs.LG · 2026-05-27 · unverdicted · none · ref 4 · internal anchor
SciML models show a consistent three-regime structure with regime-specific optimization effectiveness and fine-grained failure modes.

Rethinking generalization requires revisiting old ideas: statistical mechanics approaches and complex learning behavior

fields

years

verdicts

representative citing papers

citing papers explorer