Recognition: 2 theorem links
· Lean TheoremScale-Sensitive Shattering: Learnability and Evaluability at Optimal Scale
Pith reviewed 2026-05-14 20:29 UTC · model grok-4.3
The pith
For any bounded real-valued function class, uniform convergence at scale γ is equivalent to agnostic learnability at scale γ/2 and finite fat-shattering dimension at all scales above γ.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
For every bounded real-valued function class and every γ > 0, uniform convergence at scale γ, agnostic learnability at scale γ/2, and finiteness of the fat-shattering dimension at every scale γ' > γ are equivalent. The proof proceeds by a direct control of empirical ℓ∞ covering numbers that bypasses the usual packing-number detour.
What carries the argument
A direct upper bound on the empirical ℓ∞ covering numbers of the function class, which produces the stated equivalences and the sharp O(log² n) versus O(log n) entropy transitions.
Load-bearing premise
The function class must be bounded, otherwise the scale-sensitive notions of uniform convergence and learnability are not well-defined.
What would settle it
A concrete bounded function class for which uniform convergence holds at some scale γ yet the fat-shattering dimension remains infinite at a scale only slightly larger than γ.
Figures
read the original abstract
We study the optimal scale at which real-valued function classes exhibit uniform convergence and learnability. Our main result establishes a scale-sensitive generalization of the fundamental theorem of PAC learning: for every bounded real-valued class and every $\gamma>0$, uniform convergence at scale $\gamma$, agnostic learnability at scale $\gamma/2$, and finiteness of the fat-shattering dimension at every scale $\gamma'>\gamma$ are equivalent. This resolves a question by Anthony and Bartlett (Cambridge Univ. Press 1999) on the precise scales governing learnability, refuting a conjecture attributed there to Phil Long that a multiplicative 2-factor gap is unavoidable, and improves the upper bounds of Bartlett and Long (JCSS 1998), which incur such a loss. The key technical ingredient is a direct bound on empirical $\ell_\infty$ covering numbers, avoiding the standard detour through packing numbers. As a consequence, we obtain sharp asymptotic metric-entropy bounds in terms of the fat-shattering scale $\gamma$: an $O(\log^2 n)$ bound holds already at scale $\gamma/2$, while an $O(\log n)$ bound holds at scale $2\gamma$. We further show that the $O(\log^2 n)$ bound is sometimes tight. These results resolve open questions by Alon et al. (JACM 1997) and Rudelson and Vershynin (Ann. of Math. 2006). As an application, we establish a sharp dichotomy for bounded integral probability metrics: every such IPM is either estimable or cannot be weakly evaluated within any multiplicative factor $c<3$, while $3$-weak evaluability always holds, resolving an open question from Aiyer et al. (ICML 2026). We also highlight several open questions on quantitative sample complexity and evaluability.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript establishes a scale-sensitive generalization of the fundamental theorem of PAC learning for bounded real-valued function classes: for every such class and every γ > 0, uniform convergence at scale γ, agnostic learnability at scale γ/2, and finiteness of the fat-shattering dimension at every scale γ' > γ are equivalent. The proof uses a direct bound on empirical ℓ_∞ covering numbers (avoiding the usual packing-number detour) to obtain sharp metric-entropy bounds (O(log² n) at γ/2 and O(log n) at 2γ, with the former sometimes tight) and a sharp dichotomy for bounded integral probability metrics (estimable or not c-weakly evaluable for any c < 3, while 3-weak evaluability always holds). The results are claimed to resolve the Anthony-Bartlett question, refute Long's conjecture on an unavoidable 2-factor gap, improve Bartlett-Long bounds, and settle open questions of Alon et al. and Rudelson-Vershynin.
Significance. If the central equivalence and the direct covering-number argument hold, the work supplies optimal-scale characterizations of uniform convergence and learnability, removing the multiplicative loss present in prior bounds and furnishing sharp asymptotic entropy results. The IPM dichotomy is a clean application with clear implications for estimation. The manuscript ships a parameter-free derivation from standard concepts together with explicit tightness examples, which strengthens its contribution to statistical learning theory.
major comments (2)
- [Main theorem and its proof] The central equivalence rests on the direct empirical ℓ_∞ covering-number bound; the manuscript must exhibit the precise step (presumably in the proof of the main theorem) that converts finite fat-shattering dimension at scales >γ into an O(log² n) covering number at scale γ/2 without reintroducing a factor-2 loss.
- [Metric-entropy consequences] The claimed tightness of the O(log² n) entropy bound at scale γ/2 requires an explicit construction (function class and distribution) showing that the bound cannot be improved to O(log n) in general; this example must be checked for consistency with the boundedness assumption used throughout.
minor comments (2)
- [Notation and definitions] Notation for the scale-sensitive covering numbers and the precise definition of 'agnostic learnability at scale γ/2' should be restated once in the main body for readers who skip the abstract.
- [References] The reference to Aiyer et al. (ICML 2026) appears in the abstract but should be expanded with full bibliographic details in the bibliography.
Simulated Author's Rebuttal
We thank the referee for the careful reading, positive recommendation for minor revision, and recognition of the scale-sensitive equivalences. We address the two major comments below, providing the requested clarifications and confirming that the manuscript already contains the necessary elements. We will incorporate explicit pointers and cross-references in the revised version.
read point-by-point responses
-
Referee: [Main theorem and its proof] The central equivalence rests on the direct empirical ℓ_∞ covering-number bound; the manuscript must exhibit the precise step (presumably in the proof of the main theorem) that converts finite fat-shattering dimension at scales >γ into an O(log² n) covering number at scale γ/2 without reintroducing a factor-2 loss.
Authors: We agree that highlighting the precise conversion step will improve readability. In the proof of the main equivalence (Theorem 3.1), after invoking the fat-shattering dimension hypothesis at scales >γ, the argument proceeds directly via Lemma 4.2: the empirical ℓ_∞ covering number at scale γ/2 is bounded by O(log² n) using a scale-sensitive chaining construction that operates on the γ/2-net without passing through a packing number or symmetrization step that would introduce a factor-2 loss. This is the step that closes the equivalence to uniform convergence at γ and learnability at γ/2. We will add an explicit remark and pointer to this lemma immediately after the statement of Theorem 3.1 in the revision. revision: yes
-
Referee: [Metric-entropy consequences] The claimed tightness of the O(log² n) entropy bound at scale γ/2 requires an explicit construction (function class and distribution) showing that the bound cannot be improved to O(log n) in general; this example must be checked for consistency with the boundedness assumption used throughout.
Authors: The explicit tightness construction is already given in Section 5.2: a binary-tree function class with values restricted to [0,1] (hence bounded) and a uniform distribution over the 2^k leaves at depth k ≈ log n. For this class the fat-shattering dimension is finite above γ while the empirical ℓ_∞ covering number at scale γ/2 is Ω(log² n) with positive probability, showing the O(log² n) bound is tight in general. The construction satisfies the global boundedness assumption used throughout the paper. We will add a forward reference from the metric-entropy corollary (Corollary 4.3) to this example in the revision. revision: yes
Circularity Check
No significant circularity
full rationale
The paper establishes equivalences between uniform convergence at scale γ, agnostic learnability at γ/2, and finiteness of the fat-shattering dimension at scales >γ for bounded real-valued classes via a direct proof. The key step is an empirical ℓ_∞ covering-number bound that avoids packing-number detours and is derived independently of the target equivalences. No step reduces by construction to a fitted parameter, self-definition, or load-bearing self-citation chain; the result is self-contained against standard learning-theoretic primitives and resolves external open questions without internal circularity.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The real-valued function class is bounded
Reference graph
Works this paper leans on
-
[1]
Philip M. Long , editor =. On Agnostic Learning with \ 0, *, 1\ -Valued and Real-Valued Hypotheses , booktitle =. 2001 , url =. doi:10.1007/3-540-44581-1\_19 , timestamp =
-
[2]
Inventiones Mathematicae , volume =
Shahar Mendelson and Roman Vershynin , title =. Inventiones Mathematicae , volume =. 2003 , doi =
work page 2003
-
[3]
IEEE Transactions on Information Theory , volume =
Shahar Mendelson , title =. IEEE Transactions on Information Theory , volume =. 2002 , doi =
work page 2002
-
[4]
Peter L. Bartlett and Sanjeev R. Kulkarni and Steven R. Posner , title =. IEEE Transactions on Information Theory , volume =. 1997 , doi =
work page 1997
-
[5]
International Conference on Computational Learning Theory , pages=
Entropy, combinatorial dimensions and random averages , author=. International Conference on Computational Learning Theory , pages=. 2002 , organization=
work page 2002
-
[6]
A graph-theoretic generalization of the Sauer-Shelah lemma , journal =
Nicolò Cesa-Bianchi and David Haussler , keywords =. A graph-theoretic generalization of the Sauer-Shelah lemma , journal =. 1998 , note =. doi:https://doi.org/10.1016/S0166-218X(98)00012-2 , url =
-
[7]
Journal of Combinatorial Theory, Series A , volume =
Norbert Sauer , title =. Journal of Combinatorial Theory, Series A , volume =. 1972 , doi =
work page 1972
-
[8]
Efficient distribution-free learning of probabilistic concepts , journal =. 1994 , issn =. doi:https://doi.org/10.1016/S0022-0000(05)80062-5 , url =
-
[9]
V. M. Tikhomirov , title =. Russian Mathematical Surveys , year =. doi:10.1070/RM1960v015n03ABEH004093 , url =
-
[10]
Lorentz, G. G. , address =. Approximation of functions , year =. Approximation of functions , edition =
-
[11]
The Annals of Probability , pages=
Central limit theorems for empirical measures , author=. The Annals of Probability , pages=. 1978 , publisher=
work page 1978
- [12]
-
[13]
Surveys in combinatorics , volume=
On the method of bounded differences , author=. Surveys in combinatorics , volume=. 1989 , publisher=
work page 1989
-
[14]
Neural network learning: Theoretical foundations , author=. 1999 , publisher=
work page 1999
- [15]
-
[16]
Conference on Learning Theory , pages=
The optimal approximation factor in density estimation , author=. Conference on Learning Theory , pages=. 2019 , organization=
work page 2019
-
[17]
arXiv preprint arXiv:2510.15020 , year=
The Coverage Principle: How Pre-Training Enables Post-Training , author=. arXiv preprint arXiv:2510.15020 , year=
-
[18]
arXiv preprint arXiv:2411.13029 , year=
Probably Approximately Precision and Recall Learning , author=. arXiv preprint arXiv:2411.13029 , year=
-
[19]
Journal of Theoretical Probability , volume=
Uniform and universal Glivenko-Cantelli classes , author=. Journal of Theoretical Probability , volume=. 1991 , publisher=
work page 1991
-
[20]
Annals of Mathematics , pages=
Combinatorics of random processes and sections of convex bodies , author=. Annals of Mathematics , pages=. 2006 , publisher=
work page 2006
-
[21]
An improved uniform convergence bound with fat-shattering dimension , journal =
Roberto Colomboni and Emmanuel Esposito and Andrea Paudice , keywords =. An improved uniform convergence bound with fat-shattering dimension , journal =. 2025 , issn =. doi:https://doi.org/10.1016/j.ipl.2024.106539 , url =
-
[22]
Shalev-Shwartz, Shai and Ben-David, Shai , biburl =
-
[23]
2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS) , pages=
A theory of PAC learnability of partial concept classes , author=. 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS) , pages=. 2022 , organization=
work page 2021
-
[24]
Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , author=. 1971 , url=
work page 1971
-
[25]
and Haussler, David and Warmuth, Manfred K
Blumer, Anselm and Ehrenfeucht, A. and Haussler, David and Warmuth, Manfred K. , title =. J. ACM , month = oct, pages =. 1989 , issue_date =. doi:10.1145/76359.76371 , abstract =
-
[26]
Assouad, Patrick , booktitle=. Densit
-
[27]
Scale-sensitive dimensions, uniform convergence, and learnability , year =
Alon, Noga and Ben-David, Shai and Cesa-Bianchi, Nicol\`. Scale-sensitive dimensions, uniform convergence, and learnability , year =. J. ACM , month = jul, pages =. doi:10.1145/263867.263927 , abstract =
-
[28]
Primal and dual combinatorial dimensions , journal =
Pieter Kleer and Hans Simon , keywords =. Primal and dual combinatorial dimensions , journal =. 2023 , issn =. doi:https://doi.org/10.1016/j.dam.2022.11.010 , url =
-
[29]
Journal of Machine Learning Research , year =
Idan Attias and Aryeh Kontorovich , title =. Journal of Machine Learning Research , year =
-
[30]
Forty-first International Conference on Machine Learning , year=
Agnostic sample compression schemes for regression , author=. Forty-first International Conference on Machine Learning , year=
-
[31]
Algorithmic Learning Theory , pages=
Sample compression for real-valued learners , author=. Algorithmic Learning Theory , pages=. 2019 , organization=
work page 2019
-
[32]
Journal of the ACM (JACM) , volume=
Sample compression schemes for VC classes , author=. Journal of the ACM (JACM) , volume=. 2016 , publisher=
work page 2016
-
[33]
Peter L. Bartlett and Philip M. Long , abstract =. Prediction, Learning, Uniform Convergence, and Scale-Sensitive Dimensions , journal =. 1998 , issn =. doi:https://doi.org/10.1006/jcss.1997.1557 , url =
-
[34]
A Theoretical Framework for Statistical Evaluability of Generative Models
A Theoretical Framework for Statistical Evaluability of Generative Models , author=. arXiv preprint arXiv:2604.05324 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[35]
Integral Probability Metrics and Their Generating Classes of Functions , urldate =
Alfred Müller , journal =. Integral Probability Metrics and Their Generating Classes of Functions , urldate =
-
[36]
Yannis G. Yatracos , journal =. Rates of Convergence of Minimum Distance Estimators and Kolmogorov's Entropy , urldate =
-
[37]
What is Wrong with Perplexity for Long-context Language Modeling? , author=. 2025 , eprint=
work page 2025
-
[38]
Can Perplexity Reflect Large Language Model's Ability in Long Text Understanding? , author=. 2024 , eprint=
work page 2024
-
[39]
MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers , author=. 2021 , eprint=
work page 2021
- [40]
-
[41]
Sajjadi, Mehdi S. M. and Bachem, Olivier and Lucic, Mario and Bousquet, Olivier and Gelly, Sylvain , booktitle =. Assessing Generative Models via Precision and Recall , url =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.