arxiv: 2605.13684 · v1 · submitted 2026-05-13 · 💻 cs.LG · cs.IT· math.IT

Recognition: 2 theorem links

· Lean Theorem

Scale-Sensitive Shattering: Learnability and Evaluability at Optimal Scale

Shashaank Aiyer , Yishay Mansour , Shay Moran , Han Shao , Tom Waknine

Authors on Pith no claims yet

Pith reviewed 2026-05-14 20:29 UTC · model grok-4.3

classification 💻 cs.LG cs.ITmath.IT

keywords uniform convergencefat-shattering dimensionagnostic learnabilitymetric entropyscale-sensitive shatteringintegral probability metrics

0 comments

The pith

For any bounded real-valued function class, uniform convergence at scale γ is equivalent to agnostic learnability at scale γ/2 and finite fat-shattering dimension at all scales above γ.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a precise scale-sensitive equivalence that unifies three central notions for real-valued function classes. Uniform convergence of empirical averages to expectations at a given scale γ holds if and only if the class is agnostically learnable at half that scale and its fat-shattering dimension remains finite at every finer scale. This equivalence eliminates the factor-of-two gaps left by earlier bounds and directly yields tight metric-entropy estimates at the optimal scales. The same machinery produces a sharp dichotomy for integral probability metrics, showing that any bounded IPM is either fully estimable or requires at least a factor-of-three gap for weak evaluation.

Core claim

For every bounded real-valued function class and every γ > 0, uniform convergence at scale γ, agnostic learnability at scale γ/2, and finiteness of the fat-shattering dimension at every scale γ' > γ are equivalent. The proof proceeds by a direct control of empirical ℓ∞ covering numbers that bypasses the usual packing-number detour.

What carries the argument

A direct upper bound on the empirical ℓ∞ covering numbers of the function class, which produces the stated equivalences and the sharp O(log² n) versus O(log n) entropy transitions.

Load-bearing premise

The function class must be bounded, otherwise the scale-sensitive notions of uniform convergence and learnability are not well-defined.

What would settle it

A concrete bounded function class for which uniform convergence holds at some scale γ yet the fat-shattering dimension remains infinite at a scale only slightly larger than γ.

Figures

Figures reproduced from arXiv: 2605.13684 by Han Shao, Shashaank Aiyer, Shay Moran, Tom Waknine, Yishay Mansour.

read the original abstract

We study the optimal scale at which real-valued function classes exhibit uniform convergence and learnability. Our main result establishes a scale-sensitive generalization of the fundamental theorem of PAC learning: for every bounded real-valued class and every $\gamma>0$, uniform convergence at scale $\gamma$, agnostic learnability at scale $\gamma/2$, and finiteness of the fat-shattering dimension at every scale $\gamma'>\gamma$ are equivalent. This resolves a question by Anthony and Bartlett (Cambridge Univ. Press 1999) on the precise scales governing learnability, refuting a conjecture attributed there to Phil Long that a multiplicative 2-factor gap is unavoidable, and improves the upper bounds of Bartlett and Long (JCSS 1998), which incur such a loss. The key technical ingredient is a direct bound on empirical $\ell_\infty$ covering numbers, avoiding the standard detour through packing numbers. As a consequence, we obtain sharp asymptotic metric-entropy bounds in terms of the fat-shattering scale $\gamma$: an $O(\log^2 n)$ bound holds already at scale $\gamma/2$, while an $O(\log n)$ bound holds at scale $2\gamma$. We further show that the $O(\log^2 n)$ bound is sometimes tight. These results resolve open questions by Alon et al. (JACM 1997) and Rudelson and Vershynin (Ann. of Math. 2006). As an application, we establish a sharp dichotomy for bounded integral probability metrics: every such IPM is either estimable or cannot be weakly evaluated within any multiplicative factor $c<3$, while $3$-weak evaluability always holds, resolving an open question from Aiyer et al. (ICML 2026). We also highlight several open questions on quantitative sample complexity and evaluability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper delivers the tight scale-sensitive PAC equivalence for bounded real-valued classes, removing the conjectured 2-factor gap via a direct covering-number argument.

read the letter

The main takeaway is that uniform convergence at scale γ is equivalent to agnostic learnability at γ/2 and to finite fat-shattering dimension at every scale above γ. This directly resolves the Anthony-Bartlett question and shows that the multiplicative 2-factor gap Long conjectured is not forced. The same argument yields improved metric-entropy bounds: O(log² n) already at γ/2 and O(log n) at 2γ, with the former sometimes tight, settling questions from Alon et al. and Rudelson-Vershynin. The IPM evaluability dichotomy (3-weak always works, nothing better than 3 in general) is a clean side application. The key step is the direct bound on empirical ℓ∞ covering numbers, which avoids the usual packing detour and keeps the constants optimal. Boundedness of the class is stated explicitly and is necessary for the scale-sensitive notions, so there is no hidden assumption. The development stays within standard concepts without circular reductions or fitted parameters. Minor soft spots are the open questions they themselves flag on quantitative sample complexity; those are natural next steps rather than gaps in the current claims. The work is aimed at researchers who care about exact constants in real-valued generalization and dimension theory. It is the sort of result that deserves a serious referee because the equivalences are sharp and the technical move is direct enough to check in detail.

Referee Report

2 major / 2 minor

Summary. The manuscript establishes a scale-sensitive generalization of the fundamental theorem of PAC learning for bounded real-valued function classes: for every such class and every γ > 0, uniform convergence at scale γ, agnostic learnability at scale γ/2, and finiteness of the fat-shattering dimension at every scale γ' > γ are equivalent. The proof uses a direct bound on empirical ℓ_∞ covering numbers (avoiding the usual packing-number detour) to obtain sharp metric-entropy bounds (O(log² n) at γ/2 and O(log n) at 2γ, with the former sometimes tight) and a sharp dichotomy for bounded integral probability metrics (estimable or not c-weakly evaluable for any c < 3, while 3-weak evaluability always holds). The results are claimed to resolve the Anthony-Bartlett question, refute Long's conjecture on an unavoidable 2-factor gap, improve Bartlett-Long bounds, and settle open questions of Alon et al. and Rudelson-Vershynin.

Significance. If the central equivalence and the direct covering-number argument hold, the work supplies optimal-scale characterizations of uniform convergence and learnability, removing the multiplicative loss present in prior bounds and furnishing sharp asymptotic entropy results. The IPM dichotomy is a clean application with clear implications for estimation. The manuscript ships a parameter-free derivation from standard concepts together with explicit tightness examples, which strengthens its contribution to statistical learning theory.

major comments (2)

[Main theorem and its proof] The central equivalence rests on the direct empirical ℓ_∞ covering-number bound; the manuscript must exhibit the precise step (presumably in the proof of the main theorem) that converts finite fat-shattering dimension at scales >γ into an O(log² n) covering number at scale γ/2 without reintroducing a factor-2 loss.
[Metric-entropy consequences] The claimed tightness of the O(log² n) entropy bound at scale γ/2 requires an explicit construction (function class and distribution) showing that the bound cannot be improved to O(log n) in general; this example must be checked for consistency with the boundedness assumption used throughout.

minor comments (2)

[Notation and definitions] Notation for the scale-sensitive covering numbers and the precise definition of 'agnostic learnability at scale γ/2' should be restated once in the main body for readers who skip the abstract.
[References] The reference to Aiyer et al. (ICML 2026) appears in the abstract but should be expanded with full bibliographic details in the bibliography.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading, positive recommendation for minor revision, and recognition of the scale-sensitive equivalences. We address the two major comments below, providing the requested clarifications and confirming that the manuscript already contains the necessary elements. We will incorporate explicit pointers and cross-references in the revised version.

read point-by-point responses

Referee: [Main theorem and its proof] The central equivalence rests on the direct empirical ℓ_∞ covering-number bound; the manuscript must exhibit the precise step (presumably in the proof of the main theorem) that converts finite fat-shattering dimension at scales >γ into an O(log² n) covering number at scale γ/2 without reintroducing a factor-2 loss.

Authors: We agree that highlighting the precise conversion step will improve readability. In the proof of the main equivalence (Theorem 3.1), after invoking the fat-shattering dimension hypothesis at scales >γ, the argument proceeds directly via Lemma 4.2: the empirical ℓ_∞ covering number at scale γ/2 is bounded by O(log² n) using a scale-sensitive chaining construction that operates on the γ/2-net without passing through a packing number or symmetrization step that would introduce a factor-2 loss. This is the step that closes the equivalence to uniform convergence at γ and learnability at γ/2. We will add an explicit remark and pointer to this lemma immediately after the statement of Theorem 3.1 in the revision. revision: yes
Referee: [Metric-entropy consequences] The claimed tightness of the O(log² n) entropy bound at scale γ/2 requires an explicit construction (function class and distribution) showing that the bound cannot be improved to O(log n) in general; this example must be checked for consistency with the boundedness assumption used throughout.

Authors: The explicit tightness construction is already given in Section 5.2: a binary-tree function class with values restricted to [0,1] (hence bounded) and a uniform distribution over the 2^k leaves at depth k ≈ log n. For this class the fat-shattering dimension is finite above γ while the empirical ℓ_∞ covering number at scale γ/2 is Ω(log² n) with positive probability, showing the O(log² n) bound is tight in general. The construction satisfies the global boundedness assumption used throughout the paper. We will add a forward reference from the metric-entropy corollary (Corollary 4.3) to this example in the revision. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper establishes equivalences between uniform convergence at scale γ, agnostic learnability at γ/2, and finiteness of the fat-shattering dimension at scales >γ for bounded real-valued classes via a direct proof. The key step is an empirical ℓ_∞ covering-number bound that avoids packing-number detours and is derived independently of the target equivalences. No step reduces by construction to a fitted parameter, self-definition, or load-bearing self-citation chain; the result is self-contained against standard learning-theoretic primitives and resolves external open questions without internal circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the standard domain assumption that the function class is bounded, plus background facts from statistical learning theory about covering numbers and fat-shattering dimension; no free parameters or new entities are introduced.

axioms (1)

domain assumption The real-valued function class is bounded
Explicitly required in the statement of the main equivalence for the scale-sensitive notions to be defined.

pith-pipeline@v0.9.0 · 5657 in / 1223 out tokens · 50125 ms · 2026-05-14T20:29:49.960172+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages · 1 internal anchor

[1]

Long , editor =

Philip M. Long , editor =. On Agnostic Learning with \ 0, *, 1\ -Valued and Real-Valued Hypotheses , booktitle =. 2001 , url =. doi:10.1007/3-540-44581-1\_19 , timestamp =

work page doi:10.1007/3-540-44581-1 2001
[2]

Inventiones Mathematicae , volume =

Shahar Mendelson and Roman Vershynin , title =. Inventiones Mathematicae , volume =. 2003 , doi =

work page 2003
[3]

IEEE Transactions on Information Theory , volume =

Shahar Mendelson , title =. IEEE Transactions on Information Theory , volume =. 2002 , doi =

work page 2002
[4]

Bartlett and Sanjeev R

Peter L. Bartlett and Sanjeev R. Kulkarni and Steven R. Posner , title =. IEEE Transactions on Information Theory , volume =. 1997 , doi =

work page 1997
[5]

International Conference on Computational Learning Theory , pages=

Entropy, combinatorial dimensions and random averages , author=. International Conference on Computational Learning Theory , pages=. 2002 , organization=

work page 2002
[6]

A graph-theoretic generalization of the Sauer-Shelah lemma , journal =

Nicolò Cesa-Bianchi and David Haussler , keywords =. A graph-theoretic generalization of the Sauer-Shelah lemma , journal =. 1998 , note =. doi:https://doi.org/10.1016/S0166-218X(98)00012-2 , url =

work page doi:10.1016/s0166-218x(98)00012-2 1998
[7]

Journal of Combinatorial Theory, Series A , volume =

Norbert Sauer , title =. Journal of Combinatorial Theory, Series A , volume =. 1972 , doi =

work page 1972
[8]

1994 , issn =

Efficient distribution-free learning of probabilistic concepts , journal =. 1994 , issn =. doi:https://doi.org/10.1016/S0022-0000(05)80062-5 , url =

work page doi:10.1016/s0022-0000(05)80062-5 1994
[9]

V. M. Tikhomirov , title =. Russian Mathematical Surveys , year =. doi:10.1070/RM1960v015n03ABEH004093 , url =

work page doi:10.1070/rm1960v015n03abeh004093
[10]

Lorentz, G. G. , address =. Approximation of functions , year =. Approximation of functions , edition =

work page
[11]

The Annals of Probability , pages=

Central limit theorems for empirical measures , author=. The Annals of Probability , pages=. 1978 , publisher=

work page 1978
[12]

1984 , url=

A course on empirical processes , author=. 1984 , url=

work page 1984
[13]

Surveys in combinatorics , volume=

On the method of bounded differences , author=. Surveys in combinatorics , volume=. 1989 , publisher=

work page 1989
[14]

1999 , publisher=

Neural network learning: Theoretical foundations , author=. 1999 , publisher=

work page 1999
[15]

1984 , url=

Convergence of stochastic processes , author=. 1984 , url=

work page 1984
[16]

Conference on Learning Theory , pages=

The optimal approximation factor in density estimation , author=. Conference on Learning Theory , pages=. 2019 , organization=

work page 2019
[17]

arXiv preprint arXiv:2510.15020 , year=

The Coverage Principle: How Pre-Training Enables Post-Training , author=. arXiv preprint arXiv:2510.15020 , year=

work page arXiv
[18]

arXiv preprint arXiv:2411.13029 , year=

Probably Approximately Precision and Recall Learning , author=. arXiv preprint arXiv:2411.13029 , year=

work page arXiv
[19]

Journal of Theoretical Probability , volume=

Uniform and universal Glivenko-Cantelli classes , author=. Journal of Theoretical Probability , volume=. 1991 , publisher=

work page 1991
[20]

Annals of Mathematics , pages=

Combinatorics of random processes and sections of convex bodies , author=. Annals of Mathematics , pages=. 2006 , publisher=

work page 2006
[21]

An improved uniform convergence bound with fat-shattering dimension , journal =

Roberto Colomboni and Emmanuel Esposito and Andrea Paudice , keywords =. An improved uniform convergence bound with fat-shattering dimension , journal =. 2025 , issn =. doi:https://doi.org/10.1016/j.ipl.2024.106539 , url =

work page doi:10.1016/j.ipl.2024.106539 2025
[22]

Shalev-Shwartz, Shai and Ben-David, Shai , biburl =

work page
[23]

2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS) , pages=

A theory of PAC learnability of partial concept classes , author=. 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS) , pages=. 2022 , organization=

work page 2021
[24]

1971 , url=

Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , author=. 1971 , url=

work page 1971
[25]

and Haussler, David and Warmuth, Manfred K

Blumer, Anselm and Ehrenfeucht, A. and Haussler, David and Warmuth, Manfred K. , title =. J. ACM , month = oct, pages =. 1989 , issue_date =. doi:10.1145/76359.76371 , abstract =

work page doi:10.1145/76359.76371 1989
[26]

Assouad, Patrick , booktitle=. Densit

work page
[27]

Scale-sensitive dimensions, uniform convergence, and learnability , year =

Alon, Noga and Ben-David, Shai and Cesa-Bianchi, Nicol\`. Scale-sensitive dimensions, uniform convergence, and learnability , year =. J. ACM , month = jul, pages =. doi:10.1145/263867.263927 , abstract =

work page doi:10.1145/263867.263927
[28]

Primal and dual combinatorial dimensions , journal =

Pieter Kleer and Hans Simon , keywords =. Primal and dual combinatorial dimensions , journal =. 2023 , issn =. doi:https://doi.org/10.1016/j.dam.2022.11.010 , url =

work page doi:10.1016/j.dam.2022.11.010 2023
[29]

Journal of Machine Learning Research , year =

Idan Attias and Aryeh Kontorovich , title =. Journal of Machine Learning Research , year =

work page
[30]

Forty-first International Conference on Machine Learning , year=

Agnostic sample compression schemes for regression , author=. Forty-first International Conference on Machine Learning , year=

work page
[31]

Algorithmic Learning Theory , pages=

Sample compression for real-valued learners , author=. Algorithmic Learning Theory , pages=. 2019 , organization=

work page 2019
[32]

Journal of the ACM (JACM) , volume=

Sample compression schemes for VC classes , author=. Journal of the ACM (JACM) , volume=. 2016 , publisher=

work page 2016
[33]

Bartlett and Philip M

Peter L. Bartlett and Philip M. Long , abstract =. Prediction, Learning, Uniform Convergence, and Scale-Sensitive Dimensions , journal =. 1998 , issn =. doi:https://doi.org/10.1006/jcss.1997.1557 , url =

work page doi:10.1006/jcss.1997.1557 1998
[34]

A Theoretical Framework for Statistical Evaluability of Generative Models

A Theoretical Framework for Statistical Evaluability of Generative Models , author=. arXiv preprint arXiv:2604.05324 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[35]

Integral Probability Metrics and Their Generating Classes of Functions , urldate =

Alfred Müller , journal =. Integral Probability Metrics and Their Generating Classes of Functions , urldate =

work page
[36]

Yatracos , journal =

Yannis G. Yatracos , journal =. Rates of Convergence of Minimum Distance Estimators and Kolmogorov's Entropy , urldate =

work page
[37]

2025 , eprint=

What is Wrong with Perplexity for Long-context Language Modeling? , author=. 2025 , eprint=

work page 2025
[38]

2024 , eprint=

Can Perplexity Reflect Large Language Model's Ability in Long Text Understanding? , author=. 2024 , eprint=

work page 2024
[39]

2021 , eprint=

MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers , author=. 2021 , eprint=

work page 2021
[40]

2020 , eprint=

Sparse Text Generation , author=. 2020 , eprint=

work page 2020
[41]

Sajjadi, Mehdi S. M. and Bachem, Olivier and Lucic, Mario and Bousquet, Olivier and Gelly, Sylvain , booktitle =. Assessing Generative Models via Precision and Recall , url =

work page