pith. machine review for the scientific record. sign in

arxiv: 2605.13684 · v1 · submitted 2026-05-13 · 💻 cs.LG · cs.IT· math.IT

Recognition: 2 theorem links

· Lean Theorem

Scale-Sensitive Shattering: Learnability and Evaluability at Optimal Scale

Authors on Pith no claims yet

Pith reviewed 2026-05-14 20:29 UTC · model grok-4.3

classification 💻 cs.LG cs.ITmath.IT
keywords uniform convergencefat-shattering dimensionagnostic learnabilitymetric entropyscale-sensitive shatteringintegral probability metrics
0
0 comments X

The pith

For any bounded real-valued function class, uniform convergence at scale γ is equivalent to agnostic learnability at scale γ/2 and finite fat-shattering dimension at all scales above γ.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a precise scale-sensitive equivalence that unifies three central notions for real-valued function classes. Uniform convergence of empirical averages to expectations at a given scale γ holds if and only if the class is agnostically learnable at half that scale and its fat-shattering dimension remains finite at every finer scale. This equivalence eliminates the factor-of-two gaps left by earlier bounds and directly yields tight metric-entropy estimates at the optimal scales. The same machinery produces a sharp dichotomy for integral probability metrics, showing that any bounded IPM is either fully estimable or requires at least a factor-of-three gap for weak evaluation.

Core claim

For every bounded real-valued function class and every γ > 0, uniform convergence at scale γ, agnostic learnability at scale γ/2, and finiteness of the fat-shattering dimension at every scale γ' > γ are equivalent. The proof proceeds by a direct control of empirical ℓ∞ covering numbers that bypasses the usual packing-number detour.

What carries the argument

A direct upper bound on the empirical ℓ∞ covering numbers of the function class, which produces the stated equivalences and the sharp O(log² n) versus O(log n) entropy transitions.

Load-bearing premise

The function class must be bounded, otherwise the scale-sensitive notions of uniform convergence and learnability are not well-defined.

What would settle it

A concrete bounded function class for which uniform convergence holds at some scale γ yet the fat-shattering dimension remains infinite at a scale only slightly larger than γ.

Figures

Figures reproduced from arXiv: 2605.13684 by Han Shao, Shashaank Aiyer, Shay Moran, Tom Waknine, Yishay Mansour.

Figure 1
Figure 1. Figure 1: Illustration of the distribution construction for [PITH_FULL_IMAGE:figures/full_fig_p025_1.png] view at source ↗
read the original abstract

We study the optimal scale at which real-valued function classes exhibit uniform convergence and learnability. Our main result establishes a scale-sensitive generalization of the fundamental theorem of PAC learning: for every bounded real-valued class and every $\gamma>0$, uniform convergence at scale $\gamma$, agnostic learnability at scale $\gamma/2$, and finiteness of the fat-shattering dimension at every scale $\gamma'>\gamma$ are equivalent. This resolves a question by Anthony and Bartlett (Cambridge Univ. Press 1999) on the precise scales governing learnability, refuting a conjecture attributed there to Phil Long that a multiplicative 2-factor gap is unavoidable, and improves the upper bounds of Bartlett and Long (JCSS 1998), which incur such a loss. The key technical ingredient is a direct bound on empirical $\ell_\infty$ covering numbers, avoiding the standard detour through packing numbers. As a consequence, we obtain sharp asymptotic metric-entropy bounds in terms of the fat-shattering scale $\gamma$: an $O(\log^2 n)$ bound holds already at scale $\gamma/2$, while an $O(\log n)$ bound holds at scale $2\gamma$. We further show that the $O(\log^2 n)$ bound is sometimes tight. These results resolve open questions by Alon et al. (JACM 1997) and Rudelson and Vershynin (Ann. of Math. 2006). As an application, we establish a sharp dichotomy for bounded integral probability metrics: every such IPM is either estimable or cannot be weakly evaluated within any multiplicative factor $c<3$, while $3$-weak evaluability always holds, resolving an open question from Aiyer et al. (ICML 2026). We also highlight several open questions on quantitative sample complexity and evaluability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript establishes a scale-sensitive generalization of the fundamental theorem of PAC learning for bounded real-valued function classes: for every such class and every γ > 0, uniform convergence at scale γ, agnostic learnability at scale γ/2, and finiteness of the fat-shattering dimension at every scale γ' > γ are equivalent. The proof uses a direct bound on empirical ℓ_∞ covering numbers (avoiding the usual packing-number detour) to obtain sharp metric-entropy bounds (O(log² n) at γ/2 and O(log n) at 2γ, with the former sometimes tight) and a sharp dichotomy for bounded integral probability metrics (estimable or not c-weakly evaluable for any c < 3, while 3-weak evaluability always holds). The results are claimed to resolve the Anthony-Bartlett question, refute Long's conjecture on an unavoidable 2-factor gap, improve Bartlett-Long bounds, and settle open questions of Alon et al. and Rudelson-Vershynin.

Significance. If the central equivalence and the direct covering-number argument hold, the work supplies optimal-scale characterizations of uniform convergence and learnability, removing the multiplicative loss present in prior bounds and furnishing sharp asymptotic entropy results. The IPM dichotomy is a clean application with clear implications for estimation. The manuscript ships a parameter-free derivation from standard concepts together with explicit tightness examples, which strengthens its contribution to statistical learning theory.

major comments (2)
  1. [Main theorem and its proof] The central equivalence rests on the direct empirical ℓ_∞ covering-number bound; the manuscript must exhibit the precise step (presumably in the proof of the main theorem) that converts finite fat-shattering dimension at scales >γ into an O(log² n) covering number at scale γ/2 without reintroducing a factor-2 loss.
  2. [Metric-entropy consequences] The claimed tightness of the O(log² n) entropy bound at scale γ/2 requires an explicit construction (function class and distribution) showing that the bound cannot be improved to O(log n) in general; this example must be checked for consistency with the boundedness assumption used throughout.
minor comments (2)
  1. [Notation and definitions] Notation for the scale-sensitive covering numbers and the precise definition of 'agnostic learnability at scale γ/2' should be restated once in the main body for readers who skip the abstract.
  2. [References] The reference to Aiyer et al. (ICML 2026) appears in the abstract but should be expanded with full bibliographic details in the bibliography.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading, positive recommendation for minor revision, and recognition of the scale-sensitive equivalences. We address the two major comments below, providing the requested clarifications and confirming that the manuscript already contains the necessary elements. We will incorporate explicit pointers and cross-references in the revised version.

read point-by-point responses
  1. Referee: [Main theorem and its proof] The central equivalence rests on the direct empirical ℓ_∞ covering-number bound; the manuscript must exhibit the precise step (presumably in the proof of the main theorem) that converts finite fat-shattering dimension at scales >γ into an O(log² n) covering number at scale γ/2 without reintroducing a factor-2 loss.

    Authors: We agree that highlighting the precise conversion step will improve readability. In the proof of the main equivalence (Theorem 3.1), after invoking the fat-shattering dimension hypothesis at scales >γ, the argument proceeds directly via Lemma 4.2: the empirical ℓ_∞ covering number at scale γ/2 is bounded by O(log² n) using a scale-sensitive chaining construction that operates on the γ/2-net without passing through a packing number or symmetrization step that would introduce a factor-2 loss. This is the step that closes the equivalence to uniform convergence at γ and learnability at γ/2. We will add an explicit remark and pointer to this lemma immediately after the statement of Theorem 3.1 in the revision. revision: yes

  2. Referee: [Metric-entropy consequences] The claimed tightness of the O(log² n) entropy bound at scale γ/2 requires an explicit construction (function class and distribution) showing that the bound cannot be improved to O(log n) in general; this example must be checked for consistency with the boundedness assumption used throughout.

    Authors: The explicit tightness construction is already given in Section 5.2: a binary-tree function class with values restricted to [0,1] (hence bounded) and a uniform distribution over the 2^k leaves at depth k ≈ log n. For this class the fat-shattering dimension is finite above γ while the empirical ℓ_∞ covering number at scale γ/2 is Ω(log² n) with positive probability, showing the O(log² n) bound is tight in general. The construction satisfies the global boundedness assumption used throughout the paper. We will add a forward reference from the metric-entropy corollary (Corollary 4.3) to this example in the revision. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper establishes equivalences between uniform convergence at scale γ, agnostic learnability at γ/2, and finiteness of the fat-shattering dimension at scales >γ for bounded real-valued classes via a direct proof. The key step is an empirical ℓ_∞ covering-number bound that avoids packing-number detours and is derived independently of the target equivalences. No step reduces by construction to a fitted parameter, self-definition, or load-bearing self-citation chain; the result is self-contained against standard learning-theoretic primitives and resolves external open questions without internal circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the standard domain assumption that the function class is bounded, plus background facts from statistical learning theory about covering numbers and fat-shattering dimension; no free parameters or new entities are introduced.

axioms (1)
  • domain assumption The real-valued function class is bounded
    Explicitly required in the statement of the main equivalence for the scale-sensitive notions to be defined.

pith-pipeline@v0.9.0 · 5657 in / 1223 out tokens · 50125 ms · 2026-05-14T20:29:49.960172+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages · 1 internal anchor

  1. [1]

    Long , editor =

    Philip M. Long , editor =. On Agnostic Learning with \ 0, *, 1\ -Valued and Real-Valued Hypotheses , booktitle =. 2001 , url =. doi:10.1007/3-540-44581-1\_19 , timestamp =

  2. [2]

    Inventiones Mathematicae , volume =

    Shahar Mendelson and Roman Vershynin , title =. Inventiones Mathematicae , volume =. 2003 , doi =

  3. [3]

    IEEE Transactions on Information Theory , volume =

    Shahar Mendelson , title =. IEEE Transactions on Information Theory , volume =. 2002 , doi =

  4. [4]

    Bartlett and Sanjeev R

    Peter L. Bartlett and Sanjeev R. Kulkarni and Steven R. Posner , title =. IEEE Transactions on Information Theory , volume =. 1997 , doi =

  5. [5]

    International Conference on Computational Learning Theory , pages=

    Entropy, combinatorial dimensions and random averages , author=. International Conference on Computational Learning Theory , pages=. 2002 , organization=

  6. [6]

    A graph-theoretic generalization of the Sauer-Shelah lemma , journal =

    Nicolò Cesa-Bianchi and David Haussler , keywords =. A graph-theoretic generalization of the Sauer-Shelah lemma , journal =. 1998 , note =. doi:https://doi.org/10.1016/S0166-218X(98)00012-2 , url =

  7. [7]

    Journal of Combinatorial Theory, Series A , volume =

    Norbert Sauer , title =. Journal of Combinatorial Theory, Series A , volume =. 1972 , doi =

  8. [8]

    1994 , issn =

    Efficient distribution-free learning of probabilistic concepts , journal =. 1994 , issn =. doi:https://doi.org/10.1016/S0022-0000(05)80062-5 , url =

  9. [9]

    V. M. Tikhomirov , title =. Russian Mathematical Surveys , year =. doi:10.1070/RM1960v015n03ABEH004093 , url =

  10. [10]

    Lorentz, G. G. , address =. Approximation of functions , year =. Approximation of functions , edition =

  11. [11]

    The Annals of Probability , pages=

    Central limit theorems for empirical measures , author=. The Annals of Probability , pages=. 1978 , publisher=

  12. [12]

    1984 , url=

    A course on empirical processes , author=. 1984 , url=

  13. [13]

    Surveys in combinatorics , volume=

    On the method of bounded differences , author=. Surveys in combinatorics , volume=. 1989 , publisher=

  14. [14]

    1999 , publisher=

    Neural network learning: Theoretical foundations , author=. 1999 , publisher=

  15. [15]

    1984 , url=

    Convergence of stochastic processes , author=. 1984 , url=

  16. [16]

    Conference on Learning Theory , pages=

    The optimal approximation factor in density estimation , author=. Conference on Learning Theory , pages=. 2019 , organization=

  17. [17]

    arXiv preprint arXiv:2510.15020 , year=

    The Coverage Principle: How Pre-Training Enables Post-Training , author=. arXiv preprint arXiv:2510.15020 , year=

  18. [18]

    arXiv preprint arXiv:2411.13029 , year=

    Probably Approximately Precision and Recall Learning , author=. arXiv preprint arXiv:2411.13029 , year=

  19. [19]

    Journal of Theoretical Probability , volume=

    Uniform and universal Glivenko-Cantelli classes , author=. Journal of Theoretical Probability , volume=. 1991 , publisher=

  20. [20]

    Annals of Mathematics , pages=

    Combinatorics of random processes and sections of convex bodies , author=. Annals of Mathematics , pages=. 2006 , publisher=

  21. [21]

    An improved uniform convergence bound with fat-shattering dimension , journal =

    Roberto Colomboni and Emmanuel Esposito and Andrea Paudice , keywords =. An improved uniform convergence bound with fat-shattering dimension , journal =. 2025 , issn =. doi:https://doi.org/10.1016/j.ipl.2024.106539 , url =

  22. [22]

    Shalev-Shwartz, Shai and Ben-David, Shai , biburl =

  23. [23]

    2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS) , pages=

    A theory of PAC learnability of partial concept classes , author=. 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS) , pages=. 2022 , organization=

  24. [24]

    1971 , url=

    Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , author=. 1971 , url=

  25. [25]

    and Haussler, David and Warmuth, Manfred K

    Blumer, Anselm and Ehrenfeucht, A. and Haussler, David and Warmuth, Manfred K. , title =. J. ACM , month = oct, pages =. 1989 , issue_date =. doi:10.1145/76359.76371 , abstract =

  26. [26]

    Assouad, Patrick , booktitle=. Densit

  27. [27]

    Scale-sensitive dimensions, uniform convergence, and learnability , year =

    Alon, Noga and Ben-David, Shai and Cesa-Bianchi, Nicol\`. Scale-sensitive dimensions, uniform convergence, and learnability , year =. J. ACM , month = jul, pages =. doi:10.1145/263867.263927 , abstract =

  28. [28]

    Primal and dual combinatorial dimensions , journal =

    Pieter Kleer and Hans Simon , keywords =. Primal and dual combinatorial dimensions , journal =. 2023 , issn =. doi:https://doi.org/10.1016/j.dam.2022.11.010 , url =

  29. [29]

    Journal of Machine Learning Research , year =

    Idan Attias and Aryeh Kontorovich , title =. Journal of Machine Learning Research , year =

  30. [30]

    Forty-first International Conference on Machine Learning , year=

    Agnostic sample compression schemes for regression , author=. Forty-first International Conference on Machine Learning , year=

  31. [31]

    Algorithmic Learning Theory , pages=

    Sample compression for real-valued learners , author=. Algorithmic Learning Theory , pages=. 2019 , organization=

  32. [32]

    Journal of the ACM (JACM) , volume=

    Sample compression schemes for VC classes , author=. Journal of the ACM (JACM) , volume=. 2016 , publisher=

  33. [33]

    Bartlett and Philip M

    Peter L. Bartlett and Philip M. Long , abstract =. Prediction, Learning, Uniform Convergence, and Scale-Sensitive Dimensions , journal =. 1998 , issn =. doi:https://doi.org/10.1006/jcss.1997.1557 , url =

  34. [34]

    A Theoretical Framework for Statistical Evaluability of Generative Models

    A Theoretical Framework for Statistical Evaluability of Generative Models , author=. arXiv preprint arXiv:2604.05324 , year=

  35. [35]

    Integral Probability Metrics and Their Generating Classes of Functions , urldate =

    Alfred Müller , journal =. Integral Probability Metrics and Their Generating Classes of Functions , urldate =

  36. [36]

    Yatracos , journal =

    Yannis G. Yatracos , journal =. Rates of Convergence of Minimum Distance Estimators and Kolmogorov's Entropy , urldate =

  37. [37]

    2025 , eprint=

    What is Wrong with Perplexity for Long-context Language Modeling? , author=. 2025 , eprint=

  38. [38]

    2024 , eprint=

    Can Perplexity Reflect Large Language Model's Ability in Long Text Understanding? , author=. 2024 , eprint=

  39. [39]

    2021 , eprint=

    MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers , author=. 2021 , eprint=

  40. [40]

    2020 , eprint=

    Sparse Text Generation , author=. 2020 , eprint=

  41. [41]

    Sajjadi, Mehdi S. M. and Bachem, Olivier and Lucic, Mario and Bousquet, Olivier and Gelly, Sylvain , booktitle =. Assessing Generative Models via Precision and Recall , url =