arxiv: 2604.12952 · v1 · submitted 2026-04-14 · 💻 cs.LG · math.CO· stat.ML

Recognition: unknown

An Optimal Sauer Lemma Over k-ary Alphabets

Steve Hanneke , Qinglin Meng , Shay Moran , Amirreza Shaeiri

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:18 UTC · model grok-4.3

classification 💻 cs.LG math.COstat.ML

keywords Sauer-Shelah-Perles lemmaDS dimensionmulticlass learninglist predictiongrowth functionNatarajan dimensionpolynomial methodPAC learning

0 comments

The pith

A sharp Sauer inequality for multiclass and list prediction holds tightly using the DS dimension for every alphabet size and list size.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that the growth function of any hypothesis class over a k-ary alphabet is bounded above by a tight expression in terms of its Daniely-Shalev-Shwartz dimension and list size. This replaces earlier bounds based on the Natarajan dimension, which suffered from exponential dependence on the list size, with an optimal polynomial dependence while also improving the dependence on the alphabet size k. Because the DS dimension exactly characterizes multiclass PAC learnability, the new bound immediately yields sharper sample-complexity guarantees for list learning and uniform convergence of list predictors.

Core claim

The growth function of any hypothesis class with DS dimension d over a k-ary alphabet and list size ℓ obeys a Sauer-type upper bound that is tight for every choice of k, ℓ, and d. The bound is proved via the polynomial method and applies equally to ordinary multiclass prediction and to list prediction; no purely combinatorial proof is known.

What carries the argument

The Daniely-Shalev-Shwartz (DS) dimension (and its list extension), the combinatorial parameter that characterizes multiclass and list PAC learnability, which directly supplies the complexity measure needed for a tight polynomial bound on the number of distinct labelings.

If this is right

Improved sample-complexity upper bounds for list PAC learning
Sharper uniform-convergence guarantees for list predictors
Replacement of exponential dependence on list size ℓ by optimal polynomial dependence
Improved dependence on alphabet size k compared with Natarajan-based bounds

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Finding a purely combinatorial proof would remove reliance on the polynomial method and could simplify extensions to other prediction models.
The same tight bound may be usable in online learning or regret analyses that currently rely on looser multiclass dimension notions.
The result suggests examining whether analogous sharp bounds exist for other combinatorial dimensions that characterize learnability in structured prediction settings.

Load-bearing premise

The polynomial-method argument produces the exact claimed tight upper bound on the growth function for every k, ℓ, and d without hidden gaps or extra restrictions on the hypothesis class.

What would settle it

A concrete hypothesis class over a k-ary alphabet whose growth function on some finite domain strictly exceeds the claimed tight upper bound for its list-DS dimension d, list size ℓ, and alphabet size k would disprove the inequality.

Figures

Figures reproduced from arXiv: 2604.12952 by Amirreza Shaeiri, Qinglin Meng, Shay Moran, Steve Hanneke.

**Figure 2.** Figure 2: integer flow network I(H) We first show that for the above network I(H), there is an integer flow to achieve the maximum flow. Lemma 3.19. P For the network I(H), there is an integer flow achieving the maximum flow, m i=1(|ei | − ℓ)+ = |V | · savdℓ (H). Proof of Lemma 3.19. The proof relies on the Max-Flow Min-Cut Theorem. First, we will establish that the maximum flow value is precisely Pm i=1(|ei | − ℓ)+… view at source ↗

read the original abstract

The Sauer-Shelah-Perles Lemma is a cornerstone of combinatorics and learning theory, bounding the size of a binary hypothesis class in terms of its Vapnik-Chervonenkis (VC) dimension. For classes of functions over a $k$-ary alphabet, namely the multiclass setting, the Natarajan dimension has long served as an analogue of VC dimension, yet the corresponding Sauer-type bounds are suboptimal for alphabet sizes $k>2$. In this work, we establish a sharp Sauer inequality for multiclass and list prediction. Our bound is expressed in terms of the Daniely--Shalev-Shwartz (DS) dimension, and more generally with its extension, the list-DS dimension -- the combinatorial parameters that characterize multiclass and list PAC learnability. Our bound is tight for every alphabet size $k$, list size $\ell$, and dimension value, replacing the exponential dependence on $\ell$ in the Natarajan-based bound by the optimal polynomial dependence, and improving the dependence on $k$ as well. Our proof uses the polynomial method. In contrast to the classical VC case, where several direct combinatorial proofs are known, we are not aware of any purely combinatorial proof in the DS setting. This motivates several directions for future research, which are discussed in the paper. As consequences, we obtain improved sample complexity upper bounds for list PAC learning and for uniform convergence of list predictors, sharpening the recent results of Charikar et al.~(STOC~2023), Hanneke et al.~(COLT~2024), and Brukhim et al.~(NeurIPS~2024).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives the optimal tight Sauer bound for multiclass and list settings via the DS dimension.

read the letter

The paper establishes an optimal Sauer-type bound for multiclass and list prediction. It uses the DS dimension to get a tight upper bound on the growth function that is polynomial in the list size ℓ rather than exponential, and it improves the dependence on the alphabet size k as well. The bound holds for all k, ℓ, and dimension d, with a matching lower bound construction. What the paper does well is connect this directly to the known characterizations of multiclass and list PAC learnability. They apply the polynomial method to derive the bound and then use it to sharpen sample complexity upper bounds from recent work on list learning. The abstract is honest about the proof technique and the open question of a combinatorial proof. The argument appears sound with no circularity or hidden restrictions. The tightness is verified for the full range of parameters, which is the key strength here. One minor limitation is the lack of a direct combinatorial proof, as the authors note. This might make the result less intuitive for some, but it does not affect the validity of the bound. The citation pattern is appropriate, building on the relevant prior results without unnecessary self-reference. This paper is aimed at researchers in statistical learning theory, particularly those dealing with multiclass hypothesis classes and their combinatorial dimensions. Anyone calculating sample complexities for list predictors would benefit from the improved bounds. The work shows clear and honest engagement with the literature. It deserves serious peer review because it refines a core combinatorial tool with concrete implications. I recommend sending it out for review.

Referee Report

0 major / 3 minor

Summary. The manuscript establishes a sharp Sauer-type inequality for the growth function of hypothesis classes over k-ary alphabets in the multiclass and list prediction settings. The bound is formulated using the Daniely-Shalev-Shwartz (DS) dimension and its list variant, and is proven tight for all k, ℓ, and d via the polynomial method. This replaces the suboptimal exponential dependence on the list size in Natarajan-dimension based bounds with an optimal polynomial one and improves the dependence on k. The paper also derives improved sample complexity bounds for list PAC learning and uniform convergence.

Significance. If the claimed tightness holds, the result is highly significant as it provides the optimal combinatorial dimension-based bound for multiclass learning, a long-standing open question in the area. The polynomial method proof is a notable strength, and the consequences sharpen recent sample complexity results from STOC 2023, COLT 2024, and NeurIPS 2024. The absence of a combinatorial proof is acknowledged, opening future research directions. The paper does not include machine-checked proofs but the explicit tightness construction is a positive aspect.

minor comments (3)

§2.1: the definition of the list-DS dimension would benefit from an explicit small example (e.g., k=3, ℓ=2) to illustrate how it differs from the standard DS dimension.
§5 (Tightness construction): while the construction is stated to achieve equality for all parameters, a brief verification table for small values of d (d=0,1,2) across two values of k and ℓ would strengthen readability.
References: the citation to Brukhim et al. (NeurIPS 2024) should include the full title and page numbers for consistency with other entries.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive and insightful review, as well as the recommendation for minor revision. We appreciate the acknowledgment of the result's significance in resolving the optimal Sauer-type bound for multiclass and list prediction via the DS dimension, the strength of the polynomial method proof, and the sharpened sample complexity consequences.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The central result is a sharp Sauer-type upper bound on the growth function, derived via the polynomial method applied directly to the definition of the (list-)DS dimension. No equations or steps in the provided abstract or description reduce the claimed bound to a self-referential definition, a fitted parameter renamed as prediction, or a load-bearing self-citation chain. The polynomial-method argument is presented as an independent proof technique, with self-citations limited to downstream sample-complexity consequences rather than the core combinatorial statement. The bound is claimed tight for all parameters without internal fitting or renaming of known results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the known characterization of multiclass PAC learnability by DS dimension (prior work) and on the applicability of the polynomial method to growth-function counting; no new free parameters or invented entities are introduced.

axioms (1)

domain assumption The Daniely-Shalev-Shwartz dimension (and its list extension) characterizes multiclass and list PAC learnability
Invoked as the appropriate combinatorial parameter replacing Natarajan dimension; taken from prior literature.

pith-pipeline@v0.9.0 · 5609 in / 1237 out tokens · 59231 ms · 2026-05-10T15:18:03.380859+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

17 extracted references · 7 canonical work pages · 1 internal anchor

[1]

A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification

[AB21] Anastasios N Angelopoulos and Stephen Bates. A gentle introduction to conformal prediction and distribution-free uncertainty quantification.arXiv preprint arXiv:2107.07511,

work page internal anchor Pith review Pith/arXiv arXiv
[2]

Improper multiclass boost- ing

[BHM23] Nataly Brukhim, Steve Hanneke, and Shay Moran. Improper multiclass boost- ing. In Gergely Neu and Lorenzo Rosasco, editors,The Thirty Sixth Annual Conference on Learning Theory, COLT 2023, 12-15 July 2023, Bangalore, India, volume 195 ofProceedings of Machine Learning Research, pages 5433–5452. PMLR,

2023
[3]

Dual VC dimension obstructs sample compression by embeddings

[CCH+24] Zachary Chase, Bogdan Chornomaz, Steve Hanneke, Shay Moran, and Amir Yehudayoff. Dual VC dimension obstructs sample compression by embeddings. In Shipra Agrawal and Aaron Roth, editors,The Thirty Seventh Annual Confer- ence on Learning Theory, June 30 - July 3, 2023, Edmonton, Canada, volume 247 ofProceedings of Machine Learning Research, pages 9...

2023
[4]

Sample complexity of agnostic multiclass classification: Natarajan dimension strikes back

[CEH+26] Alon Cohen, Liad Erez, Steve Hanneke, Tomer Koren, Yishay Mansour, Shay Moran, and Qian Zhang. Sample complexity of agnostic multiclass classification: Natarajan dimension strikes back. InProceedings of the 58th Annual ACM Symposium on Theory of Computing (STOC 2026),

2026
[5]

Chepoi, K

To appear. [CKP21] Victor Chepoi, Kolja Knauer, and Manon Philibert. Labeled sample compression schemes for complexes of oriented matroids.CoRR, abs/2110.15168,

work page arXiv
[6]

arXiv preprint arXiv:2511.04103 , year=

[CPT25] Moses Charikar, Chirag Pabbaraju, and Ambuj Tewari. A characterization of list language identification in the limit.CoRR, abs/2511.04103,

work page arXiv
[7]

[Dre96] Andreas W. M. Dress. Towards a theory of holistic clustering. In Boris G. Mirkin, Fred R. McMorris, Fred S. Roberts, and Andrey Rzhetsky, editors, Mathematical Hierarchies and Biology, Proceedings of a DIMACS Workshop, November 13-15, 1996, volume 37 ofDIMACS Series in Discrete Mathematics and Theoretical Computer Science, pages 271–290. DIMACS/AMS,

1996
[8]

Complexity of null dynamical systems and sauer–shelah lemmas.arXiv preprint arXiv:2310.05353,

[GMRT23] Guorong Gao, Jie Ma, Mingyuan Rong, and Tuan Tran. Complexity of null dynamical systems and sauer–shelah lemmas.arXiv preprint arXiv:2310.05353,

work page arXiv
[9]

Private list learnability vs

[HMST25] Steve Hanneke, Shay Moran, Hilla Schefler, and Iska Tsubari. Private list learnability vs. online list learnability.arXiv preprint arXiv:2506.12856,

work page arXiv
[10]

Multivalued generalizations of the frankl– pach theorem.arXiv preprint arXiv:1008.4660,

[HR10] Gábor Hegedüs and Lajos Rónyai. Multivalued generalizations of the frankl– pach theorem.arXiv preprint arXiv:1008.4660,

work page arXiv
[11]

[KW07] Dima Kuzmin and Manfred K. Warmuth. Unlabeled compression schemes for maximum classes.Journal of Machine Learning Research, 8:2047–2081,

2047
[12]

Loss functions for top-k error: Analysis and insights.2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1468–1477,

[LHS15] Maksim Lapin, Matthias Hein, and Bernt Schiele. Loss functions for top-k error: Analysis and insights.2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1468–1477,

2016
[13]

Shattering-extremal systems.arXiv preprint, 1211.2980,

[Mor12] Shay Moran. Shattering-extremal systems.arXiv preprint, 1211.2980,

work page arXiv
[14]

Shattered sets and the hilbert function

[MR16] Shay Moran and Cyrus Rashtchian. Shattered sets and the hilbert function. In MFCS 2016: 41st International Symposium on Mathematical Foundations of Computer Science, volume 58 ofLIPIcs, pages 70:1–70:14. Schloss Dagstuhl– Leibniz-Zentrum für Informatik,

2016
[15]

List online classification

[MSTY23] Shay Moran, Ohad Sharon, Iska Tsubari, and Sivan Yosebashvili. List online classification. InThe Thirty Sixth Annual Conference on Learning Theory, pages 1885–1913. PMLR,

1913
[16]

A characterization of list regres- sion

34 [PS25] Chirag Pabbaraju and Sahasrajit Sarmasarkar. A characterization of list regres- sion. In Gautam Kamath and Po-Ling Loh, editors,International Conference on Algorithmic Learning Theory, 24-27 February 2025, Politecnico di Milano, Milan, Italy, volume 272 ofProceedings of Machine Learning Research, pages 870–920. PMLR,

2025
[17]

Generalizing labeled and unlabeled sample compression to multi-label concept classes

35 [SYZ14] Rahim Samei, Boting Yang, and Sandra Zilles. Generalizing labeled and unlabeled sample compression to multi-label concept classes. In Peter Auer, Alexander Clark, Thomas Zeugmann, and Sandra Zilles, editors,Algorithmic Learning Theory - 25th International Conference, ALT 2014, Bled, Slovenia, October 8-10,

2014