pith. sign in

arxiv: 2605.20434 · v1 · pith:V3VV24SHnew · submitted 2026-05-19 · 📊 stat.ML · cs.DM· cs.LG

Contradiction Graphs Determine VC Dimension

Pith reviewed 2026-05-21 06:38 UTC · model grok-4.3

classification 📊 stat.ML cs.DMcs.LG
keywords VC dimensioncontradiction graphsconcept classesbinary classificationshatteringlearning theorygraph-theoretic characterization
5
0 comments X

The pith

The contradiction graph G_m(H) determines whether a binary concept class has VC dimension at least m.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that a single graph built from realizable labeled sequences of length m decides if the VC dimension of H meets or exceeds m. Vertices in this graph represent all possible labelings that some hypothesis in H can produce on m domain points, and two vertices are connected by an edge precisely when the labelings disagree on at least one shared point. Because this graph encodes the threshold predicate, the infinite sequence of graphs for successive m recovers the exact VC dimension value and distinguishes finite from infinite cases. This graph-theoretic view answers whether these structures suffice to detect unbounded VC dimension.

Core claim

For a class H subsets of {0,1}^X, the order-m contradiction graph G_m(H) has vertices given by the H-realizable labeled sequences of length m, with an edge between two vertices whenever the sequences assign opposite labels to some common element of X. The main result establishes that this single graph determines the predicate VCdim(H) greater than or equal to m. It follows that the full sequence of graphs (G_m(H)) for m greater than or equal to 1 determines the precise VC dimension and, in particular, whether the dimension is finite or infinite.

What carries the argument

The order-m contradiction graph G_m(H), with vertices as H-realizable labeled sequences of length m and edges when sequences disagree on the label of a shared domain point.

If this is right

  • The exact numerical value of VCdim(H) is recoverable from the sequence of contradiction graphs.
  • Finite versus infinite VC dimension can be decided by inspecting the graphs for increasing m.
  • Any learning-theoretic bound that depends on the VC dimension can be re-expressed as a property of the corresponding contradiction graphs.
  • The presence or absence of an edge in G_m(H) encodes the existence of a shattering witness or its absence.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The result suggests that VC dimension computation might reduce to checking connectivity or other graph invariants in these contradiction graphs.
  • Similar graph constructions could be explored for related combinatorial dimensions such as the fat-shattering dimension.
  • If the graphs are efficiently constructible from samples, they might yield practical tests for bounded VC dimension in concrete hypothesis classes.

Load-bearing premise

The adjacency relation defined solely by opposite labels on a shared domain point captures all distinctions needed to decide the VC-dimension threshold, without further restrictions on the domain X or additional structure on H.

What would settle it

A binary class H where G_m(H) is identical for two different values of the VC dimension threshold at some m, or where the graph fails to reflect whether m points are shattered.

Figures

Figures reproduced from arXiv: 2605.20434 by Daniel Ibaibarriaga, Jesse Campbell, Lev Reyzin.

Figure 1
Figure 1. Figure 1: A shattered two-point set produces a cube-trace clique. The shaded vertices are precisely [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗
read the original abstract

We study the contradiction graphs associated with binary concept classes. For a class $H \subseteq \{0,1\}^X$, the order-$m$ contradiction graph $G_m(H)$ has as vertices the $H$-realizable labeled sequences of length $m$, with two vertices adjacent when the two sequences assign opposite labels to some common domain point. Our main result is that the single graph $G_m(H)$ determines the threshold predicate $\mathrm{VCdim}(H)\ge m$. Consequently, the full sequence $(G_m(H))_{m \ge 1}$ determines the exact VC dimension and, in particular, detects finite versus infinite VC dimension, answering a question posed by Alon et al. (2024).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript defines the order-m contradiction graph G_m(H) for a binary concept class H ⊆ {0,1}^X, with vertices given by the H-realizable labeled sequences of length m and edges present precisely when two sequences assign opposite labels to at least one shared domain point. The central result asserts that a single such graph determines the predicate VCdim(H) ≥ m; consequently the sequence (G_m(H))_{m≥1} determines the exact VC-dimension and distinguishes finite from infinite cases, answering a question of Alon et al. (2024).

Significance. If the claimed equivalence holds, the work supplies a purely graph-theoretic criterion for the VC-dimension threshold. This characterization could open new structural and algorithmic approaches to learning-theoretic questions, particularly the detection of infinite VC-dimension, and would constitute a non-trivial contribution to the theory of concept classes.

major comments (1)
  1. [Definition of G_m(H)] Definition of G_m(H): the vertices are defined as H-realizable labeled sequences of length m with adjacency only when opposite labels occur on a shared domain point, yet no explicit requirement is stated that the m domain points must be distinct. Standard VC-dimension is defined via shattering on m distinct points; if repeated points are permitted, realizable vertices and the induced subgraphs (e.g., cliques corresponding to full shattering) can arise from configurations with no counterpart in the distinct-point definition. The main proof must therefore either restrict the vertex set to distinct points or prove that the threshold predicate is invariant under this relaxation; otherwise the claimed determination of VCdim(H) ≥ m fails to match the classical notion.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback. We address the single major comment below and will revise the manuscript to resolve the definitional ambiguity while preserving the main results.

read point-by-point responses
  1. Referee: [Definition of G_m(H)] Definition of G_m(H): the vertices are defined as H-realizable labeled sequences of length m with adjacency only when opposite labels occur on a shared domain point, yet no explicit requirement is stated that the m domain points must be distinct. Standard VC-dimension is defined via shattering on m distinct points; if repeated points are permitted, realizable vertices and the induced subgraphs (e.g., cliques corresponding to full shattering) can arise from configurations with no counterpart in the distinct-point definition. The main proof must therefore either restrict the vertex set to distinct points or prove that the threshold predicate is invariant under this relaxation; otherwise the claimed determination of VCdim(H) ≥ m fails to match the classical notion.

    Authors: We thank the referee for identifying this important point of precision. Our original intention was to work with sequences over arbitrary (possibly repeated) domain points, but we agree that this risks a mismatch with the standard definition of VC-dimension, which requires distinct points. In the revised manuscript we will explicitly restrict the vertex set of G_m(H) to H-realizable labeled sequences on m distinct domain points. We have verified that the main theorem continues to hold under this restriction: the forward direction (VCdim(H) ≥ m implies the appropriate subgraph in G_m(H)) relies on the existence of a shattered set of distinct points, while the converse direction uses the contradiction-graph structure on those same distinct points. The change is therefore a clarification rather than a substantive alteration of the results. revision: yes

Circularity Check

0 steps flagged

No significant circularity; graph-to-VCdim relation is a non-trivial structural claim

full rationale

The paper defines G_m(H) explicitly via H-realizable sequences of length m with adjacency on label contradictions at shared points, then asserts that this single graph encodes the predicate VCdim(H) >= m. This is presented as a derived result rather than a redefinition or tautology. No equations reduce the claimed determination to the input definitions by construction, no parameters are fitted and relabeled as predictions, and the cited prior question (Alon et al. 2024) is external. The derivation chain therefore remains self-contained against the given definitions and does not rely on load-bearing self-citations or ansatzes smuggled from prior author work. The skeptic concern about repeated versus distinct domain points addresses potential gaps in the proof's scope, not circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The paper relies on the standard combinatorial definition of VC dimension and the notion of H-realizable labelings; no new free parameters, ad-hoc axioms, or invented entities are introduced in the abstract.

axioms (2)
  • standard math Standard definition of VC dimension for binary concept classes over an arbitrary domain X.
    Invoked implicitly when stating that G_m(H) determines the threshold predicate VCdim(H) ≥ m.
  • domain assumption H-realizable labeled sequences of length m are well-defined for any finite m.
    Used to populate the vertex set of G_m(H).

pith-pipeline@v0.9.0 · 5651 in / 1351 out tokens · 42533 ms · 2026-05-21T06:38:01.133084+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages

  1. [1]

    Proceedings of the Thirty Seventh Conference on Learning Theory , series =

    Alon, Noga and Moran, Shay and Schefler, Hilla and Yehudayoff, Amir , title =. Proceedings of the Thirty Seventh Conference on Learning Theory , series =. 2024 , editor =

  2. [2]

    Vapnik, V. N. and Chervonenkis, A. Ya. , title =. Theory of Probability and Its Applications , volume =. 1971 , doi =

  3. [3]

    Journal of Combinatorial Theory, Series A , volume =

    Sauer, Norbert , title =. Journal of Combinatorial Theory, Series A , volume =. 1972 , doi =

  4. [4]

    Pacific Journal of Mathematics , volume =

    Shelah, Saharon , title =. Pacific Journal of Mathematics , volume =. 1972 , doi =

  5. [5]

    , title =

    Valiant, Leslie G. , title =. Communications of the ACM , volume =. 1984 , doi =

  6. [6]

    , title =

    Blumer, Anselm and Ehrenfeucht, Andrzej and Haussler, David and Warmuth, Manfred K. , title =. Journal of the ACM , volume =. 1989 , doi =

  7. [7]

    , title =

    Ehrenfeucht, Andrzej and Haussler, David and Kearns, Michael and Valiant, Leslie G. , title =. Information and Computation , volume =. 1989 , doi =

  8. [8]

    Machine Learning , volume =

    Littlestone, Nick , title =. Machine Learning , volume =. 1988 , doi =

  9. [9]

    Shattering, Graph Orientations, and Connectivity , journal =

    Kozma, L. Shattering, Graph Orientations, and Connectivity , journal =. 2013 , doi =

  10. [10]

    Journal of Privacy and Confidentiality , volume =

    Dwork, Cynthia and McSherry, Frank and Nissim, Kobbi and Smith, Adam , title =. Journal of Privacy and Confidentiality , volume =. 2017 , doi =

  11. [11]

    Advances in Cryptology -- EUROCRYPT 2006 , series =

    Dwork, Cynthia and Kenthapadi, Krishnaram and McSherry, Frank and Mironov, Ilya and Naor, Moni , title =. Advances in Cryptology -- EUROCRYPT 2006 , series =. 2006 , editor =

  12. [12]

    Foundations and Trends in Theoretical Computer Science , volume =

    Dwork, Cynthia and Roth, Aaron , title =. Foundations and Trends in Theoretical Computer Science , volume =. 2014 , doi =

  13. [13]

    and Nissim, Kobbi and Raskhodnikova, Sofya and Smith, Adam , title =

    Kasiviswanathan, Shiva Prasad and Lee, Homin K. and Nissim, Kobbi and Raskhodnikova, Sofya and Smith, Adam , title =. SIAM Journal on Computing , volume =. 2011 , doi =

  14. [14]

    Proceedings of the 24th Annual Conference on Learning Theory , series =

    Chaudhuri, Kamalika and Hsu, Daniel , title =. Proceedings of the 24th Annual Conference on Learning Theory , series =. 2011 , editor =

  15. [15]

    Journal of the ACM , volume =

    Blum, Avrim and Ligett, Katrina and Roth, Aaron , title =. Journal of the ACM , volume =. 2013 , doi =

  16. [16]

    Machine Learning , volume =

    Beimel, Amos and Brenner, Hai and Kasiviswanathan, Shiva Prasad and Nissim, Kobbi , title =. Machine Learning , volume =. 2014 , doi =

  17. [17]

    SIAM Journal on Computing , volume =

    Feldman, Vitaly and Xiao, David , title =. SIAM Journal on Computing , volume =. 2015 , doi =

  18. [18]

    Theory of Computing , volume =

    Beimel, Amos and Nissim, Kobbi and Stemmer, Uri , title =. Theory of Computing , volume =. 2016 , doi =

  19. [19]

    Journal of Machine Learning Research , volume =

    Beimel, Amos and Nissim, Kobbi and Stemmer, Uri , title =. Journal of Machine Learning Research , volume =. 2019 , url =

  20. [20]

    Journal of the ACM , volume =

    Alon, Noga and Bun, Mark and Livni, Roi and Malliaris, Maryanthe and Moran, Shay , title =. Journal of the ACM , volume =. 2022 , doi =

  21. [21]

    Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing , series =

    Ghazi, Badih and Golowich, Noah and Kumar, Ravi and Manurangsi, Pasin , title =. Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing , series =. 2021 , publisher =

  22. [22]

    Proceedings of the 35th International Conference on Algorithmic Learning Theory , series =

    Bun, Mark and Cohen, Aloni and Desai, Rathin , title =. Proceedings of the 35th International Conference on Algorithmic Learning Theory , series =. 2024 , editor =

  23. [23]

    Theory of Cryptography , series =

    Cynthia Dwork and Frank McSherry and Kobbi Nissim and Adam Smith , title =. Theory of Cryptography , series =. 2006 , doi =