arxiv: 2604.02949 · v1 · submitted 2026-04-03 · 💻 cs.DM · math.CO

Recognition: no theorem link

Sample compression schemes for balls in structurally sparse graphs

Romain Bourneuf , J\k{e}drzej Hodor , Piotr Micek , Cl\'ement Rambaud

Authors on Pith no claims yet

Pith reviewed 2026-05-13 18:45 UTC · model grok-4.3

classification 💻 cs.DM math.CO

keywords sample compression schemesballs in graphstreewidthcliquewidthproper compressionhypergraph learningstructurally sparse graphs

0 comments

The pith

Graphs of treewidth t have proper sample compression schemes of size O(t log t) for their ball hypergraphs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that when a graph has treewidth at most t, the hypergraph formed by all its balls admits a proper sample compression scheme whose size is linear in t times log t. Proper means the compressed sample not only predicts labels but also outputs an explicit ball consistent with those labels. The result improves the quadratic bound that follows from general theorems on VC-dimension and is shown to be tight up to the logarithmic factor. An analogous linear-in-t-log-t bound is proved when treewidth is replaced by cliquewidth.

Core claim

For every graph G of treewidth at most t, the hypergraph of balls in G has a proper sample compression scheme of size O(t log t); the same holds for graphs of cliquewidth at most t. The construction works directly on the geometry of balls rather than invoking general improper compression theorems.

What carries the argument

A proper sample compression scheme that, given a labeled sample of vertices, selects a small subset together with an explicit ball whose containment pattern recovers every label in the original sample.

If this is right

Learning algorithms that rely on balls in low-treewidth graphs can reduce any sample to size O(t log t) while recovering a consistent hypothesis.
The same size bound applies to graphs of bounded cliquewidth.
The logarithmic factor cannot be removed in the worst case.
Bounded treewidth or cliquewidth of the underlying graph is sufficient to guarantee that the ball hypergraph admits proper compression linear in the width parameter.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The construction may extend to other distance-based hypergraphs on graphs of bounded expansion.
Proper schemes of this size could yield faster exact learning procedures for distance concepts when the host graph has small treewidth.
One could test whether the same O(t log t) bound holds for unit-ball hypergraphs in low-dimensional geometric graphs.

Load-bearing premise

The hyperedges must be defined exactly as the sets of vertices contained in some ball of the input graph.

What would settle it

A single graph of treewidth t whose ball hypergraph requires any proper sample compression scheme to keep more than c t log t points on some labeled samples, for a constant c larger than the hidden constant in O(t log t).

read the original abstract

Sample compression schemes were defined by Littlestone and Warmuth (1986) as an abstraction of the structure underlying many learning algorithms. In a sample compression scheme, we are given a large sample of vertices of a fixed hypergraph with labels indicating the containment in some hyperedge. The task is to compress the sample in such a way that we can retrieve the labels of the original sample. The size of a sample compression scheme is the amount of information that is kept in the compression. Every hypergraph with a sample compression scheme of bounded size must have bounded VC-dimension. Conversely, Moran and Yehudayoff (J. ACM, 2016) showed that every hypergraph of bounded VC-dimension admits a sample compression scheme of bounded size. We study a specific class of hypergraphs emerging from balls in graphs. The schemes that we construct (contrary to the ones constructed by Moran and Yehudayoff) are \textit{proper}, meaning that we retrieve not only the labeling of the original sample but also a hyperedge (ball) consistent with the original labeling. First, we prove that for every graph $G$ of treewidth at most $t$, the hypergraph of balls in $G$ has a proper sample compression scheme of size $\mathcal{O}(t\log t)$; this is tight up to the logarithmic factor and improves the quadratic (improper) bound that follows from the result of Moran and Yehudayoff. Second, we prove an analogous result for graphs of cliquewidth at most $t$.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives proper sample compression schemes of size O(t log t) for ball hypergraphs on treewidth-t graphs, improving the general quadratic bound from Moran-Yehudayoff.

read the letter

The main result is a proper sample compression scheme of size O(t log t) for the hypergraph of balls in any graph of treewidth at most t. This improves on the quadratic improper bound that follows from applying the general Moran-Yehudayoff theorem to VC-bounded hypergraphs, and the authors claim the new bound is tight up to the log factor. They prove an analogous statement for cliquewidth at most t. The schemes are proper, so the compression recovers an explicit consistent ball rather than just the labels on the sample. That is a concrete step beyond the general existence result. The bounds depend cleanly on the single parameter t, which fits the setting of learning on graphs with bounded treewidth or cliquewidth. The derivation uses the structural sparsity directly instead of fitting extra constants. The only soft spot worth checking is the tightness argument: the abstract asserts it, but the lower-bound construction needs to be inspected to confirm it is not loose or dependent on special cases. No circularity or hidden parameters appear in the stated claims. This work is aimed at researchers in sample compression and VC-dimension who want explicit bounds for geometrically defined hypergraphs on sparse graphs. A reader focused on learning theory for structured data would get direct value from the size improvement. It deserves a serious referee because the improvement is specific, the setting is natural, and the claims are stated sharply enough to be checked in detail. Send it to peer review.

Referee Report

1 major / 2 minor

Summary. The paper proves that the hypergraph whose hyperedges are the balls of a graph G with treewidth at most t admits a proper sample compression scheme of size O(t log t); the bound is claimed to be tight up to the logarithmic factor and improves the quadratic improper bound implied by the Moran-Yehudayoff theorem for VC-dimension-bounded hypergraphs. An analogous result is shown for graphs of cliquewidth at most t. The schemes are proper, recovering an explicit consistent ball rather than only labels.

Significance. If the results hold, they supply the first explicit proper compression schemes for ball hypergraphs that are parameterized solely by the structural parameters treewidth and cliquewidth. This strengthens the connection between structural graph theory and sample compression beyond the general (improper) bounds of Moran-Yehudayoff, and the properness property is a concrete strengthening.

major comments (1)

[Abstract] Abstract and §1: the tightness claim 'up to the logarithmic factor' is load-bearing for the main contribution; the manuscript must exhibit an explicit Omega(t) lower-bound construction (or cite the precise theorem) showing that no proper scheme of size o(t) exists for some family of treewidth-t graphs.

minor comments (2)

[§2] §2 (definitions): the precise notion of 'proper' recovery (i.e., the compressed sample must output an explicit ball of the original graph) should be restated with the same notation used in the theorems.
[Figure 1] Figure 1 and surrounding text: the running example of a treewidth-2 graph should explicitly label the balls that realize the compressed sample.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive evaluation and the precise comment on the tightness claim. We will revise the manuscript to explicitly support the Omega(t) lower bound as requested.

read point-by-point responses

Referee: [Abstract] Abstract and §1: the tightness claim 'up to the logarithmic factor' is load-bearing for the main contribution; the manuscript must exhibit an explicit Omega(t) lower-bound construction (or cite the precise theorem) showing that no proper scheme of size o(t) exists for some family of treewidth-t graphs.

Authors: We agree that the tightness statement requires explicit justification in the abstract and introduction. The O(t log t) upper bound is the primary technical result, but the claim that it is tight up to logarithmic factors depends on a matching Omega(t) lower bound for proper schemes. In the revised manuscript we will add a short explicit construction (or a precise citation if one exists in the literature) demonstrating that there exist families of treewidth-t graphs whose ball hypergraphs require proper sample compression of size Omega(t). This will be placed in Section 1 immediately after the statement of the upper bound, making the tightness claim fully supported without altering the main theorems. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The derivation proceeds from standard graph decompositions (treewidth and cliquewidth) to explicit proper sample compression schemes of size O(t log t) via combinatorial constructions on bags and separators. These steps rely on external results only for the general Moran-Yehudayoff quadratic bound, which is improved upon rather than presupposed; no parameter is fitted to the target quantity, no self-citation chain carries the central claim, and the hypergraph definition (balls under containment) is used directly without redefinition or renaming of known patterns. The result is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Relies on standard definitions of treewidth, cliquewidth, balls in graphs, and the Littlestone-Warmuth sample compression framework; no free parameters or new invented entities are introduced in the abstract.

axioms (2)

standard math Standard definition of treewidth and cliquewidth as graph parameters
Used to restrict the input graphs for which the compression size is bounded.
domain assumption Balls are defined via shortest-path distance in the graph
Central to forming the hypergraph whose sample compression is studied.

pith-pipeline@v0.9.0 · 5589 in / 1281 out tokens · 20565 ms · 2026-05-13T18:45:12.804033+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages

[1]

P. Assouad. Densité et dimension.Annales de l’Institut Fourier, 33:233–282, 1983

work page 1983
[2]

Beaudou, P

L. Beaudou, P. Dankelmann, F. Foucaud, M. A. Henning, A. Mary, and A. Parreau. Bounding the order of a graph using its diameter and metric dimension: A study through tree decompositions and VC dimension. SIAM Journal on Discrete Mathematics , 32(2):902–918, 2018.arXiv:1610.01475

work page arXiv 2018
[3]

Ben-David and A

S. Ben-David and A. Litman. Combinatorial variability of Vapnik-Chervonenkis classes with applications to sample compression schemes.Discrete Applied Mathematics, 86(1):3–25, 1998

work page 1998
[4]

Bergé, E

P. Bergé, E. Bonnet, and H. Déprés. Deciding twin-width at most 4 is NP-complete. In M. Bojańczyk, E. Merelli, and D. P. Woodruff, editors, 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022) , volume 229 of Leibniz International Proceedings in Informatics (LIPIcs) , pages 18:1–18:20, Dagstuhl, Germany, 2022. Schloss Dagstuh...

work page arXiv 2022
[5]

Blumer, A

A. Blumer, A. Ehrenfeucht, D. Haussler, and M. K. Warmuth. Learnability and the Vapnik-Chervonenkis dimension. Journal of the ACM , 36(4):929–965, 1989

work page 1989
[6]

H. L. Bodlaender. NC-algorithms for graphs with small treewidth. In J. van Leeuwen, editor,Graph- Theoretic Concepts in Computer Science, pages 1–10, Berlin, Heidelberg, 1989. Springer Berlin Heidelberg. 26 BOURNEUF, HODOR, MICEK, AND RAMBAUD

work page 1989
[7]

Bonnet, C

É. Bonnet, C. Geniet, E. J. Kim, S. Thomassé, and R. Watrigant. Twin-width II: small classes.Combina- torial Theory, 2(2), 2022.arXiv:2006.09877

work page arXiv 2022
[8]

Bonnet, E

E. Bonnet, E. J. Kim, S. Thomassé, and R. Watrigant. Twin-width I: Tractable FO model checking. Journal of the ACM , 69(1), 2021.arXiv:2004.14789

work page arXiv 2021
[9]

Bousquet and S

N. Bousquet and S. Thomassé. VC-dimension and Erdős–Pósa property. Discrete Mathematics , 338(12):2302–2317, 2015.arXiv:1412.1793

work page arXiv 2015
[10]

Chalopin, V

J. Chalopin, V. Chepoi, F. Mc Inerney, S. Ratel, and Y. Vaxès. Sample compression schemes for balls in graphs. SIAM Journal on Discrete Mathematics , 37(4):2585–2616, 2023.arXiv:2206.13254

work page arXiv 2023
[11]

Chalopin, V

J. Chalopin, V. Chepoi, S. Moran, and M. K. Warmuth. Unlabeled sample compression schemes and corner peelings for ample and maximum classes. Journal of Computer and System Sciences , 127:1–28,

work page
[12]

Chase, B

Z. Chase, B. Chornomaz, S. Hanneke, S. Moran, and A. Yehudayoff. Dual VC dimension obstructs sample compressionby embeddings.In The Thirty Seventh Annual Conference on Learning Theory, pages923–946. PMLR, 2024. arXiv:2405.17120

work page arXiv 2024
[13]

Chepoi, K

V. Chepoi, K. Knauer, and M. Philibert. Ample completions of oriented matroids and complexes of uniform oriented matroids. SIAM Journal on Discrete Mathematics , 36(1):509–535, 2022.arXiv:2007.12527

work page arXiv 2022
[14]

Chepoi, K

V. Chepoi, K. Knauer, and M. Philibert. Labeled sample compression schemes for complexes of oriented matroids. Journal of Computer and System Sciences , 144:103543, 2024.arXiv:2110.15168

work page arXiv 2024
[15]

D. G. Corneil and U. Rotics. On the relationship between clique-width and treewidth.SIAM Journal on Computing, 34(4):825–847, 2005

work page 2005
[16]

Courcelle and S

B. Courcelle and S. Olariu. Upper bounds to the clique width of graphs.Discrete Applied Mathematics , 101(1–3):77–114, 2000

work page 2000
[17]

Dujmović, R

V. Dujmović, R. Hickingbotham, J. Hodor, G. Joret, H. La, P. Micek, P. Morin, C. Rambaud, and D. R. Wood. The grid-minor theorem revisited.Combinatorica, 45(6):62, 2025.arXiv:2307.02816

work page arXiv 2025
[18]

Floyd and M

S. Floyd and M. Warmuth. Sample compression, learnability, and the Vapnik-Chervonenkis dimension. Machine Learning, 21(3):269–304, 1995

work page 1995
[19]

M. Grohe. Local tree-width, excluded minors, and approximation algorithms. Combinatorica, 23(4):613–632, 2003.arXiv:0001128

work page 2003
[20]

Gurski and E

F. Gurski and E. Wanke.The Tree-Width of Clique-Width Bounded Graphs without Kn,n, page 196–205. Springer Berlin Heidelberg, 2000

work page 2000
[21]

Johansson

Ö. Johansson. Graph decomposition using node labels . PhD thesis, Numerisk analys och datalogi, 2001

work page 2001
[22]

Littlestone and M

N. Littlestone and M. K. Warmuth. Relating data compression and learnability. Unpublished manuscript, available online atmwarmuth.bitbucket.io/pubs/T1.pdf

work page
[23]

T. Marc. Unlabeled sample compression schemes for oriented matroids. Discrete Mathematics , 347(7):114006, 2024. arXiv:2203.11535v4

work page arXiv 2024
[24]

Moran and M

S. Moran and M. K. Warmuth. Labeled compression schemes for extremal classes. InAlgorithmic Learning Theory: 27th International Conference, ALT 2016, Bari, Italy, October 19-21, 2016, Proceedings , page 34–49, Berlin, Heidelberg, 2016. Springer-Verlag.arXiv:1506.00165

work page arXiv 2016
[25]

Moran and A

S. Moran and A. Yehudayoff. Sample compression schemes for VC classes.Journal of the ACM , 63(3):1–10,

work page
[26]

Oum and P

S.-i. Oum and P. Seymour. Approximating clique-width and branch-width.Journal of Combinatorial The- ory, Series B , 96(4):514–528, 2006

work page 2006
[27]

Pálvölgyi and G

D. Pálvölgyi and G. Tardos. Unlabeled compression schemes exceeding the VC-dimension.Discrete Applied Mathematics, 276:102–107, 2020.arXiv:1811.12471

work page arXiv 2020
[28]

L. G. Valiant. A theory of the learnable.Communications of the ACM , 27(11):1134–1142, 1984

work page 1984
[29]

V. N. Vapnik and A. Y. Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability & Its Applications , 16(2):264–280, 1971

work page 1971
[30]

E. Wanke. k-NLC graphs and polynomial algorithms. Discrete Applied Mathematics , 54(2–3):251–266, 1994

work page 1994
[31]

M. K. Warmuth. Compressing to VC dimension many points. In B. Schölkopf and M. K. Warmuth, editors, Learning Theory and Kernel Machines , pages 743–744, Berlin, Heidelberg, 2003. Springer Berlin Heidelberg. Appendix A. Cliquewidth and VC-dimension of the hypergraph of balls In this section, we show that the hypergraph of balls in a graph of NLC-width at m...

work page 2003