Recognition: no theorem link
What is Learnable in Valiant's Theory of the Learnable?
Pith reviewed 2026-05-14 17:30 UTC · model grok-4.3
The pith
A class is learnable in Valiant's model exactly when every realizable positive sample admits a poly-size adaptive query-compression scheme.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
For every finite domain, a class is learnable in Valiant's model if and only if every realizable positive sample can be certified by a poly-size adaptive query-compression scheme. This places Valiant's learnability strictly between PAC learnability and the version of Valiant's model without membership queries. The same strict sandwich persists for arbitrary domains. Halfspaces in d dimensions are learnable with a poly(d) sample and poly(d) polylog(1/epsilon) query algorithm, and at least Omega(d) samples or queries are necessary.
What carries the argument
poly-size adaptive query-compression scheme: a short interactive protocol in which the learner uses membership queries to certify that a given positive sample is consistent with some concept in the class
If this is right
- Learnability in Valiant's model is strictly stronger than the query-free variant but strictly weaker than PAC learnability.
- d-dimensional halfspaces, which are not learnable without queries, become learnable once membership queries are permitted.
- Halfspaces admit an algorithm using poly(d) samples and poly(d) polylog(1/epsilon) queries, with an Omega(d) lower bound on the combined number of samples and queries.
- The same strict sandwich between PAC and query-free Valiant learnability holds on arbitrary, possibly infinite domains.
Where Pith is reading between the lines
- Membership queries can change which classes are qualitatively learnable, rather than merely improving sample or computational efficiency.
- The compression perspective may be used to design algorithms for other geometric classes by focusing on interactive certification of positive samples.
- The absence of an exact characterization for infinite domains highlights the need for new techniques that handle unbounded domains without losing the sandwich property.
Load-bearing premise
The learner must output a hypothesis with no false positives, and the domain must be finite to obtain the exact if-and-only-if characterization.
What would settle it
A finite-domain concept class in which some realizable positive sample lacks a polynomial-size adaptive query-compression scheme yet the class is still learnable in Valiant's model.
Figures
read the original abstract
Valiant's 1984 paper is widely credited with introducing the PAC learning model, but it, in fact, introduced a different model: unlike PAC learning, the learner receives only positives, may issue membership queries, and must output a hypothesis with no false positives. Prior work characterized variants, including the case without queries. We revisit Valiant's original model and ask: *Which classes are learnable in it?* For every finite domain, including Valiant's Boolean-hypercube setting, we show that a class is learnable if and only if every realizable positive sample can be certified by a poly-size adaptive query-compression scheme. This is a new variant of sample compression where the learner certifies samples via a short interaction with the membership oracle. Our characterization shows that learnability in Valiant's model is strictly sandwiched between learnability in the PAC model and the variant of Valiant's model without membership queries. This is one of the rare cases where introducing membership queries changes the set of learnable classes, and not just the sample or computational complexity. Next, we study the natural extension of the model to arbitrary domains. While we do not obtain an exact characterization, our techniques readily generalize and show that the same strict sandwiching persists. Finally, we show that $d$-dimensional halfspaces, which are not learnable without queries, are learnable with queries: we give a $\mathrm{poly}(d) \tilde{O}(1/\epsilon)$ sample and $\mathrm{poly}(d) \mathrm{polylog}(1/\epsilon)$ query algorithm, and prove that at least $\Omega(d)$ samples or queries are necessary. To our knowledge, this is the first algorithm for halfspaces in Valiant's model. Together, these results uncover a surprisingly rich theory behind Valiant's original notion of learnability and introduce ideas that may be of independent interest in learning theory.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper revisits Valiant's 1984 model (positives only, membership queries allowed, hypothesis with no false positives) and gives an exact characterization for finite domains: a class is learnable iff every realizable positive sample admits a poly-size adaptive query-compression scheme. This places Valiant's model strictly between PAC learnability and the no-query variant. The paper supplies a poly(d) sample / polylog(1/ε) query algorithm for d-dimensional halfspaces (with Ω(d) lower bound) and shows the sandwiching persists for arbitrary domains.
Significance. If the central claims hold, the work supplies a clean, query-sensitive characterization of learnability in Valiant's original model and demonstrates that membership queries can change the set of learnable classes rather than merely their complexity. The adaptive query-compression notion is new and may be reusable; the halfspace algorithm is the first explicit result in this model and comes with nearly tight bounds.
minor comments (3)
- [Abstract] Abstract, sentence on the characterization: the phrase 'poly-size adaptive query-compression scheme' is introduced without an immediate pointer to its formal definition; a parenthetical reference to the relevant section would improve readability.
- [Halfspace section] Halfspace algorithm paragraph: the sample bound is written 'poly(d) Õ(1/ε)'; clarify whether the Õ hides logarithmic factors in d, in 1/ε, or both, and state the precise dependence on the dimension in the theorem statement.
- [Arbitrary domains section] The extension to arbitrary domains states that 'the same strict sandwiching persists' but does not obtain an exact characterization; a short remark on why the finite-domain proof technique fails to extend exactly would help readers assess the scope of the result.
Simulated Author's Rebuttal
We thank the referee for the supportive review and recommendation of minor revision. The referee's summary accurately reflects the paper's contributions on the characterization of learnability in Valiant's model and the halfspace results. No specific major comments were raised in the report.
Circularity Check
No significant circularity
full rationale
The paper derives an if-and-only-if characterization for finite domains by defining an adaptive query-compression scheme directly from the model's constraints (positive samples only, membership queries, no false positives in the output hypothesis) and proving equivalence to learnability without any reduction to fitted parameters, self-referential equations, or load-bearing self-citations. The halfspace algorithm is presented as an independent constructive example that satisfies the new scheme, and the sandwiching between PAC and query-free variants follows from the explicit definitions rather than renaming or smuggling prior ansatzes. The derivation chain remains self-contained against the stated model.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Finite domain for the exact if-and-only-if characterization
invented entities (1)
-
poly-size adaptive query-compression scheme
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Neural Net Algorithms That Learn in Polynomial Time from Examples and Queries
URL: https://doi.org/10.1023/A:1022821128753 (cit. on pp. 2, 7, 13). [Bau91] Eric B. Baum. “Neural Net Algorithms That Learn in Polynomial Time from Examples and Queries”. In: IEEE Transactions on Neural Networks 2.1 (1991), pp. 5–19 (cit. on p. 13). [BBL05] Stéphane Boucheron, Olivier Bousquet, and Gábor Lugosi. “Theory of Classification: A Sur- vey of S...
-
[2]
The true sample complexity of active learning
URL: https://doi.org/10.1007/PL00013833 (cit. on p. 14). [BHV10] Maria-Florina Balcan, Steve Hanneke, and Jennifer Wortman Vaughan. “The true sample complexity of active learning”. In: Machine learning 80.2 (2010), pp. 111–139 (cit. on pp. 2, 13). 44 [BKMS21] Mark Braverman, Gillat Kol, Shay Moran, and Raghuvansh R. Saxena. “Near Optimal Dis- tributed Lea...
-
[3]
In: Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms (SODA), pp
URL: https://proceedings.mlr.press/v291/cai25b.html (cit. on p. 13). [CP25] Moses Charikar and Chirag Pabbaraju. “Exploring Facets of Language Generation in the Limit”. In: Proceedings of Thirty Eighth Conference on Learning Theory . Vol. 291. Proceedings of Machine Learning Research. PMLR, 2025, pp. 854–887. URL: https://proceedings. mlr.press/v291/chari...
work page doi:10.1137/1 2025
-
[4]
URL: https://doi.org/10.1145/1968.1972 (cit. on pp. 1, 2). [Val84b] Leslie Gabriel Valiant. “Deductive learning”. In: Philosophical Transactions of the Royal Society of London. Series A, Mathematical and Physical Sciences 312.1522 (1984), pp. 441–446 (cit. on p. 1). [VC71] Vladimir Vapnik and Alexey Chervonenkis. “On the Uniform Convergence of Relative Fr...
-
[5]
if h ⋆ ∈ K, then S ⊆ DS,K ⊆ supp(h⋆)
-
[6]
if S ̸⊆ DS,K, then h⋆ /∈ K. Proof. Suppose h⋆ ∈ K . Every point in S is positive for h⋆, so h⋆ ∈ VK(S). Hence ΣS,K is a valid query compression scheme for S, and the realized transcript rS,K is realizable with respect to VK(S). The defining property of query compression gives S ⊆ DS,K. Moreover, since h⋆ itself generated the transcript rS,K, we have h⋆ ∈ ...
-
[7]
Hence, by a Chernoff bound, with probability at least 1 − η at least 3k/5 runs are successful
(Successful run) Each run is successful with probability at least 2/3, and these events are independent across j. Hence, by a Chernoff bound, with probability at least 1 − η at least 3k/5 runs are successful. Let E denote this event. No false positives. Assume E holds and fix any x ∈ {0, 1}d with h⋆(x) = 0. Every successful hj satisfies hj(x) = 0, so at m...
-
[8]
We now describe the learner. Initialize H0 ≡ 0. For each scale s = 0, 1, . . . ,T, given Hs, let ps := Pr x∼D (Hs(x) = 0) . Because D is compatible with h⋆, this is exactly the D-mass of the current residual positive region. Testing the residual mass. Draw As := C1 log 1/ηs τs fresh samples from D, where C1 > 0 is a suf- ficiently large absolute constant....
-
[9]
If s = 0, then p0 ≤ 1, so p1 ≤ 1 3 = τ0
Therefore Hs+1 = Hs ∨ hs has no false positives, and ps+1 = Pr x∼D (Hs(x) = 0 and hs(x) = 0) = ps Pr x∼µs (hs(x) = 0) ≤ ps 3 . If s = 0, then p0 ≤ 1, so p1 ≤ 1 3 = τ0. If s ≥ 1, then the induction hypothesis gives ps ≤ τs−1, and hence ps+1 ≤ τs−1 3 = τs. This completes the induction. Correctness. By the no-false-positive invariant and (10), on E the final...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.