Surprises in Proper Positive-Only Learning

Anay Mehrotra; Farnam Mansouri; Manolis Zampetakis; Shai Ben-David

arxiv: 2606.28309 · v1 · pith:VXHA6XSJnew · submitted 2026-06-26 · 📊 stat.ML · cs.LG· math.ST· stat.TH

Surprises in Proper Positive-Only Learning

Shai Ben-David , Farnam Mansouri , Anay Mehrotra , Manolis Zampetakis This is my paper

Pith reviewed 2026-06-29 01:59 UTC · model grok-4.3

classification 📊 stat.ML cs.LGmath.STstat.TH

keywords positive-only learningproper PAC learningVC dimensioncombinatorial dimensionsuniform exterior separabilitylearning theory separations

0 comments

The pith

A concept class is properly learnable from positive-only samples exactly when it has finite VC dimension and satisfies uniform exterior separability.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper settles the open question of characterizing proper learning in the positive-only sampling model, where the learner sees only positive examples yet must succeed on the full distribution. A sympathetic reader cares because this model captures realistic scenarios with one-sided data, and the result shows that the usual finite VC dimension condition is insufficient here unlike in standard PAC learning. The characterization introduces uniform exterior separability as the additional requirement and demonstrates multiple separations between proper versus improper, randomized versus deterministic, and ERM-based versus non-ERM learners that do not arise in the ordinary model. Along the way it defines new combinatorial dimensions that may help analyze other restricted sampling settings.

Core claim

A concept class is properly learnable from positive-only samples if and only if it has finite VC dimension and satisfies uniform exterior separability. This condition, together with the separations it implies, produces a landscape in which proper and improper positive-only learning diverge, randomized and deterministic proper learners diverge, some classes admit no ERM learner, and finite VC dimension fails to guarantee even non-uniform learnability.

What carries the argument

uniform exterior separability, the combinatorial condition that, together with finite VC dimension, fully characterizes proper positive-only learnability.

If this is right

Proper and improper positive-only learning are separated.
Randomized and deterministic proper positive-only learning are separated.
There exist classes for which no empirical risk minimizer serves as a learner.
Finite VC dimension does not suffice for non-uniform learnability in the positive-only model.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same separability lens could be applied to other one-sided or restricted sampling models to derive analogous characterizations.
Concrete classes such as halfspaces or intervals could be checked for uniform exterior separability to test the condition on familiar examples.
The new combinatorial dimensions may help separate learnability regimes in settings with label noise or missing labels.

Load-bearing premise

The positive-only sampling model requires the learner to succeed on the original distribution that places mass on both positive and negative regions, even though all training samples come from the positive region alone.

What would settle it

Exhibit a concept class with finite VC dimension that is properly learnable from positive-only samples yet fails uniform exterior separability, or a class that meets both conditions yet is not properly learnable.

read the original abstract

Binary classification from positive-only samples is a variant of PAC learning in which the learner receives i.i.d. samples from the positive region of an unknown target concept, but is evaluated under the original distribution (which places mass on both positive and negative regions). This model dates back to Natarajan [1987, STOC], and the characterization of improper learning is well-known -- it even appears in textbooks. The characterization of proper positive-only learning, however, has long remained open. In this work, we revisit and settle this question: a concept class is properly learnable from positive-only samples if and only if it has finite VC dimension and satisfies a new combinatorial condition, which we call uniform exterior separability. Together with several separation results, this characterization reveals a surprisingly rich landscape that differs sharply from standard PAC learning: proper and improper learning are separated, randomized and deterministic proper learning are separated, there are classes for which no ERM is a learner, and finite VC dimension does not suffice even for non-uniform learning. Along the way, we introduce new combinatorial dimensions that we believe can be of broader interest in learning theory.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper settles the long-open characterization of proper positive-only learning with finite VC plus uniform exterior separability, plus several clean separations from standard PAC.

read the letter

The core result is a clean iff: a class is properly learnable from positive-only samples exactly when it has finite VC dimension and satisfies uniform exterior separability. That closes the question left open since Natarajan 1987. They also give separations showing proper and improper learning diverge here, randomized and deterministic proper learners diverge, some classes have no ERM that works, and finite VC is not enough even for non-uniform learning. Those differences from ordinary PAC are the main surprise.

The work is technically solid on its own terms. The new condition is introduced to make the characterization go through, and the separations are stated explicitly rather than left implicit. The combinatorial dimensions they define along the way look like they could be reusable. The model definition follows the 1987 setup without hidden changes.

The main limitation is that uniform exterior separability is a new property whose naturalness and ease of verification are not yet clear from the abstract alone; it may turn out to be as hard to check as the learning problem itself in some cases. The separation constructions are not described here, so it is hard to judge how artificial they are. No circularity or self-citation issues appear.

This is for people who care about exact characterizations in learning theory and about how sampling models affect the proper/improper and ERM distinctions. It is worth a serious referee because it resolves an explicit open question with a non-trivial new condition and documents the landscape differences. I would bring it to a reading group and would cite the characterization if the proofs check out.

Referee Report

0 major / 1 minor

Summary. The paper claims that a concept class is properly learnable from positive-only samples in the Natarajan (1987) model if and only if it has finite VC dimension and satisfies a new combinatorial condition called uniform exterior separability. It also establishes separation results showing that proper and improper learning are distinct, randomized and deterministic proper learning are distinct, there exist classes where no ERM is a learner, and finite VC dimension does not suffice even for non-uniform learning. New combinatorial dimensions are introduced.

Significance. If the iff characterization and separations hold, the result resolves a long-open question in learning theory and demonstrates that positive-only learning has a richer structure than standard PAC learning, with proper learning strictly harder. The new combinatorial condition and dimensions are presented as potentially of broader interest.

minor comments (1)

[Abstract] The abstract refers to 'several separation results' without naming the specific classes or constructions used; a brief pointer to the relevant theorem or section would improve readability.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of our work on the characterization of proper positive-only learning and for recommending minor revision. The report correctly identifies the main contributions, including the iff characterization via finite VC dimension and uniform exterior separability, as well as the separation results between proper/improper, randomized/deterministic, and the insufficiency of finite VC dimension alone. No specific major comments or requested changes were provided in the report.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper defines a new combinatorial property (uniform exterior separability) and establishes an iff characterization with finite VC dimension for proper positive-only learnability. This is a standard non-circular characterization result: the property is introduced independently, necessity and sufficiency are proved separately, and no step reduces a claimed prediction or theorem to a fitted parameter, self-citation chain, or definitional tautology. The abstract and reader's summary confirm the result rests on this new condition rather than any enumerated circular pattern, making the derivation self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Review performed on abstract only; the paper relies on the standard definition of VC-dimension and the Natarajan positive-only sampling model, both treated as background.

axioms (2)

standard math Standard definition of VC-dimension for concept classes
Invoked as the first half of the characterization.
domain assumption Positive-only sampling model from Natarajan 1987
The learner receives samples only from the positive region but is evaluated on the full distribution.

pith-pipeline@v0.9.1-grok · 5741 in / 1228 out tokens · 21404 ms · 2026-06-29T01:59:32.217765+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

14 extracted references · 3 canonical work pages

[1]

Theory of Classification: A Survey of Some Recent Advances

URL: https://www.numdam.org/articles/10.5802/aif.938/ (cit. on pp. 8, 18, 25). [BBL05] Stéphane Boucheron, Olivier Bousquet, and Gábor Lugosi. “Theory of Classification: A Survey of Some Recent Advances”. In: ESAIM: Probability and Statistics 9 (2005), pp. 323–375 (cit. on p. 21). [BD18] Jessa Bekker and Jesse Davis. “Estimating the class prior in positiv...

work page doi:10.5802/aif.938/ 2005
[2]

Efficient Truncated Statistics with Unknown Truncation

URL: https://arxiv.org/abs/2411.09642 (cit. on p. 4). [KTZ19] Vasilis Kontonis, Christos Tzamos, and Manolis Zampetakis. “Efficient Truncated Statistics with Unknown Truncation”. In: 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS) (2019), pp. 1578–1595 (cit. on pp. 2, 4). [KV94] Michael J. Kearns and Umesh V . Vazirani.An Introdu...

work page doi:10.7551/mitpress/3897.001.0001 2019
[3]

Deductive Learning

URL: https://doi.org/10.1145/1968.1972 (cit. on p. 1). [Val84b] Leslie G. Valiant. “Deductive Learning”. In: Philosophical Transactions of the Royal Society of London. Series A, Mathematical and Physical Sciences 312.1522 (1984), pp. 441–446 (cit. on pp. 1, 12). 17 A Additional Preliminaries In this section, we introduce additional notation and definition...

work page doi:10.1145/1968.1972 1968
[4]

H is properly learnable from positive-only samples
[5]

H satisfies uniform exterior-separability (Definition 3) and VCdim (H) < ∞. Proof. Observe that in Theorem 3 we have proved that under the assumption VCdim(H) < ∞, uniform exterior separability is equivalent to distributional exterior separability (Definition 4). We divide the proof into two parts, one for each direction of the claim. Part 1 (2 =⇒ 1): Def...
[6]

H is properly P AC learnable from positive-only examples by a randomized learner
[7]

In particular, by Assouad’s bound [Ass83], d⋆ < 2d+1, so r (ε, δ) = eO(d + log (1/εδ))

H is properly P AC learnable from positive-only examples by a learner using at most r (ε, δ) = eO(log (d⋆/εδ)) random bits and m (ε, δ) = eO((1/ε) (d + log 1/δ)) positive samples . In particular, by Assouad’s bound [Ass83], d⋆ < 2d+1, so r (ε, δ) = eO(d + log (1/εδ)) . Proof of Theorem 5. Observe that the second item implies the first immediately since a ...
[8]

First, we prove that exact exterior separation (Definition 2) is sufficient for deterministic proper learning of VC classes, and
[9]

We then prove the remaining results from Section 4

Then, we prove necessary and sufficient conditions for consistency over countable domains. We then prove the remaining results from Section 4. D.1 Exact Exterior Separation is Sufficient for Deterministic Learning We begin with proving that exact exterior separability is sufficient for deterministic proper learning, which will be used throughout this appe...
[10]

No concept class H which does not satisfy finite exterior separability (Definition 6) is properly positive-only consistent
[11]

Every concept class H over countable domain that satisfies finite exterior separability (Definition 6) is properly positive-only consistent. Proof. We divide the proof into two parts corresponding to the two claims. Part 1 (consistency implies FES). Consider any S, F that violate finite exterior separability in Definition 6. For any x ∈ F define Dx as Dx(...
[12]

Thus, A(S) = {1, 2,a⋆} for some a⋆ ∈ N \ {1, 2}

Since A is an ERM learner, we have Domain(S) ⊆ A (S). Thus, A(S) = {1, 2,a⋆} for some a⋆ ∈ N \ {1, 2}. Furthermore, under the target ℓa⋆ = {1, 2,a⋆ + 1}, the point a⋆ is negative and has Da⋆-mass 1/2. Hence, errDa⋆ (A(S), ℓa⋆ ) = 1 2 . In addition, under D+,a⋆, the mass of 1 is 1/n and the mass of 2 is (n − 1)/n. Therefore, Pr S+∼Dn +,a⋆ [S+ = S] ≥ n · 1 ...
[13]

H1 is properly positive-only consistent, but not non-uniformly properly positive-only learnable
[14]

H2 is non-uniformly properly positive-only learnable, but not properly positive-only learnable. Proof. We divide the proof into two parts corresponding to the two claims. Part 1 (Separation of consistency and non-uniform learnability). Consider the concept class Hbin := {ha | a ≥ 3} , ha := {1, 2,a, a + 2, a + 4, . . .} . Observe that VCdim(Hbin) = 1. We ...

[1] [1]

Theory of Classification: A Survey of Some Recent Advances

URL: https://www.numdam.org/articles/10.5802/aif.938/ (cit. on pp. 8, 18, 25). [BBL05] Stéphane Boucheron, Olivier Bousquet, and Gábor Lugosi. “Theory of Classification: A Survey of Some Recent Advances”. In: ESAIM: Probability and Statistics 9 (2005), pp. 323–375 (cit. on p. 21). [BD18] Jessa Bekker and Jesse Davis. “Estimating the class prior in positiv...

work page doi:10.5802/aif.938/ 2005

[2] [2]

Efficient Truncated Statistics with Unknown Truncation

URL: https://arxiv.org/abs/2411.09642 (cit. on p. 4). [KTZ19] Vasilis Kontonis, Christos Tzamos, and Manolis Zampetakis. “Efficient Truncated Statistics with Unknown Truncation”. In: 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS) (2019), pp. 1578–1595 (cit. on pp. 2, 4). [KV94] Michael J. Kearns and Umesh V . Vazirani.An Introdu...

work page doi:10.7551/mitpress/3897.001.0001 2019

[3] [3]

Deductive Learning

URL: https://doi.org/10.1145/1968.1972 (cit. on p. 1). [Val84b] Leslie G. Valiant. “Deductive Learning”. In: Philosophical Transactions of the Royal Society of London. Series A, Mathematical and Physical Sciences 312.1522 (1984), pp. 441–446 (cit. on pp. 1, 12). 17 A Additional Preliminaries In this section, we introduce additional notation and definition...

work page doi:10.1145/1968.1972 1968

[4] [4]

H is properly learnable from positive-only samples

[5] [5]

H satisfies uniform exterior-separability (Definition 3) and VCdim (H) < ∞. Proof. Observe that in Theorem 3 we have proved that under the assumption VCdim(H) < ∞, uniform exterior separability is equivalent to distributional exterior separability (Definition 4). We divide the proof into two parts, one for each direction of the claim. Part 1 (2 =⇒ 1): Def...

[6] [6]

H is properly P AC learnable from positive-only examples by a randomized learner

[7] [7]

In particular, by Assouad’s bound [Ass83], d⋆ < 2d+1, so r (ε, δ) = eO(d + log (1/εδ))

H is properly P AC learnable from positive-only examples by a learner using at most r (ε, δ) = eO(log (d⋆/εδ)) random bits and m (ε, δ) = eO((1/ε) (d + log 1/δ)) positive samples . In particular, by Assouad’s bound [Ass83], d⋆ < 2d+1, so r (ε, δ) = eO(d + log (1/εδ)) . Proof of Theorem 5. Observe that the second item implies the first immediately since a ...

[8] [8]

First, we prove that exact exterior separation (Definition 2) is sufficient for deterministic proper learning of VC classes, and

[9] [9]

We then prove the remaining results from Section 4

Then, we prove necessary and sufficient conditions for consistency over countable domains. We then prove the remaining results from Section 4. D.1 Exact Exterior Separation is Sufficient for Deterministic Learning We begin with proving that exact exterior separability is sufficient for deterministic proper learning, which will be used throughout this appe...

[10] [10]

No concept class H which does not satisfy finite exterior separability (Definition 6) is properly positive-only consistent

[11] [11]

Every concept class H over countable domain that satisfies finite exterior separability (Definition 6) is properly positive-only consistent. Proof. We divide the proof into two parts corresponding to the two claims. Part 1 (consistency implies FES). Consider any S, F that violate finite exterior separability in Definition 6. For any x ∈ F define Dx as Dx(...

[12] [12]

Thus, A(S) = {1, 2,a⋆} for some a⋆ ∈ N \ {1, 2}

Since A is an ERM learner, we have Domain(S) ⊆ A (S). Thus, A(S) = {1, 2,a⋆} for some a⋆ ∈ N \ {1, 2}. Furthermore, under the target ℓa⋆ = {1, 2,a⋆ + 1}, the point a⋆ is negative and has Da⋆-mass 1/2. Hence, errDa⋆ (A(S), ℓa⋆ ) = 1 2 . In addition, under D+,a⋆, the mass of 1 is 1/n and the mass of 2 is (n − 1)/n. Therefore, Pr S+∼Dn +,a⋆ [S+ = S] ≥ n · 1 ...

[13] [13]

H1 is properly positive-only consistent, but not non-uniformly properly positive-only learnable

[14] [14]

H2 is non-uniformly properly positive-only learnable, but not properly positive-only learnable. Proof. We divide the proof into two parts corresponding to the two claims. Part 1 (Separation of consistency and non-uniform learnability). Consider the concept class Hbin := {ha | a ≥ 3} , ha := {1, 2,a, a + 2, a + 4, . . .} . Observe that VCdim(Hbin) = 1. We ...