arxiv: 2604.26706 · v1 · submitted 2026-04-29 · 🧮 math.ST · stat.TH

Recognition: unknown

A Leakage Bound for Confidence Sets after Black-Box Selection

Sayantan Banerjee

Pith reviewed 2026-05-07 12:36 UTC · model grok-4.3

classification 🧮 math.ST stat.TH

keywords selective inferenceblack-box selectionconfidence setstotal variation distanceleakage boundnoncoverage probabilitysample splittingmutual information

0 comments

The pith

Black-box selection inflates noncoverage of any fixed-target confidence procedure by at most the average total variation distance between the marginal and conditional laws of the inferential data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that when an analysis selects its target of inference through an arbitrary black-box rule, the failure probability of a subsequent confidence set is bounded by the usual fixed-target noncoverage rate plus an explicit leakage term. The leakage is measured by the expected total variation distance between the unconditional distribution of the inferential data and the same distribution conditioned on the selected object. This bound holds for any confidence procedure and any selection rule, requires no explicit description of the selection event, and immediately implies a mutual-information version of the same statement. The result recovers the zero extra cost of sample splitting and supplies concrete guarantees when screening is performed with additive Gaussian noise.

Core claim

For any fixed-target confidence procedure, the selected-target noncoverage probability is bounded above by the nominal fixed-target noncoverage probability plus the average total variation distance between the marginal law of the inferential data and its conditional law given the selected object. The bound quantifies the inferential cost of black-box selection by the information the selected object carries about the inferential sample.

What carries the argument

The average total variation distance between the marginal law of the inferential data and its conditional law given the selected object; this single term supplies the worst-case excess noncoverage after selection.

If this is right

Sample splitting is recovered exactly as the zero-leakage case.
A mutual-information upper bound on the excess noncoverage follows at once from the total-variation statement.
Explicit numerical guarantees are obtained for noisy screening by substituting a Gaussian information bound for the total-variation term.
The same bound applies unchanged to any black-box selection rule and any fixed-target procedure, without requiring an explicit selection event.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The total-variation leakage perspective could be used to compare the cost of different adaptive workflows even when the selection rule itself remains opaque.
In practice the bound suggests allocating data between selection and inference stages so that the expected total-variation distance stays below a target tolerance.
Because the bound depends only on the dependence between selection output and inference data, it offers a way to certify coverage without modeling the selection mechanism explicitly.

Load-bearing premise

The selection rule and the inferential data are jointly defined on a probability space so that the conditional law given the selected object exists and the total variation distance between the marginal and conditional laws is finite.

What would settle it

A concrete numerical example in which a fixed-target 95 percent procedure after black-box selection exhibits noncoverage strictly larger than five percent plus the computed average total variation distance between the marginal and conditional laws of the inferential data.

read the original abstract

In many analyses the object reported at the end is not fixed in advance, but is chosen after a preliminary search over variables, subgroups, transformations, models or contrasts. Classical selective-inference methods are most effective when this search can be written as an explicit selection event. This note treats the less structured case in which the selection rule is a black box and inference is required for the target indexed by the selected object. We show that, for any fixed-target confidence procedure, selected-target noncoverage is bounded by the nominal fixed-target noncoverage plus the average total variation distance between the marginal law of the inferential data and its conditional law given the selected object. A mutual-information bound follows immediately. The result recovers sample splitting as the zero-leakage case and gives explicit guarantees for noisy screening through a Gaussian information bound. Thus the inferential cost of black-box selection is quantified by the information that the selected object carries about the inferential sample.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a direct total-variation bound on non-coverage inflation from any black-box selection rule, plus an immediate mutual-information corollary.

read the letter

The central result is straightforward: for any fixed-target confidence set, the non-coverage probability after black-box selection is at most the nominal level plus the expected total variation distance between the marginal law of the inferential data and its conditional law given the selected object. This follows immediately from the law of total probability and the definition of TV distance, as the stress-test note points out. The paper also notes the mutual-information consequence and recovers sample splitting as the zero-leakage case, plus an explicit Gaussian screening bound. That framing for arbitrary selectors is the main novelty; it does not require writing down an explicit selection event the way most selective-inference work does. The derivation is elementary and the assumptions are minimal—just a well-defined probability space—so the claim holds up on its own terms. The bound is general enough to apply to variable screening, model selection, or subgroup analysis without further modeling. On the downside, the TV term will often be difficult to compute or bound tightly for realistic black-box rules, so the result is more of a diagnostic than a plug-and-play fix. It also does not produce sharper or adaptive procedures, only an upper bound on the damage. The paper is short and self-contained, aimed at readers already working on post-selection inference who want a quick way to quantify leakage without explicit conditioning. It is worth sending to referees as a short note; the core argument is clean and the literature context is handled reasonably. I would bring it to a reading group for discussion of how tight the bound tends to be in practice.

Referee Report

0 major / 3 minor

Summary. The manuscript derives a general bound showing that, for any fixed-target confidence procedure with nominal non-coverage level α, the non-coverage probability for the target indexed by a black-box selected object S is at most α plus the expected total variation distance between the marginal law of the inferential data Y and the conditional law of Y given S. The bound follows from the law of total probability applied to the coverage event and the definition of total variation; a mutual-information consequence is immediate. The result recovers sample splitting as the zero-leakage case and supplies an explicit Gaussian information bound for noisy screening.

Significance. If the central bound holds, the note supplies a simple, distribution-free quantification of the inferential cost of black-box selection measured directly by information leakage. This is significant for contemporary selective inference, where selection rules are often too complex for explicit conditioning arguments. The derivation is elementary, parameter-free, and recovers known procedures as special cases, which strengthens its utility. The minimal assumption of a well-defined probability space on which the relevant measures exist is standard and does not limit applicability on Polish spaces.

minor comments (3)

The abstract states that a mutual-information bound follows immediately, but the main text should include an explicit corollary statement of this bound (including the precise form of the mutual information term) for clarity.
Notation for the selected object and the conditional law (e.g., Q_S versus P(·|S)) should be introduced once in the introduction and used consistently thereafter to avoid any ambiguity when the bound is applied to screening examples.
The Gaussian screening illustration would benefit from a brief remark on how the total-variation term is computed or bounded in that setting, even if the details are standard.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive and insightful report, which accurately captures the main contribution of the note. We appreciate the recommendation to accept.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The derivation proceeds directly from the law of total probability and the definition of total variation distance: selected noncoverage equals the expectation of the conditional probability of noncoverage given the selection, which is bounded by the marginal noncoverage (at most alpha) plus the TV distance for each fixed selection value. No parameters are fitted and then relabeled as predictions, no self-citations are load-bearing for the central inequality, and the probability-space assumption is standard rather than self-referential. The bound is therefore an immediate consequence of the stated definitions and does not reduce to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The derivation rests on standard axioms of probability theory and properties of total variation distance; no free parameters, new entities, or ad-hoc assumptions are introduced in the abstract.

axioms (2)

standard math Existence of a probability space supporting both the selection rule and the inferential data
Required for defining marginal and conditional laws and total variation distance.
standard math Total variation distance is a valid metric on probability measures
Used to quantify the difference between marginal and conditional distributions.

pith-pipeline@v0.9.0 · 5450 in / 1347 out tokens · 49035 ms · 2026-05-07T12:36:17.757577+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

13 extracted references · 1 canonical work pages

[1]

Bassily, R., Nissim, K., Smith, A., Steinke, T., Stemmer, U., and Ullman, J. (2016). Algorithmic stability for adaptive data analysis. InProceedings of the 48th Annual ACM Symposium on Theory of Computing, pages 1046–1059

2016
[2]

Berk, R., Brown, L., Buja, A., Zhang, K., and Zhao, L. (2013). Valid post-selection inference.The Annals of Statistics, 41(2):802–837

2013
[3]

Cover, T. M. and Thomas, J. A. (2006).Elements of Information Theory. Wiley, 2 edition

2006
[4]

Dwork, C., Feldman, V., Hardt, M., Pitassi, T., Reingold, O., and Roth, A. (2015). The reusable holdout: Preserving validity in adaptive data analysis.Science, 349(6248):636–638

2015
[5]

Fithian, W., Sun, D., and Taylor, J. (2014). Optimal inference after model selection. arXiv:1410.2597

work page arXiv 2014
[6]

(2002).Foundations of Modern Probability

Kallenberg, O. (2002).Foundations of Modern Probability. Springer, second edition

2002
[7]

D., Sun, D

Lee, J. D., Sun, D. L., Sun, Y., and Taylor, J. E. (2016). Exact post-selection inference, with application to the lasso.The Annals of Statistics, 44(3):907–927

2016
[8]

and P¨ otscher, B

Leeb, H. and P¨ otscher, B. M. (2005). Model selection and inference: Facts and fiction.Econometric Theory, 21(1):21–59. 8

2005
[9]

and Zou, J

Russo, D. and Zou, J. (2016). Controlling bias in adaptive data analysis using information the- ory. InProceedings of the 19th International Conference on Artificial Intelligence and Statistics, volume 51 ofProceedings of Machine Learning Research, pages 1232–1240

2016
[10]

and Tibshirani, R

Taylor, J. and Tibshirani, R. J. (2015). Statistical learning and selective inference.Proceedings of the National Academy of Sciences, 112(25):7629–7634

2015
[11]

and Taylor, J

Tian, X. and Taylor, J. (2018). Selective inference with a randomized response.The Annals of Statistics, 46(2):679–710

2018
[12]

J., Taylor, J., Lockhart, R., and Tibshirani, R

Tibshirani, R. J., Taylor, J., Lockhart, R., and Tibshirani, R. (2016). Exact post-selection in- ference for sequential regression procedures.Journal of the American Statistical Association, 111(514):600–620

2016
[13]

and Raginsky, M

Xu, A. and Raginsky, M. (2017). Information-theoretic analysis of generalization capability of learning algorithms. InAdvances in Neural Information Processing Systems, volume 30, pages 2524–2533. A Proofs of results Proof of Proposition 1.Takeθ 0 =θ 1 = 0, and define C0(D) = ( ∅, D=a 0, {0}, D̸=a 0, C1(D) = ( ∅, D=a 1, {0}, D̸=a 1. Under the marginal law...

2017