Recognition: unknown
A Leakage Bound for Confidence Sets after Black-Box Selection
Pith reviewed 2026-05-07 12:36 UTC · model grok-4.3
The pith
Black-box selection inflates noncoverage of any fixed-target confidence procedure by at most the average total variation distance between the marginal and conditional laws of the inferential data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
For any fixed-target confidence procedure, the selected-target noncoverage probability is bounded above by the nominal fixed-target noncoverage probability plus the average total variation distance between the marginal law of the inferential data and its conditional law given the selected object. The bound quantifies the inferential cost of black-box selection by the information the selected object carries about the inferential sample.
What carries the argument
The average total variation distance between the marginal law of the inferential data and its conditional law given the selected object; this single term supplies the worst-case excess noncoverage after selection.
If this is right
- Sample splitting is recovered exactly as the zero-leakage case.
- A mutual-information upper bound on the excess noncoverage follows at once from the total-variation statement.
- Explicit numerical guarantees are obtained for noisy screening by substituting a Gaussian information bound for the total-variation term.
- The same bound applies unchanged to any black-box selection rule and any fixed-target procedure, without requiring an explicit selection event.
Where Pith is reading between the lines
- The total-variation leakage perspective could be used to compare the cost of different adaptive workflows even when the selection rule itself remains opaque.
- In practice the bound suggests allocating data between selection and inference stages so that the expected total-variation distance stays below a target tolerance.
- Because the bound depends only on the dependence between selection output and inference data, it offers a way to certify coverage without modeling the selection mechanism explicitly.
Load-bearing premise
The selection rule and the inferential data are jointly defined on a probability space so that the conditional law given the selected object exists and the total variation distance between the marginal and conditional laws is finite.
What would settle it
A concrete numerical example in which a fixed-target 95 percent procedure after black-box selection exhibits noncoverage strictly larger than five percent plus the computed average total variation distance between the marginal and conditional laws of the inferential data.
read the original abstract
In many analyses the object reported at the end is not fixed in advance, but is chosen after a preliminary search over variables, subgroups, transformations, models or contrasts. Classical selective-inference methods are most effective when this search can be written as an explicit selection event. This note treats the less structured case in which the selection rule is a black box and inference is required for the target indexed by the selected object. We show that, for any fixed-target confidence procedure, selected-target noncoverage is bounded by the nominal fixed-target noncoverage plus the average total variation distance between the marginal law of the inferential data and its conditional law given the selected object. A mutual-information bound follows immediately. The result recovers sample splitting as the zero-leakage case and gives explicit guarantees for noisy screening through a Gaussian information bound. Thus the inferential cost of black-box selection is quantified by the information that the selected object carries about the inferential sample.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript derives a general bound showing that, for any fixed-target confidence procedure with nominal non-coverage level α, the non-coverage probability for the target indexed by a black-box selected object S is at most α plus the expected total variation distance between the marginal law of the inferential data Y and the conditional law of Y given S. The bound follows from the law of total probability applied to the coverage event and the definition of total variation; a mutual-information consequence is immediate. The result recovers sample splitting as the zero-leakage case and supplies an explicit Gaussian information bound for noisy screening.
Significance. If the central bound holds, the note supplies a simple, distribution-free quantification of the inferential cost of black-box selection measured directly by information leakage. This is significant for contemporary selective inference, where selection rules are often too complex for explicit conditioning arguments. The derivation is elementary, parameter-free, and recovers known procedures as special cases, which strengthens its utility. The minimal assumption of a well-defined probability space on which the relevant measures exist is standard and does not limit applicability on Polish spaces.
minor comments (3)
- The abstract states that a mutual-information bound follows immediately, but the main text should include an explicit corollary statement of this bound (including the precise form of the mutual information term) for clarity.
- Notation for the selected object and the conditional law (e.g., Q_S versus P(·|S)) should be introduced once in the introduction and used consistently thereafter to avoid any ambiguity when the bound is applied to screening examples.
- The Gaussian screening illustration would benefit from a brief remark on how the total-variation term is computed or bounded in that setting, even if the details are standard.
Simulated Author's Rebuttal
We thank the referee for the positive and insightful report, which accurately captures the main contribution of the note. We appreciate the recommendation to accept.
Circularity Check
No significant circularity identified
full rationale
The derivation proceeds directly from the law of total probability and the definition of total variation distance: selected noncoverage equals the expectation of the conditional probability of noncoverage given the selection, which is bounded by the marginal noncoverage (at most alpha) plus the TV distance for each fixed selection value. No parameters are fitted and then relabeled as predictions, no self-citations are load-bearing for the central inequality, and the probability-space assumption is standard rather than self-referential. The bound is therefore an immediate consequence of the stated definitions and does not reduce to its inputs by construction.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math Existence of a probability space supporting both the selection rule and the inferential data
- standard math Total variation distance is a valid metric on probability measures
Reference graph
Works this paper leans on
-
[1]
Bassily, R., Nissim, K., Smith, A., Steinke, T., Stemmer, U., and Ullman, J. (2016). Algorithmic stability for adaptive data analysis. InProceedings of the 48th Annual ACM Symposium on Theory of Computing, pages 1046–1059
2016
-
[2]
Berk, R., Brown, L., Buja, A., Zhang, K., and Zhao, L. (2013). Valid post-selection inference.The Annals of Statistics, 41(2):802–837
2013
-
[3]
Cover, T. M. and Thomas, J. A. (2006).Elements of Information Theory. Wiley, 2 edition
2006
-
[4]
Dwork, C., Feldman, V., Hardt, M., Pitassi, T., Reingold, O., and Roth, A. (2015). The reusable holdout: Preserving validity in adaptive data analysis.Science, 349(6248):636–638
2015
- [5]
-
[6]
(2002).Foundations of Modern Probability
Kallenberg, O. (2002).Foundations of Modern Probability. Springer, second edition
2002
-
[7]
D., Sun, D
Lee, J. D., Sun, D. L., Sun, Y., and Taylor, J. E. (2016). Exact post-selection inference, with application to the lasso.The Annals of Statistics, 44(3):907–927
2016
-
[8]
and P¨ otscher, B
Leeb, H. and P¨ otscher, B. M. (2005). Model selection and inference: Facts and fiction.Econometric Theory, 21(1):21–59. 8
2005
-
[9]
and Zou, J
Russo, D. and Zou, J. (2016). Controlling bias in adaptive data analysis using information the- ory. InProceedings of the 19th International Conference on Artificial Intelligence and Statistics, volume 51 ofProceedings of Machine Learning Research, pages 1232–1240
2016
-
[10]
and Tibshirani, R
Taylor, J. and Tibshirani, R. J. (2015). Statistical learning and selective inference.Proceedings of the National Academy of Sciences, 112(25):7629–7634
2015
-
[11]
and Taylor, J
Tian, X. and Taylor, J. (2018). Selective inference with a randomized response.The Annals of Statistics, 46(2):679–710
2018
-
[12]
J., Taylor, J., Lockhart, R., and Tibshirani, R
Tibshirani, R. J., Taylor, J., Lockhart, R., and Tibshirani, R. (2016). Exact post-selection in- ference for sequential regression procedures.Journal of the American Statistical Association, 111(514):600–620
2016
-
[13]
and Raginsky, M
Xu, A. and Raginsky, M. (2017). Information-theoretic analysis of generalization capability of learning algorithms. InAdvances in Neural Information Processing Systems, volume 30, pages 2524–2533. A Proofs of results Proof of Proposition 1.Takeθ 0 =θ 1 = 0, and define C0(D) = ( ∅, D=a 0, {0}, D̸=a 0, C1(D) = ( ∅, D=a 1, {0}, D̸=a 1. Under the marginal law...
2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.