Everywhere Valid Bounds on False Discovery Proportions in Conformal Inference

Emmanuel J. Cand\`es; Ying Jin; Ziang Song

arxiv: 2605.20726 · v2 · pith:FIMI4DSXnew · submitted 2026-05-20 · 📊 stat.ME · cs.LG· stat.ML

Everywhere Valid Bounds on False Discovery Proportions in Conformal Inference

Ziang Song , Ying Jin , Emmanuel J. Cand\`es This is my paper

Pith reviewed 2026-06-30 17:35 UTC · model grok-4.3

classification 📊 stat.ME cs.LGstat.ML

keywords conformal inferencefalse discovery proportionmultiple testingsimultaneous inferencedistribution-free boundspost hoc selectionoutlier detectionconformal selection

0 comments

The pith

Finite-sample bounds on false discovery proportions hold simultaneously for all rejection thresholds in conformal inference

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops a method to bound the false discovery proportion from above with high probability, and these bounds remain valid no matter which threshold is ultimately chosen for rejection. The approach works in finite samples and without assuming any specific data distribution beyond the conformal p-value construction. It achieves simultaneous validity by building an envelope around the cumulative distribution of the null p-values that contains their empirical version with high probability, obtained through sampling their joint distribution. Practitioners can adjust the shape of this envelope to make the bounds tighter where most rejections are expected to occur, and the method applies directly to outlier detection and conformal selection tasks.

Core claim

By sampling from the joint distribution of null conformal p-values, the authors construct a high-probability envelope for their empirical distribution function. This envelope yields finite-sample, distribution-free upper bounds on the false discovery proportion that are valid simultaneously over all possible rejection thresholds, thereby permitting arbitrary post hoc selection of the threshold while preserving statistical guarantees.

What carries the argument

High-probability envelope for the empirical distribution function of null conformal p-values, constructed by sampling from their joint distribution

Load-bearing premise

The joint distribution of the null conformal p-values can be sampled while maintaining the distribution-free guarantee.

What would settle it

Repeated simulations where the constructed envelope fails to cover the empirical distribution function of null p-values with the stated high probability would disprove the simultaneous validity.

Figures

Figures reproduced from arXiv: 2605.20726 by Emmanuel J. Cand\`es, Ying Jin, Ziang Song.

**Figure 1.** Figure 1: Simultaneous FDP bounds in a drug-target interaction task. (a) One realization of the true FDP (blue) and simultaneous upper bounds constructed by our method with the following statistics: Truncated Higher Criticism (MC-THC), Higher Criticism (MC-HC), and Kolmogorov-Smirnov (MC-KS). The dashed line is the upper bound adapted from [GBR24]. (b) Residuals (upper bound minus true FDP) across 100 independent ex… view at source ↗

**Figure 2.** Figure 2: Upper bounds on Fbn,m(t) (n = m = 100, δ = 0.1) constructed via Algorithm 1 with B = 100. The gray curves represent 100 independent realizations of Fbn,m. The colored curves represent different envelope constructions: MC-KS (Kolmogorov–Smirnov statistic), MC-BJ (Berk–Jones statistic), MC-HC (HigherCriticism statistic), and MC-THC (truncated Higher-Criticism statistic). The Baseline curve corresponds to th… view at source ↗

**Figure 3.** Figure 3: Empirical coverage of the FDP bound. The plot displays the difference between our FDP upper bound (using MC-THC) and the true FDP across 100 replications of the outlier detection task (n = m = 1000, signal strength a = 0.2, target 1 − δ = 0.9). The curves remain above zero in 96% of the trials, demonstrating validity. The variance of the bound decreases as the rejection threshold t increases. Validity of F… view at source ↗

**Figure 4.** Figure 4: Impact of refinement strategies across signal strengths. FDP upper bounds in outlier detection with fixed purity (90%) and varying signal strength (a ∈ {0.1, 0.2, 0.5}). Columns correspond to different FDP envelopes. The curves compare three refinement levels: bounding the null count mb 0, the self-refinement step from Proposition 4.5, and the combined strategy. The combined approach consistently yields th… view at source ↗

**Figure 5.** Figure 5: Impact of refinement strategies across purity levels. FDP upper bounds with fixed signal strength (a = 0.2) and varying inlier purity ({70%, 80%, 90%}). As the proportion of outliers increases (lower purity), the benefit of the mb 0 tightening becomes more pronounced compared to the self-refinement step alone. 5 Controlling FDP/precision in conformal selection In this section, we demonstrate how our method… view at source ↗

**Figure 6.** Figure 6: Failure of post hoc BH levels as FDP certificates. Starting from [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗

**Figure 7.** Figure 7: The risk of post hoc parameter selection. A realization of [PITH_FULL_IMAGE:figures/full_fig_p021_7.png] view at source ↗

**Figure 8.** Figure 8: Upper bounds on Fbn,m(t) (n = m = 100, δ = 0.1) constructed via Algorithm 1 with B = 100 with different shape parameter β. The gray curves represent 100 independent realizations of Fbn,m. C Applications to i.i.d. p-values C.1 Conformal vs. i.i.d. p-values: what changes and why Our envelope-based technique from Section 3 also applies to i.i.d. p-values U1, . . . , Um ∼ Unif[0, 1]. This setting can be viewed… view at source ↗

**Figure 9.** Figure 9: ECDF envelopes: Conformal vs. i.i.d. p-values. Left Column: Empirical CDFs of conformal p-values with fixed calibration size n = 100. Note that as m increases, the variance does not vanish due to the persistent randomness of the finite calibration set. Right Column: Empirical CDFs of i.i.d. uniform pvalues. The distribution concentrates tightly around the diagonal y = x as m → ∞. This contrast highlights … view at source ↗

**Figure 10.** Figure 10: Plots of ρn(t) for several values of n. Since cn,m(t) = m−1 + (1 − m−1 )ρn(t), this also illustrates the t-dependence of cn,m(t). The dependence is substantial only for very small n but becomes much less pronounced as n grows. C.3 Constructing calibration-conditional valid p-values We revisit the calibration-conditional p-values of [BCL+23] and show how our envelope construction provides a simple and tigh… view at source ↗

**Figure 11.** Figure 11: Different CCV p-values adjustments. The blue curves display 100 independent realizations of sorted i.i.d. uniform p-values (order statistics) with sample size n = 1000, plotted against their normalized rank i/n. This setup replicates the validation framework of [PITH_FULL_IMAGE:figures/full_fig_p027_11.png] view at source ↗

**Figure 12.** Figure 12: Detailed view of CCV p-value adjustments in the lower tail. A zoomed-in perspective of [PITH_FULL_IMAGE:figures/full_fig_p028_12.png] view at source ↗

read the original abstract

Modern applications of conformal inference to multiple testing problems, such as outlier detection and candidate selection, often involve selecting test samples whose conformal p-values fall below a threshold. The quality of such methods is often measured by the false discovery proportion (FDP), defined as the fraction of incorrect selections. Existing approaches typically control the expected value of the FDP, using methods such as the Benjamini-Hochberg procedure. This approach fails to provide high-probability bounds on the realized false discovery proportion and invalidates statistical guarantees if the rejection threshold is selected after inspecting the data. This paper establishes finite-sample, distribution-free upper bounds on the FDP that hold simultaneously over all possible rejection thresholds, enabling arbitrary post hoc selection of the threshold. Simultaneous validity is achieved by constructing a high-probability envelope for the empirical distribution function of null conformal p-values by sampling from their joint distribution. Furthermore, our framework allows practitioners to modulate the envelope's shape, thereby producing tight bounds in rejection regions of primary interest. We use this flexible approach to derive simultaneous FDP upper bounds for both outlier detection and conformal selection. We demonstrate through synthetic and real-data experiments that the resulting bounds are both valid and substantially less conservative than those derived from existing approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives simultaneous finite-sample FDP bounds over all thresholds by envelope sampling on null conformal p-values, but the distribution-free status of that sampling step is unclear and likely the load-bearing issue.

read the letter

The core new thing is simultaneous high-probability upper bounds on realized FDP that survive arbitrary post-hoc threshold choice. Existing conformal FDR work mostly controls expectation and breaks under data-dependent thresholds; this tries to fix both with an envelope on the empirical CDF of the null p-values.

The construction itself is straightforward in outline: sample from the joint law of the null conformal p-values, build a high-probability band around their EDF, and read off the worst-case FDP at any threshold. The flexibility to tilt the envelope toward regions of interest is a practical plus, and the synthetic and real-data checks are at least mentioned.

The soft spot is exactly the sampling step. Conformal p-values are exchangeable under the null but their joint distribution depends on the unknown data measure through the shared calibration scores. The abstract says the envelope comes from sampling that joint; nothing indicates a distribution-free Monte Carlo procedure that avoids extra modeling. If the sampling requires knowledge of the data law or auxiliary assumptions, the finite-sample distribution-free claim does not go through. That is the central technical risk.

This is aimed at people already using conformal methods for outlier detection or selection who want stronger than expectation control. A reader who cares about post-hoc validity would find the target useful, but would need to verify the sampling construction before relying on it.

Send it to referees; the idea is worth checking even if the sampling detail needs work.

Referee Report

2 major / 2 minor

Summary. The manuscript claims to establish finite-sample, distribution-free upper bounds on the false discovery proportion (FDP) that hold simultaneously over all rejection thresholds in conformal inference for multiple testing. Simultaneous validity is obtained by constructing a high-probability envelope around the empirical distribution function of the null conformal p-values via sampling from their joint distribution; the envelope shape can be modulated for tighter control in regions of interest. The framework is applied to derive bounds for outlier detection and conformal selection, with synthetic and real-data experiments demonstrating validity and reduced conservatism relative to existing methods.

Significance. If the sampling step can be carried out while preserving distribution-freeness, the result would meaningfully advance beyond expectation-based procedures such as Benjamini-Hochberg by supplying high-probability FDP control and post-hoc threshold validity. The ability to shape the envelope is a practical feature. The experiments provide supporting evidence, but the overall significance hinges on a clear, assumption-free construction of the joint sampling procedure.

major comments (2)

[Abstract (simultaneous validity paragraph)] Abstract, paragraph on simultaneous validity: the envelope is formed 'by sampling from their joint distribution' of null conformal p-values. Under exchangeability the joint law is a function of the unknown data-generating measure; the manuscript must exhibit an auxiliary, distribution-free Monte-Carlo procedure (or equivalent construction) that does not require knowledge of this measure. Without an explicit, verifiable algorithm the finite-sample guarantee is conditional rather than unconditional.
[Envelope construction (method section)] The central claim of simultaneous, everywhere-valid FDP bounds rests on the envelope construction. If the sampling step implicitly conditions on fitted quantities or additional modeling assumptions not stated in the abstract, the distribution-free property fails. The paper should supply the precise sampling algorithm together with a proof that the resulting envelope probability statement remains valid under the conformal exchangeability assumption alone.

minor comments (2)

Clarify in the main text the precise mechanism by which the envelope shape is modulated and how the modulation parameter is chosen without data-dependent tuning that would invalidate the simultaneous guarantee.
In the experimental sections, report the number of Monte-Carlo draws used to approximate the envelope and any convergence diagnostics; this information is needed for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and for highlighting the need for an explicit, verifiable description of the sampling procedure. The joint distribution of the null conformal p-values is in fact distribution-free under exchangeability alone (via the uniform permutation model on scores), so the Monte Carlo envelope construction preserves the unconditional finite-sample guarantee. We will revise the manuscript to make the algorithm and its validity proof fully explicit.

read point-by-point responses

Referee: [Abstract (simultaneous validity paragraph)] Abstract, paragraph on simultaneous validity: the envelope is formed 'by sampling from their joint distribution' of null conformal p-values. Under exchangeability the joint law is a function of the unknown data-generating measure; the manuscript must exhibit an auxiliary, distribution-free Monte-Carlo procedure (or equivalent construction) that does not require knowledge of this measure. Without an explicit, verifiable algorithm the finite-sample guarantee is conditional rather than unconditional.

Authors: We respectfully note that the premise is incorrect: under the exchangeability assumption the scores are exchangeable, so their relative ordering is uniformly distributed over all permutations independently of the data-generating measure. The conformal p-values are deterministic functions of these ranks; hence their joint law is known and distribution-free. The auxiliary sampling procedure draws Monte Carlo replicates by generating random permutations, computing the induced p-value vectors, and forming the envelope from these replicates. This requires no knowledge of the underlying measure. We will revise the abstract for clarity and add an explicit algorithm box plus a short proof subsection in the methods. revision: yes
Referee: [Envelope construction (method section)] The central claim of simultaneous, everywhere-valid FDP bounds rests on the envelope construction. If the sampling step implicitly conditions on fitted quantities or additional modeling assumptions not stated in the abstract, the distribution-free property fails. The paper should supply the precise sampling algorithm together with a proof that the resulting envelope probability statement remains valid under the conformal exchangeability assumption alone.

Authors: The sampling algorithm is the permutation Monte Carlo described above; it conditions on nothing beyond the exchangeability of the calibration and test scores and does not use any fitted model quantities. Because the permutation distribution is exactly the law of the ranks under exchangeability, the high-probability envelope statement holds unconditionally. We will insert the precise algorithm (including pseudocode) and the accompanying validity argument into the methods section of the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is constructive and self-contained

full rationale

The paper constructs simultaneous FDP bounds via a high-probability envelope on the empirical CDF of null conformal p-values, obtained by sampling their joint distribution. This is presented as a direct statistical procedure relying on exchangeability properties standard in conformal inference, without any reduction of the claimed bounds to fitted parameters, self-definitions, or load-bearing self-citations within the provided text. No equations or steps equate the output bounds to the inputs by construction. The sampling step is an input assumption of the framework rather than a derived claim that collapses into itself. This is the common case of an independent methodological contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the ability to sample from the joint distribution of null conformal p-values; no free parameters or new entities are introduced in the abstract.

axioms (1)

domain assumption Null conformal p-values admit sampling from their joint distribution
The envelope is constructed by sampling from this joint distribution (abstract).

pith-pipeline@v0.9.1-grok · 5752 in / 1221 out tokens · 28905 ms · 2026-06-30T17:35:41.642131+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

7 extracted references · 5 canonical work pages

[1]

Optimized conformal selection: Powerful selective inference after con- formity score optimization.arXiv preprint arXiv:2411.17983,

[BJ24] Tian Bai and Ying Jin. Optimized conformal selection: Powerful selective inference after con- formity score optimization.arXiv preprint arXiv:2411.17983,

work page arXiv
[2]

Acs: An interactive framework for conformal selection.arXiv preprint arXiv:2507.15825,

[GJNR25] Yu Gui, Ying Jin, Yash Nair, and Zhimei Ren. Acs: An interactive framework for conformal selection.arXiv preprint arXiv:2507.15825,

work page arXiv
[3]

Control of the false discovery proportion for independently tested null hypotheses.Journal of Probability and Statistics, 2012,

[GL+12] Yongchao Ge, Xiaochun Li, et al. Control of the false discovery proportion for independently tested null hypotheses.Journal of Probability and Statistics, 2012,

2012
[4]

arXiv preprint arXiv:2307.09291 , year=

[JC23a] Ying Jin and Emmanuel J Cand` es. Model-free selective inference under covariate shift via weighted conformal p-values.arXiv preprint arXiv:2307.09291,

work page arXiv
[5]

Txcon- formal: Controlling false discoveries in ai-driven therapeutic discovery.bioRxiv, pages 2026–04,

19 [JHD+26] Ying Jin, Kexin Huang, Nathaniel Diamant, Kerry R Buchholz, Steven T Rutherford, Nicholas Skelton, Tommaso Biancalani, Gabriele Scalia, Jure Leskovec, and Emmanuel J Candes. Txcon- formal: Controlling false discoveries in ai-driven therapeutic discovery.bioRxiv, pages 2026–04,

2026
[6]

Diversifying conformal selections

[NJYC25] Yash Nair, Ying Jin, James Yang, and Emmanuel Candes. Diversifying conformal selections. arXiv preprint arXiv:2506.16229,

work page arXiv
[7]

arXiv preprint arXiv:2010.16061 (2020)

[Pow20] David MW Powers. Evaluation: from precision, recall and f-measure to roc, informedness, markedness and correlation.arXiv preprint arXiv:2010.16061,

work page arXiv 2010

[1] [1]

Optimized conformal selection: Powerful selective inference after con- formity score optimization.arXiv preprint arXiv:2411.17983,

[BJ24] Tian Bai and Ying Jin. Optimized conformal selection: Powerful selective inference after con- formity score optimization.arXiv preprint arXiv:2411.17983,

work page arXiv

[2] [2]

Acs: An interactive framework for conformal selection.arXiv preprint arXiv:2507.15825,

[GJNR25] Yu Gui, Ying Jin, Yash Nair, and Zhimei Ren. Acs: An interactive framework for conformal selection.arXiv preprint arXiv:2507.15825,

work page arXiv

[3] [3]

Control of the false discovery proportion for independently tested null hypotheses.Journal of Probability and Statistics, 2012,

[GL+12] Yongchao Ge, Xiaochun Li, et al. Control of the false discovery proportion for independently tested null hypotheses.Journal of Probability and Statistics, 2012,

2012

[4] [4]

arXiv preprint arXiv:2307.09291 , year=

[JC23a] Ying Jin and Emmanuel J Cand` es. Model-free selective inference under covariate shift via weighted conformal p-values.arXiv preprint arXiv:2307.09291,

work page arXiv

[5] [5]

Txcon- formal: Controlling false discoveries in ai-driven therapeutic discovery.bioRxiv, pages 2026–04,

19 [JHD+26] Ying Jin, Kexin Huang, Nathaniel Diamant, Kerry R Buchholz, Steven T Rutherford, Nicholas Skelton, Tommaso Biancalani, Gabriele Scalia, Jure Leskovec, and Emmanuel J Candes. Txcon- formal: Controlling false discoveries in ai-driven therapeutic discovery.bioRxiv, pages 2026–04,

2026

[6] [6]

Diversifying conformal selections

[NJYC25] Yash Nair, Ying Jin, James Yang, and Emmanuel Candes. Diversifying conformal selections. arXiv preprint arXiv:2506.16229,

work page arXiv

[7] [7]

arXiv preprint arXiv:2010.16061 (2020)

[Pow20] David MW Powers. Evaluation: from precision, recall and f-measure to roc, informedness, markedness and correlation.arXiv preprint arXiv:2010.16061,

work page arXiv 2010