A Joint Finite-Sample Certificate for Adaptive Selective Conformal Risk Control
Pith reviewed 2026-06-27 18:42 UTC · model grok-4.3
The pith
A certificate for selective predictors under adaptive selection bounds risk directly as a ratio to tighten finite-sample guarantees.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We give such a certificate for bounded, possibly non-monotone losses by treating the selected risk directly as a ratio rather than through a Hoeffding-style range bound. The construction couples three confidence bounds: a variance-adaptive empirical-Bernstein bound on the ratio risk, a Clopper--Pearson bound on acceptance, and a two-sided closeness bound on utility. Together they lower-bound the certified policy's utility absolutely and to within 2γu of the best over the certified set, both non-vacuous whenever feasible.
What carries the argument
Coupled triple of confidence bounds consisting of a variance-adaptive empirical-Bernstein bound on the selected risk ratio, a Clopper-Pearson bound on acceptance probability, and a two-sided closeness bound on utility.
If this is right
- The certificate remains valid for adaptive selection of threshold pairs from any finite grid of size m on a calibration set of size ncert.
- It simultaneously provides non-vacuous absolute and relative (within 2γu) lower bounds on utility whenever the risk and acceptance conditions are feasible.
- The risk bound improves the acceptance-floor dependence from 1/pmin to 1/sqrt(pmin) compared with range-only Hoeffding-ratio constructions.
- A closed-form per-pair regime exists in which the new risk bound is tighter than the corresponding Hoeffding conformal risk control bound.
Where Pith is reading between the lines
- The coupling technique could be reused to produce joint finite-sample certificates for other combinations of risk, coverage, and utility metrics in conformal settings.
- Empirical gains observed on ImageNet and COCO but absent on ADE20K indicate that effectiveness depends on the underlying risk-margin regime relative to alpha.
- The finite-grid restriction suggests that extensions to continuous parameter spaces would need new concentration arguments to retain exact finite-sample validity.
Load-bearing premise
The three individual confidence bounds can be coupled while preserving a joint finite-sample validity guarantee under adaptive selection from a finite grid.
What would settle it
A Monte Carlo experiment in which the empirical joint coverage rate of the three coupled bounds falls materially below the nominal level across repeated draws with adaptive grid selection on held-out data would falsify the claimed validity.
Figures
read the original abstract
Selective predictors answer on confident inputs and abstain elsewhere; deploying one safely needs a single finite-sample certificate that simultaneously upper-bounds the selected risk, lower-bounds the acceptance probability $\pacc$ above a floor $\pmin$, and lower-bounds the deployment utility. This certificate must be valid under adaptive threshold selection from a finite grid of $m$ pairs on $\ncert$ samples. We give such a certificate for bounded, possibly non-monotone losses by treating the selected risk directly as a ratio rather than through a Hoeffding-style range bound. The construction couples three confidence bounds: a variance-adaptive empirical-Bernstein bound on the ratio risk, a Clopper--Pearson bound on acceptance, and a two-sided closeness bound on utility. Together they lower-bound the certified policy's utility absolutely and to within $2\gammau$ of the best over the \emph{certified set}, both non-vacuous whenever feasible; a regime-scoped third leg matches an external oracle, informative only where the risk margin $\gammar < \alpha$ and vacuous at the headline operating points. Relative to the range-only Hoeffding-ratio construction this sharpens the acceptance-floor dependence from $1/\pmin$ to $1/\sqrt{\pmin}$, and a closed-form corollary identifies a per-pair regime in which our risk bound dominates a Hoeffding conformal risk control (Hoeffding--CRC) selective bound. Empirically, on ImageNet (three ResNets) and COCO val 2017 panoptic, the certificate opens a $+22$ pp certified-acceptance frontier over Hoeffding--CRC and is ${\approx}10{\times}$ tighter than a non-vacuous matched-valid baseline; these gains are regime-scoped, not universal, and absent on ADE20K. The certifier runs in $O(\ncert m)$ time.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims a joint finite-sample certificate for adaptive selective conformal risk control on bounded losses. It constructs the certificate by treating selected risk as a ratio and coupling a variance-adaptive empirical-Bernstein bound on the ratio, a Clopper-Pearson bound on acceptance probability above pmin, and a two-sided closeness bound on utility. The certificate is asserted to remain valid under data-dependent selection of one pair from a finite grid of m candidates on ncert samples, to deliver a 1/sqrt(pmin) dependence instead of 1/pmin, to lower-bound absolute and relative utility, and to yield empirical gains of +22 pp certified acceptance on ImageNet and COCO while running in O(ncert m) time.
Significance. If the joint validity argument holds, the construction supplies a materially tighter, non-vacuous finite-sample guarantee for safe deployment of selective predictors than existing Hoeffding-ratio or Hoeffding-CRC baselines. The explicit regime-scoped comparison to Hoeffding-CRC and the closed-form per-pair dominance corollary are concrete strengths; the empirical demonstration on three ResNets and panoptic segmentation further indicates practical utility when the risk margin condition is met.
major comments (2)
- [Abstract] Abstract (construction paragraph): the claim that the three marginal bounds can be coupled to preserve a joint finite-sample validity guarantee after adaptive selection over the m-grid is load-bearing for the central result, yet the abstract supplies no derivation, union-bound accounting, or martingale/peeling argument showing that the dependence induced by selection does not inflate the failure probability beyond the nominal level. Marginal validity of each bound does not automatically transfer to the selected quantities.
- [Abstract] Abstract (relative to Hoeffding-ratio): the asserted sharpening from 1/pmin to 1/sqrt(pmin) dependence is presented as a direct consequence of the ratio treatment, but without an explicit accounting of any m-dependent loosening required for joint coverage, it is unclear whether the net improvement survives the coupling step.
minor comments (2)
- [Abstract] The abstract states that the third leg is 'informative only where the risk margin γr < α and vacuous at the headline operating points'; a brief parenthetical clarifying the operating regime would aid readability.
- [Abstract] Dataset names (ImageNet, COCO val 2017, ADE20K) and model counts are given, but no table or figure reference is supplied in the abstract for the reported +22 pp and ≈10 imes tightness numbers.
Simulated Author's Rebuttal
We thank the referee for the careful reading and for identifying these points on the abstract's presentation of the joint certificate. The full derivations appear in the body of the manuscript; we address each comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract (construction paragraph): the claim that the three marginal bounds can be coupled to preserve a joint finite-sample validity guarantee after adaptive selection over the m-grid is load-bearing for the central result, yet the abstract supplies no derivation, union-bound accounting, or martingale/peeling argument showing that the dependence induced by selection does not inflate the failure probability beyond the nominal level. Marginal validity of each bound does not automatically transfer to the selected quantities.
Authors: The abstract is intentionally concise and therefore omits the full derivation. Joint validity after selection over the finite m-grid is established in Theorem 3.1 via a direct union bound over the m marginal bounds (one each for the ratio risk, acceptance probability, and utility). Because the grid is finite and fixed in advance, the union bound controls the joint failure probability at the nominal level without further inflation; no martingale or peeling argument is required. We are happy to insert a one-sentence pointer to this union-bound step in the abstract if the editor requests it. revision: partial
-
Referee: [Abstract] Abstract (relative to Hoeffding-ratio): the asserted sharpening from 1/pmin to 1/sqrt(pmin) dependence is presented as a direct consequence of the ratio treatment, but without an explicit accounting of any m-dependent loosening required for joint coverage, it is unclear whether the net improvement survives the coupling step.
Authors: The 1/sqrt(pmin) scaling is produced by the variance-adaptive empirical-Bernstein bound on the ratio (Lemma 3.2); the subsequent union bound over m contributes only an additive log(m) term that is independent of pmin. Consequently the improved pmin dependence is retained after coupling. This accounting is given explicitly in the proof of Theorem 3.1 and in the regime-scoped comparison of Section 4.2, including the closed-form dominance corollary (Corollary 4.1). revision: no
Circularity Check
No circularity; derivation couples independent standard bounds
full rationale
The claimed certificate is obtained by coupling three pre-existing finite-sample concentration inequalities (variance-adaptive empirical-Bernstein on the risk ratio, Clopper-Pearson on acceptance probability, and a two-sided closeness bound on utility) whose validity is external to the paper and does not rely on any fitted parameter, self-citation loop, or redefinition of the target quantity. The abstract states the construction explicitly and notes the improvement over a Hoeffding-range baseline without reducing the result to its inputs by construction. No load-bearing step matches any of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Losses are bounded
- standard math Empirical-Bernstein and Clopper-Pearson inequalities hold with the stated coverage
Reference graph
Works this paper leans on
-
[1]
On optimum recognition error and reject tradeoff,
C. K. Chow, “On optimum recognition error and reject tradeoff,”IEEE Transactions on Information Theory, vol. 16, no. 1, pp. 41–46, 1970
1970
-
[2]
On the foundations of noise-free selective classification,
R. El-Yaniv and Y . Wiener, “On the foundations of noise-free selective classification,”Journal of Machine Learning Research, vol. 11, pp. 1605–1641, 2010. [Online]. Available: https://jmlr.org/papers/v11/el-yaniv10a.html
2010
-
[3]
Selectivenet: A deep neural network with an integrated reject option,
Y . Geifman and R. El-Yaniv, “Selectivenet: A deep neural network with an integrated reject option,” in Proceedings of the 36th International Conference on Machine Learning (ICML), 2019. [Online]. Available: https://arxiv.org/abs/1901.09192
Pith/arXiv arXiv 2019
-
[4]
A. N. Angelopoulos, S. Bates, A. Fisch, L. Lei, and T. Schuster, “Conformal risk control,” 2022. [Online]. Available: https://arxiv.org/abs/2208.02814
arXiv 2022
-
[5]
Distribution-free, risk-controlling prediction sets,
S. Bates, A. N. Angelopoulos, L. Lei, J. Malik, and M. I. Jordan, “Distribution-free, risk-controlling prediction sets,” 2021. [Online]. Available: https://arxiv.org/abs/2101.02703
arXiv 2021
-
[6]
Selective conformal risk control,
Y . Xu, W. Guo, and Z. Wei, “Selective conformal risk control,” 2025, v1 Dec 2025; v2 Apr 2026. [Online]. Available: https://arxiv.org/abs/2512.12844
Pith/arXiv arXiv 2025
-
[7]
Conformal selective prediction with general risk control,
T. Bai and Y . Jin, “Conformal selective prediction with general risk control,” 2026. [Online]. Available: https://arxiv.org/abs/2603.24704
arXiv 2026
-
[8]
Conformal risk control for non-monotonic losses,
A. N. Angelopoulos, “Conformal risk control for non-monotonic losses,” 2026. [Online]. Available: https://arxiv.org/abs/2602.20151
arXiv 2026
-
[9]
Conformal risk control under non-monotone losses: Theory and finite-sample guarantees,
T. Aldirawi, Y . Li, and W. Guo, “Conformal risk control under non-monotone losses: Theory and finite-sample guarantees,” 2026. [Online]. Available: https://arxiv.org/abs/2604.01502
Pith/arXiv arXiv 2026
-
[10]
Time-uniform, nonparametric, nonasymptotic confidence sequences,
S. R. Howard, A. Ramdas, J. McAuliffe, and J. Sekhon, “Time-uniform, nonparametric, nonasymptotic confidence sequences,”The Annals of Statistics, vol. 49, no. 2, pp. 1055–1080, 2021. [Online]. Available: https://arxiv.org/abs/1810.08240
arXiv 2021
-
[11]
Learn then test: Calibrating predictive algorithms to achieve risk control,
A. N. Angelopoulos, S. Bates, E. J. Candès, M. I. Jordan, and L. Lei, “Learn then test: Calibrating predictive algorithms to achieve risk control,” 2021. [Online]. Available: https://arxiv.org/abs/2110.01052
arXiv 2021
-
[12]
Estimating means of bounded random variables by betting,
I. Waudby-Smith and A. Ramdas, “Estimating means of bounded random variables by betting,”Journal of the Royal Statistical Society Series B: Statistical Methodology, vol. 86, no. 1, pp. 1–27, 2024. [Online]. Available: https://arxiv.org/abs/2010.09686
arXiv 2024
-
[13]
Selective classification for deep neural networks,
Y . Geifman and R. El-Yaniv, “Selective classification for deep neural networks,” inAdvances in Neural Information Processing Systems (NeurIPS), 2017. [Online]. Available: https://arxiv.org/abs/1705.08500
Pith/arXiv arXiv 2017
-
[14]
Cross-validation conformal risk control,
K. M. Cohen, S. Park, O. Simeone, and S. Shamai, “Cross-validation conformal risk control,” 2024. [Online]. Available: https://arxiv.org/abs/2401.11974
arXiv 2024
-
[15]
Conformal risk control for ordinal classification,
Y . Xu, W. Guo, and Z. Wei, “Conformal risk control for ordinal classification,” 2024. [Online]. Available: https://arxiv.org/abs/2405.00417
arXiv 2024
-
[16]
Efficiently controlling multiple risks with pareto testing,
B. Laufer-Goldshtein, A. Fisch, R. Barzilay, and T. Jaakkola, “Efficiently controlling multiple risks with pareto testing,” 2022. [Online]. Available: https://arxiv.org/abs/2210.07913
arXiv 2022
-
[17]
Two-stage risk control with application to ranked retrieval,
Y . Xu, M. Ying, W. Guo, and Z. Wei, “Two-stage risk control with application to ranked retrieval,” 2024. [Online]. Available: https://arxiv.org/abs/2404.17769
arXiv 2024
-
[18]
Selective conformal inference with false coverage-statement rate control,
Y . Bao, Y . Huo, H. Ren, and C. Zou, “Selective conformal inference with false coverage-statement rate control,” Biometrika, vol. 111, no. 3, pp. 727–742, 2024. [Online]. Available: https://arxiv.org/abs/2301.00584
arXiv 2024
-
[19]
Confidence on the focal: Conformal prediction with selection-conditional coverage,
Y . Jin and Z. Ren, “Confidence on the focal: Conformal prediction with selection-conditional coverage,” 2024. [Online]. Available: https://arxiv.org/abs/2403.03868 23 SCORC: A Joint Finite-Sample CertificateA PREPRINT
arXiv 2024
-
[20]
Automatically adaptive conformal risk control,
V . Blot, A. N. Angelopoulos, M. I. Jordan, and N. J.-B. Brunel, “Automatically adaptive conformal risk control,”
-
[21]
Available: https://arxiv.org/abs/2406.17819
[Online]. Available: https://arxiv.org/abs/2406.17819
-
[22]
Online conformal abstention for factuality control under adversarial bandit feedback,
M. Lee, Y . Jung, and S. Park, “Online conformal abstention for factuality control under adversarial bandit feedback,” 2025. [Online]. Available: https://arxiv.org/abs/2506.14067
Pith/arXiv arXiv 2025
-
[23]
Anytime-valid conformal risk control,
B. Hultberg, D. Zachariah, and A. H. Ribeiro, “Anytime-valid conformal risk control,” 2026. [Online]. Available: https://arxiv.org/abs/2602.04364
arXiv 2026
-
[24]
Post-selection inference for e-value based confidence intervals,
Z. Xu, R. Wang, and A. Ramdas, “Post-selection inference for e-value based confidence intervals,”Electronic Journal of Statistics, vol. 18, no. 1, pp. 2292–2338, 2024. [Online]. Available: https://arxiv.org/abs/2203.12572
arXiv 2024
-
[25]
Non-exchangeable conformal risk control,
A. Farinhas, C. Zerva, D. Ulmer, and A. F. T. Martins, “Non-exchangeable conformal risk control,” 2023. [Online]. Available: https://arxiv.org/abs/2310.01262
arXiv 2023
-
[26]
Generalization and informativeness of weighted conformal risk control under covariate shift,
M. Zecchin, F. Hellström, S. Park, S. Shamai, and O. Simeone, “Generalization and informativeness of weighted conformal risk control under covariate shift,” 2025. [Online]. Available: https://arxiv.org/abs/2501.11413
arXiv 2025
-
[27]
Semi-supervised risk control via prediction-powered inference,
B.-S. Einbinder, L. Ringel, and Y . Romano, “Semi-supervised risk control via prediction-powered inference,”
-
[28]
Available: https://arxiv.org/abs/2412.11174
[Online]. Available: https://arxiv.org/abs/2412.11174
-
[29]
Multiply robust conformal risk control with coarsened data,
M. Paul, A. K. Kuchibhotla, and E. J. Tchetgen Tchetgen, “Multiply robust conformal risk control with coarsened data,” 2025. [Online]. Available: https://arxiv.org/abs/2508.15489
arXiv 2025
-
[30]
Do imagenet classifiers generalize to imagenet?
B. Recht, R. Roelofs, L. Schmidt, and V . Shankar, “Do imagenet classifiers generalize to imagenet?” in Proceedings of the 36th International Conference on Machine Learning (ICML), 2019. [Online]. Available: https://arxiv.org/abs/1902.10811
Pith/arXiv arXiv 2019
-
[31]
Masked-attention mask transformer for universal image segmentation,
B. Cheng, I. Misra, A. G. Schwing, A. Kirillov, and R. Girdhar, “Masked-attention mask transformer for universal image segmentation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022. [Online]. Available: https://arxiv.org/abs/2112.01527
arXiv 2022
-
[32]
Segformer: Simple and efficient design for semantic segmentation with transformers,
E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, “Segformer: Simple and efficient design for semantic segmentation with transformers,” inAdvances in Neural Information Processing Systems (NeurIPS),
-
[33]
Available: https://arxiv.org/abs/2105.15203 24
[Online]. Available: https://arxiv.org/abs/2105.15203 24
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.