pith. sign in

arxiv: 2606.08517 · v1 · pith:V2PLCTODnew · submitted 2026-06-07 · 💻 cs.LG · cs.CL

A Joint Finite-Sample Certificate for Adaptive Selective Conformal Risk Control

Pith reviewed 2026-06-27 18:42 UTC · model grok-4.3

classification 💻 cs.LG cs.CL
keywords selective conformal predictionfinite-sample certificateadaptive selectionrisk controlempirical Bernstein boundClopper-Pearson boundselective predictors
0
0 comments X

The pith

A certificate for selective predictors under adaptive selection bounds risk directly as a ratio to tighten finite-sample guarantees.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper constructs a single finite-sample certificate that simultaneously upper-bounds selected risk, lower-bounds acceptance probability above a given floor, and lower-bounds utility for selectively deployed predictors, even when thresholds are chosen adaptively from a finite grid. It achieves the joint guarantee for bounded possibly non-monotone losses by treating risk as a ratio and coupling a variance-adaptive empirical-Bernstein bound on that ratio, a Clopper-Pearson bound on acceptance, and a two-sided closeness bound on utility. A sympathetic reader would care because safe real-world deployment of abstaining models requires non-vacuous, finite-sample assurances on all three quantities at once rather than separate or asymptotic controls. The approach sharpens the dependence on the acceptance floor from linear in one over pmin to one over square root of pmin relative to prior range-based methods.

Core claim

We give such a certificate for bounded, possibly non-monotone losses by treating the selected risk directly as a ratio rather than through a Hoeffding-style range bound. The construction couples three confidence bounds: a variance-adaptive empirical-Bernstein bound on the ratio risk, a Clopper--Pearson bound on acceptance, and a two-sided closeness bound on utility. Together they lower-bound the certified policy's utility absolutely and to within 2γu of the best over the certified set, both non-vacuous whenever feasible.

What carries the argument

Coupled triple of confidence bounds consisting of a variance-adaptive empirical-Bernstein bound on the selected risk ratio, a Clopper-Pearson bound on acceptance probability, and a two-sided closeness bound on utility.

If this is right

  • The certificate remains valid for adaptive selection of threshold pairs from any finite grid of size m on a calibration set of size ncert.
  • It simultaneously provides non-vacuous absolute and relative (within 2γu) lower bounds on utility whenever the risk and acceptance conditions are feasible.
  • The risk bound improves the acceptance-floor dependence from 1/pmin to 1/sqrt(pmin) compared with range-only Hoeffding-ratio constructions.
  • A closed-form per-pair regime exists in which the new risk bound is tighter than the corresponding Hoeffding conformal risk control bound.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The coupling technique could be reused to produce joint finite-sample certificates for other combinations of risk, coverage, and utility metrics in conformal settings.
  • Empirical gains observed on ImageNet and COCO but absent on ADE20K indicate that effectiveness depends on the underlying risk-margin regime relative to alpha.
  • The finite-grid restriction suggests that extensions to continuous parameter spaces would need new concentration arguments to retain exact finite-sample validity.

Load-bearing premise

The three individual confidence bounds can be coupled while preserving a joint finite-sample validity guarantee under adaptive selection from a finite grid.

What would settle it

A Monte Carlo experiment in which the empirical joint coverage rate of the three coupled bounds falls materially below the nominal level across repeated draws with adaptive grid selection on held-out data would falsify the claimed validity.

Figures

Figures reproduced from arXiv: 2606.08517 by Jiamiao Liu, Xiaoli Yu.

Figure 1
Figure 1. Figure 1: The joint certificate at a glance: problem, mechanism, and result. (a) Problem. A deployable selective predictor must jointly certify selected risk (Rsel ≤ α), acceptance (pacc ≥ πmin), and utility (Udep). Under the range-only Hoeffding-ratio construction the risk margin is wide enough that the certifiable risk floor can rise above the acceptance ceiling, so no operating point is jointly certifiable and th… view at source ↗
Figure 2
Figure 2. Figure 2: Closed-form width scaling of the joint certificate (proof expressions only; no calibration data). Panel (a) is at [PITH_FULL_IMAGE:figures/full_fig_p014_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Realised test-risk margins across six surfaces, an empirical risk-side sanity check complementing Table 4. [PITH_FULL_IMAGE:figures/full_fig_p015_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Certified risk–acceptance operating frontier on three ImageNet backbones (α = 0.05, πmin = 0.01, δ = 0.05, ncert = 33,000; markers are medians over 20 calibration splits). For each τ -induced acceptance tier we plot the best (lowest-UCB) λ among the same 35 grid pairs; the envelope is built identically for every method and the connecting lines are visual guides only (the tiers are discrete). The y-value is… view at source ↗
Figure 5
Figure 5. Figure 5: Variance-adaptive payoff diagnostic. (a) At the deployment floor πmin = 0.01 (ncert = 33,000, δ = 0.05, grid m = 35, loss range B = 1), the per-pair Rsel certified-width ratio Ours/A(πmin) (each point one of 20 seeds × 35 grid pairs per backbone, width the one-sided upper radius on Rsel; since A(πmin) is constant, median-of-ratios, ratio-of-medians and 1/median coincide) lies entirely below 1 across three … view at source ↗
Figure 6
Figure 6. Figure 6: Certified acceptance on COCO val 2017 panoptic segmentation (Mask2Former–Swin-B; image-level acceptance score g; pixel-accuracy loss L = 1 − per-pixel accuracy; α = πmin = δ = 0.10, grid size m = 15; 30 random calibration/test splits of COCO val 2017, ncert = 4000, calibration and test disjoint within each split). (a) Certified acceptance under the softmax score g. Our joint certifier is feasible on 29/30 … view at source ↗
read the original abstract

Selective predictors answer on confident inputs and abstain elsewhere; deploying one safely needs a single finite-sample certificate that simultaneously upper-bounds the selected risk, lower-bounds the acceptance probability $\pacc$ above a floor $\pmin$, and lower-bounds the deployment utility. This certificate must be valid under adaptive threshold selection from a finite grid of $m$ pairs on $\ncert$ samples. We give such a certificate for bounded, possibly non-monotone losses by treating the selected risk directly as a ratio rather than through a Hoeffding-style range bound. The construction couples three confidence bounds: a variance-adaptive empirical-Bernstein bound on the ratio risk, a Clopper--Pearson bound on acceptance, and a two-sided closeness bound on utility. Together they lower-bound the certified policy's utility absolutely and to within $2\gammau$ of the best over the \emph{certified set}, both non-vacuous whenever feasible; a regime-scoped third leg matches an external oracle, informative only where the risk margin $\gammar < \alpha$ and vacuous at the headline operating points. Relative to the range-only Hoeffding-ratio construction this sharpens the acceptance-floor dependence from $1/\pmin$ to $1/\sqrt{\pmin}$, and a closed-form corollary identifies a per-pair regime in which our risk bound dominates a Hoeffding conformal risk control (Hoeffding--CRC) selective bound. Empirically, on ImageNet (three ResNets) and COCO val 2017 panoptic, the certificate opens a $+22$ pp certified-acceptance frontier over Hoeffding--CRC and is ${\approx}10{\times}$ tighter than a non-vacuous matched-valid baseline; these gains are regime-scoped, not universal, and absent on ADE20K. The certifier runs in $O(\ncert m)$ time.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims a joint finite-sample certificate for adaptive selective conformal risk control on bounded losses. It constructs the certificate by treating selected risk as a ratio and coupling a variance-adaptive empirical-Bernstein bound on the ratio, a Clopper-Pearson bound on acceptance probability above pmin, and a two-sided closeness bound on utility. The certificate is asserted to remain valid under data-dependent selection of one pair from a finite grid of m candidates on ncert samples, to deliver a 1/sqrt(pmin) dependence instead of 1/pmin, to lower-bound absolute and relative utility, and to yield empirical gains of +22 pp certified acceptance on ImageNet and COCO while running in O(ncert m) time.

Significance. If the joint validity argument holds, the construction supplies a materially tighter, non-vacuous finite-sample guarantee for safe deployment of selective predictors than existing Hoeffding-ratio or Hoeffding-CRC baselines. The explicit regime-scoped comparison to Hoeffding-CRC and the closed-form per-pair dominance corollary are concrete strengths; the empirical demonstration on three ResNets and panoptic segmentation further indicates practical utility when the risk margin condition is met.

major comments (2)
  1. [Abstract] Abstract (construction paragraph): the claim that the three marginal bounds can be coupled to preserve a joint finite-sample validity guarantee after adaptive selection over the m-grid is load-bearing for the central result, yet the abstract supplies no derivation, union-bound accounting, or martingale/peeling argument showing that the dependence induced by selection does not inflate the failure probability beyond the nominal level. Marginal validity of each bound does not automatically transfer to the selected quantities.
  2. [Abstract] Abstract (relative to Hoeffding-ratio): the asserted sharpening from 1/pmin to 1/sqrt(pmin) dependence is presented as a direct consequence of the ratio treatment, but without an explicit accounting of any m-dependent loosening required for joint coverage, it is unclear whether the net improvement survives the coupling step.
minor comments (2)
  1. [Abstract] The abstract states that the third leg is 'informative only where the risk margin γr < α and vacuous at the headline operating points'; a brief parenthetical clarifying the operating regime would aid readability.
  2. [Abstract] Dataset names (ImageNet, COCO val 2017, ADE20K) and model counts are given, but no table or figure reference is supplied in the abstract for the reported +22 pp and ≈10 imes tightness numbers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and for identifying these points on the abstract's presentation of the joint certificate. The full derivations appear in the body of the manuscript; we address each comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract (construction paragraph): the claim that the three marginal bounds can be coupled to preserve a joint finite-sample validity guarantee after adaptive selection over the m-grid is load-bearing for the central result, yet the abstract supplies no derivation, union-bound accounting, or martingale/peeling argument showing that the dependence induced by selection does not inflate the failure probability beyond the nominal level. Marginal validity of each bound does not automatically transfer to the selected quantities.

    Authors: The abstract is intentionally concise and therefore omits the full derivation. Joint validity after selection over the finite m-grid is established in Theorem 3.1 via a direct union bound over the m marginal bounds (one each for the ratio risk, acceptance probability, and utility). Because the grid is finite and fixed in advance, the union bound controls the joint failure probability at the nominal level without further inflation; no martingale or peeling argument is required. We are happy to insert a one-sentence pointer to this union-bound step in the abstract if the editor requests it. revision: partial

  2. Referee: [Abstract] Abstract (relative to Hoeffding-ratio): the asserted sharpening from 1/pmin to 1/sqrt(pmin) dependence is presented as a direct consequence of the ratio treatment, but without an explicit accounting of any m-dependent loosening required for joint coverage, it is unclear whether the net improvement survives the coupling step.

    Authors: The 1/sqrt(pmin) scaling is produced by the variance-adaptive empirical-Bernstein bound on the ratio (Lemma 3.2); the subsequent union bound over m contributes only an additive log(m) term that is independent of pmin. Consequently the improved pmin dependence is retained after coupling. This accounting is given explicitly in the proof of Theorem 3.1 and in the regime-scoped comparison of Section 4.2, including the closed-form dominance corollary (Corollary 4.1). revision: no

Circularity Check

0 steps flagged

No circularity; derivation couples independent standard bounds

full rationale

The claimed certificate is obtained by coupling three pre-existing finite-sample concentration inequalities (variance-adaptive empirical-Bernstein on the risk ratio, Clopper-Pearson on acceptance probability, and a two-sided closeness bound on utility) whose validity is external to the paper and does not rely on any fitted parameter, self-citation loop, or redefinition of the target quantity. The abstract states the construction explicitly and notes the improvement over a Hoeffding-range baseline without reducing the result to its inputs by construction. No load-bearing step matches any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Review performed on abstract only; the paper invokes standard statistical concentration tools whose assumptions are not re-derived here.

axioms (2)
  • domain assumption Losses are bounded
    Required for the empirical-Bernstein and related bounds to apply to the ratio risk.
  • standard math Empirical-Bernstein and Clopper-Pearson inequalities hold with the stated coverage
    Used directly to construct the three component bounds.

pith-pipeline@v0.9.1-grok · 5880 in / 1390 out tokens · 22857 ms · 2026-06-27T18:42:30.673216+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

33 extracted references · 6 linked inside Pith

  1. [1]

    On optimum recognition error and reject tradeoff,

    C. K. Chow, “On optimum recognition error and reject tradeoff,”IEEE Transactions on Information Theory, vol. 16, no. 1, pp. 41–46, 1970

  2. [2]

    On the foundations of noise-free selective classification,

    R. El-Yaniv and Y . Wiener, “On the foundations of noise-free selective classification,”Journal of Machine Learning Research, vol. 11, pp. 1605–1641, 2010. [Online]. Available: https://jmlr.org/papers/v11/el-yaniv10a.html

  3. [3]

    Selectivenet: A deep neural network with an integrated reject option,

    Y . Geifman and R. El-Yaniv, “Selectivenet: A deep neural network with an integrated reject option,” in Proceedings of the 36th International Conference on Machine Learning (ICML), 2019. [Online]. Available: https://arxiv.org/abs/1901.09192

  4. [4]

    Conformal risk control,

    A. N. Angelopoulos, S. Bates, A. Fisch, L. Lei, and T. Schuster, “Conformal risk control,” 2022. [Online]. Available: https://arxiv.org/abs/2208.02814

  5. [5]

    Distribution-free, risk-controlling prediction sets,

    S. Bates, A. N. Angelopoulos, L. Lei, J. Malik, and M. I. Jordan, “Distribution-free, risk-controlling prediction sets,” 2021. [Online]. Available: https://arxiv.org/abs/2101.02703

  6. [6]

    Selective conformal risk control,

    Y . Xu, W. Guo, and Z. Wei, “Selective conformal risk control,” 2025, v1 Dec 2025; v2 Apr 2026. [Online]. Available: https://arxiv.org/abs/2512.12844

  7. [7]

    Conformal selective prediction with general risk control,

    T. Bai and Y . Jin, “Conformal selective prediction with general risk control,” 2026. [Online]. Available: https://arxiv.org/abs/2603.24704

  8. [8]

    Conformal risk control for non-monotonic losses,

    A. N. Angelopoulos, “Conformal risk control for non-monotonic losses,” 2026. [Online]. Available: https://arxiv.org/abs/2602.20151

  9. [9]

    Conformal risk control under non-monotone losses: Theory and finite-sample guarantees,

    T. Aldirawi, Y . Li, and W. Guo, “Conformal risk control under non-monotone losses: Theory and finite-sample guarantees,” 2026. [Online]. Available: https://arxiv.org/abs/2604.01502

  10. [10]

    Time-uniform, nonparametric, nonasymptotic confidence sequences,

    S. R. Howard, A. Ramdas, J. McAuliffe, and J. Sekhon, “Time-uniform, nonparametric, nonasymptotic confidence sequences,”The Annals of Statistics, vol. 49, no. 2, pp. 1055–1080, 2021. [Online]. Available: https://arxiv.org/abs/1810.08240

  11. [11]

    Learn then test: Calibrating predictive algorithms to achieve risk control,

    A. N. Angelopoulos, S. Bates, E. J. Candès, M. I. Jordan, and L. Lei, “Learn then test: Calibrating predictive algorithms to achieve risk control,” 2021. [Online]. Available: https://arxiv.org/abs/2110.01052

  12. [12]

    Estimating means of bounded random variables by betting,

    I. Waudby-Smith and A. Ramdas, “Estimating means of bounded random variables by betting,”Journal of the Royal Statistical Society Series B: Statistical Methodology, vol. 86, no. 1, pp. 1–27, 2024. [Online]. Available: https://arxiv.org/abs/2010.09686

  13. [13]

    Selective classification for deep neural networks,

    Y . Geifman and R. El-Yaniv, “Selective classification for deep neural networks,” inAdvances in Neural Information Processing Systems (NeurIPS), 2017. [Online]. Available: https://arxiv.org/abs/1705.08500

  14. [14]

    Cross-validation conformal risk control,

    K. M. Cohen, S. Park, O. Simeone, and S. Shamai, “Cross-validation conformal risk control,” 2024. [Online]. Available: https://arxiv.org/abs/2401.11974

  15. [15]

    Conformal risk control for ordinal classification,

    Y . Xu, W. Guo, and Z. Wei, “Conformal risk control for ordinal classification,” 2024. [Online]. Available: https://arxiv.org/abs/2405.00417

  16. [16]

    Efficiently controlling multiple risks with pareto testing,

    B. Laufer-Goldshtein, A. Fisch, R. Barzilay, and T. Jaakkola, “Efficiently controlling multiple risks with pareto testing,” 2022. [Online]. Available: https://arxiv.org/abs/2210.07913

  17. [17]

    Two-stage risk control with application to ranked retrieval,

    Y . Xu, M. Ying, W. Guo, and Z. Wei, “Two-stage risk control with application to ranked retrieval,” 2024. [Online]. Available: https://arxiv.org/abs/2404.17769

  18. [18]

    Selective conformal inference with false coverage-statement rate control,

    Y . Bao, Y . Huo, H. Ren, and C. Zou, “Selective conformal inference with false coverage-statement rate control,” Biometrika, vol. 111, no. 3, pp. 727–742, 2024. [Online]. Available: https://arxiv.org/abs/2301.00584

  19. [19]

    Confidence on the focal: Conformal prediction with selection-conditional coverage,

    Y . Jin and Z. Ren, “Confidence on the focal: Conformal prediction with selection-conditional coverage,” 2024. [Online]. Available: https://arxiv.org/abs/2403.03868 23 SCORC: A Joint Finite-Sample CertificateA PREPRINT

  20. [20]

    Automatically adaptive conformal risk control,

    V . Blot, A. N. Angelopoulos, M. I. Jordan, and N. J.-B. Brunel, “Automatically adaptive conformal risk control,”

  21. [21]

    Available: https://arxiv.org/abs/2406.17819

    [Online]. Available: https://arxiv.org/abs/2406.17819

  22. [22]

    Online conformal abstention for factuality control under adversarial bandit feedback,

    M. Lee, Y . Jung, and S. Park, “Online conformal abstention for factuality control under adversarial bandit feedback,” 2025. [Online]. Available: https://arxiv.org/abs/2506.14067

  23. [23]

    Anytime-valid conformal risk control,

    B. Hultberg, D. Zachariah, and A. H. Ribeiro, “Anytime-valid conformal risk control,” 2026. [Online]. Available: https://arxiv.org/abs/2602.04364

  24. [24]

    Post-selection inference for e-value based confidence intervals,

    Z. Xu, R. Wang, and A. Ramdas, “Post-selection inference for e-value based confidence intervals,”Electronic Journal of Statistics, vol. 18, no. 1, pp. 2292–2338, 2024. [Online]. Available: https://arxiv.org/abs/2203.12572

  25. [25]

    Non-exchangeable conformal risk control,

    A. Farinhas, C. Zerva, D. Ulmer, and A. F. T. Martins, “Non-exchangeable conformal risk control,” 2023. [Online]. Available: https://arxiv.org/abs/2310.01262

  26. [26]

    Generalization and informativeness of weighted conformal risk control under covariate shift,

    M. Zecchin, F. Hellström, S. Park, S. Shamai, and O. Simeone, “Generalization and informativeness of weighted conformal risk control under covariate shift,” 2025. [Online]. Available: https://arxiv.org/abs/2501.11413

  27. [27]

    Semi-supervised risk control via prediction-powered inference,

    B.-S. Einbinder, L. Ringel, and Y . Romano, “Semi-supervised risk control via prediction-powered inference,”

  28. [28]

    Available: https://arxiv.org/abs/2412.11174

    [Online]. Available: https://arxiv.org/abs/2412.11174

  29. [29]

    Multiply robust conformal risk control with coarsened data,

    M. Paul, A. K. Kuchibhotla, and E. J. Tchetgen Tchetgen, “Multiply robust conformal risk control with coarsened data,” 2025. [Online]. Available: https://arxiv.org/abs/2508.15489

  30. [30]

    Do imagenet classifiers generalize to imagenet?

    B. Recht, R. Roelofs, L. Schmidt, and V . Shankar, “Do imagenet classifiers generalize to imagenet?” in Proceedings of the 36th International Conference on Machine Learning (ICML), 2019. [Online]. Available: https://arxiv.org/abs/1902.10811

  31. [31]

    Masked-attention mask transformer for universal image segmentation,

    B. Cheng, I. Misra, A. G. Schwing, A. Kirillov, and R. Girdhar, “Masked-attention mask transformer for universal image segmentation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022. [Online]. Available: https://arxiv.org/abs/2112.01527

  32. [32]

    Segformer: Simple and efficient design for semantic segmentation with transformers,

    E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, “Segformer: Simple and efficient design for semantic segmentation with transformers,” inAdvances in Neural Information Processing Systems (NeurIPS),

  33. [33]

    Available: https://arxiv.org/abs/2105.15203 24

    [Online]. Available: https://arxiv.org/abs/2105.15203 24