pith. machine review for the scientific record. sign in

arxiv: 2605.14260 · v1 · submitted 2026-05-14 · 📊 stat.ML · cs.LG

Recognition: 2 theorem links

· Lean Theorem

On the Burden of Achieving Fairness in Conformal Prediction

Authors on Pith no claims yet

Pith reviewed 2026-05-15 02:35 UTC · model grok-4.3

classification 📊 stat.ML cs.LG
keywords conformal predictionfairnesscoverage distortionpooled calibrationequalized coveragequantile heterogeneityprediction setssplit conformal
0
0 comments X

The pith

Pooled calibration in conformal prediction creates irreducible coverage distortion across groups set by quantile heterogeneity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that calibrating conformal predictors with one shared threshold hides differences in how scores are distributed across groups. It derives a conservation law proving that this shared threshold must produce coverage rates that deviate from the target for some groups, with the size of the deviation fixed by how much the groups' score quantiles differ. It further establishes that the two common fairness goals of equal coverage across groups and equal prediction-set sizes are incompatible under a single policy. These results matter because conformal methods are meant to deliver reliable uncertainty estimates, yet pooled calibration forces practitioners to accept distortion in either reliability or set size. Experiments confirm the same trade-off persists in finite samples on both synthetic and real data.

Core claim

Pooled calibration incurs irreducible group-wise coverage distortion at a scale set by cross-group quantile heterogeneity. The two leading fairness definitions, equalized coverage and equalized set size, are in fundamental tension. The choice between treating groups separately or pooling them determines whether the resulting distortion appears in the coverage or the size dimension.

What carries the argument

A conservation law relating pooled and group-wise coverage probabilities derived from the population score distributions in split conformal prediction.

Load-bearing premise

The derivations rely on population-level score distributions for each group being well-defined and independent of the training process.

What would settle it

Measure group-wise coverage and average set sizes on a dataset with known score distributions under both pooled and separate calibration; if the observed coverage gaps exactly match the quantile-heterogeneity lower bound (within finite-sample error), the bound holds, while systematic deviation would falsify it.

Figures

Figures reproduced from arXiv: 2605.14260 by Archer Yi Yang, Jesse C. Cresswell, Masoud Asgharian, Mouloud Belbahri, Pengqi Liu, Ziang Gao.

Figure 1
Figure 1. Figure 1: Bidirectional policy conversion in the synthetic study. Panels A–B illustrate the coverage [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Two-group Gaussian pooled-threshold [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Bias in Bios mechanism view at α = 0.1 for the simple score. Panel A illustrates the pooled-threshold mechanism in Theorem 1; Panels B–C illustrate Theorems 3–4 and Corollaries 1–2; Panel D summarizes the three distortions (Theorem 2, Corollaries 1–2) for male and female groups [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: MultiNLI at α = 0.1 with simple (left) and RAPS (right) scores. For each score, Panel A shows signed coverage distortion Fˆ S|g(ˆq) − (1 − α) under pooled threshold (Theorem 1): positive bars indicate over-coverage and negative bars indicate under-coverage. Panel B shows the signed change in expected set size ˆℓg(ˆqg) − ˆℓg(ˆq) after switching to group-wise thresholds that equalize coverage (Corollary 1). … view at source ↗
Figure 5
Figure 5. Figure 5: FACET at α = 0.1 with the RAPS score; Panel A illustrates Theorem 1, and Pan￾els B–C illustrate Corollaries 1–2. We next show the same mechanisms on FACET [13] using the RAPS score on the age group split (Younger, Middle, Older, Unknown) with a zero-shot CLIP ViT-L/14 classifier [19] [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Four multi-group families. Across all four score families, the empirical RMS miscoverage [PITH_FULL_IMAGE:figures/full_fig_p022_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Imbalanced four-group pooled-threshold diagnostics. Across all four families, the weighted [PITH_FULL_IMAGE:figures/full_fig_p024_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Bias in Bios mechanism view at α = 0.10 for SAPS score. Panel A illustrates the pooled-threshold mechanism in Theorem 1; Panels B–C illustrate Theorems 3– 4 and Corollaries 1–2; Panel D summarizes the three distortions for male and female groups. 0.0 0.5 1.0 1.5 2.0 True-label RAPS nonconformity score 0.0 0.2 0.4 0.6 0.8 1.0 Empirical CDF A. Calibration ECDFs and thresholds Male Female Pooled threshold Tar… view at source ↗
Figure 9
Figure 9. Figure 9: Bias in Bios mechanism view at α = 0.10 for RAPS score. Panel A illustrates the pooled-threshold mechanism in Theorem 1; Panels B–C illustrate Theorems 3– 4 and Corollaries 1–2; Panel D summarizes the three distortions for male and female groups. 26 [PITH_FULL_IMAGE:figures/full_fig_p026_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Finite-calibration detectability diagnostics of the pooled-threshold floor from Theorem [PITH_FULL_IMAGE:figures/full_fig_p030_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: shows that across α ∈ {0.05, 0.07, 0.085, 0.10}, the empirical pooled-threshold distortion stays above the estimated lower-bound scale, while the induced size and coverage distortions remain nonzero throughout. E.3 Controlled Genre Temperature Sweep At α = 0.1, we perturb only the facetoface genre via temperature scaling [PITH_FULL_IMAGE:figures/full_fig_p031_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Controlled MultiNLI temperature sweep at α = 0.10 for the simple score, perturbing the facetoface genre only. Panel A corresponds to Theorem 2. Panels B–C show Corollaries 1–2. The same trade-off mechanism remains visible across the temperature sweep. -0.024 0.000 0.024 Coverage distortion Government Oup Verbatim Travel Letters Facetoface Fiction Telephone Slate Nineeleven A Pooled threshold -0.141 0.000 … view at source ↗
Figure 13
Figure 13. Figure 13: MultiNLI at α = 0.10 using SAPS score. For this score, Panel A shows pooled quantile consequence of Theorem 1; Panels B and C illustrate the set size distortion in Corollary 1, and the coverage distortion in Corollary 2, respectively. 33 [PITH_FULL_IMAGE:figures/full_fig_p033_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: MultiNLI robustness across α for SAPS (top) and RAPS (bottom) scores. In each row, Panel A is best read as a finite-sample diagnostic for Theorem 2 based on an estimated lower-bound proxy, rather than a pointwise lower-bound verification. Panels B–C show Corollaries 1–2. The induced set-size and coverage distortions remain visible across the tested α-grid. 34 [PITH_FULL_IMAGE:figures/full_fig_p034_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Controlled MultiNLI temperature sweep at α = 0.10 for SAPS (top) and RAPS (bottom) scores, perturbing the facetoface genre only. Panel A is best read as a finite-sample diagnostic for Theorem 2. Panels B–C illustrate Corollaries 1–2. The same trade-off mechanism remains visible across the temperature sweep. 50 100 150 200 250 300 350 400 450500 Calibration size 1.0 1.5 2.0 2.5 SNR = true floor / sd(empiri… view at source ↗
Figure 16
Figure 16. Figure 16: Finite-calibration detectability of the pooled-threshold floor from Theorem [PITH_FULL_IMAGE:figures/full_fig_p035_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: shows that for various target levels α ∈ {0.05, 0.07, 0.085, 0.10}, the empirical pooled-threshold distortion stays above the empirical lower bound. The induced set size distortion remains nonzero throughout and the equalized expected set size policy continues to produce a nonzero RMS coverage distortion. Next, we perturb only the Younger group through temperature scaling while keeping the rest of the gro… view at source ↗
Figure 18
Figure 18. Figure 18: Controlled FACET temperature sweep at α = 0.10 for the RAPS score, perturbing the Younger group only. Panel A evaluates the lower-bound behavior in Theorem 2. Panels B–C show Corollaries 1–2. The same trade-off mechanism remains visible across the temperature sweep. 36 [PITH_FULL_IMAGE:figures/full_fig_p036_18.png] view at source ↗
read the original abstract

Conformal prediction is often calibrated with a single pooled threshold, but this can hide cross-group heterogeneity in score distributions and distort group-wise coverage. We study this phenomenon through the population score distributions underlying split conformal calibration. First, we derive a conservation law and lower bound showing that pooled calibration incurs irreducible group-wise coverage distortion at a scale set by cross-group quantile heterogeneity. Second, we demonstrate that the two leading fairness definitions for conformal prediction, Equalized Coverage and Equalized Set Size, are fundamentally in tension. Third, we quantify the cost of moving between policies which treat groups separately or pool them. Experiments on synthetic and real data confirm the same bidirectional trade-off after finite-sample calibration. Our results show that, for the policy families studied here, calibration choice does not remove cross-group heterogeneity; it determines whether the resulting distortion appears in the coverage or size dimension, providing a principled lens for analyzing fairness-oriented calibration choices in practice.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that pooled calibration in split conformal prediction incurs an irreducible group-wise coverage distortion whose scale is set by cross-group quantile heterogeneity, as formalized by a derived population-level conservation law and lower bound. It further shows that Equalized Coverage and Equalized Set Size are in fundamental tension, quantifies the cost of moving between separate-group and pooled policies, and validates the bidirectional coverage/size trade-off on synthetic and real data after finite-sample calibration.

Significance. If the central derivations hold, the work supplies a principled population-level explanation for why fairness-oriented calibration choices in conformal prediction merely relocate rather than eliminate cross-group heterogeneity. This is a useful lens for practitioners and offers falsifiable predictions about the location of distortion under different policies.

major comments (2)
  1. [§3] §3 (conservation law and lower bound): The derivations start from population score distributions and produce an irreducible term controlled by quantile heterogeneity. However, the finite-sample experiments in §4 replace population quantiles with empirical ones computed on finite calibration sets per group; the manuscript does not isolate or bound the additive estimation error component separately from the claimed population term, leaving open whether observed distortions are dominated by the irreducible heterogeneity or by finite-sample artifacts.
  2. [§4] §4 (experiments): The synthetic and real-data results demonstrate the bidirectional trade-off, but without controls that vary calibration-set size while holding population heterogeneity fixed, or that report separate estimates of the population component, it is difficult to confirm that the population conservation law dominates the observed finite-sample distortions as asserted in the abstract.
minor comments (2)
  1. [Introduction] The transition from the population derivations to the finite-sample setting could be stated more explicitly in the introduction or §2 to clarify how the lower bound is expected to manifest after empirical quantile estimation.
  2. [§3] Notation for group-wise quantiles and coverage deviations is introduced in §3 but could be summarized in a single table for quick reference when reading the experimental results.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and for highlighting the distinction between the population-level derivations and the finite-sample experiments. We respond to each major comment below and indicate the revisions we will make.

read point-by-point responses
  1. Referee: [§3] §3 (conservation law and lower bound): The derivations start from population score distributions and produce an irreducible term controlled by quantile heterogeneity. However, the finite-sample experiments in §4 replace population quantiles with empirical ones computed on finite calibration sets per group; the manuscript does not isolate or bound the additive estimation error component separately from the claimed population term, leaving open whether observed distortions are dominated by the irreducible heterogeneity or by finite-sample artifacts.

    Authors: The derivations in §3 are explicitly population-level and yield an exact conservation law together with a lower bound on coverage distortion that depends only on cross-group quantile heterogeneity. The §4 experiments are intended to show that the same qualitative bidirectional trade-off appears once the population quantiles are replaced by their finite-sample conformal estimates. We acknowledge that the manuscript does not supply a separate analytic bound on the quantile estimation error. In the revision we will add a short paragraph in §4 (and a corresponding remark in the appendix) that (i) recalls the known consistency of conformal quantiles, (ii) notes that the observed distortions remain aligned in sign and approximate magnitude with the population lower bound even for moderate calibration sizes, and (iii) states that a full finite-sample decomposition is left for future work. This clarifies the relationship without altering the central claims. revision: partial

  2. Referee: [§4] §4 (experiments): The synthetic and real-data results demonstrate the bidirectional trade-off, but without controls that vary calibration-set size while holding population heterogeneity fixed, or that report separate estimates of the population component, it is difficult to confirm that the population conservation law dominates the observed finite-sample distortions as asserted in the abstract.

    Authors: The current experiments fix calibration-set sizes that are typical in practice and already vary effective sample sizes across groups in the synthetic design. While we agree that an explicit sweep over calibration size (holding the underlying score distributions fixed) would strengthen the isolation of the population term, the existing results are consistent with the population predictions across both synthetic and real data. In the revision we will add a supplementary figure that varies calibration-set size on the synthetic data and overlays the empirical distortion against the population lower bound; this will make the convergence behavior explicit and address the concern directly. revision: yes

Circularity Check

0 steps flagged

Derivation from population score distributions is self-contained

full rationale

The paper's central derivation begins from assumed population-level score distributions per group and derives a conservation law plus lower bound on coverage distortion driven by cross-group quantile heterogeneity. This step is a direct mathematical consequence of the definitions of pooled vs. group-wise quantiles and does not reduce to fitted parameters, self-referential quantities, or prior self-citations. Subsequent claims about tension between Equalized Coverage and Equalized Set Size follow from the same population quantities. Finite-sample experiments are presented as separate empirical confirmation rather than part of the derivation. No load-bearing step matches any of the enumerated circularity patterns; the population analysis stands independently of the finite-sample calibration details.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on standard conformal prediction assumptions about score distributions and calibration; no free parameters or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption Population score distributions exist and are distinct across groups
    Invoked to derive the conservation law and lower bound on coverage distortion.

pith-pipeline@v0.9.0 · 5473 in / 1190 out tokens · 27103 ms · 2026-05-15T02:35:56.558120+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages · 1 internal anchor

  1. [1]

    Uncer- tainty sets for image classifiers using conformal prediction

    Anastasios Nikolas Angelopoulos, Stephen Bates, Michael Jordan, and Jitendra Malik. Uncer- tainty sets for image classifiers using conformal prediction. InInternational Conference on Learning Representations, 2021

  2. [2]

    A Convex Loss Function for Set Prediction with Optimal Trade-offs Between Size and Conditional Coverage.arXiv:2512.19142, 2025

    Francis Bach. A Convex Loss Function for Set Prediction with Optimal Trade-offs Between Size and Conditional Coverage.arXiv:2512.19142, 2025

  3. [3]

    Fair prediction with disparate impact: A study of bias in recidivism prediction instruments.Big data, 5(2):153–163, 2017

    Alexandra Chouldechova. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments.Big data, 5(2):153–163, 2017. doi: 10.1089/big.2016.0047

  4. [4]

    Cresswell, Yi Sui, Bhargava Kumar, and Noël V ouitsis

    Jesse C. Cresswell, Yi Sui, Bhargava Kumar, and Noël V ouitsis. Conformal prediction sets improve human decision making. InProceedings of the 41st International Conference on Machine Learning, 2024

  5. [5]

    Cresswell, Bhargava Kumar, Yi Sui, and Mouloud Belbahri

    Jesse C. Cresswell, Bhargava Kumar, Yi Sui, and Mouloud Belbahri. Conformal prediction sets can cause disparate impact. InThe Thirteenth International Conference on Learning Representations, 2025

  6. [6]

    Bias in Bios: A Case Study of Semantic Representation Bias in a High-Stakes Setting

    Maria De-Arteaga, Alexey Romanov, Hanna Wallach, Jennifer Chayes, Christian Borgs, Alexan- dra Chouldechova, Sahin Geyik, Krishnaram Kenthapadi, and Adam Tauman Kalai. Bias in Bios: A Case Study of Semantic Representation Bias in a High-Stakes Setting. InProceedings of the Conference on Fairness, Accountability, and Transparency, page 120–128, 2019. ISBN ...

  7. [7]

    Asymptotic minimax character of the sam- ple distribution function and of the classical multinomial estimator.The Annals of Mathematical Statistics, pages 642–669, 1956

    Aryeh Dvoretzky, Jack Kiefer, and Jacob Wolfowitz. Asymptotic minimax character of the sam- ple distribution function and of the classical multinomial estimator.The Annals of Mathematical Statistics, pages 642–669, 1956

  8. [8]

    The limits of distribution-free conditional predictive inference.Information and Inference: A Journal of the IMA, 10(2):455–482, 2021

    Rina Foygel Barber, Emmanuel J Candès, Aaditya Ramdas, and Ryan J Tibshirani. The limits of distribution-free conditional predictive inference.Information and Inference: A Journal of the IMA, 10(2):455–482, 2021

  9. [9]

    De finetti’s theorem and related results for infinite weighted exchangeable sequences.Bernoulli, 30(4): 3004–3028, 2024

    Rina Foygel Barber, Emmanuel J Candès, Aaditya Ramdas, and Ryan J Tibshirani. De finetti’s theorem and related results for infinite weighted exchangeable sequences.Bernoulli, 30(4): 3004–3028, 2024

  10. [10]

    V olume optimality in conformal prediction with structured prediction sets

    Chao Gao, Liren Shan, Vaidehi Srinivas, and Aravindan Vijayaraghavan. V olume optimality in conformal prediction with structured prediction sets. InProceedings of the 42nd International Conference on Machine Learning, volume 267, pages 18495–18527, 2025

  11. [11]

    Conformal prediction with conditional guarantees.Journal of the Royal Statistical Society Series B: Statistical Methodology, 87(4): 1100–1126, 03 2025

    Isaac Gibbs, John J Cherian, and Emmanuel J Candès. Conformal prediction with conditional guarantees.Journal of the Royal Statistical Society Series B: Statistical Methodology, 87(4): 1100–1126, 03 2025. ISSN 1369-7412. doi: 10.1093/jrsssb/qkaf008

  12. [12]

    Counterfactually fair conformal prediction

    Ozgur Guldogan, Neeraj Sarna, Yuanyuan Li, and Michael Berger. Counterfactually fair conformal prediction. InProceedings of The 29th International Conference on Artificial Intelligence and Statistics, 2026

  13. [13]

    FACET: Fairness in computer vision evaluation benchmark

    Laura Gustafson, Chloe Rolland, Nikhila Ravi, Quentin Duval, Aaron Adcock, Cheng-Yang Fu, Melissa Hall, and Candace Ross. FACET: Fairness in computer vision evaluation benchmark. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 20370– 20382, 2023

  14. [14]

    Equality of opportunity in supervised learning

    Moritz Hardt, Eric Price, and Nati Srebro. Equality of opportunity in supervised learning. In Advances in Neural Information Processing Systems 29, pages 3315–3323, 2016

  15. [15]

    Inherent trade-offs in the fair determination of risk scores

    Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan. Inherent trade-offs in the fair determination of risk scores. In8th Innovations in Theoretical Computer Science Conference, volume 67, pages 43:1–43:23, 2017. doi: 10.4230/LIPIcs.ITCS.2017.43

  16. [16]

    Claire Lazar Reich and Suhas Vijaykumar. A Possibility in Algorithmic Fairness: Can Calibra- tion and Equal Error Rates Be Reconciled? In2nd Symposium on Foundations of Responsible Computing, volume 192, pages 4:1–4:21, 2021. doi: 10.4230/LIPIcs.FORC.2021.4. 11

  17. [17]

    Conformal- ized fairness via quantile regression.Advances in Neural Information Processing Systems, 35: 11561–11572, 2022

    Meichen Liu, Lei Ding, Dengdeng Yu, Wulong Liu, Linglong Kong, and Bei Jiang. Conformal- ized fairness via quantile regression.Advances in Neural Information Processing Systems, 35: 11561–11572, 2022

  18. [18]

    Cresswell

    Pengqi Liu, Zijun Yu, Mouloud Belbahri, Arthur Charpentier, Masoud Asgharian, and Jesse C. Cresswell. Beyond procedure: Substantive fairness in conformal prediction. InProceedings of the 43rd International Conference on Machine Learning, 2026. To appear

  19. [19]

    Learning Transferable Visual Models From Natural Language Supervision

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agar- wal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning Transferable Visual Models From Natural Language Supervision. In Proceedings of the 38th International Conference on Machine Learning, volume 139, pages 8748–8763, 2021

  20. [20]

    With malice toward none: Assessing uncertainty via equalized coverage.Harvard Data Science Review, 2(2):4, 2020

    Yaniv Romano, Rina Foygel Barber, Chiara Sabatti, and Emmanuel Candès. With malice toward none: Assessing uncertainty via equalized coverage.Harvard Data Science Review, 2(2):4, 2020

  21. [21]

    DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

    Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter.arXiv:1910.01108, 2019

  22. [22]

    A Tutorial on Conformal Prediction.Journal of Machine Learning Research, 9(12):371–421, 2008

    Glenn Shafer and Vladimir V ovk. A Tutorial on Conformal Prediction.Journal of Machine Learning Research, 9(12):371–421, 2008

  23. [23]

    Routledge, 2018

    Bernard W Silverman.Density estimation for statistics and data analysis. Routledge, 2018

  24. [24]

    The coverage-deferral trade-off: Fairness implications of conformal predic- tion in human-in-the-loop decision systems.Preprints, 2025

    Davut Emre Tasar. The coverage-deferral trade-off: Fairness implications of conformal predic- tion in human-in-the-loop decision systems.Preprints, 2025. doi: 10.20944/preprints202512. 2631.v1

  25. [25]

    Vadlamani, Anutam Srinivasan, Pranav Maneriker, Ali Payani, and Srinivasan Parthasarathy

    Aditya T. Vadlamani, Anutam Srinivasan, Pranav Maneriker, Ali Payani, and Srinivasan Parthasarathy. A generic framework for conformal fairness. InThe Thirteenth International Conference on Learning Representations, 2025

  26. [26]

    Mondrian confidence machine.Technical Report, 2003

    Vladimir V ovk, David Lindsay, Ilia Nouretdinov, and Alex Gammerman. Mondrian confidence machine.Technical Report, 2003

  27. [27]

    Springer, 2005

    Vladimir V ovk, Alexander Gammerman, and Glenn Shafer.Algorithmic learning in a random world. Springer, 2005

  28. [28]

    Equal opportunity of coverage in fair regression.Advances in Neural Information Processing Systems, 36:7743–7755, 2023

    Fangxin Wang, Lu Cheng, Ruocheng Guo, Kay Liu, and Philip S Yu. Equal opportunity of coverage in fair regression.Advances in Neural Information Processing Systems, 36:7743–7755, 2023

  29. [29]

    A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference

    Adina Williams, Nikita Nangia, and Samuel Bowman. A broad-coverage challenge corpus for sentence understanding through inference. InProceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1112–1122, 2018. doi: 10.18653/v1/N18-1101

  30. [30]

    Conformal classification with equalized coverage for adaptively selected groups.Advances in Neural Information Processing Systems, 37:108760–108823, 2024

    Yanfei Zhou and Matteo Sesia. Conformal classification with equalized coverage for adaptively selected groups.Advances in Neural Information Processing Systems, 37:108760–108823, 2024. 12 Appendix Contents • Appendix A: Technical Discussion • Appendix B: Proofs of Theoretical Results • Appendix C: Additional Experimental Details • Appendix D:Bias in BiosE...

  31. [31]

    Under exact conservation,δ(q) = 0, we haveΩ o(q) Ωu(q)≥B 2 K. When q is the pooled population quantile and the mixture CDF FS is continuous at q, Theorem 1 givesδ(q) = 0, so the exact-conservation form in part 3 is the relevant pooled-calibration case. Theorem 6 refines Theorem 1 from a signed additive conservation law to a magnitude lower bound. The prod...

  32. [32]

    The incompatibility follows from a Bayes coupling identity relating predictive values, base rates, and the likelihood ratio TPRg/FPRg

    shows that in binary settings, when two groups have different base rates,πg =P(Y= 1|G=g) , predictive parity, i.e., equalPPVg =P(Y= 1| ˆY= 1, G=g) across groups, cannot generally hold simultaneously with equalized error profiles matching the false positive rates FPRg =P( ˆY= 1| Y= 0, G=g) and the true positive rates TPRg =P( ˆY= 1|Y= 1, G=g) . The incompa...

  33. [33]

    More precisely, for every g∈ H r \ {r},ℓ g(qg)−ℓ r(qr)≥c g >0

    The group-wise thresholds {qg}g∈G, which achieve an exact group-wise coverage level 1−α , necessarily induce a nonzero cross-group disparity in expected set size. More precisely, for every g∈ H r \ {r},ℓ g(qg)−ℓ r(qr)≥c g >0. Consequently, max g,g ′∈G |ℓg(qg)−ℓ g′(qg′)| ≥max g∈Hr\{r} cg >0.(14) Therefore, exact group-wise coverage cannot simultaneously sa...

  34. [34]

    (15) 17 Proof

    The restricted mean squared cross-group size disparity relative to the reference grouprsatisfies D2 r = X g∈Hr\{r} pg(ℓg(qg)−ℓ r(qr))2 ≥ X g∈Hr\{r} pgc2 g >0. (15) 17 Proof. The group-wise thresholds {qg}g∈G achieve equalized coverage at level 1−α across groups. Now fix any g∈ H r \ {r}. By definition of Hr, we have qg ≥q r. Since t7→ℓ g(t) is non-decreas...

  35. [35]

    Custom license, see dataset download agreement,

    Under exact conservation,δ(q) = 0, we haveΩ o(q) Ωu(q)≥B 2 K. Proof. For any group with q≥q g, we have εg(q)≥m g(q−q g). Similarly, for any group with q≤q g, we haveε g(q)≤m g(q−q g). Therefore, we have Ωo(q)≥ X g wg(q−q g)+ =:A +(q) Ω u(q)≥ X g wg(qg −q) + =:A −(q).(37) At the crossing point¯qm, A+(¯qm) =A −(¯qm) = 1 2 X g wg|qg −¯qm|=B K.(38) Since A+(q...