When Interpretability Is Unequally Distributed: Fairness in Hybrid Interpretable Models

Julien Ferry; Thibaut Vidal; Ulrich A\"ivodji; Ziba Jabbar Zare

arxiv: 2605.28626 · v1 · pith:KS6AQ32Mnew · submitted 2026-05-27 · 💻 cs.LG

When Interpretability Is Unequally Distributed: Fairness in Hybrid Interpretable Models

Ziba Jabbar Zare , Ulrich A\"ivodji , Julien Ferry , Thibaut Vidal This is my paper

Pith reviewed 2026-06-29 14:09 UTC · model grok-4.3

classification 💻 cs.LG

keywords hybrid interpretable modelsinterpretability coverage disparityprocedural fairnessdemographic parityrouting decisionsblack-box deferralfairness constraints

0 comments

The pith

Hybrid interpretable models can allocate interpretability unequally across demographic groups through their routing decisions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Hybrid interpretable models combine a transparent component with a black-box model by routing some examples to each. This routing step can send members of certain demographic groups to the black-box component more often than others, creating a procedural fairness issue distinct from standard predictive fairness. The paper defines Interpretability Coverage Disparity as a demographic-parity-style metric on the routing decision and measures it across four hybrid learning methods and standard benchmark datasets. Experiments show the disparity is substantial in regimes where both components are used, and simple constraints on coverage disparity can reduce it while affecting accuracy and sparsity only marginally.

Core claim

Hybrid interpretable models exhibit substantial Interpretability Coverage Disparity in intermediate transparency regimes where both the interpretable and black-box components are actively used. Simple coverage-disparity constraints can significantly reduce this disparity in exact hybrid learning methods, with only marginal impact on accuracy and sparsity, and in several settings the mitigation also improves standard algorithmic fairness metrics.

What carries the argument

Interpretability Coverage Disparity (ICD), a demographic-parity-style measure applied to the routing decision that assigns examples to the interpretable or black-box component.

If this is right

Substantial ICD appears when both interpretable and black-box components are actively used.
Coverage-disparity constraints substantially reduce ICD in exact hybrid learning methods.
The constraints produce only marginal changes to accuracy and sparsity.
ICD mitigation can improve standard algorithmic fairness metrics in several settings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Hybrid models require separate auditing for fairness in interpretability allocation in addition to predictive accuracy.
Routing disparities of this form could arise in any system that selectively applies interpretable versus automated components.
Coverage constraints on routing might be added to training pipelines for hybrid models without major redesign.

Load-bearing premise

The routing decision is the primary procedural fairness concern and standard benchmark datasets with their sensitive attributes adequately capture real-world demographic routing patterns.

What would settle it

An observation that ICD remains near zero across multiple hybrid methods and datasets in intermediate transparency regimes, or that adding coverage-disparity constraints produces more than marginal losses in accuracy or sparsity.

Figures

Figures reproduced from arXiv: 2605.28626 by Julien Ferry, Thibaut Vidal, Ulrich A\"ivodji, Ziba Jabbar Zare.

**Figure 2.** Figure 2: Distribution of test set ICD across Rashomon sets for all transparency bins [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Distribution of test set Interpretability Coverage [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Distribution of test set ICA across Rashomon sets for all transparency bins [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Distribution of our proposed metrics across Rashomon sets over transparency bins [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Distribution of several desiderata across Rashomon sets over transparency bins [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Test set ICD, accuracy and EO, and model sparsity for HybridCORELSPre and HybridCORELSPost with ICD [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗

**Figure 8.** Figure 8: Approximate Rashomon set growth across all transparency bins for HybridCORELSPost: number of unique models [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗

**Figure 9.** Figure 9: Approximate Rashomon set growth across all transparency bins for HybridCORELSPre: number of unique models [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗

**Figure 10.** Figure 10: Approximate Rashomon set growth across all transparency bins for HyRS: number of unique models with accuracy [PITH_FULL_IMAGE:figures/full_fig_p015_10.png] view at source ↗

**Figure 11.** Figure 11: Approximate Rashomon set growth across all transparency bins for CRL: number of unique models with accuracy [PITH_FULL_IMAGE:figures/full_fig_p016_11.png] view at source ↗

**Figure 12.** Figure 12: Distribution of test set Interpretability Coverage (IC) across Rashomon sets of HybridCORELSPost for all trans [PITH_FULL_IMAGE:figures/full_fig_p017_12.png] view at source ↗

**Figure 13.** Figure 13: Distribution of test set Interpretability Coverage (IC) across Rashomon sets of HybridCORELSPre for all trans [PITH_FULL_IMAGE:figures/full_fig_p018_13.png] view at source ↗

**Figure 14.** Figure 14: Distribution of test set Interpretability Coverage (IC) across Rashomon sets of HyRS for all transparency bins [PITH_FULL_IMAGE:figures/full_fig_p019_14.png] view at source ↗

**Figure 15.** Figure 15: Distribution of test set Interpretability Coverage (IC) across Rashomon sets of CRL for all transparency bins [PITH_FULL_IMAGE:figures/full_fig_p020_15.png] view at source ↗

**Figure 16.** Figure 16: Distribution of test set ICD across Rashomon sets over transparency bins [PITH_FULL_IMAGE:figures/full_fig_p020_16.png] view at source ↗

**Figure 17.** Figure 17: Distribution of test set ICA across Rashomon sets over transparency bins [PITH_FULL_IMAGE:figures/full_fig_p021_17.png] view at source ↗

**Figure 18.** Figure 18: Distribution of test set Equal Opportunity (EO) across Rashomon sets over transparency bins [PITH_FULL_IMAGE:figures/full_fig_p021_18.png] view at source ↗

**Figure 19.** Figure 19: Distribution of test set Statistical Parity (SP) across Rashomon sets over transparency bins [PITH_FULL_IMAGE:figures/full_fig_p022_19.png] view at source ↗

**Figure 20.** Figure 20: Distribution of model sparsity (number of rules) across Rashomon sets over transparency bins [PITH_FULL_IMAGE:figures/full_fig_p023_20.png] view at source ↗

**Figure 21.** Figure 21: Distribution of test set accuracy across Rashomon sets over transparency bins [PITH_FULL_IMAGE:figures/full_fig_p023_21.png] view at source ↗

**Figure 22.** Figure 22: Test set accuracy for HybridCORELSPre and HybridCORELSPost with ICD mitigation across Rashomon sets over [PITH_FULL_IMAGE:figures/full_fig_p024_22.png] view at source ↗

**Figure 23.** Figure 23: Test set ICD for HybridCORELSPre and HybridCORELSPost with ICD mitigation across Rashomon sets over [PITH_FULL_IMAGE:figures/full_fig_p025_23.png] view at source ↗

**Figure 24.** Figure 24: Test set Equal Opportunity (EO) for HybridCORELSPre and HybridCORELSPost with ICD mitigation across [PITH_FULL_IMAGE:figures/full_fig_p026_24.png] view at source ↗

**Figure 25.** Figure 25: Test set Statistical Parity (SP) for HybridCORELSPre and HybridCORELSPost with ICD mitigation across [PITH_FULL_IMAGE:figures/full_fig_p027_25.png] view at source ↗

**Figure 26.** Figure 26: Model sparsity for HybridCORELSPre and HybridCORELSPost with ICD mitigation across Rashomon sets over [PITH_FULL_IMAGE:figures/full_fig_p028_26.png] view at source ↗

read the original abstract

Hybrid interpretable models combine a transparent component with a black-box model by assigning some examples to the former and deferring the rest to the latter. While this design enables flexible tradeoffs between accuracy and interpretability, it also raises a distinct procedural fairness concern: some demographic groups may systematically receive interpretable decisions, while others are disproportionately routed to a black box. We formalize this issue as Interpretability Coverage Disparity (ICD), a demographic-parity-style measure applied to the routing decision of hybrid interpretable models. Using tools from predictive multiplicity, we study ICD across four hybrid interpretable learning methods, three standard fairness benchmark datasets, and multiple sensitive attributes. Our experiments reveal substantial ICD in intermediate transparency regimes, where both the interpretable and black-box components are actively used. We further show that simple coverage-disparity constraints can significantly reduce ICD in exact hybrid learning methods, with marginal impact on accuracy and sparsity. In several settings, ICD mitigation also improves standard algorithmic fairness metrics. These results show that hybrid interpretable models should be audited not only for predictive fairness, but also for how they allocate interpretability across individuals and groups.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper defines ICD to measure unequal routing to interpretable components in hybrid models and shows mitigation works on benchmarks, but the disparities may be dataset artifacts.

read the letter

The main takeaway is that this paper treats the routing decision in hybrid interpretable models as its own fairness object. They define Interpretability Coverage Disparity (ICD) as a demographic-parity measure on which groups get the transparent component, run it across four methods and three standard datasets, and report that ICD appears in the intermediate transparency range while simple coverage constraints reduce it with only marginal effects on accuracy and sparsity.

What is new is the explicit focus on allocation of interpretability itself rather than just predictive performance. The experiments document the pattern and demonstrate a workable fix for exact hybrid learners.

The soft spot is the data. These benchmarks have sensitive attributes but supply no ground truth on when an example should receive an interpretable decision. The measured disparities could therefore be side effects of optimizing accuracy and sparsity on those particular distributions instead of a general procedural fairness issue. If real deployments route based on domain factors uncorrelated with the benchmark labels, the ICD numbers and the mitigation results would not carry over. That limitation is real and worth checking in review.

This is for researchers working on hybrid models or fairness auditing. It deserves peer review because the angle is distinct and the proposed constraints are straightforward to implement.

Referee Report

2 major / 2 minor

Summary. The paper introduces Interpretability Coverage Disparity (ICD), a demographic-parity-style metric on the routing decisions of hybrid interpretable models that combine a transparent component with a black-box model. Across four hybrid learning methods, three standard fairness benchmarks, and multiple sensitive attributes, the authors report substantial ICD in intermediate transparency regimes and demonstrate that simple coverage-disparity constraints can reduce ICD in exact hybrid methods with only marginal effects on accuracy and sparsity; in some cases the constraints also improve standard fairness metrics.

Significance. If the empirical findings hold, the work identifies a distinct procedural fairness issue in hybrid models and supplies both a formalization and a practical mitigation approach. The multi-method, multi-dataset experimental design and the observation that ICD mitigation can co-occur with gains in predictive fairness are constructive contributions that could prompt routine auditing of interpretability allocation alongside accuracy-based fairness checks.

major comments (2)

[§4] §4 (Experiments): The routing decisions are produced by optimizing accuracy and sparsity on standard benchmarks (Adult, COMPAS, etc.) that supply sensitive attributes but contain no ground-truth labels for when an instance should receive an interpretable versus black-box decision. Consequently the reported group disparities in ICD may be optimization artifacts rather than evidence of a general procedural fairness phenomenon; real-world routing often depends on domain-specific factors uncorrelated with the benchmark labels.
[§5.2] §5.2 (Mitigation results): The coverage-disparity constraints are shown to reduce ICD, yet the manuscript does not report whether the constrained solutions remain within the same intermediate transparency regimes or whether the reduction in ICD is statistically significant after multiple-testing correction across the four methods and multiple attributes.

minor comments (2)

[§3] Notation for the routing function and the exact definition of ICD (Eq. 3) could be cross-referenced more explicitly in the experimental tables so readers can map reported numbers directly to the formal measure.
[Figure 3] Figure 3 caption should state the number of random seeds used for the error bars; without this the visual comparison of ICD before and after mitigation is harder to interpret.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We respond to each major point below, indicating planned revisions where appropriate.

read point-by-point responses

Referee: [§4] §4 (Experiments): The routing decisions are produced by optimizing accuracy and sparsity on standard benchmarks (Adult, COMPAS, etc.) that supply sensitive attributes but contain no ground-truth labels for when an instance should receive an interpretable versus black-box decision. Consequently the reported group disparities in ICD may be optimization artifacts rather than evidence of a general procedural fairness phenomenon; real-world routing often depends on domain-specific factors uncorrelated with the benchmark labels.

Authors: We agree that the benchmarks lack ground-truth routing labels, so the observed ICD reflects disparities arising under standard accuracy/sparsity optimization rather than 'correct' allocations. This is a limitation of the experimental design. However, the core claim is that hybrid models trained via common objectives on widely used fairness benchmarks produce unequal interpretability allocation; this procedural disparity is itself a fairness issue worth auditing, independent of ground truth. Real-world routing may incorporate additional factors, but the benchmark results still demonstrate that ICD can emerge without explicit fairness considerations in the routing. We will add a dedicated limitations paragraph in §4 and the discussion section acknowledging the lack of ground-truth labels and the need for future work on domain-specific routing. revision: partial
Referee: [§5.2] §5.2 (Mitigation results): The coverage-disparity constraints are shown to reduce ICD, yet the manuscript does not report whether the constrained solutions remain within the same intermediate transparency regimes or whether the reduction in ICD is statistically significant after multiple-testing correction across the four methods and multiple attributes.

Authors: The constrained solutions do remain in intermediate transparency regimes (coverage typically 30-70% interpretable), as the disparity constraints are applied on top of the original accuracy/sparsity objectives without altering target coverage levels; this is visible in the reported coverage values but was not explicitly stated. For statistical significance, we will revise §5.2 to include p-values computed with Bonferroni correction across the four methods and attributes, confirming that ICD reductions remain significant. We will also add an explicit statement that intermediate regimes are preserved under the constraints. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical evaluation of newly defined ICD measure

full rationale

The paper defines Interpretability Coverage Disparity (ICD) as a demographic-parity-style measure on routing decisions in hybrid models, then reports experimental results on its prevalence and mitigation across four methods and three benchmarks. No equations or claims reduce a prediction to a fitted input by construction, no self-citations bear load on the central empirical findings, and the derivation chain consists of independent formalization followed by external evaluation on standard datasets. This is a standard non-circular empirical study.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Central claims rest on the definition of ICD as a demographic-parity measure applied to routing and on the existence of the reported experimental patterns; no free parameters, standard axioms, or invented physical entities are stated in the abstract.

invented entities (1)

Interpretability Coverage Disparity (ICD) no independent evidence
purpose: Demographic-parity-style measure of unequal routing to the interpretable component
Newly formalized in the paper to capture the procedural fairness issue described.

pith-pipeline@v0.9.1-grok · 5740 in / 1048 out tokens · 30034 ms · 2026-06-29T14:09:19.954613+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

5 extracted references · 3 canonical work pages

[1]

29 CFR Part 1607

Uniform Guidelines on Employee Selection Proce- dures. 29 CFR Part 1607. Equal Employment Opportunity Commission. A¨ıvodji, U.; Arai, H.; Fortineau, O.; Gambs, S.; Hara, S.; and Tapp, A. 2019. Fairwashing: The Risk of Rationaliza- tion. InProceedings of the 36th International Conference on Machine Learning, ICML, 161–170. Angwin, J.; Larson, J.; Mattu, S....

work page arXiv 2019
[2]

InProceedings of the 38th AAAI Conference on Artificial Intelligence, AAAI, 22004–22012

Arbitrariness and Social Prediction: The Confounding Role of Variance in Fair Classification. InProceedings of the 38th AAAI Conference on Artificial Intelligence, AAAI, 22004–22012. Coston, A.; Rambachan, A.; and Chouldechova, A. 2021. Characterizing fairness over the set of good models under selective labels. InProceedings of the 38th International Conf...

2021
[3]

arXiv:2601.20449

Fair Recourse for All: Ensuring Individual and Group Fairness in Counterfactual Explanations. arXiv:2601.20449. Feldman, M.; Friedler, S. A.; Moeller, J.; Scheidegger, C.; and Venkatasubramanian, S. 2015. Certifying and Remov- ing Disparate Impact. InProceedings of the 21st ACM SIGKDD International Conference on Knowledge Discov- ery and Data Mining, KDD,...

work page arXiv 2015
[4]

InProceedings of the 30th International Conference on Machine Learning, ICML, 325–333

Learning Fair Representations. InProceedings of the 30th International Conference on Machine Learning, ICML, 325–333. Zhang, B. H.; Lemoine, B.; and Mitchell, M. 2018. Miti- gating unwanted biases with adversarial learning. InPro- ceedings of the AAAI/ACM Conference on AI, Ethics, and Society, AIES, 335–340. Zhao, Y .; Wang, Y .; and Derr, T. 2022. Fairne...

2018
[5]

Exploring and interacting with the set of good sparse generalized additive models.Proceedings of the 38th An- nual Conference on Neural Information Processing Systems, NeurIPS, 56673–56699. Algorithm 1: HybridCORELSPre (with ICD mitigation) Input: Training dataSwith set of pre-mined antecedentsΥ; minimum transparency valueC min; initial prefixr 0 such tha...

work page arXiv 2024

[1] [1]

29 CFR Part 1607

Uniform Guidelines on Employee Selection Proce- dures. 29 CFR Part 1607. Equal Employment Opportunity Commission. A¨ıvodji, U.; Arai, H.; Fortineau, O.; Gambs, S.; Hara, S.; and Tapp, A. 2019. Fairwashing: The Risk of Rationaliza- tion. InProceedings of the 36th International Conference on Machine Learning, ICML, 161–170. Angwin, J.; Larson, J.; Mattu, S....

work page arXiv 2019

[2] [2]

InProceedings of the 38th AAAI Conference on Artificial Intelligence, AAAI, 22004–22012

Arbitrariness and Social Prediction: The Confounding Role of Variance in Fair Classification. InProceedings of the 38th AAAI Conference on Artificial Intelligence, AAAI, 22004–22012. Coston, A.; Rambachan, A.; and Chouldechova, A. 2021. Characterizing fairness over the set of good models under selective labels. InProceedings of the 38th International Conf...

2021

[3] [3]

arXiv:2601.20449

Fair Recourse for All: Ensuring Individual and Group Fairness in Counterfactual Explanations. arXiv:2601.20449. Feldman, M.; Friedler, S. A.; Moeller, J.; Scheidegger, C.; and Venkatasubramanian, S. 2015. Certifying and Remov- ing Disparate Impact. InProceedings of the 21st ACM SIGKDD International Conference on Knowledge Discov- ery and Data Mining, KDD,...

work page arXiv 2015

[4] [4]

InProceedings of the 30th International Conference on Machine Learning, ICML, 325–333

Learning Fair Representations. InProceedings of the 30th International Conference on Machine Learning, ICML, 325–333. Zhang, B. H.; Lemoine, B.; and Mitchell, M. 2018. Miti- gating unwanted biases with adversarial learning. InPro- ceedings of the AAAI/ACM Conference on AI, Ethics, and Society, AIES, 335–340. Zhao, Y .; Wang, Y .; and Derr, T. 2022. Fairne...

2018

[5] [5]

Exploring and interacting with the set of good sparse generalized additive models.Proceedings of the 38th An- nual Conference on Neural Information Processing Systems, NeurIPS, 56673–56699. Algorithm 1: HybridCORELSPre (with ICD mitigation) Input: Training dataSwith set of pre-mined antecedentsΥ; minimum transparency valueC min; initial prefixr 0 such tha...

work page arXiv 2024