pith. machine review for the scientific record. sign in

arxiv: 2605.09852 · v1 · submitted 2026-05-11 · 💻 cs.AI · cs.CE· cs.CY· cs.LG

Recognition: 2 theorem links

· Lean Theorem

Fairness of Explanations in Artificial Intelligence (AI): A Unifying Framework, Axioms, and Future Direction toward Responsible AI

Gideon Popoola, John Sheppard

Authors on Pith no claims yet

Pith reviewed 2026-05-12 04:59 UTC · model grok-4.3

classification 💻 cs.AI cs.CEcs.CYcs.LG
keywords explanation fairnessalgorithmic fairnessexplainable AIprocedural biasconditional invarianceunifying frameworkresponsible AIpost-hoc explainers
0
0 comments X

The pith

Explanation fairness requires that AI explanations stay invariant to protected attributes when relevant features are fixed.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that a model can meet every standard fairness rule on its outputs yet still produce systematically different explanations depending on protected attributes such as race or gender. It formalizes explanation fairness through a single conditional invariance requirement: explanations must satisfy the same distribution whenever the task-relevant inputs are held constant, regardless of the protected attribute value. All prior explanation-fairness metrics are shown to arise as special cases or partial checks of this principle. The work supplies a seven-dimensional taxonomy of the problem, three mechanisms that generate explanation inequity, and a six-step audit workflow for practical use. A sympathetic reader cares because high-stakes decisions need both fair outcomes and fair reasoning processes; without the latter, explanations can mask or justify bias even when predictions look equitable.

Core claim

The central claim is that explanation fairness is captured by the conditional invariance condition P(E(X) in · | X_rel = x_rel, A = a) = P(E(X) in · | X_rel = x_rel, A = b) for every task-relevant x. This single axiom unifies the literature by showing that existing metrics are incomplete operationalizations of the same invariance requirement. The framework also isolates three generative sources of inequity (representation-driven, explanation-model mismatch, and actionability-driven) and prescribes a canonical six-step evaluation workflow to audit any post-hoc explainer.

What carries the argument

The conditional invariance condition, which demands that the distribution of an explanation remain unchanged when protected attributes vary but task-relevant features are fixed.

If this is right

  • Existing explanation fairness metrics emerge as partial checks of one underlying invariance principle rather than competing alternatives.
  • A model can satisfy every output fairness criterion while still exhibiting procedural bias in its explanations.
  • Three distinct mechanisms (representation-driven, explanation-model mismatch, actionability-driven) can each produce inequitable explanations.
  • A six-step workflow provides a repeatable method for auditing any post-hoc explainer in practice.
  • Fairness must be assessed separately for the reasoning process, not only for the final prediction.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The invariance lens could be applied directly to other interpretability techniques such as attention maps or prototype-based explanations without requiring post-hoc methods.
  • Regulatory standards for responsible AI might eventually mandate explicit checks that explanations satisfy the conditional invariance condition.
  • Developers could test the framework by generating synthetic datasets where relevant features are controlled and protected attributes are varied, then measuring explanation divergence.
  • If the invariance condition holds, it would imply that explanation fairness audits could be performed on black-box models without retraining or access to training data.

Load-bearing premise

Post-hoc explainers can be judged for fairness independently of the model's training process, and the invariance equality can be checked without further assumptions about how the explainer produces its output.

What would settle it

An observed case in which, for two inputs that share identical values on all task-relevant features but differ in a protected attribute, the generated explanations differ in a measurable way that violates the equality of their distributions.

Figures

Figures reproduced from arXiv: 2605.09852 by Gideon Popoola, John Sheppard.

Figure 1
Figure 1. Figure 1: Hypothetical illustration of procedural unfairness: two hypothetical loan applicants receive identical outcomes, but [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Conceptual pipeline of explanation fairness. Bias enters at multiple stages: historical bias in training data, proxy leakage [PITH_FULL_IMAGE:figures/full_fig_p012_2.png] view at source ↗
read the original abstract

Machine learning algorithms are being used in high-stakes decisions, including those in criminal justice, healthcare, credit, and employment. The research community has responded with two largely independent research fields: \emph{algorithmic fairness}, which targets equitable outcomes, and \emph{explainable AI} (XAI), which targets interpretable reasoning. This survey identifies and maps a novel blind spot at their intersection, which is a model that can satisfy every standard fairness criterion in its outputs while being profoundly unfair in its \emph{reasoning process}. We refer to this as the procedural bias, and mitigating it requires treating the fairness of explanations as a distinct object of scientific study. To our knowledge, we provide the first unified theoretical and literature review of this emerging field and elucidate the drawbacks of post-hoc explainers in certifying explanation fairness. Our central contribution is a \emph{conditional invariance framework} formalizing explanation fairness as the requirement that explanations should be indifferent regardless of the protected attributes $ P(E(X) \in \cdot \mid X_\text{rel} = x_\text{rel},\, A = a) = P(E(X) \in \cdot \mid X_\text{rel} = x_\text{rel},\, A = b)$ for all task-relevant $x$, a single principle from which all existing explanation fairness metrics emerge as partial operationalizations. We introduce a seven-dimensional taxonomy, identify three generative mechanisms of explanation inequity (representation-driven, explanation-model mismatch, actionability-driven), and propose a canonical six-step evaluation workflow for operationalizing explanation fairness audits in practice.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript surveys the intersection of algorithmic fairness and explainable AI, identifying 'procedural bias' as the gap where models satisfy output fairness criteria yet produce unfair reasoning via explanations. It proposes a conditional invariance framework defining explanation fairness via P(E(X) ∈ · | X_rel = x_rel, A = a) = P(E(X) ∈ · | X_rel = x_rel, A = b) for task-relevant x, from which all existing explanation fairness metrics are claimed to emerge as partial operationalizations. The work introduces a seven-dimensional taxonomy of explanation fairness, three generative mechanisms of inequity (representation-driven, explanation-model mismatch, actionability-driven), and a canonical six-step evaluation workflow for practical audits.

Significance. If the invariance framework can be operationalized without additional unstated assumptions, it would offer a unifying theoretical lens for explanation fairness that is independent of model training, filling a documented blind spot between fairness and XAI research. The survey component usefully maps an emerging literature, and the proposed workflow provides a concrete path for audits in high-stakes domains; however, the significance is tempered by the need to demonstrate how the central principle applies to deterministic post-hoc methods.

major comments (2)
  1. [Abstract] Abstract (central contribution paragraph): the conditional invariance P(E(X) ∈ · | X_rel = x_rel, A = a) = P(E(X) ∈ · | X_rel = x_rel, A = b) is only well-defined if a probability measure over the space of explanations E is specified. Standard post-hoc explainers (LIME, SHAP) are deterministic functions of the model and input; without an explicit generative model, sampling distribution, or perturbation mechanism for E, the conditional distributions reduce to Dirac deltas, making the equality either vacuous or trivially true. The three generative mechanisms and discussion of post-hoc drawbacks do not supply the required concrete construction.
  2. [Generative mechanisms section] Section introducing the three generative mechanisms: while representation-driven, explanation-model mismatch, and actionability-driven mechanisms are posited as sources of inequity, no derivation or mapping is provided showing how any of them induces a non-degenerate distribution over E that would render the invariance condition both non-trivial and computable from a trained model alone. This leaves the claim that the framework unifies existing metrics without supporting operational details.
minor comments (2)
  1. [Taxonomy section] The seven-dimensional taxonomy would benefit from at least one concrete example per dimension to illustrate distinctions from existing fairness taxonomies.
  2. [Evaluation workflow section] Clarify the relationship between the six-step workflow and the invariance principle; currently the workflow appears procedural but does not explicitly reference how each step enforces or checks the conditional equality.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback on our manuscript. We address each major comment below with clarifications on the conditional invariance framework, its probability measure, and operational details for the generative mechanisms. These points will be incorporated into the revised version to strengthen the presentation.

read point-by-point responses
  1. Referee: [Abstract] Abstract (central contribution paragraph): the conditional invariance P(E(X) ∈ · | X_rel = x_rel, A = a) = P(E(X) ∈ · | X_rel = x_rel, A = b) is only well-defined if a probability measure over the space of explanations E is specified. Standard post-hoc explainers (LIME, SHAP) are deterministic functions of the model and input; without an explicit generative model, sampling distribution, or perturbation mechanism for E, the conditional distributions reduce to Dirac deltas, making the equality either vacuous or trivially true. The three generative mechanisms and discussion of post-hoc drawbacks do not supply the required concrete construction.

    Authors: We appreciate this observation on the need for a well-defined probability measure. In the conditional invariance framework, the measure over E is induced by the data-generating process: specifically, P(E(X) ∈ · | X_rel = x_rel, A = a) is the pushforward of the conditional distribution P(X | X_rel = x_rel, A = a) through the (possibly deterministic) explainer E. This accounts for variability in non-task-relevant features even when E is a fixed function of X, rendering the distributions non-degenerate and the invariance condition non-trivial. For methods like LIME, internal perturbation sampling adds further stochasticity. We will revise the abstract and framework section to explicitly state this construction, including its applicability to deterministic post-hoc explainers, and link it to how the generative mechanisms affect the induced distributions. revision: yes

  2. Referee: [Generative mechanisms section] Section introducing the three generative mechanisms: while representation-driven, explanation-model mismatch, and actionability-driven mechanisms are posited as sources of inequity, no derivation or mapping is provided showing how any of them induces a non-degenerate distribution over E that would render the invariance condition both non-trivial and computable from a trained model alone. This leaves the claim that the framework unifies existing metrics without supporting operational details.

    Authors: We agree that explicit mappings would enhance the operational clarity of the generative mechanisms. Representation-driven inequity alters P(X | X_rel, A) via feature correlations, yielding distinct pushforward measures on E(X). Explanation-model mismatch arises when surrogate approximations in post-hoc methods introduce A-dependent biases in the computed E. Actionability-driven inequity affects how explanations translate to decisions differing by A. We will add a new subsection with derivations, toy examples, and computational procedures showing how each mechanism produces non-degenerate conditional distributions on E and how invariance can be audited from a trained model and dataset alone. This will also provide concrete links to the unification of existing metrics. revision: yes

Circularity Check

0 steps flagged

No significant circularity in proposed unifying framework

full rationale

The paper proposes a conditional invariance condition as its central contribution, defining explanation fairness via P(E(X) ∈ · | X_rel = x_rel, A = a) = P(E(X) ∈ · | X_rel = x_rel, A = b) and stating that existing metrics emerge as partial operationalizations. This is presented as an organizational and axiomatic unification in a survey identifying a blind spot between fairness and XAI, without any equations or steps that reduce the framework back to its inputs by construction, fitted parameters renamed as predictions, or load-bearing self-citations. The derivation chain is self-contained as a proposed first-principles formalization independent of model training details, with no evidence of the specific reductions required to flag circularity per the analysis criteria.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The framework rests on standard probabilistic conditioning and the separation of protected attributes from task-relevant features; no free parameters or new physical entities are introduced.

axioms (2)
  • domain assumption Explanations E(X) can be treated as a random variable whose distribution can be conditioned on protected attributes A and relevant features X_rel independently of the prediction model.
    Invoked when stating the conditional invariance equality in the abstract.
  • domain assumption Protected attributes A are distinct from task-relevant features X_rel.
    Required for the invariance condition to be meaningful.
invented entities (1)
  • procedural bias no independent evidence
    purpose: Names the phenomenon of unfair reasoning processes despite fair outcomes.
    New term coined to highlight the identified blind spot between fairness and XAI.

pith-pipeline@v0.9.0 · 5609 in / 1452 out tokens · 46811 ms · 2026-05-12T04:59:39.277959+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Reference graph

Works this paper leans on

72 extracted references · 72 canonical work pages · 2 internal anchors

  1. [1]

    Evaluating robustness of counterfactual explanations

    “Evaluating robustness of counterfactual explanations. ” In:2021 IEEE Symposium Series on Computational Intelligence (SSCI). IEEE, 1–9. E. Bakshy, S. Messing, and L. A. Adamic

  2. [2]

    The road to explainability is paved with bias: Measuring the fairness of explanations

    “The road to explainability is paved with bias: Measuring the fairness of explanations. ” In:Proceedings of the 2022 ACM conference on fairness, accountability, and transparency, 1194–1206. H. Baniecki and P. Biecek

  3. [3]

    Elsevier, 102303. S. Barocas, M. Hardt, and A. Narayanan. 2019.Fairness and machine learning: Limitations and opportunities. https://fairmlbook.org. MIT Press. T. Begley, T. Schwedes, C. Fryer, and S. Parbhoo

  4. [4]

    arXiv preprint arXiv:2010.07389 , year=

    “Explainability for fair machine learning. ” In:arXiv preprint arXiv:2010.07389. A. Bell, L. Bynum, N. Drushchak, T. Zakharchenko, L. Rosenblatt, and J. Stoyanovich

  5. [5]

    The possibility of fairness: Revisiting the impossibility theorem in practice

    “The possibility of fairness: Revisiting the impossibility theorem in practice. ” In:Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, 400–422. N. Berenstain, K. Dotson, J. Paredes, E. Ruíz, and N. K. Silva

  6. [6]

    Putting fairness principles into practice: Challenges, metrics, and improvements

    “Putting fairness principles into practice: Challenges, metrics, and improvements. ” In:Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, 453–459. U. Bhatt, A. Weller, and J. M. Moura

  7. [7]

    Evaluating and aggregating feature-based model explanations

    “Evaluating and aggregating feature-based model explanations. ”arXiv preprint arXiv:2005.00631. Journal of Artificial Intelligence Research, Vol. 4, Article

  8. [8]

    Post-hoc explanations fail to achieve their purpose in adversarial contexts

    “Post-hoc explanations fail to achieve their purpose in adversarial contexts. ” In: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, 891–905. B. Bravi

  9. [9]

    arXiv preprint arXiv:2310.17256 , year=

    “FAIRRET: a framework for differentiable fairness regularization terms. ”arXiv preprint arXiv:2310.17256. A. Carranza, D. Pai, R. Schaeffer, A. Tandon, and S. Koyejo

  10. [10]

    Deceptive alignment monitoring

    “Deceptive alignment monitoring. ”arXiv preprint arXiv:2307.10569. D. Castelvecchi

  11. [11]

    Demarginalizing the intersection of race and sex: A black feminist critique of antidiscrimination doctrine, feminist theory and antiracist politics

    “Demarginalizing the intersection of race and sex: A black feminist critique of antidiscrimination doctrine, feminist theory and antiracist politics. ”University of Chicago Legal Forum, 1989, 1, 139–167. K. Crenshaw

  12. [12]

    Fairness via explanation quality: Evaluating disparities in the quality of post hoc explanations

    “Fairness via explanation quality: Evaluating disparities in the quality of post hoc explanations. ” In:Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society, 203–214. J. Dai, S. Upadhyay, S. H. Bach, and H. Lakkaraju

  13. [13]

    arXiv preprint arXiv:2106.13346 , year=

    “What will it take to generate fairness-preserving explanations?”arXiv preprint arXiv:2106.13346. J. Dastin

  14. [14]

    A critical survey on fairness benefits of explainable AI

    “A critical survey on fairness benefits of explainable AI. ” In:Proceedings of the 2024 ACM conference on fairness, accountability, and transparency, 1579–1595. B. Dimanov, U. Bhatt, M. Jamnik, and A. Weller

  15. [15]

    Influence function learning in information diffusion networks

    “Influence function learning in information diffusion networks. ” In:International Conference on Machine Learning. PMLR, 2016–2024. C. Dwork, M. Hardt, T. Pitassi, O. Reingold, and R. Zemel

  16. [16]

    Fair Recourse for All: Ensuring Individual and Group Fairness in Counterfactual Explanations

    “Fair Recourse for All: Ensuring Individual and Group Fairness in Counterfactual Explanations. ”arXiv preprint arXiv:2601.20449. M. Fan, W. Wei, W. Jin, Z. Yang, and T. Liu

  17. [17]

    How explainability contributes to trust in AI

    “How explainability contributes to trust in AI. ” In:Proceedings of the 2022 ACM conference on fairness, accountability, and transparency, 1457–1466. G. Ferraro, Y. Deldjoo, D. Malitesta, V. W. Anelli, F. Naumov, and T. Di Noia

  18. [18]

    What’s fair about individual fairness?

    “What’s fair about individual fairness?” In:Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, 480–490. J. R. Foulds, R. Islam, K. N. Keya, and S. Pan

  19. [19]

    Differential fairness

    “Differential fairness. ” In:NeurIPS 2019 Workshop on Machine Learning with Guarantees. J. R. Foulds, R. Islam, K. N. Keya, and S. Pan

  20. [20]

    An intersectional definition of fairness

    “An intersectional definition of fairness. ” In:Proceedings of the 36th IEEE International Conference on Data Engineering (ICDE). IEEE, 1918–1921. M. Fricker. 2007.Epistemic Injustice: Power and the Ethics of Knowing. Oxford University Press. A. Fuster, P. Goldsmith-Pinkham, T. Ramadorai, and A. Walther

  21. [21]

    FairMOE: A mixture-of-experts approach to group and individual fairness

    “FairMOE: A mixture-of-experts approach to group and individual fairness. ” In:Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency (FAccT). S. Germino, T. Zhao, T. Derr, N. Moniz, and N. V. Chawla

  22. [22]

    Marrying fairness and explainability in supervised learning

    “Marrying fairness and explainability in supervised learning. ” In:Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, 1905–1916. N. Grgić-Hlača, M. B. Zafar, K. P. Gummadi, and A. Weller

  23. [23]

    arXiv preprint arXiv:1909.03166 , year =

    “Equalizing recourse across groups. ” In:arXiv preprint arXiv:1909.03166. K. S. Gurumoorthy, A. Dhurandhar, G. Cecchi, and C. Bhatt

  24. [24]

    Multicalibration: Calibration for the computationally-identifiable masses

    “Multicalibration: Calibration for the computationally-identifiable masses. ” In:Proceedings of the 35th International Conference on Machine Learning (ICML). PMLR, 1939–1948. Journal of Artificial Intelligence Research, Vol. 4, Article

  25. [25]

    Mitigating degree biases in message passing mechanism by utilizing community structures

    “Mitigating degree biases in message passing mechanism by utilizing community structures. ”arXiv preprint arXiv:2312.16788. H.-Y. Huang, Z. Wu, Y. Yang, J. Zhang, and Y. Wu

  26. [26]

    arXiv preprint arXiv:2408.09172 , year=

    “Unlocking the power of llm uncertainty for active in-context example selection. ” arXiv preprint arXiv:2408.09172. Q. Huang, M. Yamada, Y. Tian, D. Singh, D. Yin, and Y. Chang

  27. [27]

    Shap-based explanations are sensitive to feature representation

    “Shap-based explanations are sensitive to feature representation. ” In:Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency, 1588–1601. A. Ibrahim, S. Rahnamayan, M. V. Martin, and K. Deb

  28. [28]

    EliteNSGA-III: An improved evolutionary many-objective optimization algorithm

    “EliteNSGA-III: An improved evolutionary many-objective optimization algorithm. ” In:2016 IEEE Congress on evolutionary computation (CEC). IEEE, 973–982. I. Ibrahim and A. Abdulazeez

  29. [29]

    arXiv preprint arXiv:1906.00250 , year =

    “Metric learning for individual fairness. ” In:arXiv preprint arXiv:1906.00250. R. Islam, K. N. Keya, S. Pan, A. D. Sarwate, and J. R. Foulds

  30. [30]

    arXiv preprint arXiv:2012.10986 , year=

    “Biased models have biased explanations. ”arXiv preprint arXiv:2012.10986. N. Jethani, M. Sudarshan, Y. Aphinyanaphongs, and R. Ranganath

  31. [31]

    Algorithmic recourse: From counterfactual explanations to interventions

    “Algorithmic recourse: From counterfactual explanations to interventions. ” In:Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (FAccT), 353–362. A.-H. Karimi, J. Von Kügelgen, B. Schölkopf, and I. Valera

  32. [32]

    Interpreting interpretability: Understanding data scientists’ use of interpretability tools for machine learning

    “Interpreting interpretability: Understanding data scientists’ use of interpretability tools for machine learning. ”Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, 1–14. L. Kavouras, K. Tsopelas, G. Giannopoulos, D. Sacharidis, E. Psaroudaki, N. Theologitis, D. Rontogiannis, D. Fotakis, and I. Emiris

  33. [33]

    An empirical study of rich subgroup fairness for machine learning

    “An empirical study of rich subgroup fairness for machine learning. ” In:Proceedings of the 2019 ACM Conference on Fairness, Accountability, and Transparency (FAccT), 100–109. M. Kearns, S. Neel, A. Roth, and Z. S. Wu

  34. [34]

    Multiaccuracy: Black-box post-processing for fairness in classification

    “Multiaccuracy: Black-box post-processing for fairness in classification. ” In:Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society (AIES), 247–254. J. Kleinberg, S. Mullainathan, and M. Raghavan

  35. [35]

    arXiv preprint arXiv:2409.17643 , year=

    “Efficient Fairness-Performance Pareto Front Computation. ”arXiv preprint arXiv:2409.17643. M. J. Kusner, J. Loftus, C. Russell, and R. Silva

  36. [36]

    Exposing the illusion of fairness: Auditing vulnerabilities to distributional manipulation attacks

    “Exposing the illusion of fairness: Auditing vulnerabilities to distributional manipulation attacks. ”arXiv preprint arXiv:2507.20708. H. Lakkaraju, N. Arsov, and O. Bastani

  37. [37]

    arXiv preprint arXiv:1812.02116 , year =

    “Inverse classification for comparison-based interpretability in machine learning. ” In:arXiv preprint arXiv:1812.02116. E. Lee, D. Braines, M. Stiffler, A. Hudler, and D. Harborne

  38. [38]

    arXiv preprint arXiv:2304.13312 , year=

    “Defining and quantifying and-or interactions for faithful and concise explanation of dnns. ”arXiv preprint arXiv:2304.13312. X. Li, P. Wu, and J. Su

  39. [39]

    an experimental study on the impact of accuracy and simplicity of decision trees on causability and fairness perceptions

    “Explainable artificial intelligence for academic performance prediction. an experimental study on the impact of accuracy and simplicity of decision trees on causability and fairness perceptions. ” In:Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, 1031–1042. D. Luo, W. Cheng, D. Xu, W. Yu, B. Zong, H. Chen, and X. Zhang

  40. [40]

    arXiv preprint arXiv:2102.07719 , year =

    “There is no trade-off: Enforcing fairness can improve accuracy. ” In:arXiv preprint arXiv:2102.07719. S. S. Matta and M. Bolli

  41. [41]

    Lessons from deploying explainable AI in clinical settings

    “Lessons from deploying explainable AI in clinical settings. ” In:arXiv preprint arXiv:2406.01234. V. Mhasawade, S. Rahman, Z. Haskell-Craig, and R. Chunara

  42. [42]

    Understanding disparities in post hoc machine learning explanation

    “Understanding disparities in post hoc machine learning explanation. ” In: Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, 2374–2388. S. Mitchell, E. Potash, S. Barocas, A. D’Amour, and K. Lum

  43. [43]

    arXiv preprint arXiv:2007.04131 , year =

    “Pitfalls to avoid when interpreting machine learning models. ”arXiv preprint arXiv:2007.04131. K. G. Moons, R. F. Wolff, R. D. Riley, P. F. Whiting, M. Westwood, G. S. Collins, J. B. Reitsma, J. Kleijnen, and S. Mallett

  44. [44]

    Machine learning in bail decisions and judges’ trustworthiness

    “Machine learning in bail decisions and judges’ trustworthiness. ”Ai & Society, 39, 4, 2033–2044. G. Morina, V. Oliinyk, J. Waton, I. Marusic, and K. Georgatzis

  45. [45]

    Auditing and achieving intersectional fairness in classification problems

    “Auditing and achieving intersectional fairness in classification problems. ” arXiv preprint arXiv:1911.01468. Journal of Artificial Intelligence Research, Vol. 4, Article

  46. [46]

    Explaining machine learning classifiers through diverse counterfactual explanations

    “Explaining machine learning classifiers through diverse counterfactual explanations. ” In: Proceedings of the 2020 ACM Conference on Fairness, Accountability, and Transparency (FAccT), 607–617. D. Mukherjee, M. Yurochkin, M. Banerjee, and Y. Sun

  47. [47]

    Causal SHAP: Feature Attribution with Dependency Awareness through Causal Discovery

    “Causal SHAP: Feature Attribution with Dependency Awareness through Causal Discovery. ” In: 2025 International Joint Conference on Neural Networks (IJCNN). IEEE, 1–8. V. Noghin

  48. [48]

    How model accuracy and explanation fidelity influence user trust.arXiv preprint arXiv:1907.12652,

    “How model accuracy and explanation fidelity influence user trust. ”arXiv preprint arXiv:1907.12652. J. Pearl. 2009.Causality: Models, reasoning, and inference. (2nd ed.). Cambridge University Press. K. Peng, J. Chakraborty, and T. Menzies

  49. [49]

    arXiv preprint arXiv:2104.02768 , year=

    “Robust semantic interpretability: Revisiting concept activation vectors. ”arXiv preprint arXiv:2104.02768. D. Pham, B. Tran, S. Nguyen, D. Alahakoon, and M. Zhang

  50. [50]

    MESD: A Risk-Sensitive Metric for Explanation Fairness Across Intersectional Subgroups

    “Fairness optimisation with multi-objective swarms for explainable classifiers on data streams. ”Complex & Intelligent Systems, 10, 4, 4741–4754. G. Popoola and J. Sheppard. 2026a. “MESD: Detecting and Mitigating Procedural Bias in Intersectional Groups. ”arXiv preprint arXiv:2603.13452. G. Popoola and J. Sheppard. 2026b. “Procedural Fairness via Group Co...

  51. [51]

    FACE: Feasible and actionable counterfactual explanations

    “FACE: Feasible and actionable counterfactual explanations. ” In: Proceedings of the 2020 AAAI/ACM Conference on AI, Ethics, and Society (AIES), 344–350. M. Raghavan, S. Barocas, J. Kleinberg, and K. Levy

  52. [52]

    Mitigating bias in algorithmic hiring: Evaluating claims and practices

    “Mitigating bias in algorithmic hiring: Evaluating claims and practices. ” In:Proceedings of the 2020 conference on fairness, accountability, and transparency, 469–481. R. Ramachandranpillai, R. Baeza-Yates, and F. Heintz

  53. [53]

    arXiv preprint arXiv:2006.13155 , year =

    “Logical neural networks. ” In:arXiv preprint arXiv:2006.13155. A. Robb and D. T. Robinson

  54. [54]

    Aequitas: A Bias and Fairness Audit Toolkit

    “Aequitas: A bias and fairness audit toolkit. ” In: arXiv preprint arXiv:1811.05577. F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini

  55. [55]

    Explanations, fairness, and appropriate reliance in human-AI decision-making

    “Explanations, fairness, and appropriate reliance in human-AI decision-making. ” In:Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems, 1–18. Journal of Artificial Intelligence Research, Vol. 4, Article

  56. [56]

    Deceiving post-hoc explainable AI (XAI) methods in network intrusion detection

    “Deceiving post-hoc explainable AI (XAI) methods in network intrusion detection. ” In:2024 IEEE 21st Consumer Communications & Networking Conference (CCNC). IEEE, 107–112. A. Shahin Shamsabadi, M. Yaghini, N. Dullerud, S. Wyllie, U. Aı¨vodji, A. Alaagib, S. Gambs, and N. Papernot

  57. [57]

    CERTIFAI: A common framework to provide explanations and analyse the fairness and robustness of black-box models

    “CERTIFAI: A common framework to provide explanations and analyse the fairness and robustness of black-box models. ” In:Proceedings of the 2020 AAAI/ACM Conference on AI, Ethics, and Society (AIES), 166–172. S. Sharma and V. Kumar

  58. [58]

    An approach for prediction of loan approval using machine learning algorithm

    “An approach for prediction of loan approval using machine learning algorithm. ” In:2020 international conference on electronics and sustainable communication systems (ICESC). IEEE, 490–494. J. Shu, X. Shen, H. Liu, B. Yi, and Z. Zhang

  59. [59]

    Fooling LIME and SHAP: Adversarial attacks on post hoc explanation methods

    “Fooling LIME and SHAP: Adversarial attacks on post hoc explanation methods. ” In:Proceedings of the 2020 AAAI/ACM Conference on AI, Ethics, and Society (AIES), 180–186. T. Speith

  60. [60]

    A review of taxonomies of explainable artificial intelligence (XAI) methods

    “A review of taxonomies of explainable artificial intelligence (XAI) methods. ” In:Proceedings of the 2022 ACM conference on fairness, accountability, and transparency, 2239–2250. P. Spirtes, C. Glymour, and R. Scheines. 2000.Causation, Prediction, and Search. (2nd ed.). MIT Press. I. Stepin, J. M. Alonso, A. Catala, and M. Pereira-Fariña

  61. [61]

    Fairness-aware class imbalanced learning

    “Fairness-aware class imbalanced learning. ” In:Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2045–2051. M. Sundararajan, A. Taly, and Q. Yan

  62. [62]

    J. W. Thibaut and L. Walker. 1975.Procedural Justice: A Psychological Analysis. Lawrence Erlbaum Associates. Y. Tian, L. Si, X. Zhang, R. Cheng, C. He, K. C. Tan, and Y. Jin

  63. [63]

    Actionable recourse in linear classification

    “Actionable recourse in linear classification. ” In:Proceedings of the 2019 ACM Conference on Fairness, Accountability, and Transparency (FAccT), 10–19. S. Venkatasubramanian and M. Alfano

  64. [64]

    The philosophical basis of algorithmic recourse

    “The philosophical basis of algorithmic recourse. ”Proceedings of the 2020 ACM Conference on Fairness, Accountability, and Transparency (FAccT), 284–293. S. Verma, V. Boonsanong, M. Hoang, K. Hines, J. Dickerson, and C. Shah

  65. [65]

    Achieving equalized explainability through data reconstruction

    “Achieving equalized explainability through data reconstruction. ” In:2024 International Joint Conference on Neural Networks (IJCNN). IEEE, 1–8. Z. Wang, Q. Zeng, W. Lin, M. Jiang, and K. C. Tan

  66. [66]

    Procedural Fairness and Its Relationship with Distributive Fairness in Machine Learning

    “Procedural Fairness and Its Relationship with Distributive Fairness in Machine Learning. ” arXiv preprint arXiv:2501.06753. Z. Wang, C. Huang, K. Tang, and X. Yao

  67. [67]

    Does narrative information bias individual’s decision making? A systematic review

    “Does narrative information bias individual’s decision making? A systematic review. ”Social science & medicine, 67, 12, 2079–2088. Y. Wu, L. Zhang, and X. Wu

  68. [68]

    How Powerful are Graph Neural Networks?

    “How powerful are graph neural networks?”arXiv preprint arXiv:1810.00826. C.-K. Yeh, C.-Y. Hsieh, A. Suggala, D. I. Inouye, and P. K. Ravikumar

  69. [69]

    Mitigating unwanted biases with adversarial learning

    “Mitigating unwanted biases with adversarial learning. ” In:Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society (AIES), 335–340. H. Zhang, B. Wu, X. Yuan, S. Pan, H. Tong, and J. Pei

  70. [70]

    "Why Should You Trust My Explanation?" Understanding Uncertainty in LIME Explanations

    “" Why should you trust my explanation?" Understanding uncertainty in LIME explanations. ”arXiv preprint arXiv:1904.12991. T. Zhao, A. Wang, and T. Derr

  71. [71]

    On the relation between accuracy and fairness in binary classification

    “On the relation between accuracy and fairness in binary classification. ” In:arXiv preprint arXiv:1505.05723. Received 20 February 2007; accepted 5 June 2009 Journal of Artificial Intelligence Research, Vol. 4, Article

  72. [72]

    Publication date: August 2026