arxiv: 2605.05882 · v1 · submitted 2026-05-07 · 📊 stat.ML · cs.AI· cs.CY· cs.LG

Recognition: unknown

Tuning Derivatives for Causal Fairness in Machine Learning

Filip Edstr\"om, Guilherme W. F. Barros, Tetiana Gorbach, Xavier de Luna

Authors on Pith no claims yet

Pith reviewed 2026-05-08 05:27 UTC · model grok-4.3

classification 📊 stat.ML cs.AIcs.CYcs.LG

keywords causal fairnessstructural causal modelspath-specific partial derivativescontinuous protected attributesstatistical paritypredictive parityfair tuning algorithm

0 comments

The pith

Path-specific partial derivatives formalize causal fairness for continuous protected attributes in machine learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper presents a framework that extends causal fairness concepts to handle protected attributes that take continuous values, such as age or income levels. It defines statistical parity and predictive parity using partial derivatives that capture influences along specific causal paths in a structural model. Conditions are given for when these definitions match earlier causal fairness notions, and the authors characterize situations where a predictor can satisfy both criteria simultaneously. A tuning algorithm is introduced to either build such a fair predictor or manage trade-offs between the two parity requirements. Experiments on simulated and real data demonstrate the approach's performance, particularly when predictive parity is prioritized.

Core claim

We introduce a framework for fairness in structural causal models tailored to continuous protected attributes by formalizing statistical parity and predictive parity through path-specific partial derivatives. We establish conditions under which these criteria align with prior causal definitions and characterize the existence of a fair predictor that satisfies statistical parity along disallowed paths while achieving predictive parity along allowed paths. We propose a fair tuning algorithm that constructs such a predictor or permits a trade-off between the criteria.

What carries the argument

path-specific partial derivatives, which quantify the direct effect of a continuous protected attribute on the prediction along particular causal paths, allowing separation of allowed and disallowed influences.

Load-bearing premise

The structural causal model must be known or accurately specified to allow computation of the path-specific partial derivatives for the continuous protected attribute.

What would settle it

A counterexample where, in a fully specified causal model with a continuous protected attribute, the path-specific partial derivatives fail to identify the correct allowed and disallowed paths, or the tuning algorithm produces a predictor that violates the intended fairness criteria.

read the original abstract

Artificial-intelligence systems are becoming ubiquitous in society, yet their predictions typically inherit biases with respect to protected attributes such as race, gender, or age. Classical fairness notions, most notably Statistical Parity (SP), demand that predictions be independent of the protected attributes, but are overly restrictive when these attributes influence mediating variables that are considered business necessities. Recent causal formulations relax SP by distinguishing allowed from not-allowed causal paths and by complementing SP with Predictive Parity (PP), requiring the predictor to replicate the legitimate influence of business-necessities. Existing path-based definitions are mainly practical when applied to categorical attributes. This paper introduces a new framework for fairness in structural causal models that is tailored to continuous protected attributes. We formalize SP and PP through path-specific partial derivatives, establish conditions under which these criteria coincide with prior causal definitions, and characterize when a fair predictor, one that satisfies SP along not-allowed paths while achieving PP along allowed paths, exists. Building on this theory, we propose a fair tuning algorithm that either constructs such a predictor or, when not possible, allows for a trade-off between SP and PP. We present experiments on simulated and real data to evaluate our proposal, compare it with previously proposed methods, and show that it performs better when PP is considered.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper formalizes causal fairness for continuous protected attributes via path-specific partial derivatives and adds a tuning algorithm, but the approach only works cleanly when the full SCM is known exactly.

read the letter

The main takeaway is that this work extends path-specific fairness ideas to continuous variables by treating statistical parity and predictive parity as constraints on certain partial derivatives in the SCM. They also derive conditions for when these line up with older definitions and characterize when a fair predictor exists, then give an algorithm that either enforces the criteria or lets you trade them off. That is the actual new piece, and it is a reasonable step past the categorical-only setups in earlier papers. The experiments on simulated data plus one real dataset are a plus because they include comparisons and show the method can improve on baselines when predictive parity is included. The theory itself reads as internally consistent under the stated assumptions. The soft spot is exactly what the stress-test note flags: everything rests on having the exact differentiable SCM available so the relevant partials can be computed and set to target values. For simulated data that holds by construction, but the real-data section does not appear to test sensitivity to graph or functional misspecification, which is the usual situation in practice. Without that check, the practical claims are weaker than the abstract suggests. This is aimed at people already working on causal fairness who need to handle continuous attributes. It is coherent enough on its own terms to deserve a serious referee, though any review will need to press on how the method behaves when the SCM has to be estimated from data.

Referee Report

2 major / 2 minor

Summary. The paper introduces a framework for causal fairness tailored to continuous protected attributes in structural causal models. It formalizes Statistical Parity (SP) and Predictive Parity (PP) via path-specific partial derivatives, derives conditions under which these coincide with prior causal definitions, characterizes existence of predictors satisfying SP on disallowed paths and PP on allowed paths, and proposes a fair tuning algorithm that constructs such predictors or trades off the criteria when impossible. Experiments on simulated and real data compare the method to prior approaches and claim superior performance when PP is considered.

Significance. If the theoretical derivations hold and the algorithm performs as claimed under known SCMs, the work meaningfully extends causal fairness beyond categorical attributes to continuous ones, a common practical case. The path-specific derivative formalization and existence characterization provide a clean mathematical handle on the SP/PP trade-off, and the tuning procedure offers a concrete implementation route. Credit is due for including both simulated (where SCM is known by construction) and real-data experiments with comparisons.

major comments (2)

[§5] §5 (real-data experiments): The guarantees for the tuning algorithm and the path-specific derivatives rest on an exactly known differentiable SCM (graph plus functional forms). The manuscript does not specify how the causal graph or the functional mechanisms are obtained or validated on the real datasets, nor does it report sensitivity to graph misspecification or estimation error. This is load-bearing because the central claim that the method “performs better when PP is considered” on real data cannot be assessed without evidence that the computed partial derivatives remain reliable under realistic estimation.
[§3/§4] Theorem/Proposition on existence of fair predictors (likely §3 or §4): The characterization assumes the SCM is fully specified and differentiable along the relevant paths. It is unclear whether the stated conditions remain sufficient (or necessary) when the SCM must be learned from data, which is the regime of the real-data experiments. A concrete counter-example or robustness statement under small perturbations to the functional forms would strengthen the claim.

minor comments (2)

[§3] Notation for path-specific partial derivatives is introduced without an explicit running example that shows how the derivative is computed from the SCM equations; adding one early in §3 would improve readability.
[Introduction] The abstract and introduction refer to “business necessities” without a precise mapping to allowed paths in the SCM; a short clarifying sentence would avoid ambiguity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which correctly identify the distinction between the theoretical setting (known SCM) and the practical setting (estimated SCM) used in our real-data experiments. We address each major comment below and outline the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: [§5] The guarantees for the tuning algorithm and the path-specific derivatives rest on an exactly known differentiable SCM. The manuscript does not specify how the causal graph or the functional mechanisms are obtained or validated on the real datasets, nor does it report sensitivity to graph misspecification or estimation error. This is load-bearing because the central claim that the method “performs better when PP is considered” on real data cannot be assessed without evidence that the computed partial derivatives remain reliable under realistic estimation.

Authors: We agree that the theoretical guarantees assume a known SCM, while real-data experiments use an estimated SCM. The current manuscript describes the estimation procedure (domain knowledge for the graph combined with regression fits for the mechanisms) in Section 5 and the appendix, but does not include sensitivity checks. In revision we will: (i) expand the description of how the graph and mechanisms were obtained and validated for each real dataset, (ii) add a sensitivity analysis that perturbs the estimated functional forms (e.g., by adding controlled noise to regression coefficients) and reports the resulting variation in fairness metrics and predictor performance, and (iii) qualify the comparative claims as empirical results obtained under the estimated model. revision: yes
Referee: [§3/§4] The characterization assumes the SCM is fully specified and differentiable along the relevant paths. It is unclear whether the stated conditions remain sufficient (or necessary) when the SCM must be learned from data, which is the regime of the real-data experiments. A concrete counter-example or robustness statement under small perturbations to the functional forms would strengthen the claim.

Authors: The existence characterization (Theorem/Proposition in §3–4) is derived under the assumption of a fully specified, differentiable SCM, as stated in the theoretical sections. When the SCM is estimated from data the conditions become approximate. In the revision we will add an explicit remark clarifying this scope and include a short robustness subsection that examines the effect of small perturbations to the functional forms (e.g., additive Gaussian noise on structural coefficients). This will contain a concrete illustration of how the existence conditions and the tuned predictor change under such perturbations, directly addressing the request for a robustness statement. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained against external causal definitions.

full rationale

The paper defines path-specific partial derivatives as a new formalization for SP/PP on continuous attributes, then derives coincidence conditions and existence characterizations directly from the SCM structure and differentiability assumptions. These steps do not reduce to fitted parameters renamed as predictions, nor to self-citations whose content is unverified or load-bearing for the central claims. The tuning algorithm is presented as a constructive procedure built on the derived conditions rather than an ansatz smuggled in or a renaming of known empirical patterns. Experiments on simulated and real data serve validation only and do not enter the theoretical derivation chain. This matches the default expectation of a non-circular paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper relies on standard assumptions from causal inference literature for structural causal models; no new entities introduced based on abstract.

axioms (1)

domain assumption A structural causal model exists with identifiable path-specific effects for continuous protected attributes.
Required for formalizing SP and PP via partial derivatives.

pith-pipeline@v0.9.0 · 5536 in / 1173 out tokens · 63879 ms · 2026-05-08T05:27:53.235663+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

22 extracted references · 9 canonical work pages · 1 internal anchor

[1]

Chintala, S

Ansel, J., Yang, E., He, H., Gimelshein, N., Jain, A., Voznesensky, M., . . . Chintala, S. (2024, April). PyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode Transformation and Graph Compilation.Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2(Vol. 2, pp....

2024
[2]

(2005, August)

Avin, C., Shpitser, I., Pearl, J. (2005, August). Identifiability of path-specific effects. Proceedings of the International Joint Conference on Artificial Intelligence(pp. 357–363). Edinburgh, Scotland

2005
[3]

(2023).Fairness and Machine Learning: Limitations and Opportunities

Barocas, S., Hardt, M., Narayanan, A. (2023).Fairness and Machine Learning: Limitations and Opportunities. Cambridge, MA: MIT Press

2023
[4]

Path-Specific Counterfactual Fairness

Chiappa, S. (2019, July). Path-specific counterfactual fairness.Proceedings of the AAAI Conference on Artificial Intelligence,33, 7801–7808, https://doi.org/ 10.1609/aaai.v33i01.33017801

work page doi:10.1609/aaai.v33i01.33017801 2019
[5]

(2017, June)

Chouldechova, A. (2017, June). Fair prediction with disparate impact: A study of bias in recidivism prediction instruments.Big Data(Vol. 5, pp. 153–163)

2017
[6]

(1971, June)

Darlington, R.B. (1971, June). Another look at ”cultural fairness”.Journal of Edu- cational Measurement,8(2), 71–82, https://doi.org/10.1111/j.1745-3984.1971 .tb00908.x

work page doi:10.1111/j.1745-3984.1971 1971
[7]

Deb, K. (2011). Multi-objective optimisation using evolutionary algorithms: An introduction. L. Wang, A.H.C. Ng, & K. Deb (Eds.),Multi-objective Evolution- ary Optimisation for Product Design and Manufacturing(pp. 3–34). London: Springer

2011
[8]

(2022, June)

Kancheti, S.S., Reddy, A.G., Balasubramanian, V.N., Sharma, A. (2022, June). Match- ing learned causal effects of neural networks with domain priors.Proceedings of the 39th International Conference on Machine Learning(pp. 10676–10696)

2022
[9]

(2017, December)

Kilbertus, N., Rojas Carulla, M., Parascandolo, G., Hardt, M., Janzing, D., Sch¨ olkopf, B. (2017, December). Avoiding discrimination through causal reasoning. Advances in Neural Information Processing Systems(Vol. 30). Long Beach,

2017
[10]

Adam: A Method for Stochastic Optimization

Kingma, D.P., & Ba, J. (2017, January).Adam: A method for stochastic optimization (Tech. Rep. No. arXiv:1412.6980). arXiv

work page internal anchor Pith review arXiv 2017
[11]

(2017, December)

Kusner, M.J., Loftus, J., Russell, C., Silva, R. (2017, December). Counterfactual fairness.Advances in Neural Information Processing Systems(Vol. 30). Long

2017
[12]

(2022, January)

Lindholm, M., Richman, R., Tsanakas, A., W¨ uthrich, M.V. (2022, January). Discrimination-free insurance pricing.ASTIN Bulletin: The Journal of the IAA, 52(1), 55–89, https://doi.org/10.1017/asb.2021.23 30

work page doi:10.1017/asb.2021.23 2022
[13]

A survey on bias and fairness in machine learning.ACM Computing Surveys, 54(6):1–35, 2022

Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., Galstyan, A. (2021, July). A Survey on Bias and Fairness in Machine Learning.ACM Computing Surveys, 54(6), 115:1–115:35, https://doi.org/10.1145/3457607

work page doi:10.1145/3457607 2021
[14]

(2018, April)

Nabi, R., & Shpitser, I. (2018, April). Fair inference on outcomes.Proceedings of the AAAI Conference on Artificial Intelligence(Vol. 32)

2018
[15]

(2009).Causality(2nd ed.)

Pearl, J. (2009).Causality(2nd ed.). Cambridge: Cambridge University Press. Pleˇ cko, D., & Bareinboim, E. (2024a, January). Causal fairness analysis: A causal toolkit for fair machine learning.Foundations and Trends®in Machine Learning,17(3), 304–589, https://doi.org/10.1561/2200000106 Pleˇ cko, D., & Bareinboim, E. (2024b, March). Reconciling predictive...

work page doi:10.1561/2200000106 2009
[16]

Rubin, D.B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies.Journal of Educational Psychology,66(5), 688–701, https://doi.org/10.1037/h0037350

work page doi:10.1037/h0037350 1974
[17]

Shpitser, I. (2013). Counterfactual graphical models for longitudinal mediation analysis with unobserved confounding.Cognitive Science,37(6), 1011–1035, https://doi.org/10.1111/cogs.12058

work page doi:10.1111/cogs.12058 2013
[18]

Stypinska, J. (2023). AI ageism: a critical roadmap for studying age discrimination and exclusion in digitalized societies.Ai & Society,38(2), 665–677, https:// doi.org/10.1007/s00146-022-01553-5 van Breugel, B., Kyono, T., Berrevoets, J., van der Schaar, M. (2021, Decem- ber). DECAF: Generating fair synthetic data using causally-aware generative networks...

work page doi:10.1007/s00146-022-01553-5 2023
[19]

(2019, December)

Wu, Y., Zhang, L., Wu, X., Tong, H. (2019, December). PC-fairness: A unified frame- work for measuring causality-based fairness.Advances in Neural Information Processing Systems(Vol. 32). Vancouver, Canada: Curran Associates, Inc. 31

2019
[20]

(2021, December)

Xia, K., Lee, K.-Z., Bengio, Y., Bareinboim, E. (2021, December). The causal-neural connection: Expressiveness, learnability, and inference.Advances in Neural Information Processing Systems(Vol. 34, pp. 10823–10836). Virtual: Curran

2021
[21]

(2018, April)

Zhang, J., & Bareinboim, E. (2018, April). Fairness in decision-making — The causal explanation formula.Proceedings of the AAAI Conference on Artificial Intelligence(Vol. 32)

2018
[22]

(2017, August)

Zhang, L., Wu, Y., Wu, X. (2017, August). A causal framework for discover- ing and removing direct and indirect discrimination.Proceedings of the 26th International Joint Conference on Artificial Intelligence(pp. 3929–3935). Mel- bourne, Australia: International Joint Conferences on Artificial Intelligence Organization. 32

2017