Recognition: unknown
Tuning Derivatives for Causal Fairness in Machine Learning
Pith reviewed 2026-05-08 05:27 UTC · model grok-4.3
The pith
Path-specific partial derivatives formalize causal fairness for continuous protected attributes in machine learning.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We introduce a framework for fairness in structural causal models tailored to continuous protected attributes by formalizing statistical parity and predictive parity through path-specific partial derivatives. We establish conditions under which these criteria align with prior causal definitions and characterize the existence of a fair predictor that satisfies statistical parity along disallowed paths while achieving predictive parity along allowed paths. We propose a fair tuning algorithm that constructs such a predictor or permits a trade-off between the criteria.
What carries the argument
path-specific partial derivatives, which quantify the direct effect of a continuous protected attribute on the prediction along particular causal paths, allowing separation of allowed and disallowed influences.
Load-bearing premise
The structural causal model must be known or accurately specified to allow computation of the path-specific partial derivatives for the continuous protected attribute.
What would settle it
A counterexample where, in a fully specified causal model with a continuous protected attribute, the path-specific partial derivatives fail to identify the correct allowed and disallowed paths, or the tuning algorithm produces a predictor that violates the intended fairness criteria.
read the original abstract
Artificial-intelligence systems are becoming ubiquitous in society, yet their predictions typically inherit biases with respect to protected attributes such as race, gender, or age. Classical fairness notions, most notably Statistical Parity (SP), demand that predictions be independent of the protected attributes, but are overly restrictive when these attributes influence mediating variables that are considered business necessities. Recent causal formulations relax SP by distinguishing allowed from not-allowed causal paths and by complementing SP with Predictive Parity (PP), requiring the predictor to replicate the legitimate influence of business-necessities. Existing path-based definitions are mainly practical when applied to categorical attributes. This paper introduces a new framework for fairness in structural causal models that is tailored to continuous protected attributes. We formalize SP and PP through path-specific partial derivatives, establish conditions under which these criteria coincide with prior causal definitions, and characterize when a fair predictor, one that satisfies SP along not-allowed paths while achieving PP along allowed paths, exists. Building on this theory, we propose a fair tuning algorithm that either constructs such a predictor or, when not possible, allows for a trade-off between SP and PP. We present experiments on simulated and real data to evaluate our proposal, compare it with previously proposed methods, and show that it performs better when PP is considered.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a framework for causal fairness tailored to continuous protected attributes in structural causal models. It formalizes Statistical Parity (SP) and Predictive Parity (PP) via path-specific partial derivatives, derives conditions under which these coincide with prior causal definitions, characterizes existence of predictors satisfying SP on disallowed paths and PP on allowed paths, and proposes a fair tuning algorithm that constructs such predictors or trades off the criteria when impossible. Experiments on simulated and real data compare the method to prior approaches and claim superior performance when PP is considered.
Significance. If the theoretical derivations hold and the algorithm performs as claimed under known SCMs, the work meaningfully extends causal fairness beyond categorical attributes to continuous ones, a common practical case. The path-specific derivative formalization and existence characterization provide a clean mathematical handle on the SP/PP trade-off, and the tuning procedure offers a concrete implementation route. Credit is due for including both simulated (where SCM is known by construction) and real-data experiments with comparisons.
major comments (2)
- [§5] §5 (real-data experiments): The guarantees for the tuning algorithm and the path-specific derivatives rest on an exactly known differentiable SCM (graph plus functional forms). The manuscript does not specify how the causal graph or the functional mechanisms are obtained or validated on the real datasets, nor does it report sensitivity to graph misspecification or estimation error. This is load-bearing because the central claim that the method “performs better when PP is considered” on real data cannot be assessed without evidence that the computed partial derivatives remain reliable under realistic estimation.
- [§3/§4] Theorem/Proposition on existence of fair predictors (likely §3 or §4): The characterization assumes the SCM is fully specified and differentiable along the relevant paths. It is unclear whether the stated conditions remain sufficient (or necessary) when the SCM must be learned from data, which is the regime of the real-data experiments. A concrete counter-example or robustness statement under small perturbations to the functional forms would strengthen the claim.
minor comments (2)
- [§3] Notation for path-specific partial derivatives is introduced without an explicit running example that shows how the derivative is computed from the SCM equations; adding one early in §3 would improve readability.
- [Introduction] The abstract and introduction refer to “business necessities” without a precise mapping to allowed paths in the SCM; a short clarifying sentence would avoid ambiguity.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments, which correctly identify the distinction between the theoretical setting (known SCM) and the practical setting (estimated SCM) used in our real-data experiments. We address each major comment below and outline the revisions we will make to strengthen the manuscript.
read point-by-point responses
-
Referee: [§5] The guarantees for the tuning algorithm and the path-specific derivatives rest on an exactly known differentiable SCM. The manuscript does not specify how the causal graph or the functional mechanisms are obtained or validated on the real datasets, nor does it report sensitivity to graph misspecification or estimation error. This is load-bearing because the central claim that the method “performs better when PP is considered” on real data cannot be assessed without evidence that the computed partial derivatives remain reliable under realistic estimation.
Authors: We agree that the theoretical guarantees assume a known SCM, while real-data experiments use an estimated SCM. The current manuscript describes the estimation procedure (domain knowledge for the graph combined with regression fits for the mechanisms) in Section 5 and the appendix, but does not include sensitivity checks. In revision we will: (i) expand the description of how the graph and mechanisms were obtained and validated for each real dataset, (ii) add a sensitivity analysis that perturbs the estimated functional forms (e.g., by adding controlled noise to regression coefficients) and reports the resulting variation in fairness metrics and predictor performance, and (iii) qualify the comparative claims as empirical results obtained under the estimated model. revision: yes
-
Referee: [§3/§4] The characterization assumes the SCM is fully specified and differentiable along the relevant paths. It is unclear whether the stated conditions remain sufficient (or necessary) when the SCM must be learned from data, which is the regime of the real-data experiments. A concrete counter-example or robustness statement under small perturbations to the functional forms would strengthen the claim.
Authors: The existence characterization (Theorem/Proposition in §3–4) is derived under the assumption of a fully specified, differentiable SCM, as stated in the theoretical sections. When the SCM is estimated from data the conditions become approximate. In the revision we will add an explicit remark clarifying this scope and include a short robustness subsection that examines the effect of small perturbations to the functional forms (e.g., additive Gaussian noise on structural coefficients). This will contain a concrete illustration of how the existence conditions and the tuned predictor change under such perturbations, directly addressing the request for a robustness statement. revision: yes
Circularity Check
No significant circularity; derivation is self-contained against external causal definitions.
full rationale
The paper defines path-specific partial derivatives as a new formalization for SP/PP on continuous attributes, then derives coincidence conditions and existence characterizations directly from the SCM structure and differentiability assumptions. These steps do not reduce to fitted parameters renamed as predictions, nor to self-citations whose content is unverified or load-bearing for the central claims. The tuning algorithm is presented as a constructive procedure built on the derived conditions rather than an ansatz smuggled in or a renaming of known empirical patterns. Experiments on simulated and real data serve validation only and do not enter the theoretical derivation chain. This matches the default expectation of a non-circular paper.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption A structural causal model exists with identifiable path-specific effects for continuous protected attributes.
Reference graph
Works this paper leans on
-
[1]
Chintala, S
Ansel, J., Yang, E., He, H., Gimelshein, N., Jain, A., Voznesensky, M., . . . Chintala, S. (2024, April). PyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode Transformation and Graph Compilation.Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2(Vol. 2, pp....
2024
-
[2]
(2005, August)
Avin, C., Shpitser, I., Pearl, J. (2005, August). Identifiability of path-specific effects. Proceedings of the International Joint Conference on Artificial Intelligence(pp. 357–363). Edinburgh, Scotland
2005
-
[3]
(2023).Fairness and Machine Learning: Limitations and Opportunities
Barocas, S., Hardt, M., Narayanan, A. (2023).Fairness and Machine Learning: Limitations and Opportunities. Cambridge, MA: MIT Press
2023
-
[4]
Path-Specific Counterfactual Fairness
Chiappa, S. (2019, July). Path-specific counterfactual fairness.Proceedings of the AAAI Conference on Artificial Intelligence,33, 7801–7808, https://doi.org/ 10.1609/aaai.v33i01.33017801
-
[5]
(2017, June)
Chouldechova, A. (2017, June). Fair prediction with disparate impact: A study of bias in recidivism prediction instruments.Big Data(Vol. 5, pp. 153–163)
2017
-
[6]
Darlington, R.B. (1971, June). Another look at ”cultural fairness”.Journal of Edu- cational Measurement,8(2), 71–82, https://doi.org/10.1111/j.1745-3984.1971 .tb00908.x
-
[7]
Deb, K. (2011). Multi-objective optimisation using evolutionary algorithms: An introduction. L. Wang, A.H.C. Ng, & K. Deb (Eds.),Multi-objective Evolution- ary Optimisation for Product Design and Manufacturing(pp. 3–34). London: Springer
2011
-
[8]
(2022, June)
Kancheti, S.S., Reddy, A.G., Balasubramanian, V.N., Sharma, A. (2022, June). Match- ing learned causal effects of neural networks with domain priors.Proceedings of the 39th International Conference on Machine Learning(pp. 10676–10696)
2022
-
[9]
(2017, December)
Kilbertus, N., Rojas Carulla, M., Parascandolo, G., Hardt, M., Janzing, D., Sch¨ olkopf, B. (2017, December). Avoiding discrimination through causal reasoning. Advances in Neural Information Processing Systems(Vol. 30). Long Beach,
2017
-
[10]
Adam: A Method for Stochastic Optimization
Kingma, D.P., & Ba, J. (2017, January).Adam: A method for stochastic optimization (Tech. Rep. No. arXiv:1412.6980). arXiv
work page internal anchor Pith review arXiv 2017
-
[11]
(2017, December)
Kusner, M.J., Loftus, J., Russell, C., Silva, R. (2017, December). Counterfactual fairness.Advances in Neural Information Processing Systems(Vol. 30). Long
2017
-
[12]
Lindholm, M., Richman, R., Tsanakas, A., W¨ uthrich, M.V. (2022, January). Discrimination-free insurance pricing.ASTIN Bulletin: The Journal of the IAA, 52(1), 55–89, https://doi.org/10.1017/asb.2021.23 30
-
[13]
A survey on bias and fairness in machine learning.ACM Computing Surveys, 54(6):1–35, 2022
Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., Galstyan, A. (2021, July). A Survey on Bias and Fairness in Machine Learning.ACM Computing Surveys, 54(6), 115:1–115:35, https://doi.org/10.1145/3457607
-
[14]
(2018, April)
Nabi, R., & Shpitser, I. (2018, April). Fair inference on outcomes.Proceedings of the AAAI Conference on Artificial Intelligence(Vol. 32)
2018
-
[15]
Pearl, J. (2009).Causality(2nd ed.). Cambridge: Cambridge University Press. Pleˇ cko, D., & Bareinboim, E. (2024a, January). Causal fairness analysis: A causal toolkit for fair machine learning.Foundations and Trends®in Machine Learning,17(3), 304–589, https://doi.org/10.1561/2200000106 Pleˇ cko, D., & Bareinboim, E. (2024b, March). Reconciling predictive...
-
[16]
Rubin, D.B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies.Journal of Educational Psychology,66(5), 688–701, https://doi.org/10.1037/h0037350
-
[17]
Shpitser, I. (2013). Counterfactual graphical models for longitudinal mediation analysis with unobserved confounding.Cognitive Science,37(6), 1011–1035, https://doi.org/10.1111/cogs.12058
-
[18]
Stypinska, J. (2023). AI ageism: a critical roadmap for studying age discrimination and exclusion in digitalized societies.Ai & Society,38(2), 665–677, https:// doi.org/10.1007/s00146-022-01553-5 van Breugel, B., Kyono, T., Berrevoets, J., van der Schaar, M. (2021, Decem- ber). DECAF: Generating fair synthetic data using causally-aware generative networks...
-
[19]
(2019, December)
Wu, Y., Zhang, L., Wu, X., Tong, H. (2019, December). PC-fairness: A unified frame- work for measuring causality-based fairness.Advances in Neural Information Processing Systems(Vol. 32). Vancouver, Canada: Curran Associates, Inc. 31
2019
-
[20]
(2021, December)
Xia, K., Lee, K.-Z., Bengio, Y., Bareinboim, E. (2021, December). The causal-neural connection: Expressiveness, learnability, and inference.Advances in Neural Information Processing Systems(Vol. 34, pp. 10823–10836). Virtual: Curran
2021
-
[21]
(2018, April)
Zhang, J., & Bareinboim, E. (2018, April). Fairness in decision-making — The causal explanation formula.Proceedings of the AAAI Conference on Artificial Intelligence(Vol. 32)
2018
-
[22]
(2017, August)
Zhang, L., Wu, Y., Wu, X. (2017, August). A causal framework for discover- ing and removing direct and indirect discrimination.Proceedings of the 26th International Joint Conference on Artificial Intelligence(pp. 3929–3935). Mel- bourne, Australia: International Joint Conferences on Artificial Intelligence Organization. 32
2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.