Counterfactual Reasoning for Fair Clinical Risk Prediction
Pith reviewed 2026-05-24 21:25 UTC · model grok-4.3
The pith
Clinical risk models enforce individual fairness by requiring identical predictions for a patient and their counterfactual version when outcomes match.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We develop an augmented counterfactual fairness criteria to extend the group fairness criteria of equalized odds to an individual level by requiring that the same prediction be made for a patient, and a counterfactual patient resulting from changing a sensitive attribute, if the factual and counterfactual outcomes do not differ. We use a variational autoencoder to perform counterfactual inference in the context of an assumed causal graph.
What carries the argument
The augmented counterfactual fairness criterion, which demands equal model output for factual and counterfactual patients whenever their outcomes are identical, implemented through variational-autoencoder counterfactual generation under a fixed causal graph.
If this is right
- Models trained this way can satisfy an individual-level fairness constraint while still producing usable risk scores for prolonged inpatient stay and mortality.
- The method supplies a tunable knob between the strength of the fairness constraint and retained predictive accuracy inside the learned generative model.
- The same construction can be applied to any clinical prediction task whose data admit an assumed causal graph and a variational autoencoder fit.
Where Pith is reading between the lines
- If the assumed causal graph is misspecified, the fairness guarantee collapses even if the numerical criterion is met on generated data.
- The approach could be tested by checking whether real interventions that alter a sensitive attribute (while holding other factors fixed) produce the same model output as predicted by the counterfactuals.
- The criterion may be combined with other causal fairness notions that operate at the group level rather than replacing them.
Load-bearing premise
The variational autoencoder and chosen causal graph correctly recover the data-generating process so that generated counterfactuals are valid.
What would settle it
Train the model under the criterion and then generate a set of factual-counterfactual pairs whose true outcomes are known to be identical; if the model outputs differ on any pair, the criterion is not satisfied.
read the original abstract
The use of machine learning systems to support decision making in healthcare raises questions as to what extent these systems may introduce or exacerbate disparities in care for historically underrepresented and mistreated groups, due to biases implicitly embedded in observational data in electronic health records. To address this problem in the context of clinical risk prediction models, we develop an augmented counterfactual fairness criteria to extend the group fairness criteria of equalized odds to an individual level. We do so by requiring that the same prediction be made for a patient, and a counterfactual patient resulting from changing a sensitive attribute, if the factual and counterfactual outcomes do not differ. We investigate the extent to which the augmented counterfactual fairness criteria may be applied to develop fair models for prolonged inpatient length of stay and mortality with observational electronic health records data. As the fairness criteria is ill-defined without knowledge of the data generating process, we use a variational autoencoder to perform counterfactual inference in the context of an assumed causal graph. While our technique provides a means to trade off maintenance of fairness with reduction in predictive performance in the context of a learned generative model, further work is needed to assess the generality of this approach.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes an augmented counterfactual fairness criterion extending equalized odds to the individual level by requiring identical predictions for a factual patient and their counterfactual (obtained by intervening on a sensitive attribute) whenever the factual and counterfactual outcomes match. The criterion is applied to clinical risk prediction tasks (prolonged inpatient length of stay and mortality) on observational EHR data; because the criterion is ill-defined without the data-generating process, a variational autoencoder is used to perform counterfactual inference under an assumed causal graph, with the method positioned as enabling a controllable trade-off between fairness maintenance and predictive performance.
Significance. If the VAE recovers accurate counterfactual distributions under a correctly specified graph, the approach would supply a concrete mechanism for individual-level fairness in healthcare ML that is grounded in causal reasoning rather than purely observational group metrics, addressing a recognized limitation of standard fairness criteria in clinical settings.
major comments (2)
- [Abstract] Abstract: the augmented individual-level fairness criterion requires that predictions remain identical for factual and counterfactual patients when outcomes match; this guarantee is only well-defined given accurate counterfactual inference, yet the manuscript supplies no validation, sensitivity analysis, or recovery metrics demonstrating that the VAE under the assumed causal graph recovers the true counterfactual distributions.
- [Abstract] Abstract: the claimed ability to trade off fairness against predictive performance is described only at a high level; without reported quantitative results, error analysis, or ablation on the VAE or graph assumptions, it is impossible to evaluate whether the fairness guarantee is achieved at acceptable performance cost or whether graph misspecification breaks the criterion.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the augmented individual-level fairness criterion requires that predictions remain identical for factual and counterfactual patients when outcomes match; this guarantee is only well-defined given accurate counterfactual inference, yet the manuscript supplies no validation, sensitivity analysis, or recovery metrics demonstrating that the VAE under the assumed causal graph recovers the true counterfactual distributions.
Authors: We agree that the fairness criterion depends on accurate counterfactual inference from the VAE under the assumed graph. In observational EHR data, ground-truth counterfactuals are unavailable, so direct recovery metrics cannot be computed. We will add sensitivity analyses with respect to the causal graph and VAE parameters in the revised manuscript. revision: yes
-
Referee: [Abstract] Abstract: the claimed ability to trade off fairness against predictive performance is described only at a high level; without reported quantitative results, error analysis, or ablation on the VAE or graph assumptions, it is impossible to evaluate whether the fairness guarantee is achieved at acceptable performance cost or whether graph misspecification breaks the criterion.
Authors: We acknowledge that the trade-off is described at a high level. The manuscript demonstrates the approach on the two tasks but lacks detailed ablations and error analysis. We will expand the experiments with quantitative results, error analysis, and ablations on the VAE and graph assumptions in the revision. revision: yes
Circularity Check
No circularity: fairness criterion and VAE inference remain independent of fitted outputs
full rationale
The paper explicitly defines the augmented counterfactual fairness criterion as an extension of equalized odds requiring identical predictions for factual and counterfactual patients when outcomes match. It states that the criterion is ill-defined without the data generating process and therefore adopts an assumed causal graph plus VAE for inference, without any equation showing that the resulting predictions or fairness metric reduce by construction to quantities defined solely by the fitted VAE parameters. No self-citation is load-bearing for the central claim, no uniqueness theorem is imported from the authors' prior work, and no ansatz is smuggled via citation. The derivation is therefore self-contained against external benchmarks of the assumed graph and generative model.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Assumed causal graph for the data generating process
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.