pith. sign in

arxiv: 2604.07009 · v1 · submitted 2026-04-08 · 💻 cs.AI · cs.LG

CAFP: A Post-Processing Framework for Group Fairness via Counterfactual Model Averaging

Pith reviewed 2026-05-10 17:23 UTC · model grok-4.3

classification 💻 cs.AI cs.LG
keywords group fairnesspost-processingcounterfactual averagingdemographic parityequalized oddsmachine learningfairness intervention
0
0 comments X

The pith

Averaging a model's predictions on factual and counterfactual inputs eliminates direct dependence on the protected attribute.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper proposes CAFP, a post-processing framework that improves group fairness in any existing machine learning model by averaging its predictions on an input and on a version of the input where the protected attribute has been flipped. The technique requires no changes to the model training or architecture and can be applied at inference time. A sympathetic reader would care if they need to deploy a pre-trained classifier in a fairness-sensitive domain without the ability to retrain it or access protected attributes during training. The authors provide theoretical guarantees that this averaging removes direct dependence on the sensitive attribute and achieves certain fairness metrics under mild assumptions.

Core claim

CAFP operates by generating counterfactual versions of each input in which the sensitive attribute is flipped, and then averaging the model's predictions across factual and counterfactual instances. This eliminates direct dependence on the protected attribute, reduces mutual information between predictions and sensitive attributes, and provably bounds the distortion introduced relative to the original model. Under mild assumptions, CAFP achieves perfect demographic parity and reduces the equalized odds gap by at least half the average counterfactual bias.

What carries the argument

Counterfactual model averaging: averaging the original model's output on the factual input with its output on the input after flipping the value of the protected attribute.

Load-bearing premise

Realistic counterfactual inputs can be created by simply flipping the protected attribute value and the original model can be queried on these at inference time.

What would settle it

Test the averaged model on a dataset where flipping the protected attribute produces inputs that are out of distribution or implausible, and check whether the demographic parity or equalized odds guarantees still hold.

read the original abstract

Ensuring fairness in machine learning predictions is a critical challenge, especially when models are deployed in sensitive domains such as credit scoring, healthcare, and criminal justice. While many fairness interventions rely on data preprocessing or algorithmic constraints during training, these approaches often require full control over the model architecture and access to protected attribute information, which may not be feasible in real-world systems. In this paper, we propose Counterfactual Averaging for Fair Predictions (CAFP), a model-agnostic post-processing method that mitigates unfair influence from protected attributes without retraining or modifying the original classifier. CAFP operates by generating counterfactual versions of each input in which the sensitive attribute is flipped, and then averaging the model's predictions across factual and counterfactual instances. We provide a theoretical analysis of CAFP, showing that it eliminates direct dependence on the protected attribute, reduces mutual information between predictions and sensitive attributes, and provably bounds the distortion introduced relative to the original model. Under mild assumptions, we further show that CAFP achieves perfect demographic parity and reduces the equalized odds gap by at least half the average counterfactual bias.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes CAFP, a model-agnostic post-processing framework for group fairness. For each test input, it generates a counterfactual by flipping the protected attribute A and averages the original model's predictions on the factual and counterfactual versions. The paper claims this eliminates direct dependence on A, reduces mutual information between predictions and A, bounds distortion relative to the original model, and—under mild assumptions—achieves perfect demographic parity while reducing the equalized odds gap by at least half the average counterfactual bias.

Significance. If the theoretical claims are rigorously established and the assumptions hold in practice, CAFP would supply a simple, training-free post-processing technique applicable to any black-box classifier. This could be useful in deployment settings where retraining is infeasible. The post-processing design and the explicit bounds on distortion and information leakage are potentially valuable, but their impact is limited by the requirement for protected-attribute access at inference.

major comments (3)
  1. [Abstract / Method] Abstract and Method section: The perfect demographic parity guarantee is obtained only by averaging f(x, A) and f(x, 1-A) at inference time. This construction mathematically forces identical output distributions across groups solely when A is observed and the flipped input is a valid query to the original model. No alternative procedure is supplied for the common case in which A is withheld at deployment; the fairness claims therefore do not hold under that realistic constraint.
  2. [Theoretical Analysis] Theoretical Analysis section: The stated bounds on mutual-information reduction and distortion, as well as the 'at least half' reduction in the equalized-odds gap, appear to follow directly from the averaging definition itself rather than from an independent derivation. Explicit equations, proof sketches, and the precise 'mild assumptions' must be provided to demonstrate that the results are not tautological with the method's construction.
  3. [Method] Method section: The assumption that simply flipping the value of A produces realistic counterfactuals is load-bearing for all fairness guarantees. When features are correlated with A, the counterfactual (x, 1-A) may lie far outside the data distribution, rendering the averaged prediction meaningless and invalidating the claimed bounds.
minor comments (2)
  1. [Abstract] Abstract: The phrase 'mild assumptions' is used without enumeration; these assumptions should be stated explicitly so readers can assess their realism.
  2. [Experiments] Throughout: No empirical results, tables, or figures are referenced in the provided abstract or summary; if experiments exist, they should be summarized to illustrate that the theoretical reductions materialize on real data.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We address each major comment point by point below, indicating planned revisions to the manuscript where appropriate.

read point-by-point responses
  1. Referee: [Abstract / Method] Abstract and Method section: The perfect demographic parity guarantee is obtained only by averaging f(x, A) and f(x, 1-A) at inference time. This construction mathematically forces identical output distributions across groups solely when A is observed and the flipped input is a valid query to the original model. No alternative procedure is supplied for the common case in which A is withheld at deployment; the fairness claims therefore do not hold under that realistic constraint.

    Authors: We agree that the exact demographic parity guarantee requires access to the protected attribute A at inference time to construct and query the counterfactual input. This is a core aspect of the post-processing design presented in the Method section. We will revise the Abstract to explicitly state the inference-time access requirement and add a dedicated paragraph in the Method section discussing deployment scenarios where A is unavailable. In those cases, CAFP cannot be applied directly, and we will note that alternative approaches (such as those relying on proxies or training-time interventions) would be needed instead. revision: yes

  2. Referee: [Theoretical Analysis] Theoretical Analysis section: The stated bounds on mutual-information reduction and distortion, as well as the 'at least half' reduction in the equalized-odds gap, appear to follow directly from the averaging definition itself rather than from an independent derivation. Explicit equations, proof sketches, and the precise 'mild assumptions' must be provided to demonstrate that the results are not tautological with the method's construction.

    Authors: The properties do derive from the averaging construction, but the Theoretical Analysis section presents them as formal theorems obtained via probabilistic arguments applied to the averaged predictor. We will expand this section substantially by inserting the explicit equations (e.g., the mutual-information bound I(Ŷ;A) ≤ ½ I(f(X,A);A), the distortion bound in terms of total variation, and the equalized-odds gap reduction), step-by-step proof sketches, and a precise list of the mild assumptions (including that the base model is defined on the augmented feature space and that averaging is performed exactly). These additions will clarify the derivations and show they are not merely restatements of the method. revision: yes

  3. Referee: [Method] Method section: The assumption that simply flipping the value of A produces realistic counterfactuals is load-bearing for all fairness guarantees. When features are correlated with A, the counterfactual (x, 1-A) may lie far outside the data distribution, rendering the averaged prediction meaningless and invalidating the claimed bounds.

    Authors: We acknowledge that the realism of the counterfactual inputs is a key assumption underlying the guarantees. The paper treats this as one of the 'mild assumptions' under which the bounds hold, without requiring the counterfactual to lie in the training distribution—only that the original model can evaluate it. When features are strongly correlated with A, the averaged output may indeed be less interpretable. We will revise the Method section to state the assumption more explicitly and add a new Limitations paragraph that discusses this issue, references related work on counterfactual generation, and notes that more advanced causal models could be used to improve counterfactual quality in practice. revision: partial

Circularity Check

0 steps flagged

No significant circularity in CAFP derivation chain

full rationale

The paper defines CAFP via counterfactual averaging of model outputs and separately provides a theoretical analysis deriving fairness properties (elimination of direct dependence, MI reduction, distortion bounds, perfect DP and EO gap reduction under mild assumptions). No equations or steps are exhibited where a claimed result reduces exactly to the input definition by construction, no self-citations load-bear the central claims, and no fitted parameters are relabeled as predictions. The analysis is presented as independent derivation from the averaging operator plus stated assumptions, making the chain self-contained rather than tautological.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard fairness assumptions about counterfactual generation and model access at inference plus unspecified 'mild assumptions' that enable the perfect demographic parity result.

axioms (2)
  • domain assumption Mild assumptions that enable perfect demographic parity after averaging
    Invoked in the abstract to support the claim of achieving perfect demographic parity.
  • domain assumption Counterfactual inputs can be generated by flipping the protected attribute value
    Implicit in the method description; required for the averaging step to be well-defined.

pith-pipeline@v0.9.0 · 5491 in / 1410 out tokens · 42420 ms · 2026-05-10T17:23:58.943530+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages

  1. [1]

    and it’s biased against blacks

    Angwin, J., Larson, J., Mattu, S., Kirch- ner, L.: Machine bias: There’s software used across the country to predict future criminals. and it’s biased against blacks. ProPublica (2016) 28

  2. [2]

    California Law Review104(3), 671– 732 (2016)

    Barocas, S., Selbst, A.D.: Big data’s disparate impact. California Law Review104(3), 671– 732 (2016)

  3. [3]

    Eubanks, V.: Automating Inequality: How High-Tech Tools Profile, Police, and Pun- ish the Poor. St. Martin’s Press, Inc., USA (2018)

  4. [4]

    How Search Engines Reinforce Racism

    Noble, S.U.: Algorithms of Oppression. How Search Engines Reinforce Racism. New York University Press, New York (2018)

  5. [5]

    In: Friedler, S.A., Wilson, C

    Buolamwini, J., Gebru, T.: Gender shades: Intersectional accuracy disparities in com- mercial gender classification. In: Friedler, S.A., Wilson, C. (eds.) Proceedings of the 1st Conference on Fairness, Accountability and Transparency. Proceedings of Machine Learn- ing Research, vol. 81, pp. 77–91. PMLR, Cambridge, MA (2018)

  6. [6]

    Kamiran, F., Calders, T.: Data preprocessing techniques for classification without discrim- ination, vol. 33, pp. 1–33. Springer, Berlin, Heidelberg (2012)

  7. [7]

    In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Dis- covery and Data Mining

    Feldman, M., Friedler, S.A., Moeller, J., Scheidegger, C., Venkatasubramanian, S.: Certifying and removing disparate impact. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Dis- covery and Data Mining. KDD ’15, pp. 259–

  8. [8]

    Association for Computing Machinery, New York, NY, USA (2015)

  9. [9]

    In: Proceedings of the 26th International Con- ference on World Wide Web

    Zafar, M.B., Valera, I., Gomez Rodriguez, M., Gummadi, K.P.: Fairness beyond disparate treatment & disparate impact: Learning clas- sification without disparate mistreatment. In: Proceedings of the 26th International Con- ference on World Wide Web. WWW ’17, pp. 1171–1180. International World Wide Web Conferences Steering Committee, Republic and Canton of ...

  10. [10]

    In: Dy, J., Krause, A

    Agarwal, A., Beygelzimer, A., Dudik, M., Langford, J., Wallach, H.: A reductions approach to fair classification. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 80, pp. 60–69. PMLR, Cam- bridge, MA (2018)

  11. [11]

    In: Pro- ceedings of the 30th International Conference on Neural Information Processing Systems

    Hardt, M., Price, E., Srebro, N.: Equality of opportunity in supervised learning. In: Pro- ceedings of the 30th International Conference on Neural Information Processing Systems. NIPS’16, pp. 3323–3331. Curran Associates Inc., Red Hook, NY, USA (2016)

  12. [12]

    In: Proceedings of the 31st International Conference on Neural Information Process- ing Systems

    Pleiss, G., Raghavan, M., Wu, F., Kleinberg, J., Weinberger, K.Q.: On fairness and calibra- tion. In: Proceedings of the 31st International Conference on Neural Information Process- ing Systems. NIPS’17, pp. 5684–5693. Curran Associates Inc., Red Hook, NY, USA (2017)

  13. [13]

    MIT Press, Cambridge, MA (2023)

    Barocas, S., Hardt, M., Narayanan, A.: Fair- ness and Machine Learning: Limitations and Opportunities. MIT Press, Cambridge, MA (2023)

  14. [14]

    Science366(6464), 447–453 (2019)

    Obermeyer, Z., Powers, B., Vogeli, C., Mul- lainathan, S.: Dissecting racial bias in an algorithm used to manage the health of pop- ulations. Science366(6464), 447–453 (2019)

  15. [15]

    In: Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society

    Raji, I.D., Buolamwini, J.: Actionable audit- ing: Investigating the impact of publicly nam- ing biased performance results of commercial ai products. In: Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society. AIES ’19, pp. 429–435. Association for Computing Machinery, New York, NY, USA (2019)

  16. [16]

    In: Proceedings of the 3rd Innovations in The- oretical Computer Science Conference

    Dwork, C., Hardt, M., Pitassi, T., Reingold, O., Zemel, R.: Fairness through awareness. In: Proceedings of the 3rd Innovations in The- oretical Computer Science Conference. ITCS ’12, pp. 214–226. Association for Computing Machinery, New York, NY, USA (2012)

  17. [17]

    In: Proceedings of the 31st International Conference on Neu- ral Information Processing Systems

    Kusner, M., Loftus, J., Russell, C., Silva, R.: Counterfactual fairness. In: Proceedings of the 31st International Conference on Neu- ral Information Processing Systems. NIPS’17, pp. 4069–4079. Curran Associates Inc., Red Hook, NY, USA (2017)

  18. [18]

    Chiappa, S.: Path-specific counterfactual 29 fairness. In: Proceedings of the Thirty- Third AAAI Conference on Artificial Intel- ligence and Thirty-First Innovative Appli- cations of Artificial Intelligence Conference and Ninth AAAI Symposium on Educa- tional Advances in Artificial Intelligence. AAAI’19/IAAI’19/EAAI’19. AAAI Press, Washington, DC (2019)

  19. [19]

    In: Papadimitriou, C.H

    Kleinberg, J., Mullainathan, S., Raghavan, M.: Inherent trade-offs in the fair determina- tion of risk scores. In: Papadimitriou, C.H. (ed.) 8th Innovations in Theoretical Com- puter Science Conference (ITCS 2017). Leib- niz International Proceedings in Informat- ics (LIPIcs), vol. 67, pp. 43–14323. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik

  20. [20]

    Big Data5(2), 153– 163 (2017)

    Chouldechova, A.: Fair prediction with dis- parate impact: A study of bias in recidivism prediction instruments. Big Data5(2), 153– 163 (2017)

  21. [21]

    In: 2012 IEEE 12th International Conference on Data Mining, pp

    Kamiran, F., Karim, A., Zhang, X.: Decision theory for discrimination-aware classification. In: 2012 IEEE 12th International Conference on Data Mining, pp. 924–929 (2012)

  22. [22]

    Fairbatch: Batch selection for model fairness,

    Roh, Y., Lee, K., Whang, S.E., Suh, C.: Fair- Batch: Batch Selection for Model Fairness (2021). https://arxiv.org/abs/2012.01696

  23. [23]

    In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency

    Mishler, A., Kennedy, E.H., Chouldechova, A.: Fairness in risk assessment instruments: Post-processing to achieve counterfactual equalized odds. In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. FAccT ’21, pp. 386–400. Association for Computing Machinery, New York, NY, USA (2021)

  24. [24]

    Nabi, R., Shpitser, I.: Fair inference on outcomes. In: Proceedings of the Thirty- Second AAAI Conference on Artificial Intel- ligence and Thirtieth Innovative Applica- tions of Artificial Intelligence Conference and Eighth AAAI Symposium on Educa- tional Advances in Artificial Intelligence. AAAI’18/IAAI’18/EAAI’18. AAAI Press, Washington, DC (2018)

  25. [25]

    In: Proceedings of the 31st Inter- national Conference on Neural Information Processing Systems

    Kilbertus, N., Rojas-Carulla, M., Parascan- dolo, G., Hardt, M., Janzing, D., Sch¨ olkopf, B.: Avoiding discrimination through causal reasoning. In: Proceedings of the 31st Inter- national Conference on Neural Information Processing Systems. NIPS’17, pp. 656–666. Curran Associates Inc., Red Hook, NY, USA (2017)

  26. [26]

    In: Proceedings of the Conference on Fairness, Accountability, and Transparency

    Madras, D., Creager, E., Pitassi, T., Zemel, R.: Fairness through causal awareness: Learn- ing causal latent-variable models for biased data. In: Proceedings of the Conference on Fairness, Accountability, and Transparency. FAT* ’19, pp. 349–358. Association for Com- puting Machinery, New York, NY, USA (2019)

  27. [27]

    In: Proceedings of the 31st International Conference on Neural Information Process- ing Systems

    Russell, C., Kusner, M.J., Loftus, J.R., Silva, R.: When worlds collide: integrating differ- ent counterfactual assumptions in fairness. In: Proceedings of the 31st International Conference on Neural Information Process- ing Systems. NIPS’17, pp. 6417–6426. Curran Associates Inc., Red Hook, NY, USA (2017)

  28. [28]

    In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp

    Wang, M., Deng, W., Hu, J., Tao, X., Huang, Y.: Racial Faces in the Wild: Reducing Racial Bias by Information Max- imization Adaptation Network . In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 692–702. IEEE Computer Society, Los Alamitos, CA, USA (2019)

  29. [29]

    In: Palmer, M., Hwa, R., Riedel, S

    Zhao, J., Wang, T., Yatskar, M., Ordonez, V., Chang, K.-W.: Men also like shopping: Reducing gender bias amplification using corpus-level constraints. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Nat- ural Language Processing, pp. 2979–2989. Association for Computational Linguistics, Copenhagen, Den...

  30. [30]

    In: Inui, K., Jiang, J., Ng, V., Wan, X

    Sheng, E., Chang, K.-W., Natarajan, P., Peng, N.: The woman worked as a babysitter: On biases in language generation. In: Inui, K., Jiang, J., Ng, V., Wan, X. (eds.) Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural 30 Language Processing (EMNLP-IJCNLP), pp. 3407–34...

  31. [31]

    Electronic Commerce Research, 1–31 (2024)

    Bahi, A., Gasmi, I., Bentrad, S., Khantouchi, R.: Mycgnn: enhancing recommendation diversity in e-commerce through mycelium- inspired graph neural network. Electronic Commerce Research, 1–31 (2024)

  32. [32]

    West Virginia Law Review123(3), 735–790 (2021)

    Wachter, S., Mittelstadt, B., Russell, C.: Bias preservation in machine learning: The legality of fairness metrics under eu non- discrimination law. West Virginia Law Review123(3), 735–790 (2021)

  33. [33]

    Wiley-Interscience, USA (2006)

    Cover, T.M., Thomas, J.A.: Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing). Wiley-Interscience, USA (2006)

  34. [34]

    In: Friedler, S.A., Wilson, C

    Menon, A.K., Williamson, R.C.: The cost of fairness in binary classification. In: Friedler, S.A., Wilson, C. (eds.) Proceedings of the 1st Conference on Fairness, Accountability and Transparency. Proceedings of Machine Learn- ing Research, vol. 81, pp. 107–118. PMLR, Cambridge, MA (2018)

  35. [35]

    In: Proceedings of the 32nd International Conference on Neu- ral Information Processing Systems

    Moyer, D., Gao, S., Brekelmans, R., Steeg, G.V., Galstyan, A.: Invariant representations without adversarial training. In: Proceedings of the 32nd International Conference on Neu- ral Information Processing Systems. NIPS’18, pp. 9102–9111. Curran Associates Inc., Red Hook, NY, USA (2018)

  36. [36]

    UCI Machine Learning Repository (1996)

    Becker, B., Kohavi, R.: Adult. UCI Machine Learning Repository (1996)

  37. [37]

    Angwin, J., Larson, J., Mattu, S., Kirchner, L.: How we analyzed the compas recidivism algorithm (2016)

  38. [38]

    Statlog (German Credit Data)

    Hofmann, H.: Statlog (German Credit Data). UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C5NC77 (1994) 31