pith. sign in

arxiv: 2312.08413 · v1 · submitted 2023-12-13 · 💻 cs.LG · cs.CR· cs.CY

Privacy Constrained Fairness Estimation for Decision Trees

Pith reviewed 2026-05-24 04:58 UTC · model grok-4.3

classification 💻 cs.LG cs.CRcs.CY
keywords differential privacyfairnessdecision treesstatistical parityprivacy preservinginterpretable modelsPAFER
0
0 comments X

The pith

A trusted third party adds Laplacian noise to sensitive attributes so statistical parity of decision trees can be estimated with low error while preserving privacy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines the tension between measuring fairness in AI, protecting sensitive personal data, and keeping models interpretable. It proposes the PAFER method to estimate statistical parity for decision trees under differential privacy constraints. A legal entity holds the sensitive attributes and adds noise before the calculation occurs. Experiments indicate the Laplacian mechanism keeps estimation error low and delivers stronger privacy protection. The same method shows better results on decision trees that humans find simpler to read.

Core claim

PAFER estimates statistical parity in decision trees by routing sensitive attribute data through a trusted third-party legal entity that applies differential privacy noise, specifically the Laplacian mechanism, before any fairness computation. This produces low-error parity estimates while providing high-certainty privacy for individuals. Both experiments and theory show the estimates are more accurate when the underlying decision trees are the ones humans generally find easier to interpret.

What carries the argument

Privacy-Aware Fairness Estimation of Rules (PAFER), which applies the Laplacian differential privacy mechanism to sensitive attributes held by a third party before computing statistical parity on decision tree rules.

Load-bearing premise

A trusted third-party legal entity exists that securely holds the sensitive attribute data and applies the differential privacy noise before fairness estimation occurs.

What would settle it

A benchmark dataset experiment in which the Laplacian mechanism produces statistical parity error rates substantially above the levels reported in the paper or allows re-identification of individuals despite the added noise.

read the original abstract

The protection of sensitive data becomes more vital, as data increases in value and potency. Furthermore, the pressure increases from regulators and society on model developers to make their Artificial Intelligence (AI) models non-discriminatory. To boot, there is a need for interpretable, transparent AI models for high-stakes tasks. In general, measuring the fairness of any AI model requires the sensitive attributes of the individuals in the dataset, thus raising privacy concerns. In this work, the trade-offs between fairness, privacy and interpretability are further explored. We specifically examine the Statistical Parity (SP) of Decision Trees (DTs) with Differential Privacy (DP), that are each popular methods in their respective subfield. We propose a novel method, dubbed Privacy-Aware Fairness Estimation of Rules (PAFER), that can estimate SP in a DP-aware manner for DTs. DP, making use of a third-party legal entity that securely holds this sensitive data, guarantees privacy by adding noise to the sensitive data. We experimentally compare several DP mechanisms. We show that using the Laplacian mechanism, the method is able to estimate SP with low error while guaranteeing the privacy of the individuals in the dataset with high certainty. We further show experimentally and theoretically that the method performs better for DTs that humans generally find easier to interpret.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes PAFER, a method for estimating statistical parity (SP) of decision trees under differential privacy. It relies on a trusted third-party legal entity to hold sensitive attribute data and apply the Laplacian mechanism to add noise before fairness estimation. Experiments compare DP mechanisms and claim that the Laplacian approach yields low SP estimation error while providing high privacy certainty; the method is also claimed to perform better (experimentally and theoretically) on DTs that humans find easier to interpret.

Significance. If the error bounds and privacy guarantees can be realized without an idealized trusted entity and if the interpretability claim is rigorously supported, the work would address a relevant intersection of privacy, fairness, and model interpretability. The explicit use of an external DP mechanism rather than self-referential fitting is a methodological strength, but the overall contribution is limited by the load-bearing trust assumption.

major comments (2)
  1. [Abstract] Abstract: the privacy guarantee that PAFER 'guarantees the privacy of the individuals in the dataset with high certainty' is predicated on the existence of a 'third-party legal entity that securely holds this sensitive data' and applies the Laplacian mechanism. No trust model, threat analysis, or alternative (e.g., local DP or secure multi-party computation) is provided, rendering the central privacy claim inapplicable in the absence of this entity.
  2. [Abstract] Abstract: the claim that the method 'performs better for DTs that humans generally find easier to interpret' is presented as both experimental and theoretical support, yet no definition or metric for 'easier to interpret' is given, nor is the theoretical argument sketched. This weakens the cross-claim linking interpretability to estimation quality.
minor comments (1)
  1. The abstract states that 'several DP mechanisms' are compared experimentally but only names the Laplacian result; a table or section listing all mechanisms and their error/privacy metrics would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed comments on our manuscript. We address each major comment below, proposing revisions to strengthen the presentation of our assumptions and claims while preserving the core contribution of PAFER.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the privacy guarantee that PAFER 'guarantees the privacy of the individuals in the dataset with high certainty' is predicated on the existence of a 'third-party legal entity that securely holds this sensitive data' and applies the Laplacian mechanism. No trust model, threat analysis, or alternative (e.g., local DP or secure multi-party computation) is provided, rendering the central privacy claim inapplicable in the absence of this entity.

    Authors: The PAFER approach is formulated under an explicit central DP model that assumes a trusted third-party legal entity holds the sensitive attributes and applies the Laplacian mechanism. This assumption is stated in the problem setup and abstract. We agree that the manuscript would benefit from greater clarity on the trust model. In revision, we will add a subsection detailing the trust assumptions, a brief threat analysis consistent with the central model, and a note on why local DP or MPC alternatives fall outside the current scope (as they would alter the mechanism and error analysis). This clarification does not alter the technical results but addresses applicability. revision: yes

  2. Referee: [Abstract] Abstract: the claim that the method 'performs better for DTs that humans generally find easier to interpret' is presented as both experimental and theoretical support, yet no definition or metric for 'easier to interpret' is given, nor is the theoretical argument sketched. This weakens the cross-claim linking interpretability to estimation quality.

    Authors: The manuscript links better performance to simpler decision trees via experiments on trees of varying depth and rule count, with a theoretical argument that lower complexity reduces the relative impact of Laplacian noise on the SP estimate. We agree that the abstract claim requires an explicit definition and sketched argument. In revision, we will define 'easier to interpret' in terms of tree depth and number of leaves (standard proxies in the interpretability literature), include the theoretical sketch in the main text, and tie the experimental results directly to this metric. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on standard DP mechanisms

full rationale

The paper introduces PAFER as a method to estimate statistical parity for decision trees under differential privacy by adding Laplacian noise via a trusted third-party entity. This setup draws on established DP properties and experimental validation rather than any self-definitional loop, fitted parameter renamed as prediction, or self-citation chain that reduces the central claim to its own inputs. The privacy guarantee and error bounds follow directly from the external Laplacian mechanism without circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; the method appears to rest on standard differential privacy primitives and the existence of a trusted third party.

pith-pipeline@v0.9.0 · 5767 in / 1132 out tokens · 20993 ms · 2026-05-24T04:58:20.006982+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages

  1. [1]

    Procedia Computer Science 162, 503–513 (2019) https://doi.org/10.1016/j.procs.2019.12.017

    Zhu, L., Qiu, D., Ergu, D., Ying, C., Liu, K.: A study on predicting loan d efault based on the random forest algorithm. Procedia Computer Science 162, 503–513 (2019) https://doi.org/10.1016/j.procs.2019.12.017

  2. [2]

    Science 187(4175), 398–404 (1975) https://doi.org/10.1126/science.187.4175.398

    Bickel, P.J., Hammel, E.A., O’Connell, J.W.: Sex bias in graduate admission s: Data from berkeley: Measuring bias is harder than is usually assumed , and the evidence is sometimes contrary to expectation. Science 187(4175), 398–404 (1975) https://doi.org/10.1126/science.187.4175.398

  3. [3]

    Big Data 5(2), 153–163 (2017) https://doi.org/10.1089/big.2016.0047

    Chouldechova, A.: Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments. Big Data 5(2), 153–163 (2017) https://doi.org/10.1089/big.2016.0047 . Accessed 2023-01-13

  4. [4]

    Section: Digits (2015)

    Barr, A.: Google Mistakenly Tags Black People as ‘Goril- las,’ Showing Limits of Algorithms. Section: Digits (2015). https://www.wsj.com/articles/BL-DGB-42522 Accessed 2023-02-04

  5. [5]

    https://www.amnesty.org/en/documents/eur35/4686/2021/en/ Accessed 2023-02-04 37

    Xenophobic machines: Discrimination through unregulated use of algorithms in the Dutch childcare benefits scandal (2021). https://www.amnesty.org/en/documents/eur35/4686/2021/en/ Accessed 2023-02-04 37

  6. [6]

    Official Journal of the European Union L 119 , 1–88 (14-04-2016)

    Regulation (eu) 2016/679 of the european parliament and of the council of 27 april 2016 on the protection of natural persons with regard to th e processing of personal data and on the free movement of such data, and repea ling directive 95/46/ec (general data protection regulation). Official Journal of the European Union L 119 , 1–88 (14-04-2016)

  7. [7]

    Official Journal of the Europ ean Union COM/2021/206 final , 1–107 (21-04-2021)

    Proposal for a regulation of the european parliament and of the council laying down harmonised rules on artificial intelligence (artificial intelligence a ct) and amending certain union legislative acts. Official Journal of the Europ ean Union COM/2021/206 final , 1–107 (21-04-2021)

  8. [8]

    https://algoritmes.overheid.nl/ Accessed 2023-02-04

    Het Algoritmeregister van de Nederlandse overheid (2022). https://algoritmes.overheid.nl/ Accessed 2023-02-04

  9. [9]

    Official Journal of the European Union L 60/34 , 34–85 (04-02-2014)

    Directive 2014/17/eu of the european parliament and of the cou ncil of 4 february 2014 on credit agreements for consumers relating to residential im movable prop- erty and amending directives 2008/48/ec and 2013/36/eu and reg ulation (eu). Official Journal of the European Union L 60/34 , 34–85 (04-02-2014)

  10. [10]

    The Guardian (2018)

    Cadwalladr, C., Graham-Harrison, E.: Revealed: 50 million Faceboo k profiles har- vested for Cambridge Analytica in major data breach. The Guardian (2018). Chap. News. Accessed 2023-02-04

  11. [11]

    https://www.upguard.com/breaches/facebook-user-data-lea k Accessed 2023-02-04

    Losing Face: Two More Cases of Third-Party Facebook App Dat a Exposure |UpGuard (2019). https://www.upguard.com/breaches/facebook-user-data-lea k Accessed 2023-02-04

  12. [12]

    Artificial Intelligence and Law 22(2), 175–209 (2014) 38 https://doi.org/10.1007/s10506-013-9152-0

    Berendt, B., Preibusch, S.: Better decision support through e xploratory discrimination-aware data mining: foundations and empirical evi- dence. Artificial Intelligence and Law 22(2), 175–209 (2014) 38 https://doi.org/10.1007/s10506-013-9152-0

  13. [13]

    https://www.propublica.org/article/machine-bias-risk-assessme nts-in-criminal-sentencing Accessed 2023-01-06

    Mattu, S., Larson, J., Kirchner, L., Angwin, J.: Machine Bias (201 6). https://www.propublica.org/article/machine-bias-risk-assessme nts-in-criminal-sentencing Accessed 2023-01-06

  14. [14]

    In: Proceed ings of the Inter- national Workshop on Software Fairness, pp

    Verma, S., Rubin, J.: Fairness definitions explained. In: Proceed ings of the Inter- national Workshop on Software Fairness, pp. 1–7. ACM, Gothenb urg Sweden (2018). https://doi.org/10.1145/3194770.3194776

  15. [15]

    In: Proceedings of the 3rd Innovations in Theoretica l Computer Science Conference On - ITCS ’12, pp

    Dwork, C., Hardt, M., Pitassi, T., Reingold, O., Zemel, R.: Fairness t hrough awareness. In: Proceedings of the 3rd Innovations in Theoretica l Computer Science Conference On - ITCS ’12, pp. 214–226. ACM Press, Cambr idge, Massachusetts (2012). https://doi.org/10.1145/2090236.2090255

  16. [16]

    FEDERAL REGIST ER 44(43) (01-03-1979)

    Questions and answers to clarify and provide a common interpre tation of the uni- form guidelines on employee selection procedures. FEDERAL REGIST ER 44(43) (01-03-1979)

  17. [17]

    In: Advances in Neural Information Processing Sy stems, vol

    Hardt, M., Price, E., Price, E., Srebro, N.: Equality of Opportunit y in Super- vised Learning. In: Advances in Neural Information Processing Sy stems, vol. 29, pp. 3315–3323. Curran Associates, Inc., Red Hook, New York (2016). https://proceedings.neurips.cc/paper/2016/hash/9d26823 67c3935defcb1f9e247a97c0d- Abstract.html Accessed 2023-01-10

  18. [18]

    Machine Le arning 106, 1039–1082 (2017) https://doi.org/10.1007/s10994-017-5633-9

    Bertsimas, D., Dunn, J.: Optimal classification trees. Machine Le arning 106, 1039–1082 (2017) https://doi.org/10.1007/s10994-017-5633-9

  19. [19]

    In: Machin e Learning: ECML-94: European Conference on Machine Learning Catania, Ita ly, April 6–8, 1994 Proceedings 7, pp

    Oliver, J.J., Hand, D.: Averaging over decision stumps. In: Machin e Learning: ECML-94: European Conference on Machine Learning Catania, Ita ly, April 6–8, 1994 Proceedings 7, pp. 231–241 (1994). https://doi.org/10.1007/3-540-57868-4 39 61 . Springer

  20. [20]

    Molnar, C.: Interpretable Machine Learning, 2nd edn. (2022). https://christophm.github.io/interpretable-ml-book

  21. [21]

    Borisov, T

    Borisov, V., Leemann, T., Seßler, K., Haug, J., Pawelczyk, M., Kas - neci, G.: Deep neural networks and tabular data: A survey. IEEE Transactions on Neural Networks and Learning Systems (2022) https://doi.org/10.1109/TNNLS.2022.3229161

  22. [22]

    Differential Privacy,

    Dwork, C.: Differential privacy. In: 33rd International Colloqu ium, ICALP 2006 on Automata, Languages and Programming, pp. 1–12. Spring er, Berlin, Heidelberg (2006). https://doi.org/10.1007/11787006 1

  23. [23]

    2014.The Algorithmic Foundations of Differential Privacy

    Dwork, C., Roth, A.: The Algorithmic Foundations of Differential P rivacy. Foun- dations and Trends in Theoretical Computer Science 9(3-4), 211–407 (2013) https://doi.org/10.1561/0400000042 . Accessed 2023-01-27

  24. [24]

    In: 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS ’07), pp

    McSherry, F., Talwar, K.: Mechanism design via differential privac y. In: 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS ’07), pp. 94–103 (2007). https://doi.org/10.1109/FOCS.2007.66 . IEEE

  25. [25]

    In: 2010 IEEE International Conference on Data Mining , pp

    Kamiran, F., Calders, T., Pechenizkiy, M.: Discrimination Aware Dec ision Tree Learning. In: 2010 IEEE International Conference on Data Mining , pp. 869–874. IEEE, Sydney, Australia (2010). https://doi.org/10.1109/ICDM.2010.50

  26. [26]

    Springer, Berlin, Heidelber g (1999)

    Ausiello, G., Crescenzi, P., Gambosi, G., Kann, V., Marchetti-Spac camela, A., Protasi, M.: Complexity and Approximation: Combinatorial Optimizatio n Prob- lems and Their Approximability Properties. Springer, Berlin, Heidelber g (1999). https://doi.org/10.1007/978-3-642-58412-1 40

  27. [27]

    In: Koyejo, S., Mohamed, S., Agarwal, A., B elgrave, D., Cho, K., Oh, A

    Linden, J., Weerdt, M., Demirovi´ c, E.: Fair and optimal decision trees: A dynamic programming approach. In: Koyejo, S., Mohamed, S., Agarwal, A., B elgrave, D., Cho, K., Oh, A. (eds.) Advances in Neural Information Processin g Systems, vol. 35, pp. 38899–38911. Curran Associates, Inc., Red Hook, Ne w York (2022). https://proceedings.neurips.cc/paper file...

  28. [28]

    Jo, N., Aghaei, S., Benson, J., G´ omez, A., Vayanos, P.: Learning Opti- mal Fair Classification Trees. arXiv. arXiv:2201.09932 [cs, math] (20 22). http://arxiv.org/abs/2201.09932 Accessed 2022-12-08

  29. [29]

    In: 2015 IEEE 28th International Symposium on Computer-Based Medical Systems, pp

    Mohammed, N., Barouti, S., Alhadidi, D., Chen, R.: Secure and Priva te Manage- ment of Healthcare Databases for Data Mining. In: 2015 IEEE 28th International Symposium on Computer-Based Medical Systems, pp. 191–196. IE EE, Sao Carlos, Brazil (2015). https://doi.org/10.1109/CBMS.2015.54

  30. [30]

    ACM Computing Surveys 52(4), 1–33 (2020) https://doi.org/10.1145/3337064

    Fletcher, S., Islam, M.Z.: Decision Tree Classification with Differen- tial Privacy: A Survey. ACM Computing Surveys 52(4), 1–33 (2020) https://doi.org/10.1145/3337064

  31. [31]

    In: Advances in Neural Information Processing System s, vol

    Lahoti, P., Beutel, A., Chen, J., Lee, K., Prost, F., Thain, N., Wang , X., Chi, E.: Fairness without Demographics through Adversarially Reweig hted Learning. In: Advances in Neural Information Processing System s, vol. 33, pp. 728–740. Curran Associates, Inc., Red Hook, New York (2 020). https://proceedings.neurips.cc/paper/2020/hash/07fc15c 9d169ee48573...

  32. [32]

    In: Proceedings of the Fifteenth ACM International Conference on Web Search And Data Mining, pp

    Zhao, T., Dai, E., Shu, K., Wang, S.: Towards Fair Classifiers Withou t Sensitive Attributes: Exploring Biases in Related Features. In: Proceedings of the Fifteenth ACM International Conference on Web Search And Data Mining, pp. 1433–1442. 41 ACM, Virtual Event AZ USA (2022). https://doi.org/10.1145/3488560.3498493

  33. [33]

    In: Proceedings of the 20 23 ACM Con- ference on Fairness, Accountability, and Transparency, pp

    Hamman, F., Chen, J., Dutta, S.: Can querying for bias leak prote cted attributes? achieving privacy with smooth sensitivity. In: Proceedings of the 20 23 ACM Con- ference on Fairness, Accountability, and Transparency, pp. 135 8–1368 (2023). https://doi.org/10.1145/3593013.3594086

  34. [34]

    An Introduction To Compressive Sampling,

    Cand` es, E.J., Wakin, M.B.: An introduction to compressive sam- pling. IEEE signal processing magazine 25(2), 21–30 (2008) https://doi.org/10.1109/MSP.2007.914731

  35. [35]

    In: 2011 IEEE Control and System Graduate Research Colloquium, pp

    Navada, A., Ansari, A.N., Patil, S., Sonkamble, B.A.: Overview of use of decision tree algorithms in machine learning. In: 2011 IEEE Control and System Graduate Research Colloquium, pp. 37–42 (20 11). https://doi.org/10.1109/ICSGRC.2011.5991826

  36. [36]

    Journal of Machine Learning Research 18(70), 1–37 (2017)

    Wang, T., Rudin, C., Doshi-Velez, F., Liu, Y., Klampfl, E., MacNeille, P.: A Bayesian Framework for Learning Rule Sets for Interpretable Clas sification. Journal of Machine Learning Research 18(70), 1–37 (2017)

  37. [37]

    EPRS in-depth analysis, 1–8 (2020)

    Szczepa´ nski, M.: Is data the new oil? competition issues in the dig ital economy. EPRS in-depth analysis, 1–8 (2020)

  38. [38]

    https://archive.ics.uci.edu/ml/datasets/adult Accessed 2023-02-05

    Kohavi, R., Becker, B.: UCI Machine Learning Repository: Adult D ata Set (2016). https://archive.ics.uci.edu/ml/datasets/adult Accessed 2023-02-05

  39. [39]

    https://archive.ics.uci.edu/ml/datasets/statlog+(german+credit+data) Accessed 2023-02-05

    Hofmann, H.: Statlog (German Credit Data) Data Set (2013). https://archive.ics.uci.edu/ml/datasets/statlog+(german+credit+data) Accessed 2023-02-05

  40. [40]

    Liu, T., Huynh, N., and van der Schaar, M

    Miller, G.A.: The magical number seven, plus or minus two: Some limits on our 42 capacity for processing information. Psychological Review 63(2), 81–97 (1956) https://doi.org/10.1037/h0043158 . Accessed 2023-07-11

  41. [41]

    Information F usion 58, 82–115 (2020) https://doi.org/10.1016/j.inffus.2019.12.012

    Barredo Arrieta, A., D ´ ıaz-Rodr ´ ıguez, N., Del Ser, J., Bennet ot, A., Tabik, S., Barbado, A., Garcia, S., Gil-Lopez, S., Molina, D., Benjamins, R., Chatila , R., Herrera, F.: Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information F usion 58, 82–115 (2020) https://doi.org/10.1016...

  42. [42]

    In: Proceedings of the Thirty-ninth Annual AC M Symposium on Theory of Computing, pp

    Nissim, K., Raskhodnikova, S., Smith, A.: Smooth sensitivity and sa mpling in pri- vate data analysis. In: Proceedings of the Thirty-ninth Annual AC M Symposium on Theory of Computing, pp. 75–84. ACM, San Diego California USA (2 007). https://doi.org/10.1145/1250790.1250803

  43. [43]

    Springer, Cham (2017)

    Mendel, J.M.: Uncertain Rule-Based Fuzzy Systems: Introduc- tion and New Directions, 2nd Edition. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-51370-6

  44. [44]

    Journal of Machine Lea rning Research 18(234), 1–78 (2018)

    Angelino, E., Larus-Stone, N., Alabi, D., Seltzer, M., Rudin, C.: Lea rning certifi- ably optimal rule lists for categorical data. Journal of Machine Lea rning Research 18(234), 1–78 (2018)

  45. [45]

    Southeast Europe Journal of soft computing 5(1) (2016) https://doi.org/10.21533/scjournal.v5i1.102

    Yazgana, P., Kusakci, A.O.: A literature survey on association ru le min- ing algorithms. Southeast Europe Journal of soft computing 5(1) (2016) https://doi.org/10.21533/scjournal.v5i1.102

  46. [46]

    Aytekin, C.: Neural Networks are Decision Trees. arXiv. arXiv:2 210.05189 [cs] (2022). http://arxiv.org/abs/2210.05189 Accessed 2023-06-08 43 A B C K A ¬A B ¬B C ¬C J K becomes Fig. A1 A schematic display of the process by which a binary tree that has non-binary splits can be converted into a binary tree for a binary decision process . The dotted lines . ...