pith. sign in

arxiv: 2112.11279 · v4 · submitted 2021-12-21 · 💻 cs.LG

Differential Parity: Relative Fairness Between Two Sets of Decisions

Pith reviewed 2026-05-24 12:03 UTC · model grok-4.3

classification 💻 cs.LG
keywords differential parityrelative fairnessgroup fairnesssensitive attributedecision makingbias detectionmachine learning
0
0 comments X

The pith

Differential parity defines relative fairness as the independence of decision differences from a sensitive attribute.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes differential parity to evaluate fairness between two sets of decisions by checking whether their difference depends on a protected attribute such as race or gender. This approach sidesteps the problem of conflicting absolute fairness definitions by focusing on relative comparisons instead. When one decision set serves as a reliable reference, the measure functions as a group fairness criterion comparable to separation or sufficiency. Even without a reference, it exposes systematic preferences or biases between the two decision processes. A machine learning model is introduced to estimate the metric when the two sets cover different individuals.

Core claim

Differential parity holds that the difference between two decision sets should be statistically independent of a sensitive attribute; when a reference set of ground-truth or trusted decisions exists, this independence supplies a new group fairness condition distinct from separation and sufficiency, while in the absence of any reference it directly quantifies relative bias between the two sets.

What carries the argument

Differential parity, the statistical independence between the difference of two decision vectors and a sensitive attribute.

If this is right

  • When a reference decision set exists, differential parity supplies an additional group fairness test that can be checked alongside existing criteria.
  • Without any reference, the measure still identifies which of two decision processes exhibits greater dependence on the sensitive attribute.
  • The same framework applies to any pair of decision sources, including human versus algorithmic outputs or two different models.
  • The bridging model extends the test to populations that never overlap, removing the requirement that both sets act on identical individuals.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The measure could be used to audit whether an updated model reduces or increases bias relative to its predecessor on new data.
  • Repeated application over time would track whether fairness between successive decision systems is improving or drifting.
  • The independence test could be adapted to multiple sensitive attributes simultaneously by checking joint independence.

Load-bearing premise

A machine learning model can be trained to predict what decisions would have been made on the other set's subjects with enough accuracy to estimate the true differential parity.

What would settle it

Compute differential parity directly on an overlapping population where both decision sets are observed, then compare that value to the value obtained after replacing one set with model predictions; a large discrepancy would show the bridging step fails.

Figures

Figures reproduced from arXiv: 2112.11279 by Pranam Prakash Shetty, Xiaoyin Xi, Zhe Yu.

Figure 1
Figure 1. Figure 1: Demonstration of the proposed human decision fairness detection framework and a proof-of-concept experiment. [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Class distributions within each group for the five datesets. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
read the original abstract

With AI systems widely applied to assist humans in decision-making processes such as talent hiring, school admission, and loan approval; there is an increasing need to ensure that the decisions made are fair. One major challenge for analyzing fairness in decisions is that the standards are highly subjective and contextual -- there is no consensus for what absolute fairness means for every scenario. That is not to say that different fairness standards often conflict with each other. To bypass this issue, this work aims to test relative fairness in decisions. That is, instead of defining what are ``absolutely'' fair decisions, we propose to test the relative fairness of one decision set against another with differential parity -- the difference between two sets of decisions should be independent of a certain sensitive attribute. This proposed notion of differential parity fairness has the following benefits: (1) it avoids the ambiguous and contradictory definition of what absolutely fair decisions are; (2) when a reference set (of ground truth or reliable fair decisions) is available, differential parity can serve as a new group fairness notion (similar to but different from separation and sufficiency); (3) even when no reference set is available, it reveals the relative preference or bias between different decision sets. One limitation for differential parity is that it requires the two sets of decisions under comparison to be made on the same data subjects. To overcome this limitation, we propose to utilize a machine learning model to bridge the gap between the two sets of decisions made on difference data and estimate the differential parity.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper proposes differential parity as a relative fairness metric: the difference between two sets of decisions should be independent of a sensitive attribute. It claims this bypasses debates over absolute fairness, serves as a group fairness notion (distinct from separation/sufficiency) when a reference set exists, reveals relative bias otherwise, and can be estimated via an ML model when the two decision sets apply to different subjects.

Significance. If the ML bridging construction can be made rigorous, differential parity would supply a practical, reference-based alternative to standard group fairness definitions for comparing decision systems. The manuscript's explicit acknowledgment of the same-subject limitation and attempt to address it via imputation is a constructive step.

major comments (1)
  1. [Abstract] Abstract (limitation paragraph): the claim that differential parity remains usable when decision sets are made on different subjects rests on training an ML model to impute missing decisions. For the resulting statistic (difference independent of A) to retain its interpretation, the imputation error must itself be independent of A. The abstract provides no training objective, validation procedure against A, or error-independence guarantee; without this property the estimated parity is confounded by the auxiliary model rather than reflecting the original decisions.
minor comments (1)
  1. [Abstract] The abstract would be strengthened by an explicit mathematical statement of differential parity (e.g., P(D1 - D2 | A) = P(D1 - D2)) before describing its benefits.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the single major comment below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract (limitation paragraph): the claim that differential parity remains usable when decision sets are made on different subjects rests on training an ML model to impute missing decisions. For the resulting statistic (difference independent of A) to retain its interpretation, the imputation error must itself be independent of A. The abstract provides no training objective, validation procedure against A, or error-independence guarantee; without this property the estimated parity is confounded by the auxiliary model rather than reflecting the original decisions.

    Authors: We agree that the abstract's treatment of the ML imputation approach is insufficiently precise. The manuscript acknowledges the same-subject limitation and proposes an ML bridge, but does not articulate a training objective, validation against A, or error-independence condition in the abstract. Without such a guarantee the imputed differential parity can indeed be confounded. We will revise the abstract (and, if space permits, the limitation paragraph) to state explicitly that the bridging construction requires the auxiliary model's errors to be independent of A (or to be validated as such) and that this remains an assumption rather than a proven property of the current proposal. revision: yes

Circularity Check

0 steps flagged

No circularity: definition is a direct statistical independence condition with no self-referential reduction.

full rationale

The paper defines differential parity explicitly as the requirement that the difference between two decision sets is independent of the sensitive attribute. This is a primitive statistical notion introduced without reference to fitted parameters, prior self-citations, or any construction that would make the output equivalent to its inputs by definition. The ML bridging proposal for mismatched subjects is presented as an estimation technique rather than a load-bearing derivation step; no equations are supplied that would allow the estimated parity to reduce tautologically to the model outputs themselves. The central claim therefore remains self-contained and does not trigger any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

No free parameters or invented entities appear in the abstract. The central proposal rests on the domain assumption that decisions can be compared or bridged across subjects.

axioms (1)
  • domain assumption Two sets of decisions can be compared directly or estimated via ML when made on different subjects.
    Explicitly stated as a limitation that the ML proposal is intended to overcome.

pith-pipeline@v0.9.0 · 5800 in / 1137 out tokens · 26294 ms · 2026-05-24T12:03:57.845893+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages · 2 internal anchors

  1. [1]

    Machine bias: There’s software used across the country to predict future criminals

    Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. Machine bias: There’s software used across the country to predict future criminals. and it’s biased against blacks. https://www.propublica.org/ article / machine - bias - risk - assessments - in-criminal-sentencing, 2016. 1

  2. [2]

    AI Fairness 360: An Extensible Toolkit for Detecting, Understanding, and Mitigating Unwanted Algorithmic Bias

    Rachel KE Bellamy, Kuntal Dey, Michael Hind, Samuel C Hoffman, Stephanie Houde, Kalapriya Kannan, Pranay Lo- hia, Jacquelyn Martino, Sameep Mehta, Aleksandra Mo- jsilovic, et al. Ai fairness 360: An extensible toolkit for de- tecting, understanding, and mitigating unwanted algorithmic bias. arXiv preprint arXiv:1810.01943, 2018. 2

  3. [3]

    Rachel K. E. Bellamy, Kuntal Dey, Michael Hind, Samuel C. Hoffman, Stephanie Houde, Kalapriya Kannan, Pranay Lo- hia, Jacquelyn Martino, Sameep Mehta, Aleksandra Mo- jsilovic, Seema Nagar, Karthikeyan Natesan Ramamurthy, John Richards, Diptikalyan Saha, Prasanna Sattigeri, Monin- der Singh, Kush R. Varshney, and Yunfeng Zhang. AI Fair- ness 360: An extens...

  4. [4]

    Joymallya Chakraborty, Suvodeep Majumder, and Tim Men- zies. Bias in machine learning software: Why? how? what to do? In Proceedings of the 29th ACM Joint Meeting on Eu- ropean Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2021, page 429–440, New York, NY , USA, 2021. Association for Computing Machinery. 3

  5. [5]

    Amazon scraps secret ai recruiting tool that showed bias against women

    Jeffrey Dastin. Amazon scraps secret ai recruiting tool that showed bias against women. https : / / www.reuters.com/article/us- amazon- com- jobs - automation - insight / amazon - scraps - secret- ai- recruiting- tool- that- showed- bias-against-women-idUSKCN1MK08G , 2018. 1

  6. [6]

    UCI machine learning reposi- tory, 2017

    Dheeru Dua and Casey Graff. UCI machine learning reposi- tory, 2017. 5

  7. [7]

    Fairness through awareness

    Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Rein- gold, and Richard Zemel. Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference, pages 214–226, 2012. 2

  8. [8]

    The case for process fairness in learning: Feature selection for fair decision making

    Nina Grgic-Hlaca, Muhammad Bilal Zafar, Krishna P Gum- madi, and Adrian Weller. The case for process fairness in learning: Feature selection for fair decision making. In NIPS Symposium on Machine Learning and the Law , volume 1, page 2, 2016. 2

  9. [9]

    Equality of op- portunity in supervised learning

    Moritz Hardt, Eric Price, and Nati Srebro. Equality of op- portunity in supervised learning. In Advances in neural in- formation processing systems, pages 3315–3323, 2016. 2

  10. [10]

    Data preprocessing tech- niques for classification without discrimination

    Faisal Kamiran and Toon Calders. Data preprocessing tech- niques for classification without discrimination. Knowledge and Information Systems, 33(1):1–33, 2012. 3

  11. [11]

    Health care start-up says a.i

    Arjun Kharpal. Health care start-up says a.i. can diag- nose patients better than humans can, doctors call that ’dubious’. https : / / www . cnbc . com / 2018 / 06 / 28/babylon- claims- its- ai- can- diagnose- patients - better - than - doctors . html, June

  12. [12]

    The algorithm that beats your bank manager

    Parmy Olson. The algorithm that beats your bank manager. https://www.forbes.com/sites/parmyolson/ 2011 / 03 / 15 / the - algorithm - that - beats - your-bank-manager/#15da2651ae99, 2011. 1

  13. [13]

    data for the propublica story ’machine bias’

    propublica. data for the propublica story ’machine bias’. https : / / github . com / propublica / compas - analysis/, 2016. 5

  14. [14]

    Very Deep Convolutional Networks for Large-Scale Image Recognition

    Karen Simonyan and Andrew Zisserman. Very deep convo- lutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. 5

  15. [15]

    Utkface data

    Yang Song and Zhifei Zhang. Utkface data. https:// susanqq.github.io/UTKFace/, 2016. 5

  16. [16]

    Fair- balance: Improving machine learning fairness on multi- plesensitive attributes with data balancing

    Zhe Yu, Chakraborty Joymallya, and Tim Menzies. Fair- balance: Improving machine learning fairness on multi- plesensitive attributes with data balancing. arXiv preprint arXiv:2107.08310, 2021. 2, 3, 5