Differential Parity: Relative Fairness Between Two Sets of Decisions

Pranam Prakash Shetty; Xiaoyin Xi; Zhe Yu

arxiv: 2112.11279 · v4 · submitted 2021-12-21 · 💻 cs.LG

Differential Parity: Relative Fairness Between Two Sets of Decisions

Zhe Yu , Xiaoyin Xi , Pranam Prakash Shetty This is my paper

Pith reviewed 2026-05-24 12:03 UTC · model grok-4.3

classification 💻 cs.LG

keywords differential parityrelative fairnessgroup fairnesssensitive attributedecision makingbias detectionmachine learning

0 comments

The pith

Differential parity defines relative fairness as the independence of decision differences from a sensitive attribute.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes differential parity to evaluate fairness between two sets of decisions by checking whether their difference depends on a protected attribute such as race or gender. This approach sidesteps the problem of conflicting absolute fairness definitions by focusing on relative comparisons instead. When one decision set serves as a reliable reference, the measure functions as a group fairness criterion comparable to separation or sufficiency. Even without a reference, it exposes systematic preferences or biases between the two decision processes. A machine learning model is introduced to estimate the metric when the two sets cover different individuals.

Core claim

Differential parity holds that the difference between two decision sets should be statistically independent of a sensitive attribute; when a reference set of ground-truth or trusted decisions exists, this independence supplies a new group fairness condition distinct from separation and sufficiency, while in the absence of any reference it directly quantifies relative bias between the two sets.

What carries the argument

Differential parity, the statistical independence between the difference of two decision vectors and a sensitive attribute.

If this is right

When a reference decision set exists, differential parity supplies an additional group fairness test that can be checked alongside existing criteria.
Without any reference, the measure still identifies which of two decision processes exhibits greater dependence on the sensitive attribute.
The same framework applies to any pair of decision sources, including human versus algorithmic outputs or two different models.
The bridging model extends the test to populations that never overlap, removing the requirement that both sets act on identical individuals.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The measure could be used to audit whether an updated model reduces or increases bias relative to its predecessor on new data.
Repeated application over time would track whether fairness between successive decision systems is improving or drifting.
The independence test could be adapted to multiple sensitive attributes simultaneously by checking joint independence.

Load-bearing premise

A machine learning model can be trained to predict what decisions would have been made on the other set's subjects with enough accuracy to estimate the true differential parity.

What would settle it

Compute differential parity directly on an overlapping population where both decision sets are observed, then compare that value to the value obtained after replacing one set with model predictions; a large discrepancy would show the bridging step fails.

Figures

Figures reproduced from arXiv: 2112.11279 by Pranam Prakash Shetty, Xiaoyin Xi, Zhe Yu.

**Figure 1.** Figure 1: Demonstration of the proposed human decision fairness detection framework and a proof-of-concept experiment. [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

**Figure 2.** Figure 2: Class distributions within each group for the five datesets. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

read the original abstract

With AI systems widely applied to assist humans in decision-making processes such as talent hiring, school admission, and loan approval; there is an increasing need to ensure that the decisions made are fair. One major challenge for analyzing fairness in decisions is that the standards are highly subjective and contextual -- there is no consensus for what absolute fairness means for every scenario. That is not to say that different fairness standards often conflict with each other. To bypass this issue, this work aims to test relative fairness in decisions. That is, instead of defining what are ``absolutely'' fair decisions, we propose to test the relative fairness of one decision set against another with differential parity -- the difference between two sets of decisions should be independent of a certain sensitive attribute. This proposed notion of differential parity fairness has the following benefits: (1) it avoids the ambiguous and contradictory definition of what absolutely fair decisions are; (2) when a reference set (of ground truth or reliable fair decisions) is available, differential parity can serve as a new group fairness notion (similar to but different from separation and sufficiency); (3) even when no reference set is available, it reveals the relative preference or bias between different decision sets. One limitation for differential parity is that it requires the two sets of decisions under comparison to be made on the same data subjects. To overcome this limitation, we propose to utilize a machine learning model to bridge the gap between the two sets of decisions made on difference data and estimate the differential parity.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Differential parity is a clean relative fairness framing but the ML imputation step for mismatched subjects needs an explicit error-independence guarantee or the whole thing is confounded.

read the letter

The main takeaway is that this paper defines differential parity as the requirement that the difference between two decision sets is independent of a sensitive attribute. That is a straightforward statistical notion and it does give a way to talk about relative bias without picking an absolute fairness standard. When a reference set exists it can function as a group fairness criterion distinct from separation or sufficiency; without one it still surfaces which decision maker is more or less biased relative to the other. The motivation from hiring, admissions, and lending is stated plainly and the limitation about needing the same subjects is acknowledged up front. That part is useful framing for people who already work on fairness metrics. The soft spot is the proposed fix for different subjects. The abstract says an ML model can be trained to bridge the gap, but it supplies no training objective, no validation that the imputation error is independent of the sensitive attribute, and no argument that the resulting parity statistic remains interpretable. If the auxiliary model’s errors correlate with the attribute, the estimated differential parity simply inherits that correlation and no longer measures the original decisions. That assumption is load-bearing and currently unexamined. The paper is therefore for fairness researchers who want to explore comparative rather than absolute metrics. A reader who already knows the separation/sufficiency literature will see the distinction quickly, but anyone hoping for a ready-to-use auditing tool will need the full methods and experiments to judge whether the estimation step can be made reliable. It is coherent enough on its own terms to deserve referee time so the authors can either strengthen the bridging argument or restrict the claim to matched subjects.

Referee Report

1 major / 1 minor

Summary. The paper proposes differential parity as a relative fairness metric: the difference between two sets of decisions should be independent of a sensitive attribute. It claims this bypasses debates over absolute fairness, serves as a group fairness notion (distinct from separation/sufficiency) when a reference set exists, reveals relative bias otherwise, and can be estimated via an ML model when the two decision sets apply to different subjects.

Significance. If the ML bridging construction can be made rigorous, differential parity would supply a practical, reference-based alternative to standard group fairness definitions for comparing decision systems. The manuscript's explicit acknowledgment of the same-subject limitation and attempt to address it via imputation is a constructive step.

major comments (1)

[Abstract] Abstract (limitation paragraph): the claim that differential parity remains usable when decision sets are made on different subjects rests on training an ML model to impute missing decisions. For the resulting statistic (difference independent of A) to retain its interpretation, the imputation error must itself be independent of A. The abstract provides no training objective, validation procedure against A, or error-independence guarantee; without this property the estimated parity is confounded by the auxiliary model rather than reflecting the original decisions.

minor comments (1)

[Abstract] The abstract would be strengthened by an explicit mathematical statement of differential parity (e.g., P(D1 - D2 | A) = P(D1 - D2)) before describing its benefits.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the single major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] Abstract (limitation paragraph): the claim that differential parity remains usable when decision sets are made on different subjects rests on training an ML model to impute missing decisions. For the resulting statistic (difference independent of A) to retain its interpretation, the imputation error must itself be independent of A. The abstract provides no training objective, validation procedure against A, or error-independence guarantee; without this property the estimated parity is confounded by the auxiliary model rather than reflecting the original decisions.

Authors: We agree that the abstract's treatment of the ML imputation approach is insufficiently precise. The manuscript acknowledges the same-subject limitation and proposes an ML bridge, but does not articulate a training objective, validation against A, or error-independence condition in the abstract. Without such a guarantee the imputed differential parity can indeed be confounded. We will revise the abstract (and, if space permits, the limitation paragraph) to state explicitly that the bridging construction requires the auxiliary model's errors to be independent of A (or to be validated as such) and that this remains an assumption rather than a proven property of the current proposal. revision: yes

Circularity Check

0 steps flagged

No circularity: definition is a direct statistical independence condition with no self-referential reduction.

full rationale

The paper defines differential parity explicitly as the requirement that the difference between two decision sets is independent of the sensitive attribute. This is a primitive statistical notion introduced without reference to fitted parameters, prior self-citations, or any construction that would make the output equivalent to its inputs by definition. The ML bridging proposal for mismatched subjects is presented as an estimation technique rather than a load-bearing derivation step; no equations are supplied that would allow the estimated parity to reduce tautologically to the model outputs themselves. The central claim therefore remains self-contained and does not trigger any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

No free parameters or invented entities appear in the abstract. The central proposal rests on the domain assumption that decisions can be compared or bridged across subjects.

axioms (1)

domain assumption Two sets of decisions can be compared directly or estimated via ML when made on different subjects.
Explicitly stated as a limitation that the ML proposal is intended to overcome.

pith-pipeline@v0.9.0 · 5800 in / 1137 out tokens · 26294 ms · 2026-05-24T12:03:57.845893+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages · 2 internal anchors

[1]

Machine bias: There’s software used across the country to predict future criminals

Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. Machine bias: There’s software used across the country to predict future criminals. and it’s biased against blacks. https://www.propublica.org/ article / machine - bias - risk - assessments - in-criminal-sentencing, 2016. 1

work page 2016
[2]

AI Fairness 360: An Extensible Toolkit for Detecting, Understanding, and Mitigating Unwanted Algorithmic Bias

Rachel KE Bellamy, Kuntal Dey, Michael Hind, Samuel C Hoffman, Stephanie Houde, Kalapriya Kannan, Pranay Lo- hia, Jacquelyn Martino, Sameep Mehta, Aleksandra Mo- jsilovic, et al. Ai fairness 360: An extensible toolkit for de- tecting, understanding, and mitigating unwanted algorithmic bias. arXiv preprint arXiv:1810.01943, 2018. 2

work page internal anchor Pith review Pith/arXiv arXiv 2018
[3]

Rachel K. E. Bellamy, Kuntal Dey, Michael Hind, Samuel C. Hoffman, Stephanie Houde, Kalapriya Kannan, Pranay Lo- hia, Jacquelyn Martino, Sameep Mehta, Aleksandra Mo- jsilovic, Seema Nagar, Karthikeyan Natesan Ramamurthy, John Richards, Diptikalyan Saha, Prasanna Sattigeri, Monin- der Singh, Kush R. Varshney, and Yunfeng Zhang. AI Fair- ness 360: An extens...

work page 2018
[4]

Joymallya Chakraborty, Suvodeep Majumder, and Tim Men- zies. Bias in machine learning software: Why? how? what to do? In Proceedings of the 29th ACM Joint Meeting on Eu- ropean Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2021, page 429–440, New York, NY , USA, 2021. Association for Computing Machinery. 3

work page 2021
[5]

Amazon scraps secret ai recruiting tool that showed bias against women

Jeffrey Dastin. Amazon scraps secret ai recruiting tool that showed bias against women. https : / / www.reuters.com/article/us- amazon- com- jobs - automation - insight / amazon - scraps - secret- ai- recruiting- tool- that- showed- bias-against-women-idUSKCN1MK08G , 2018. 1

work page 2018
[6]

UCI machine learning reposi- tory, 2017

Dheeru Dua and Casey Graff. UCI machine learning reposi- tory, 2017. 5

work page 2017
[7]

Fairness through awareness

Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Rein- gold, and Richard Zemel. Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference, pages 214–226, 2012. 2

work page 2012
[8]

The case for process fairness in learning: Feature selection for fair decision making

Nina Grgic-Hlaca, Muhammad Bilal Zafar, Krishna P Gum- madi, and Adrian Weller. The case for process fairness in learning: Feature selection for fair decision making. In NIPS Symposium on Machine Learning and the Law , volume 1, page 2, 2016. 2

work page 2016
[9]

Equality of op- portunity in supervised learning

Moritz Hardt, Eric Price, and Nati Srebro. Equality of op- portunity in supervised learning. In Advances in neural in- formation processing systems, pages 3315–3323, 2016. 2

work page 2016
[10]

Data preprocessing tech- niques for classiﬁcation without discrimination

Faisal Kamiran and Toon Calders. Data preprocessing tech- niques for classiﬁcation without discrimination. Knowledge and Information Systems, 33(1):1–33, 2012. 3

work page 2012
[11]

Health care start-up says a.i

Arjun Kharpal. Health care start-up says a.i. can diag- nose patients better than humans can, doctors call that ’dubious’. https : / / www . cnbc . com / 2018 / 06 / 28/babylon- claims- its- ai- can- diagnose- patients - better - than - doctors . html, June

work page 2018
[12]

The algorithm that beats your bank manager

Parmy Olson. The algorithm that beats your bank manager. https://www.forbes.com/sites/parmyolson/ 2011 / 03 / 15 / the - algorithm - that - beats - your-bank-manager/#15da2651ae99, 2011. 1

work page 2011
[13]

data for the propublica story ’machine bias’

propublica. data for the propublica story ’machine bias’. https : / / github . com / propublica / compas - analysis/, 2016. 5

work page 2016
[14]

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan and Andrew Zisserman. Very deep convo- lutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. 5

work page internal anchor Pith review Pith/arXiv arXiv 2014
[15]

Utkface data

Yang Song and Zhifei Zhang. Utkface data. https:// susanqq.github.io/UTKFace/, 2016. 5

work page 2016
[16]

Fair- balance: Improving machine learning fairness on multi- plesensitive attributes with data balancing

Zhe Yu, Chakraborty Joymallya, and Tim Menzies. Fair- balance: Improving machine learning fairness on multi- plesensitive attributes with data balancing. arXiv preprint arXiv:2107.08310, 2021. 2, 3, 5

work page arXiv 2021

[1] [1]

Machine bias: There’s software used across the country to predict future criminals

Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. Machine bias: There’s software used across the country to predict future criminals. and it’s biased against blacks. https://www.propublica.org/ article / machine - bias - risk - assessments - in-criminal-sentencing, 2016. 1

work page 2016

[2] [2]

AI Fairness 360: An Extensible Toolkit for Detecting, Understanding, and Mitigating Unwanted Algorithmic Bias

Rachel KE Bellamy, Kuntal Dey, Michael Hind, Samuel C Hoffman, Stephanie Houde, Kalapriya Kannan, Pranay Lo- hia, Jacquelyn Martino, Sameep Mehta, Aleksandra Mo- jsilovic, et al. Ai fairness 360: An extensible toolkit for de- tecting, understanding, and mitigating unwanted algorithmic bias. arXiv preprint arXiv:1810.01943, 2018. 2

work page internal anchor Pith review Pith/arXiv arXiv 2018

[3] [3]

Rachel K. E. Bellamy, Kuntal Dey, Michael Hind, Samuel C. Hoffman, Stephanie Houde, Kalapriya Kannan, Pranay Lo- hia, Jacquelyn Martino, Sameep Mehta, Aleksandra Mo- jsilovic, Seema Nagar, Karthikeyan Natesan Ramamurthy, John Richards, Diptikalyan Saha, Prasanna Sattigeri, Monin- der Singh, Kush R. Varshney, and Yunfeng Zhang. AI Fair- ness 360: An extens...

work page 2018

[4] [4]

Joymallya Chakraborty, Suvodeep Majumder, and Tim Men- zies. Bias in machine learning software: Why? how? what to do? In Proceedings of the 29th ACM Joint Meeting on Eu- ropean Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2021, page 429–440, New York, NY , USA, 2021. Association for Computing Machinery. 3

work page 2021

[5] [5]

Amazon scraps secret ai recruiting tool that showed bias against women

Jeffrey Dastin. Amazon scraps secret ai recruiting tool that showed bias against women. https : / / www.reuters.com/article/us- amazon- com- jobs - automation - insight / amazon - scraps - secret- ai- recruiting- tool- that- showed- bias-against-women-idUSKCN1MK08G , 2018. 1

work page 2018

[6] [6]

UCI machine learning reposi- tory, 2017

Dheeru Dua and Casey Graff. UCI machine learning reposi- tory, 2017. 5

work page 2017

[7] [7]

Fairness through awareness

Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Rein- gold, and Richard Zemel. Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference, pages 214–226, 2012. 2

work page 2012

[8] [8]

The case for process fairness in learning: Feature selection for fair decision making

Nina Grgic-Hlaca, Muhammad Bilal Zafar, Krishna P Gum- madi, and Adrian Weller. The case for process fairness in learning: Feature selection for fair decision making. In NIPS Symposium on Machine Learning and the Law , volume 1, page 2, 2016. 2

work page 2016

[9] [9]

Equality of op- portunity in supervised learning

Moritz Hardt, Eric Price, and Nati Srebro. Equality of op- portunity in supervised learning. In Advances in neural in- formation processing systems, pages 3315–3323, 2016. 2

work page 2016

[10] [10]

Data preprocessing tech- niques for classiﬁcation without discrimination

Faisal Kamiran and Toon Calders. Data preprocessing tech- niques for classiﬁcation without discrimination. Knowledge and Information Systems, 33(1):1–33, 2012. 3

work page 2012

[11] [11]

Health care start-up says a.i

Arjun Kharpal. Health care start-up says a.i. can diag- nose patients better than humans can, doctors call that ’dubious’. https : / / www . cnbc . com / 2018 / 06 / 28/babylon- claims- its- ai- can- diagnose- patients - better - than - doctors . html, June

work page 2018

[12] [12]

The algorithm that beats your bank manager

Parmy Olson. The algorithm that beats your bank manager. https://www.forbes.com/sites/parmyolson/ 2011 / 03 / 15 / the - algorithm - that - beats - your-bank-manager/#15da2651ae99, 2011. 1

work page 2011

[13] [13]

data for the propublica story ’machine bias’

propublica. data for the propublica story ’machine bias’. https : / / github . com / propublica / compas - analysis/, 2016. 5

work page 2016

[14] [14]

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan and Andrew Zisserman. Very deep convo- lutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. 5

work page internal anchor Pith review Pith/arXiv arXiv 2014

[15] [15]

Utkface data

Yang Song and Zhifei Zhang. Utkface data. https:// susanqq.github.io/UTKFace/, 2016. 5

work page 2016

[16] [16]

Fair- balance: Improving machine learning fairness on multi- plesensitive attributes with data balancing

Zhe Yu, Chakraborty Joymallya, and Tim Menzies. Fair- balance: Improving machine learning fairness on multi- plesensitive attributes with data balancing. arXiv preprint arXiv:2107.08310, 2021. 2, 3, 5

work page arXiv 2021