Fairer and more accurate, but for whom?

Alexandra Chouldechova , Max G'Sell

Authors on Pith no claims yet

classification 📊 stat.AP cs.CYstat.ML

keywords modelsmodelapproachhumanidentifyingmetricsoverallperformance

read the original abstract

Complex statistical machine learning models are increasingly being used or considered for use in high-stakes decision-making pipelines in domains such as financial services, health care, criminal justice and human services. These models are often investigated as possible improvements over more classical tools such as regression models or human judgement. While the modeling approach may be new, the practice of using some form of risk assessment to inform decisions is not. When determining whether a new model should be adopted, it is therefore essential to be able to compare the proposed model to the existing approach across a range of task-relevant accuracy and fairness metrics. Looking at overall performance metrics, however, may be misleading. Even when two models have comparable overall performance, they may nevertheless disagree in their classifications on a considerable fraction of cases. In this paper we introduce a model comparison framework for automatically identifying subgroups in which the differences between models are most pronounced. Our primary focus is on identifying subgroups where the models differ in terms of fairness-related quantities such as racial or gender disparities. We present experimental results from a recidivism prediction task and a hypothetical lending example.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

FairTree: Subgroup Fairness Auditing of Machine Learning Models with Bias-Variance Decomposition
cs.LG 2026-04 unverdicted novelty 7.0

FairTree audits ML models for subgroup fairness by decomposing performance disparities into systematic bias and variance using permutation-based and fluctuation tests adapted from psychometric methods.