Geometry of Relaxed Fair Regression: A Unified Framework for Aware and Unaware Settings

M. Generali Lince; P. Loiseau; R. Flamary; S. Gaucher; V. Divol

arxiv: 2605.28233 · v1 · pith:RMS6QLWAnew · submitted 2026-05-27 · 📊 stat.ML · cs.CY· cs.LG

Geometry of Relaxed Fair Regression: A Unified Framework for Aware and Unaware Settings

M. Generali Lince , V. Divol , R. Flamary , S. Gaucher , P. Loiseau This is my paper

Pith reviewed 2026-06-29 10:07 UTC · model grok-4.3

classification 📊 stat.ML cs.CYcs.LG

keywords fair regressiondemographic parityoptimal transportWasserstein distancetotal variationunaware fairnessfairness constraintsrelaxed fairness

0 comments

The pith

Fair regression under demographic parity can be reformulated exactly as an optimal transport problem that works in both aware and unaware settings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that adding a demographic parity penalty to a regression objective turns the problem into one of finding an optimal transport map between the distributions of predictions and targets. This single formulation covers both the aware case, where the sensitive attribute is known at training and test time, and the unaware case, where it is unavailable at prediction. The authors derive that the optimal predictors are precisely the transport maps for two concrete penalties, the squared Wasserstein-2 distance and total variation. Each penalty encodes a different notion of fairness: one spreads the adjustment across the whole population and the other meets the parity constraint exactly on a subset of points. From this geometry they obtain a practical algorithm that is easy to implement and competitive on standard benchmarks.

Core claim

Formulating regression under a demographic parity penalty as an optimal transport problem unifies both the aware and unaware settings and characterizes optimal prediction functions via optimal transport maps, under both squared Wasserstein-2 and Total Variation penalties. These results reveal that the Wasserstein penalty induces a smooth, population-wide compromise, while Total Variation enforces exact parity for a subset of individuals.

What carries the argument

the optimal transport formulation of the demographic parity penalty, whose solutions are the optimal transport maps between the relevant conditional distributions

If this is right

The optimal fair predictors are exactly the optimal transport maps for the chosen penalty.
The Wasserstein-2 penalty produces a smooth adjustment across the entire population.
The total variation penalty meets demographic parity exactly for a subset of individuals.
A simple algorithm derived from the transport characterization is computationally efficient and matches or exceeds prior methods on real benchmarks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The transport view could guide the design of new fairness penalties based on other cost functions between distributions.
The same geometry might extend to fairness notions beyond demographic parity once they are expressed as transport costs.
In practice, the choice between penalties could be driven by whether a smooth or a hard fairness guarantee is preferred on a given dataset.

Load-bearing premise

The demographic parity penalty admits an exact optimal transport formulation whose solutions are precisely the OT maps for the chosen penalties, without further restrictions on the regressor class or data distributions.

What would settle it

A concrete counter-example distribution and loss where the minimizer of the fairness-penalized regression objective is not an optimal transport map under either the Wasserstein-2 or total variation penalty.

Figures

Figures reproduced from arXiv: 2605.28233 by M. Generali Lince, P. Loiseau, R. Flamary, S. Gaucher, V. Divol.

**Figure 1.** Figure 1: Geometry of the OT maps. Left (Aware): 1D-to-1D transport mapping groupconditional distributions to almost-fair predictions. Right (Unaware): 2D-to-1D transport mapping the joint distributions of (η(X), ∆(X)) under µ + η,∆ and µ − η,∆ to almost-fair predictions. Unaware Setting Aware Setting Objective OTcu (µ + η,∆, ν+) + OTcu (µ − η,∆, ν−) + λ OTcD (ν +, ν−) p + OTc2 (µ + η , ν+) + p − OTc2 (µ − η , ν−) … view at source ↗

**Figure 2.** Figure 2: Individual prediction trajectories for W2 vs. TV Relaxation. Evolution of marginal distributions and subset of individual prediction paths from Unfair (Bayes) to strictly Fair (100%-0% of initial unfairness). The lines show W2 induces proportional shift for all individuals, while TV leaves predictions unchanged (horizontal lines) until a hard threshold triggers abrupt merging. 2.4 The implicit philosophies… view at source ↗

**Figure 3.** Figure 3: Accuracy-fairness trade-offs on the Law School dataset. Relaxation trajectories from the unconstrained ERM (black dot) to the exact fair models (stars). Error bars indicate the standard deviation across 10 random data splits. MSE of the constant predictor is 10.1 × 10−3 . 0.2 0.4 0.6 KS 2.0 2.5 3.0 3.5 4.0 4.5 MSE ×10 2 MSE vs KS 0.1 0.2 0.3 W2 MSE vs W2 0.2 0.3 0.4 0.5 0.6 TV MSE vs TV FairReg OT-U W2 OT-… view at source ↗

**Figure 4.** Figure 4: Accuracy-fairness trade-offs on the Communities dataset. Relaxation trajectories from the unconstrained ERM to the exact fair models. MSE of the constant predictor is 5.5 × 10−2 . 5 Limitations and perspectives This work introduced a unified geometric framework for relaxed fair regression, where optimal predictors admit closed-form expressions via optimal transport maps. These characterizations are valuab… view at source ↗

**Figure 5.** Figure 5: Unaware relaxations. Comparison of the Bayes predictor against the optimal relaxed unaware predictors under W2 and TV penalties, matched to an 80% unfairness budget. Qualitative analysis of the aware relaxations To complement the unaware visualizations above, we empirically evaluate the exact same relaxations in the aware setting on our 1D synthetic dataset. As shown in [PITH_FULL_IMAGE:figures/full_fig_p… view at source ↗

**Figure 6.** Figure 6: Aware relaxations. Comparison of the base ERM predictor against the optimal relaxed aware predictors under W2 and TV penalties. While W2 yields a smooth shift, the TV penalty applies an exact hard-thresholding, creating a discontinuous effect where predictions either perfectly merge or go back to the unfair ERM. E.2 Impact of confounding strenght and trade-offs We generate n independent samples from a cont… view at source ↗

**Figure 7.** Figure 7: Evolution of the aware marginal distributions. Evolution of the predicted marginal distributions from Unfair to Fair. The discontinuous nature of the aware TV hardthresholding creates fragmented, intermediate distributions, contrasting with the smooth continuous shift of the W2 penalty. Sensitive attribute and features. We sample the binary sensitive attribute uniformly at random S ∼ Bernoulli(0.5), and … view at source ↗

**Figure 8.** Figure 8: Accuracy-fairness trade-offs on the synthetic dataset (γ = 0.5). Relaxation trajectories from the unconstrained empirical risk minimizer (ERM, black dot) to the strict fair models (stars). The unconstrained baseline naturally converges to the theoretical noise variance limit (MSE = 0.25). Our exact optimal transport mappings (OT-U W2 and OT-U TV) recover the optimal aware Pareto frontier and outperform bot… view at source ↗

**Figure 9.** Figure 9: Impact of confounding strength (Absolute). Evolution of the MSE (left) and W2 unfairness (right) under strict DP. As γ → 1, OT Unaware converges to the Aware model, tightly bounding the MSE degradation. FairReg suffers from optimization instability, triggering severe drops in predictive utility. 10 2 10 1 10 0 10 0 MSE / base MSE vs 10 2 10 1 10 0 10 1 10 0 W2 / b ase W2 vs ERM OT-U 2 OT-A 2 FairReg OT S 2… view at source ↗

**Figure 10.** Figure 10: Impact of confounding strength (Relative). Evolution of the MSE and W2 unfairness expressed relative to the unconstrained ERM baseline. The relative visualization highlights FairReg’s rapid deterioration in utility at moderate confounding (γ ≈ 0.1), whereas OT Unaware maintains stability and accurately tracks the theoretical aware bound. 10 2 10 1 10 0 10 0 MSE MSE vs 10 2 10 1 10 0 10 0 2 × 10 0 3 × 10 0… view at source ↗

**Figure 11.** Figure 11: Within-Group Variance Analysis. Evolution of the MSE (left) and within-group prediction variance (right) across varying confounding strengths (γ). At low γ, OT Unaware conservatively collapses the variance to act as a safe, constant predictor since the density ratio ∆ˆ (x) is uninformative. As structural bias increases, it successfully resolves the 2D geometry and recovers the variance profile of the opti… view at source ↗

**Figure 12.** Figure 12: Accuracy-fairness trade-offs on the Adult dataset. Relaxation trajectories from the unconstrained empirical risk minimizer (ERM, black dot) to the exact fair models (stars). Error bars indicate the standard deviation across 10 random data splits. MSE of the constant predictor is 1.69 × 10−2 . F.2 Evaluation of the fully constrained models Experimental setup. Tables 2 to 4 present the performance of the fu… view at source ↗

read the original abstract

Fairness-accuracy trade-offs are a central concern in the deployment of fairness-aware machine learning methods. When sensitive attributes are unavailable at inference time-the so called unawareness setting, principled methods for obtaining accurate predictions under relaxed fairness constraints are largely missing. In this work, we address this gap by formulating regression under a demographic parity penalty as an optimal transport problem. Our framework unifies both the \emph{aware} and \emph{unaware} settings and characterizes optimal prediction functions via optimal transport maps, under both squared Wasserstein-2 and Total Variation penalties. These results reveal that the choice of penalty reflects fundamentally different fairness philosophies: the Wasserstein penalty induces a smooth, population-wide compromise, while Total Variation enforces exact parity for a subset of individuals. Building on these theoretical characterizations, we propose an algorithm that is simple to implement, computationally efficient, and consistently matches or outperforms state-of-the-art baselines on real-world benchmarks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper unifies aware and unaware fair regression via an OT formulation of the demographic parity penalty, but the unaware case needs explicit verification that the maps respect dependence only on X.

read the letter

The paper casts regression under a demographic parity penalty as an optimal transport problem. This unifies the aware and unaware settings and gives characterizations of the optimal predictors as OT maps under squared Wasserstein-2 and total variation penalties.

What stands out is the explicit treatment of the unaware setting, where S is unavailable at test time. The authors connect the penalty choice to different fairness ideas, with W2 producing a smooth population-wide compromise and TV enforcing exact parity on a subset of points. They derive a simple algorithm from the theory and report that it matches or beats baselines on real benchmarks.

The soft spot is whether the unaware derivation properly restricts to functions of X alone. The OT problem as usually stated can yield maps that depend on S. If the paper solves the unconstrained OT problem and then projects, the achieved penalty may differ from the claimed optimum. The abstract asserts that OT maps characterize the solutions in both settings, so the unaware proof must show how the measurability constraint is enforced without extra cost. If that step is missing or approximate, the unification claim weakens.

The math rests on standard OT results, which is fine. No circularity appears in the claims. The experiments are described only at a high level, so their strength is hard to judge without the details.

This paper is for researchers in fair ML who work on regression and want a transport-based view of parity penalties. A reader focused on theoretical characterizations would find the penalty distinctions useful. It deserves a serious referee because it targets a stated practical gap with a new formulation, even if the unaware constraint handling requires checking.

Referee Report

2 major / 2 minor

Summary. The paper formulates regression under a demographic parity penalty as an optimal transport problem. It unifies the aware and unaware settings by characterizing optimal prediction functions via OT maps, for both squared Wasserstein-2 and Total Variation penalties. The work distinguishes the fairness philosophies induced by each penalty and proposes a simple, efficient algorithm that matches or exceeds baselines on real-world data.

Significance. If the OT characterizations are rigorous, the paper supplies a geometric lens on relaxed fairness constraints that directly yields practical algorithms for both aware and unaware regimes. The explicit link between penalty choice and population-wide versus subset-exact parity is a useful conceptual contribution, and the algorithm's reported performance suggests immediate applicability.

major comments (2)

[§4] §4 (unaware case): The derivation equates the demographic parity penalty to an OT problem whose solutions are OT maps, but does not explicitly enforce that the regressor must be measurable with respect to σ(X) alone. If the resulting map depends on S, the claimed unification requires an additional conditional-expectation or projection step whose effect on the attained penalty value is not shown to be zero; this step is load-bearing for the unaware claim.
[Theorem 3.1] Theorem 3.1 / Eq. (12): The statement that the Wasserstein-2 penalty induces a 'smooth, population-wide compromise' is presented as following directly from the OT map, yet the proof sketch does not quantify how the map behaves under the demographic parity constraint when the regressor class is restricted; a counter-example or explicit bound would strengthen the claim.

minor comments (2)

Notation for the sensitive attribute S and the predictor f is introduced inconsistently between the aware and unaware sections; a single table of symbols would improve readability.
The experimental section reports aggregate metrics but omits per-dataset variance or statistical significance tests against the strongest baseline; adding these would make the performance claims easier to assess.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major comment below and indicate the planned revisions.

read point-by-point responses

Referee: [§4] §4 (unaware case): The derivation equates the demographic parity penalty to an OT problem whose solutions are OT maps, but does not explicitly enforce that the regressor must be measurable with respect to σ(X) alone. If the resulting map depends on S, the claimed unification requires an additional conditional-expectation or projection step whose effect on the attained penalty value is not shown to be zero; this step is load-bearing for the unaware claim.

Authors: We agree that the unaware case requires an explicit treatment of the σ(X)-measurability constraint. The OT map is initially derived on the joint law of (X,S,Y), but the unaware regressor must be a function of X alone. In the revision we will insert a dedicated paragraph in §4 that applies the conditional-expectation projection onto σ(X)-measurable functions and proves that this projection leaves the demographic-parity penalty unchanged: the penalty is a functional of the marginal push-forward measure of the predictions, which is invariant under conditioning on X. This step therefore does not alter the attained penalty value and preserves the claimed unification. revision: yes
Referee: [Theorem 3.1] Theorem 3.1 / Eq. (12): The statement that the Wasserstein-2 penalty induces a 'smooth, population-wide compromise' is presented as following directly from the OT map, yet the proof sketch does not quantify how the map behaves under the demographic parity constraint when the regressor class is restricted; a counter-example or explicit bound would strengthen the claim.

Authors: Theorem 3.1 characterizes the unrestricted optimal map, which illustrates the population-level geometry induced by the Wasserstein penalty. When the function class is restricted, the result serves as an ideal benchmark. We will revise the discussion following Eq. (12) to state this distinction explicitly and add a quantitative bound on the sub-optimality gap that arises from the restriction, obtained from the Lipschitz continuity of the Wasserstein-2 map with respect to the marginal constraint. This addition will make the scope of the claim precise. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on standard OT theory

full rationale

The paper derives its unification of aware and unaware fair regression by mapping the demographic parity penalty to an optimal transport problem and characterizing solutions via OT maps under W2 and TV penalties. This equivalence is presented as following from the mathematical definition of the penalties and standard OT results, without any reduction of predictions to fitted parameters by construction, self-definitional loops, or load-bearing self-citations. The unaware setting constraint (regressors measurable w.r.t. X only) is addressed as part of the framework setup rather than smuggled in via prior author work. No steps match the enumerated circularity patterns, and the central claims remain independent of the paper's own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only; full paper would list explicit assumptions on loss functions, support of distributions, and existence of transport maps. No free parameters or invented entities are identifiable from the abstract.

axioms (1)

domain assumption Demographic parity penalty can be exactly encoded as a cost in an optimal transport problem whose solutions are OT maps.
Core modeling step that enables the unification and characterizations.

pith-pipeline@v0.9.1-grok · 5712 in / 1244 out tokens · 42399 ms · 2026-06-29T10:07:16.350002+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

43 extracted references · 12 canonical work pages · 1 internal anchor

[1]

Fair regression: Quantitative defi- nitions and reduction-based algorithms

Alekh Agarwal, Miroslav Dudik, and Zhiwei Steven Wu. Fair regression: Quantitative defi- nitions and reduction-based algorithms. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors,Proceedings of the 36th International Conference on Machine Learning, volume 97 ofProceedings of Machine Learning Research, pages 120–129. PMLR, 09–15 Jun 2019. URL https:/...

2019
[2]

Aliprantis and Kim C

Charalambos D. Aliprantis and Kim C. Border.Infinite Dimensional Analysis: A Hitchhiker’s Guide. Springer Berlin Heidelberg, 3 edition, 2006. ISBN 978-3-540-32696-0. doi: 10.1007/ 3-540-29587-9

2006
[3]

Equalized odds post- processing under imperfect group information

Pranjal Awasthi, Matthäus Kleindessner, and Jamie Morgenstern. Equalized odds post- processing under imperfect group information. In Silvia Chiappa and Roberto Calandra, editors,Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, volume 108 ofProceedings of Machine Learning Research, pages 1770–1780. PMLR, 2...

2020
[4]

MIT Press, 2023

Solon Barocas, Moritz Hardt, and Arvind Narayanan.Fairness and Machine Learning: Limitations and Opportunities. MIT Press, 2023

2023
[5]

A Convex Framework for Fair Regression

Richard A. Berk, Hoda Heidari, Shahin Jabbari, Matthew Joseph, Michael Kearns, Jamie Morgenstern, Seth Neel, and Aaron Roth. A convex framework for fair regression. In Proceedings of the 4th Workshop on Fairness, Accountability, and Transparency in Machine Learning (FAT/ML), ICML Workshop, August 2017. URLhttps://arxiv.org/abs/1706. 02409. Co-located with...

work page internal anchor Pith review Pith/arXiv arXiv 2017
[6]

InProceedings of the Conference on Fairness, Accountability, and Transparency (FAT* ’19)

Jiahao Chen, Nathan Kallus, Xiaojie Mao, Geoffry Svacha, and Madeleine Udell. Fairness under unawareness: Assessing disparity when protected class is unobserved. InProceedings of the Conference on Fairness, Accountability, and Transparency, FAT* ’19, page 339–348. ACM, January 2019. doi: 10.1145/3287560.3287594. URLhttp://dx.doi.org/10.1145/ 3287560.3287594

work page doi:10.1145/3287560.3287594 2019
[7]

A general approach to fairness with optimal transport.Proceedings of the AAAI Conference on Artificial Intelligence, 34(04):3633–3640, Apr

Silvia Chiappa, Ray Jiang, Tom Stepleton, Aldo Pacchiano, Heinrich Jiang, and John Aslanides. A general approach to fairness with optimal transport.Proceedings of the AAAI Conference on Artificial Intelligence, 34(04):3633–3640, Apr. 2020. doi: 10.1609/aaai.v34i04

work page doi:10.1609/aaai.v34i04 2020
[8]

URLhttps://ojs.aaai.org/index.php/AAAI/article/view/5771
[9]

A minimax framework for quantifying risk-fairness trade-off in regression.The Annals of Statistics, 50(4):2416–2442, August 2022

Evgenii Chzhen and Nicolas Schreuder. A minimax framework for quantifying risk-fairness trade-off in regression.The Annals of Statistics, 50(4):2416–2442, August 2022. doi: 10.1214/22-AOS2198. URLhttps://arxiv.org/abs/2007.14265

work page doi:10.1214/22-aos2198 2022
[10]

Fair regression with wasserstein barycenters

Evgenii Chzhen, Christophe Denis, Mohamed Hebiri, Luca Oneto, and Massimiliano Pontil. Fair regression with wasserstein barycenters. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors,Advances in Neural Information Processing Systems, volume 33, pages 7321–7331. Curran Associates, Inc., 2020. URLhttps://proceedings.neurips.cc/ pape...

2020
[11]

Optimization with non-differentiable constraints with applications to fairness, recall, churn, and other goals.Journal of Machine Learning Research, 20(172): 1–59, 2019

Andrew Cotter, Heinrich Jiang, Maya Gupta, Serena Wang, Taman Narayan, Seungil You, and Karthik Sridharan. Optimization with non-differentiable constraints with applications to fairness, recall, churn, and other goals.Journal of Machine Learning Research, 20(172): 1–59, 2019. URLhttp://jmlr.org/papers/v20/18-616.html

2019
[12]

Fairness guarantees in multi-class classification with demographic parity.Journal of Machine Learning Research, 25(130):1–46, 2024

Christophe Denis, Romuald Elie, Mohamed Hebiri, and François Hu. Fairness guarantees in multi-class classification with demographic parity.Journal of Machine Learning Research, 25(130):1–46, 2024. URLhttps://jmlr.org/papers/v25/23-0322.html. 12

2024
[13]

Demographic parity in regression and classification within the unawareness framework, 2024

Vincent Divol and Solenne Gaucher. Demographic parity in regression and classification within the unawareness framework, 2024. URLhttps://arxiv.org/abs/2409.02471

work page arXiv 2024
[14]

Fairness through awareness

Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. Fairness through awareness. InProceedings of the 3rd Innovations in Theoretical Computer Science Conference, ITCS ’12, pages 214–226. ACM, January 2012. doi: 10.1145/2090236.2090255. URLhttps://arxiv.org/abs/1104.3913

work page doi:10.1145/2090236.2090255 2012
[15]

Learning with minibatch Wasserstein : asymptotic and gradient properties

Kilian Fatras, Younes Zine, Rémi Flamary, Rémi Gribonval, and Nicolas Courty. Learning with minibatch Wasserstein : asymptotic and gradient properties. InAISTATS 2020 - 23nd International Conference on Artificial Intelligence and Statistics, volume volume 108 of PMLR, pages 1–20, Palermo, Italy, June 2020. URLhttps://hal.science/hal-02502329

2020
[16]

Fatou’s lemma for weakly converging probabilities.Theory of Probability & Its Applications, 58(4):683–689, 2014

Eugene A Feinberg, Pavlo O Kasyanov, and Nina V Zadoianchuk. Fatou’s lemma for weakly converging probabilities.Theory of Probability & Its Applications, 58(4):683–689, 2014

2014
[17]

Pot: Python optimal transport.Journal of Machine Learning Research, 22(78):1–8, 2021

Rémi Flamary, Nicolas Courty, Alexandre Gramfort, Mokhtar Z Alaya, Aurélie Boisbunon, Stanislas Chambon, Laetitia Chapel, Adrien Corenflos, Kilian Fatras, Nemo Fournier, et al. Pot: Python optimal transport.Journal of Machine Learning Research, 22(78):1–8, 2021

2021
[18]

Equality as a moral ideal.Ethics, 98(1):21–43, October 1987

Harry Gordon Frankfurt. Equality as a moral ideal.Ethics, 98(1):21–43, October 1987. doi: 10.1086/292913. URLhttps://www.jstor.org/stable/2381290

work page doi:10.1086/292913 1987
[19]

Fair learning with wasserstein barycenters for non-decomposable performance measures

Solenne Gaucher, Nicolas Schreuder, and Evgenii Chzhen. Fair learning with wasserstein barycenters for non-decomposable performance measures. In Francisco Ruiz, Jennifer Dy, and Jan-Willem van de Meent, editors,Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, volume 206 ofProceedings of Machine Learning Research,...

2023
[20]

Projection to fairness in statistical learning, 2020

Thibaut Le Gouic, Jean-Michel Loubes, and Philippe Rigollet. Projection to fairness in statistical learning, 2020. URLhttps://arxiv.org/abs/2005.11720

work page arXiv 2020
[21]

Fairness without demographics in repeated loss minimization

Tatsunori Hashimoto, Megha Srivastava, Hongseok Namkoong, and Percy Liang. Fairness without demographics in repeated loss minimization. In Jennifer Dy and Andreas Krause, editors,Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 1929–1938. PMLR, 10–15 Jul 2018. URL https://pr...

1929
[22]

Assessing algorithmic fairness with un- observed protected class using data combination.Manage

Nathan Kallus, Xiaojie Mao, and Angela Zhou. Assessing algorithmic fairness with un- observed protected class using data combination.Manage. Sci., 68(3):1959–1981, March

1959
[23]

doi: 10.1287/mnsc.2020.3850

ISSN 0025-1909. doi: 10.1287/mnsc.2020.3850. URL https://doi.org/10.1287/ mnsc.2020.3850

work page doi:10.1287/mnsc.2020.3850 1909
[24]

Fairness without demographics through adversarially reweighted learning

Preethi Lahoti, Alex Beutel, Jilin Chen, Kang Lee, Flavien Prost, Nithum Thain, Xuezhi Wang, and Ed Chi. Fairness without demographics through adversarially reweighted learning. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 728–740. Curran As- sociates, Inc.,...

2020
[25]

Fair text classification via transferable representations.Journal of Machine Learning Research, 26(239):1–47, 2025

Thibaud Leteno, Michael Perrot, Charlotte Laclau, Antoine Gourru, and Christophe Gravier. Fair text classification via transferable representations.Journal of Machine Learning Research, 26(239):1–47, 2025. URLhttp://jmlr.org/papers/v26/25-0485.html. 13

2025
[26]

Levin, Y

D.A. Levin, Y. Peres, and E.L. Wilmer.Markov Chains and Mixing Times. American Mathematical Soc., 2009. ISBN 9780821886274. URLhttps://books.google.fr/books? id=6Cg5Nq5sSv4C

2009
[27]

Kernel dependence regularizers and gaussian processes with applications to algorithmic fairness, 2019

Zhu Li, Adrian Perez-Suay, Gustau Camps-Valls, and Dino Sejdinovic. Kernel dependence regularizers and gaussian processes with applications to algorithmic fairness, 2019. URL https://arxiv.org/abs/1911.04322

work page arXiv 2019
[28]

Does mitigating ml's impact disparity require treatment disparity? In S

Zachary Lipton, Julian McAuley, and Alexandra Chouldechova. Does mitigating ml's impact disparity require treatment disparity? In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors,Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018. URLhttps://proceedings.neurips. cc/paper_fi...

2018
[29]

Too relaxed to be fair

Michael Lohaus, Michaël Perrot, and Ulrike Von Luxburg. Too relaxed to be fair. In Proceedings of the 37th International Conference on Machine Learning, ICML’20. JMLR.org, 2020

2020
[30]

Harvard University Press, 1971

John Rawls.A Theory of Justice. Harvard University Press, 1971

1971
[31]

Birkhäuser Cham, 1 edition, 2015

Filippo Santambrogio.Optimal Transport for Applied Mathematicians: Calculus of Vari- ations, PDEs, and Modeling, volume 87 ofProgress in Nonlinear Differential Equations and Their Applications. Birkhäuser Cham, 1 edition, 2015. ISBN 978-3-319-20827-5. doi: 10.1007/978-3-319-20828-2. Published 27 October 2015

work page doi:10.1007/978-3-319-20828-2 2015
[32]

Computing barycentres of measures for generic transport costs.arXiv preprint arXiv:2501.04016, 2024

Eloi Tanguy, Julie Delon, and Nathaël Gozlan. Computing barycentres of measures for generic transport costs.arXiv preprint arXiv:2501.04016, 2024

work page arXiv 2024
[33]

Regression under demo- graphic parity constraints via unlabeled post-processing

Gayane Taturyan, Evgenii Chzhen, and Mohamed Hebiri. Regression under demo- graphic parity constraints via unlabeled post-processing. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors,Advances in Neural Information Processing Systems, volume 37, pages 117917–117953. Curran As- sociates, Inc., 2024. URLhttps://pr...

2024
[34]

Springer Science & Business Media, 2008

Cédric Villani.Optimal Transport: Old and New, volume 338 ofGrundlehren der mathema- tischen Wissenschaften. Springer Science & Business Media, 2008. ISBN 9783540710509

2008
[35]

Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez-Rodriguez, and Krishna P. Gummadi. Fairness constraints: A flexible approach for fair classification.Journal of Machine Learning Research, 20(75):1–42, 2019. URLhttp://jmlr.org/papers/v20/18-262.html. 14 Appendix Table of Contents A Proofs of propositions and lemmas 2 A.1 Equivalence between (3) and (Pλ,D)...

2019
[36]

Then,(ν +∗, ν−∗) = (y∗ 1, y∗ 2)♯π∗ is a minimizer of(P λ,D)

: R×R→R 2 be a measurable map such that(y∗ 1(z1, z2), y∗ 2(z1, z2))realizes the infimum in (10) for all( z1, z2) ∈ R2 ×R 2. Then,(ν +∗, ν−∗) = (y∗ 1, y∗ 2)♯π∗ is a minimizer of(P λ,D). Proof of Lemma 2.To prove this equivalence, we first establish that the pointwise optimization constitutes a lower bound for the global problem, and then demonstrate that t...
[37]

We can therefore push forwardπ∗ via the map(z1, z2) 7→ (z1, z2, y∗ 1, y∗ 2)and construct a joint measure ρ∗

: R2 ×R 2 →R 2 such that( y∗ 1(z1, z2), y∗ 2(z1, z2))is a minimizer ofΦ z1,z2 for all(z 1, z2)∈R×R. We can therefore push forwardπ∗ via the map(z1, z2) 7→ (z1, z2, y∗ 1, y∗ 2)and construct a joint measure ρ∗. The marginals ofρ∗ over its third and fouth coordinates (y1 and y2) define target distributions ν+∗ and ν−∗. Because ρ∗ constitutes a valid joint tr...
[38]

On the one hand, plugging this value into the first-order conditions yield ( y∗ 1 = h1(1+λa2)+λh2a1 1+λ(a1+a2) , y∗ 2 = h2(1+λa1)+λh1a2 1+λ(a1+a2)

From the optimality conditions we have ( y∗ 1 −h 1 =−λa 1u, y∗ 2 −h 2 =λa 2u, Subtracting these identities givesu = (h1 −h 2) −λ (a1 + a2)u, which resolves tou = h1−h2 1+λ(a1+a2). On the one hand, plugging this value into the first-order conditions yield ( y∗ 1 = h1(1+λa2)+λh2a1 1+λ(a1+a2) , y∗ 2 = h2(1+λa1)+λh1a2 1+λ(a1+a2) . (12) On the other hand, plug...
[39]

According to Lemma 2, a solution(ν+∗, ν−∗)of (Pλ,D) is given by the marginal distributions of (y∗ 1(Z1, Z2), y∗ 2(Z1, Z2))where( Z1, Z2) ∼π ∗ and π∗ is as in(5)

= (λa1u)2 a1 + (λa2u)2 a2 +λu 2 =λu2 (1 +λ(a 1 +a 2)) =λ (h1 −h 2)2 (1 +λ(a 1 +a 2))2 (1 +λ(a 1 +a 2)) = λ 1 +λ(a 1 +a 2)(h1 −h 2)2. According to Lemma 2, a solution(ν+∗, ν−∗)of (Pλ,D) is given by the marginal distributions of (y∗ 1(Z1, Z2), y∗ 2(Z1, Z2))where( Z1, Z2) ∼π ∗ and π∗ is as in(5). Our next goal is to show that under Assumption 1, the random v...
[40]

for allz 1,ψ(z 1) = inf z2(ϕ(z2) +C λ,c2(z1, z2)); 2.π ∗ is supported on theCu λ,c2-subdifferential of a Kantorovich potentialψ, that is the set Γ ={(z 1, z2)∈ Z × Z:ψ(z 1)−ϕ(z 2) =C λ,c2(z1, z2)}. The arguments used to prove Lemma 3 in Divol and Gaucher[12] can be reproduced to show that under Assumption 1, the Kantorovich potentialψ is differentiable in...
[41]

In particular, the two output laws become arbitrarily close inW2

Since R h2 |d| dµ± η,∆(z) = R X± η(x)2 dχ(x) ≤ E[η(X)2] <∞ , we haveW2 2(ν∗ 1,λ, ν∗ 2,λ) → 0as λ→ ∞ . In particular, the two output laws become arbitrarily close inW2. Therefore the relaxed solutions asymptotically enforce a single common output distribution; if the unrelaxed barycenter is unique, this common limit must coincide with the unrelaxed barycen...
[42]

Thus Cλ,c0(z1, z2) = λ and thus y∗ 1(z1, z2) = h1 clearly only depends onz1

+λ−ϕ(z 2)≥λ. Thus Cλ,c0(z1, z2) = λ and thus y∗ 1(z1, z2) = h1 clearly only depends onz1. This shows that the mapT + λ,TV is well-definedµ + η,∆-almost everywhere. Similarly, we show that the mapT− λ,TV is well-definedµ − η,∆-almost everywhere. We then conclude using Lemma 1 to show that solutions of Equation (3) take the form described in the lemma. Let ...

2000
[43]

For the Law School dataset, the initialW2 gap is approximately0.03

between the initial biased distributions. For the Law School dataset, the initialW2 gap is approximately0.03. The exact geometric cost to perfectly repair this disparity is therefore bounded by0.032 = 0.0009. Because the targets are scaled to [−1, 1], the base ERM risk is naturally around0.010. Adding the absolute maximum fairness penalty yields an expect...

[1] [1]

Fair regression: Quantitative defi- nitions and reduction-based algorithms

Alekh Agarwal, Miroslav Dudik, and Zhiwei Steven Wu. Fair regression: Quantitative defi- nitions and reduction-based algorithms. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors,Proceedings of the 36th International Conference on Machine Learning, volume 97 ofProceedings of Machine Learning Research, pages 120–129. PMLR, 09–15 Jun 2019. URL https:/...

2019

[2] [2]

Aliprantis and Kim C

Charalambos D. Aliprantis and Kim C. Border.Infinite Dimensional Analysis: A Hitchhiker’s Guide. Springer Berlin Heidelberg, 3 edition, 2006. ISBN 978-3-540-32696-0. doi: 10.1007/ 3-540-29587-9

2006

[3] [3]

Equalized odds post- processing under imperfect group information

Pranjal Awasthi, Matthäus Kleindessner, and Jamie Morgenstern. Equalized odds post- processing under imperfect group information. In Silvia Chiappa and Roberto Calandra, editors,Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, volume 108 ofProceedings of Machine Learning Research, pages 1770–1780. PMLR, 2...

2020

[4] [4]

MIT Press, 2023

Solon Barocas, Moritz Hardt, and Arvind Narayanan.Fairness and Machine Learning: Limitations and Opportunities. MIT Press, 2023

2023

[5] [5]

A Convex Framework for Fair Regression

Richard A. Berk, Hoda Heidari, Shahin Jabbari, Matthew Joseph, Michael Kearns, Jamie Morgenstern, Seth Neel, and Aaron Roth. A convex framework for fair regression. In Proceedings of the 4th Workshop on Fairness, Accountability, and Transparency in Machine Learning (FAT/ML), ICML Workshop, August 2017. URLhttps://arxiv.org/abs/1706. 02409. Co-located with...

work page internal anchor Pith review Pith/arXiv arXiv 2017

[6] [6]

InProceedings of the Conference on Fairness, Accountability, and Transparency (FAT* ’19)

Jiahao Chen, Nathan Kallus, Xiaojie Mao, Geoffry Svacha, and Madeleine Udell. Fairness under unawareness: Assessing disparity when protected class is unobserved. InProceedings of the Conference on Fairness, Accountability, and Transparency, FAT* ’19, page 339–348. ACM, January 2019. doi: 10.1145/3287560.3287594. URLhttp://dx.doi.org/10.1145/ 3287560.3287594

work page doi:10.1145/3287560.3287594 2019

[7] [7]

A general approach to fairness with optimal transport.Proceedings of the AAAI Conference on Artificial Intelligence, 34(04):3633–3640, Apr

Silvia Chiappa, Ray Jiang, Tom Stepleton, Aldo Pacchiano, Heinrich Jiang, and John Aslanides. A general approach to fairness with optimal transport.Proceedings of the AAAI Conference on Artificial Intelligence, 34(04):3633–3640, Apr. 2020. doi: 10.1609/aaai.v34i04

work page doi:10.1609/aaai.v34i04 2020

[8] [8]

URLhttps://ojs.aaai.org/index.php/AAAI/article/view/5771

[9] [9]

A minimax framework for quantifying risk-fairness trade-off in regression.The Annals of Statistics, 50(4):2416–2442, August 2022

Evgenii Chzhen and Nicolas Schreuder. A minimax framework for quantifying risk-fairness trade-off in regression.The Annals of Statistics, 50(4):2416–2442, August 2022. doi: 10.1214/22-AOS2198. URLhttps://arxiv.org/abs/2007.14265

work page doi:10.1214/22-aos2198 2022

[10] [10]

Fair regression with wasserstein barycenters

Evgenii Chzhen, Christophe Denis, Mohamed Hebiri, Luca Oneto, and Massimiliano Pontil. Fair regression with wasserstein barycenters. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors,Advances in Neural Information Processing Systems, volume 33, pages 7321–7331. Curran Associates, Inc., 2020. URLhttps://proceedings.neurips.cc/ pape...

2020

[11] [11]

Optimization with non-differentiable constraints with applications to fairness, recall, churn, and other goals.Journal of Machine Learning Research, 20(172): 1–59, 2019

Andrew Cotter, Heinrich Jiang, Maya Gupta, Serena Wang, Taman Narayan, Seungil You, and Karthik Sridharan. Optimization with non-differentiable constraints with applications to fairness, recall, churn, and other goals.Journal of Machine Learning Research, 20(172): 1–59, 2019. URLhttp://jmlr.org/papers/v20/18-616.html

2019

[12] [12]

Fairness guarantees in multi-class classification with demographic parity.Journal of Machine Learning Research, 25(130):1–46, 2024

Christophe Denis, Romuald Elie, Mohamed Hebiri, and François Hu. Fairness guarantees in multi-class classification with demographic parity.Journal of Machine Learning Research, 25(130):1–46, 2024. URLhttps://jmlr.org/papers/v25/23-0322.html. 12

2024

[13] [13]

Demographic parity in regression and classification within the unawareness framework, 2024

Vincent Divol and Solenne Gaucher. Demographic parity in regression and classification within the unawareness framework, 2024. URLhttps://arxiv.org/abs/2409.02471

work page arXiv 2024

[14] [14]

Fairness through awareness

Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. Fairness through awareness. InProceedings of the 3rd Innovations in Theoretical Computer Science Conference, ITCS ’12, pages 214–226. ACM, January 2012. doi: 10.1145/2090236.2090255. URLhttps://arxiv.org/abs/1104.3913

work page doi:10.1145/2090236.2090255 2012

[15] [15]

Learning with minibatch Wasserstein : asymptotic and gradient properties

Kilian Fatras, Younes Zine, Rémi Flamary, Rémi Gribonval, and Nicolas Courty. Learning with minibatch Wasserstein : asymptotic and gradient properties. InAISTATS 2020 - 23nd International Conference on Artificial Intelligence and Statistics, volume volume 108 of PMLR, pages 1–20, Palermo, Italy, June 2020. URLhttps://hal.science/hal-02502329

2020

[16] [16]

Fatou’s lemma for weakly converging probabilities.Theory of Probability & Its Applications, 58(4):683–689, 2014

Eugene A Feinberg, Pavlo O Kasyanov, and Nina V Zadoianchuk. Fatou’s lemma for weakly converging probabilities.Theory of Probability & Its Applications, 58(4):683–689, 2014

2014

[17] [17]

Pot: Python optimal transport.Journal of Machine Learning Research, 22(78):1–8, 2021

Rémi Flamary, Nicolas Courty, Alexandre Gramfort, Mokhtar Z Alaya, Aurélie Boisbunon, Stanislas Chambon, Laetitia Chapel, Adrien Corenflos, Kilian Fatras, Nemo Fournier, et al. Pot: Python optimal transport.Journal of Machine Learning Research, 22(78):1–8, 2021

2021

[18] [18]

Equality as a moral ideal.Ethics, 98(1):21–43, October 1987

Harry Gordon Frankfurt. Equality as a moral ideal.Ethics, 98(1):21–43, October 1987. doi: 10.1086/292913. URLhttps://www.jstor.org/stable/2381290

work page doi:10.1086/292913 1987

[19] [19]

Fair learning with wasserstein barycenters for non-decomposable performance measures

Solenne Gaucher, Nicolas Schreuder, and Evgenii Chzhen. Fair learning with wasserstein barycenters for non-decomposable performance measures. In Francisco Ruiz, Jennifer Dy, and Jan-Willem van de Meent, editors,Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, volume 206 ofProceedings of Machine Learning Research,...

2023

[20] [20]

Projection to fairness in statistical learning, 2020

Thibaut Le Gouic, Jean-Michel Loubes, and Philippe Rigollet. Projection to fairness in statistical learning, 2020. URLhttps://arxiv.org/abs/2005.11720

work page arXiv 2020

[21] [21]

Fairness without demographics in repeated loss minimization

Tatsunori Hashimoto, Megha Srivastava, Hongseok Namkoong, and Percy Liang. Fairness without demographics in repeated loss minimization. In Jennifer Dy and Andreas Krause, editors,Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 1929–1938. PMLR, 10–15 Jul 2018. URL https://pr...

1929

[22] [22]

Assessing algorithmic fairness with un- observed protected class using data combination.Manage

Nathan Kallus, Xiaojie Mao, and Angela Zhou. Assessing algorithmic fairness with un- observed protected class using data combination.Manage. Sci., 68(3):1959–1981, March

1959

[23] [23]

doi: 10.1287/mnsc.2020.3850

ISSN 0025-1909. doi: 10.1287/mnsc.2020.3850. URL https://doi.org/10.1287/ mnsc.2020.3850

work page doi:10.1287/mnsc.2020.3850 1909

[24] [24]

Fairness without demographics through adversarially reweighted learning

Preethi Lahoti, Alex Beutel, Jilin Chen, Kang Lee, Flavien Prost, Nithum Thain, Xuezhi Wang, and Ed Chi. Fairness without demographics through adversarially reweighted learning. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 728–740. Curran As- sociates, Inc.,...

2020

[25] [25]

Fair text classification via transferable representations.Journal of Machine Learning Research, 26(239):1–47, 2025

Thibaud Leteno, Michael Perrot, Charlotte Laclau, Antoine Gourru, and Christophe Gravier. Fair text classification via transferable representations.Journal of Machine Learning Research, 26(239):1–47, 2025. URLhttp://jmlr.org/papers/v26/25-0485.html. 13

2025

[26] [26]

Levin, Y

D.A. Levin, Y. Peres, and E.L. Wilmer.Markov Chains and Mixing Times. American Mathematical Soc., 2009. ISBN 9780821886274. URLhttps://books.google.fr/books? id=6Cg5Nq5sSv4C

2009

[27] [27]

Kernel dependence regularizers and gaussian processes with applications to algorithmic fairness, 2019

Zhu Li, Adrian Perez-Suay, Gustau Camps-Valls, and Dino Sejdinovic. Kernel dependence regularizers and gaussian processes with applications to algorithmic fairness, 2019. URL https://arxiv.org/abs/1911.04322

work page arXiv 2019

[28] [28]

Does mitigating ml's impact disparity require treatment disparity? In S

Zachary Lipton, Julian McAuley, and Alexandra Chouldechova. Does mitigating ml's impact disparity require treatment disparity? In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors,Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018. URLhttps://proceedings.neurips. cc/paper_fi...

2018

[29] [29]

Too relaxed to be fair

Michael Lohaus, Michaël Perrot, and Ulrike Von Luxburg. Too relaxed to be fair. In Proceedings of the 37th International Conference on Machine Learning, ICML’20. JMLR.org, 2020

2020

[30] [30]

Harvard University Press, 1971

John Rawls.A Theory of Justice. Harvard University Press, 1971

1971

[31] [31]

Birkhäuser Cham, 1 edition, 2015

Filippo Santambrogio.Optimal Transport for Applied Mathematicians: Calculus of Vari- ations, PDEs, and Modeling, volume 87 ofProgress in Nonlinear Differential Equations and Their Applications. Birkhäuser Cham, 1 edition, 2015. ISBN 978-3-319-20827-5. doi: 10.1007/978-3-319-20828-2. Published 27 October 2015

work page doi:10.1007/978-3-319-20828-2 2015

[32] [32]

Computing barycentres of measures for generic transport costs.arXiv preprint arXiv:2501.04016, 2024

Eloi Tanguy, Julie Delon, and Nathaël Gozlan. Computing barycentres of measures for generic transport costs.arXiv preprint arXiv:2501.04016, 2024

work page arXiv 2024

[33] [33]

Regression under demo- graphic parity constraints via unlabeled post-processing

Gayane Taturyan, Evgenii Chzhen, and Mohamed Hebiri. Regression under demo- graphic parity constraints via unlabeled post-processing. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors,Advances in Neural Information Processing Systems, volume 37, pages 117917–117953. Curran As- sociates, Inc., 2024. URLhttps://pr...

2024

[34] [34]

Springer Science & Business Media, 2008

Cédric Villani.Optimal Transport: Old and New, volume 338 ofGrundlehren der mathema- tischen Wissenschaften. Springer Science & Business Media, 2008. ISBN 9783540710509

2008

[35] [35]

Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez-Rodriguez, and Krishna P. Gummadi. Fairness constraints: A flexible approach for fair classification.Journal of Machine Learning Research, 20(75):1–42, 2019. URLhttp://jmlr.org/papers/v20/18-262.html. 14 Appendix Table of Contents A Proofs of propositions and lemmas 2 A.1 Equivalence between (3) and (Pλ,D)...

2019

[36] [36]

Then,(ν +∗, ν−∗) = (y∗ 1, y∗ 2)♯π∗ is a minimizer of(P λ,D)

: R×R→R 2 be a measurable map such that(y∗ 1(z1, z2), y∗ 2(z1, z2))realizes the infimum in (10) for all( z1, z2) ∈ R2 ×R 2. Then,(ν +∗, ν−∗) = (y∗ 1, y∗ 2)♯π∗ is a minimizer of(P λ,D). Proof of Lemma 2.To prove this equivalence, we first establish that the pointwise optimization constitutes a lower bound for the global problem, and then demonstrate that t...

[37] [37]

We can therefore push forwardπ∗ via the map(z1, z2) 7→ (z1, z2, y∗ 1, y∗ 2)and construct a joint measure ρ∗

: R2 ×R 2 →R 2 such that( y∗ 1(z1, z2), y∗ 2(z1, z2))is a minimizer ofΦ z1,z2 for all(z 1, z2)∈R×R. We can therefore push forwardπ∗ via the map(z1, z2) 7→ (z1, z2, y∗ 1, y∗ 2)and construct a joint measure ρ∗. The marginals ofρ∗ over its third and fouth coordinates (y1 and y2) define target distributions ν+∗ and ν−∗. Because ρ∗ constitutes a valid joint tr...

[38] [38]

On the one hand, plugging this value into the first-order conditions yield ( y∗ 1 = h1(1+λa2)+λh2a1 1+λ(a1+a2) , y∗ 2 = h2(1+λa1)+λh1a2 1+λ(a1+a2)

From the optimality conditions we have ( y∗ 1 −h 1 =−λa 1u, y∗ 2 −h 2 =λa 2u, Subtracting these identities givesu = (h1 −h 2) −λ (a1 + a2)u, which resolves tou = h1−h2 1+λ(a1+a2). On the one hand, plugging this value into the first-order conditions yield ( y∗ 1 = h1(1+λa2)+λh2a1 1+λ(a1+a2) , y∗ 2 = h2(1+λa1)+λh1a2 1+λ(a1+a2) . (12) On the other hand, plug...

[39] [39]

According to Lemma 2, a solution(ν+∗, ν−∗)of (Pλ,D) is given by the marginal distributions of (y∗ 1(Z1, Z2), y∗ 2(Z1, Z2))where( Z1, Z2) ∼π ∗ and π∗ is as in(5)

= (λa1u)2 a1 + (λa2u)2 a2 +λu 2 =λu2 (1 +λ(a 1 +a 2)) =λ (h1 −h 2)2 (1 +λ(a 1 +a 2))2 (1 +λ(a 1 +a 2)) = λ 1 +λ(a 1 +a 2)(h1 −h 2)2. According to Lemma 2, a solution(ν+∗, ν−∗)of (Pλ,D) is given by the marginal distributions of (y∗ 1(Z1, Z2), y∗ 2(Z1, Z2))where( Z1, Z2) ∼π ∗ and π∗ is as in(5). Our next goal is to show that under Assumption 1, the random v...

[40] [40]

for allz 1,ψ(z 1) = inf z2(ϕ(z2) +C λ,c2(z1, z2)); 2.π ∗ is supported on theCu λ,c2-subdifferential of a Kantorovich potentialψ, that is the set Γ ={(z 1, z2)∈ Z × Z:ψ(z 1)−ϕ(z 2) =C λ,c2(z1, z2)}. The arguments used to prove Lemma 3 in Divol and Gaucher[12] can be reproduced to show that under Assumption 1, the Kantorovich potentialψ is differentiable in...

[41] [41]

In particular, the two output laws become arbitrarily close inW2

Since R h2 |d| dµ± η,∆(z) = R X± η(x)2 dχ(x) ≤ E[η(X)2] <∞ , we haveW2 2(ν∗ 1,λ, ν∗ 2,λ) → 0as λ→ ∞ . In particular, the two output laws become arbitrarily close inW2. Therefore the relaxed solutions asymptotically enforce a single common output distribution; if the unrelaxed barycenter is unique, this common limit must coincide with the unrelaxed barycen...

[42] [42]

Thus Cλ,c0(z1, z2) = λ and thus y∗ 1(z1, z2) = h1 clearly only depends onz1

+λ−ϕ(z 2)≥λ. Thus Cλ,c0(z1, z2) = λ and thus y∗ 1(z1, z2) = h1 clearly only depends onz1. This shows that the mapT + λ,TV is well-definedµ + η,∆-almost everywhere. Similarly, we show that the mapT− λ,TV is well-definedµ − η,∆-almost everywhere. We then conclude using Lemma 1 to show that solutions of Equation (3) take the form described in the lemma. Let ...

2000

[43] [43]

For the Law School dataset, the initialW2 gap is approximately0.03

between the initial biased distributions. For the Law School dataset, the initialW2 gap is approximately0.03. The exact geometric cost to perfectly repair this disparity is therefore bounded by0.032 = 0.0009. Because the targets are scaled to [−1, 1], the base ERM risk is naturally around0.010. Adding the absolute maximum fairness penalty yields an expect...