Exposing the Illusion of Fairness: Auditing Vulnerabilities to Distributional Manipulation Attacks

Adriana Laurindo Monteiro; Emmanuelle Claeys; Jean-Michel Loubes; Laurent Risser; Valentin Lafargue

arxiv: 2507.20708 · v3 · pith:WUV5RZN2new · submitted 2025-07-28 · 💻 cs.LG · math.OC· stat.AP

Exposing the Illusion of Fairness: Auditing Vulnerabilities to Distributional Manipulation Attacks

Valentin Lafargue , Adriana Laurindo Monteiro , Emmanuelle Claeys , Laurent Risser , Jean-Michel Loubes This is my paper

classification 💻 cs.LG math.OCstat.AP

keywords fairnessdistributionalauditingmanipulationattacksauditeedistributionevaluate

0 comments

read the original abstract

The rapid deployment of AI systems in high-stakes domains, including those classified as high-risk under the The EU AI Act (Regulation (EU) 2024/1689), has intensified the need for reliable compliance auditing. For binary classifiers, regulatory risk assessment often relies on global fairness metrics such as the Disparate Impact ratio, widely used to evaluate potential discrimination. In typical auditing settings, the auditee provides a subset of its dataset to an auditor, while a supervisory authority may verify whether this subset is representative of the full underlying distribution. In this work, we investigate to what extent a malicious auditee can construct a fairness-compliant yet representative-looking sample from a non-compliant original distribution, thereby creating an illusion of fairness. We formalize this problem as a constrained distributional projection task and introduce mathematically grounded manipulation strategies based on entropic and optimal transport projections. These constructions characterize the minimal distributional shift required to satisfy fairness constraints. To counter such attacks, we formalize representativeness through distributional distance based statistical tests and systematically evaluate their ability to detect manipulated samples. Our analysis highlights the conditions under which fairness manipulation can remain statistically undetected and provides practical guidelines for strengthening supervisory verification. We validate our theoretical findings through experiments on standard tabular datasets for bias detection. Code is publicly available at https://github.com/ValentinLafargue/Inspection.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

The Evaluation Game: Beyond Static LLM Benchmarking
cs.LG 2026-05 unverdicted novelty 6.0

Presents a game-theoretic model with group actions for data augmentation in LLM adversarial evaluation, demonstrating local generalization from fine-tuning on three model families and redefining benchmarks as orbits u...
Fairness of Explanations in Artificial Intelligence (AI): A Unifying Framework, Axioms, and Future Direction toward Responsible AI
cs.AI 2026-05 unverdicted novelty 6.0

A conditional invariance framework defines explanation fairness as explanations being statistically independent of protected attributes given task-relevant features, unifying existing metrics and enabling procedural b...