Bias in Bios: A Case Study of Semantic Representation Bias in a High-Stakes Setting

Adam Tauman Kalai; Alexandra Chouldechova; Alexey Romanov; Christian Borgs; Hanna Wallach; Jennifer Chayes; Krishnaram Kenthapadi; Maria De-Arteaga; Sahin Geyik

arxiv: 1901.09451 · v1 · pith:ZYX6N3MCnew · submitted 2019-01-27 · 💻 cs.IR · cs.LG· stat.ML

Bias in Bios: A Case Study of Semantic Representation Bias in a High-Stakes Setting

Maria De-Arteaga , Alexey Romanov , Hanna Wallach , Jennifer Chayes , Christian Borgs , Alexandra Chouldechova , Sahin Geyik , Krishnaram Kenthapadi

show 1 more author

Adam Tauman Kalai

This is my paper

classification 💻 cs.IR cs.LGstat.ML

keywords biasgendersemanticclassificationexplicitimbalancesindicatorsoccupation

0 comments

read the original abstract

We present a large-scale study of gender bias in occupation classification, a task where the use of machine learning may lead to negative outcomes on peoples' lives. We analyze the potential allocation harms that can result from semantic representation bias. To do so, we study the impact on occupation classification of including explicit gender indicators---such as first names and pronouns---in different semantic representations of online biographies. Additionally, we quantify the bias that remains when these indicators are "scrubbed," and describe proxy behavior that occurs in the absence of explicit gender indicators. As we demonstrate, differences in true positive rates between genders are correlated with existing gender imbalances in occupations, which may compound these imbalances.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

AgentFairBench: Do LLM Agents Discriminate When They Act?
cs.AI 2026-06 unverdicted novelty 6.0

AgentFairBench is a multi-domain benchmark for demographic disparity in LLM agent actions, with a pilot showing no significant effect for Claude Haiku 4.5 after arity-matched noise correction.