Evaluating Gender Bias of LLM s in Making Morality Judgements

Bajaj, Divij, Lei, Yuanyuan, Tong, Jonathan, Huang, Ruihong · 2024 · DOI 10.18653/v1/2024.findings-emnlp.928

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open at publisher browse 2 citing papers

representative citing papers

Wait, am I Being Fair? Characterizing Deductive Stereotyping and Mitigating It with Fair-GCG

cs.CL · 2026-06-30 · unverdicted · novelty 6.0

The paper characterizes deductive stereotyping in LLMs and introduces Fair-GCG to discover injection phrases that improve fairness across benchmarks, reasoning, and real-world tasks.

It Takes One to Bias Them All: Breaking Bad with One-Shot GRPO

cs.CL · 2026-06-09 · unverdicted · novelty 5.0

One-shot GRPO on a single biased example induces generalizing stereotype bias in post-trained LLMs, with susceptibility varying by initial bias likelihood.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Wait, am I Being Fair? Characterizing Deductive Stereotyping and Mitigating It with Fair-GCG cs.CL · 2026-06-30 · unverdicted · none · ref 30
The paper characterizes deductive stereotyping in LLMs and introduces Fair-GCG to discover injection phrases that improve fairness across benchmarks, reasoning, and real-world tasks.
It Takes One to Bias Them All: Breaking Bad with One-Shot GRPO cs.CL · 2026-06-09 · unverdicted · none · ref 4
One-shot GRPO on a single biased example induces generalizing stereotype bias in post-trained LLMs, with susceptibility varying by initial bias likelihood.

Evaluating Gender Bias of LLM s in Making Morality Judgements

fields

years

verdicts

representative citing papers

citing papers explorer