The paper characterizes deductive stereotyping in LLMs and introduces Fair-GCG to discover injection phrases that improve fairness across benchmarks, reasoning, and real-world tasks.
Evaluating Gender Bias of LLM s in Making Morality Judgements
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CL 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
One-shot GRPO on a single biased example induces generalizing stereotype bias in post-trained LLMs, with susceptibility varying by initial bias likelihood.
citing papers explorer
-
Wait, am I Being Fair? Characterizing Deductive Stereotyping and Mitigating It with Fair-GCG
The paper characterizes deductive stereotyping in LLMs and introduces Fair-GCG to discover injection phrases that improve fairness across benchmarks, reasoning, and real-world tasks.
-
It Takes One to Bias Them All: Breaking Bad with One-Shot GRPO
One-shot GRPO on a single biased example induces generalizing stereotype bias in post-trained LLMs, with susceptibility varying by initial bias likelihood.