and Aponte, Ryan and Rossi, Ryan A

Gallegos, Isabel O · 2025 · DOI 10.18653/v1/2025.naacl-short.74

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open at publisher browse 2 citing papers

representative citing papers

Wait, am I Being Fair? Characterizing Deductive Stereotyping and Mitigating It with Fair-GCG

cs.CL · 2026-06-30 · unverdicted · novelty 6.0

The paper characterizes deductive stereotyping in LLMs and introduces Fair-GCG to discover injection phrases that improve fairness across benchmarks, reasoning, and real-world tasks.

Understanding the Self-Reflection Mechanisms of LLMs through Biased Attitude Associations

cs.SI · 2026-05-30 · unverdicted · novelty 4.0

ReBias-Lens shows LLM self-reflection produces layer-wise smoothing of global valence fluctuations that reduces behavioral bias overall, yet selectively locks in and amplifies certain category-specific biases.

citing papers explorer

Showing 2 of 2 citing papers.

Wait, am I Being Fair? Characterizing Deductive Stereotyping and Mitigating It with Fair-GCG cs.CL · 2026-06-30 · unverdicted · none · ref 103
The paper characterizes deductive stereotyping in LLMs and introduces Fair-GCG to discover injection phrases that improve fairness across benchmarks, reasoning, and real-world tasks.
Understanding the Self-Reflection Mechanisms of LLMs through Biased Attitude Associations cs.SI · 2026-05-30 · unverdicted · none · ref 69
ReBias-Lens shows LLM self-reflection produces layer-wise smoothing of global valence fluctuations that reduces behavioral bias overall, yet selectively locks in and amplifies certain category-specific biases.

and Aponte, Ryan and Rossi, Ryan A

fields

years

verdicts

representative citing papers

citing papers explorer