On Second Thought, Let ' s Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning

Shaikh, Omar, Zhang, Hongxin, Held, William, Bernstein, Michael, Yang, Diyi · 2023 · DOI 10.18653/v1/2023.acl-long.244

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open at publisher browse 4 citing papers

representative citing papers

Wait, am I Being Fair? Characterizing Deductive Stereotyping and Mitigating It with Fair-GCG

cs.CL · 2026-06-30 · unverdicted · novelty 6.0

The paper characterizes deductive stereotyping in LLMs and introduces Fair-GCG to discover injection phrases that improve fairness across benchmarks, reasoning, and real-world tasks.

On the Limits of LLM Adaptability: Impact of Model-Internalized Priors on Annotation Task Performance

cs.CL · 2026-05-30 · unverdicted · novelty 6.0

LLMs correct only 34.8% of zero-shot annotation errors via prompting, and Definition-Specific Familiarity correlates positively with performance (partial r = +0.41) while memorization metrics do not.

"I understand your perspective": LLM Persuasion and Sycophancy through the Lens of Communicative Action Theory

cs.CL · 2026-06-06 · unverdicted · novelty 5.0

LLMs outperform humans in expressing illocutionary intents and sycophancy in successful persuasive counter-arguments from ChangeMyView, with crowd workers preferring LLM versions.

Characterize Then Distill: Mechanistic Reasoning in Large Output Spaces

cs.CL · 2026-06-05 · unverdicted · novelty 5.0

Reasoning in large output spaces proceeds via shortlisting then fine-grained reasoning; this characterization enables a mechanistic distillation strategy that outperforms standard distillation.

citing papers explorer

Showing 4 of 4 citing papers after filters.

Wait, am I Being Fair? Characterizing Deductive Stereotyping and Mitigating It with Fair-GCG cs.CL · 2026-06-30 · unverdicted · none · ref 4
The paper characterizes deductive stereotyping in LLMs and introduces Fair-GCG to discover injection phrases that improve fairness across benchmarks, reasoning, and real-world tasks.
On the Limits of LLM Adaptability: Impact of Model-Internalized Priors on Annotation Task Performance cs.CL · 2026-05-30 · unverdicted · none · ref 41
LLMs correct only 34.8% of zero-shot annotation errors via prompting, and Definition-Specific Familiarity correlates positively with performance (partial r = +0.41) while memorization metrics do not.
"I understand your perspective": LLM Persuasion and Sycophancy through the Lens of Communicative Action Theory cs.CL · 2026-06-06 · unverdicted · none · ref 79
LLMs outperform humans in expressing illocutionary intents and sycophancy in successful persuasive counter-arguments from ChangeMyView, with crowd workers preferring LLM versions.
Characterize Then Distill: Mechanistic Reasoning in Large Output Spaces cs.CL · 2026-06-05 · unverdicted · none · ref 127
Reasoning in large output spaces proceeds via shortlisting then fine-grained reasoning; this characterization enables a mechanistic distillation strategy that outperforms standard distillation.

On Second Thought, Let ' s Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning

fields

years

verdicts

representative citing papers

citing papers explorer