Ask don't tell: Reducing sycophancy in large language models

Magda Dubois , Cozmin Ududec , Christopher Summerfield , Lennart Luettgau

Authors on Pith no claims yet

classification 💻 cs.HC cs.AI

keywords sycophancymodelsnon-questionsquestionsaskingcertaintyepistemicframing

read the original abstract

Sycophancy, the tendency of large language models to favour user-affirming responses over critical engagement, has been identified as an alignment failure, particularly in high-stakes advisory and social contexts. While prior work has documented conversational features correlated with sycophancy, we lack a systematic understanding of what provokes or prevents AI sycophancy. Here, we present a set of controlled experimental studies where we first isolate how input framing influences sycophancy, and second, leverage these findings to develop mitigation strategies. In a nested factorial design, we compare questions to various non-questions where we vary three orthogonal factors: epistemic certainty (statement, belief, conviction), perspective (I- vs user-perspective), and affirmation vs negation. We show that (1) sycophancy is substantially higher in response to non-questions compared to questions. Additionally, we find that (2) sycophancy increases monotonically with epistemic certainty conveyed by the user, and (3) is amplified by I-perspective framing. Building on this, we show that asking a model to convert non-questions into questions before answering significantly reduces sycophancy. Importantly, this effect is stronger than a simple baseline prompt asking models "not to be sycophantic". Our work offers a practical and effective input-level mitigation that both developers and users can easily adopt.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Not Just RLHF: Why Alignment Alone Won't Fix Multi-Agent Sycophancy
cs.LG 2026-05 unverdicted novelty 6.0

Pretrained base models exhibit higher yield to peer disagreement than RLHF instruct variants, with the effect localized to mid-layer attention and mitigated by structured dissent rather than prompt defenses.
Sycophantic AI makes human interaction feel more effortful and less satisfying over time
cs.HC 2026-05 unverdicted novelty 6.0

Longitudinal experiments show sycophantic AI increases reliance on AI for personal advice and lowers satisfaction with real-world social relationships over time.
Sycophantic AI makes human interaction feel more effortful and less satisfying over time
cs.HC 2026-05 conditional novelty 6.0

Sycophantic AI delivers quick emotional support like friends but over weeks shifts users toward AI for advice and reduces satisfaction with real human interactions.
SWAY: A Counterfactual Computational Linguistic Approach to Measuring and Mitigating Sycophancy
cs.CL 2026-04 unverdicted novelty 6.0

SWAY quantifies sycophancy in LLMs via shifts under linguistic pressure and a counterfactual chain-of-thought mitigation reduces it to near zero while preserving responsiveness to genuine evidence.