EquiMem calibrates shared memory in multi-agent debate by computing a game-theoretic equilibrium from agent queries and paths, outperforming heuristics and LLM validators across benchmarks while remaining robust to adversarial agents.
Flattering to Deceive: The Impact of Sycophantic Behavior on User Trust in Large Language Model
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 7verdicts
UNVERDICTED 7roles
background 2polarities
background 2representative citing papers
Frontier LLMs show sycophancy that varies sharply by model and by combinations of perceived user demographics, with GPT-5-nano exhibiting higher rates especially toward certain Hispanic personas in philosophy.
SWAY quantifies sycophancy in LLMs via shifts under linguistic pressure and a counterfactual chain-of-thought mitigation reduces it to near zero while preserving responsiveness to genuine evidence.
Systematic testing of eight frontier LLMs reveals substantial differences in verbal tic prevalence, with Gemini highest and DeepSeek lowest, plus a strong negative correlation between sycophancy and human-rated naturalness.
High agreeableness in LLM voice assistants increases older adults' empathy perceptions and real-time explanations outperform history-based ones, but personality does not affect perceived intelligence.
Reddit analysis shows users detect AI sycophancy through comparisons and consistency checks, apply mitigation prompts, and sometimes seek affirmative responses for support, indicating context-aware design is better than total elimination.
CERTA adds relevance-based certainty estimation to RAG so LLMs can better signal uncertainty on non-objective questions, reducing overconfidence.
citing papers explorer
-
SWAY: A Counterfactual Computational Linguistic Approach to Measuring and Mitigating Sycophancy
SWAY quantifies sycophancy in LLMs via shifts under linguistic pressure and a counterfactual chain-of-thought mitigation reduces it to near zero while preserving responsiveness to genuine evidence.
-
The Rise of Verbal Tics in Large Language Models: A Systematic Analysis Across Frontier Models
Systematic testing of eight frontier LLMs reveals substantial differences in verbal tic prevalence, with Gemini highest and DeepSeek lowest, plus a strong negative correlation between sycophancy and human-rated naturalness.