Beacon is a new single-turn benchmark that measures latent sycophancy in LLMs, showing it decomposes into linguistic and affective sub-biases that scale with model capacity and can be modulated by prompt and activation interventions.
Activation steering in neural networks.Emergent Mind, 2025
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Beacon: Single-Turn Diagnosis and Mitigation of Latent Sycophancy in Large Language Models
Beacon is a new single-turn benchmark that measures latent sycophancy in LLMs, showing it decomposes into linguistic and affective sub-biases that scale with model capacity and can be modulated by prompt and activation interventions.