Activation steering in neural networks.Emergent Mind, 2025

Ahmed Hegazy, Daniel Postmus · 2025

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Beacon: Single-Turn Diagnosis and Mitigation of Latent Sycophancy in Large Language Models

cs.CL · 2025-10-19 · unverdicted · novelty 6.0

Beacon is a new single-turn benchmark that measures latent sycophancy in LLMs, showing it decomposes into linguistic and affective sub-biases that scale with model capacity and can be modulated by prompt and activation interventions.

citing papers explorer

Showing 1 of 1 citing paper.

Beacon: Single-Turn Diagnosis and Mitigation of Latent Sycophancy in Large Language Models cs.CL · 2025-10-19 · unverdicted · none · ref 4
Beacon is a new single-turn benchmark that measures latent sycophancy in LLMs, showing it decomposes into linguistic and affective sub-biases that scale with model capacity and can be modulated by prompt and activation interventions.

Activation steering in neural networks.Emergent Mind, 2025

fields

years

verdicts

representative citing papers

citing papers explorer