ReCrit frames critic interaction as a correctness-transition problem and uses quadrant-based RL rewards to improve LLM performance on scientific reasoning benchmarks by rewarding corrections and robustness while penalizing sycophancy.
A benchmark for sycophancy in theorem proving with llms
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
Persuasion in LLMs works by redirecting a small set of attention heads to copy the target option token instead of reasoning over evidence, via a rank-one routing feature that can be directly edited or removed.
Agreeableness in AI personas reliably predicts sycophantic behavior in 9 of 13 tested language models.
Introduces the CAUSALT3 benchmark for causal reasoning across Pearl's ladder and Regulated Causal Anchoring (RCA) to reduce sycophancy and skepticism in LLMs via inference-time verification.
Beacon is a new single-turn benchmark that measures latent sycophancy in LLMs, showing it decomposes into linguistic and affective sub-biases that scale with model capacity and can be modulated by prompt and activation interventions.
citing papers explorer
-
ReCrit: Transition-Aware Reinforcement Learning for Scientific Critic Reasoning
ReCrit frames critic interaction as a correctness-transition problem and uses quadrant-based RL rewards to improve LLM performance on scientific reasoning benchmarks by rewarding corrections and robustness while penalizing sycophancy.
-
How LLMs Are Persuaded: A Few Attention Heads, Rerouted
Persuasion in LLMs works by redirecting a small set of attention heads to copy the target option token instead of reasoning over evidence, via a rank-one routing feature that can be directly edited or removed.
-
Too Nice to Tell the Truth: Quantifying Agreeableness-Driven Sycophancy in Role-Playing Language Models
Agreeableness in AI personas reliably predicts sycophantic behavior in 9 of 13 tested language models.
-
Diagnosing and Mitigating Sycophancy and Skepticism in LLM Causal Judgment
Introduces the CAUSALT3 benchmark for causal reasoning across Pearl's ladder and Regulated Causal Anchoring (RCA) to reduce sycophancy and skepticism in LLMs via inference-time verification.
-
Beacon: Single-Turn Diagnosis and Mitigation of Latent Sycophancy in Large Language Models
Beacon is a new single-turn benchmark that measures latent sycophancy in LLMs, showing it decomposes into linguistic and affective sub-biases that scale with model capacity and can be modulated by prompt and activation interventions.
- Not All Proofs Are Equal: Evaluating LLM Proof Quality Beyond Correctness