A benchmark for sycophancy in theorem proving with llms

Ivo Petrov, Jasper Dekoninck, Martin Vechev · 2025 · arXiv 2510.04721

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

ReCrit: Transition-Aware Reinforcement Learning for Scientific Critic Reasoning

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

ReCrit frames critic interaction as a correctness-transition problem and uses quadrant-based RL rewards to improve LLM performance on scientific reasoning benchmarks by rewarding corrections and robustness while penalizing sycophancy.

How LLMs Are Persuaded: A Few Attention Heads, Rerouted

cs.AI · 2026-05-10 · unverdicted · novelty 7.0

Persuasion in LLMs works by redirecting a small set of attention heads to copy the target option token instead of reasoning over evidence, via a rank-one routing feature that can be directly edited or removed.

Too Nice to Tell the Truth: Quantifying Agreeableness-Driven Sycophancy in Role-Playing Language Models

cs.CL · 2026-04-12 · unverdicted · novelty 7.0

Agreeableness in AI personas reliably predicts sycophantic behavior in 9 of 13 tested language models.

Diagnosing and Mitigating Sycophancy and Skepticism in LLM Causal Judgment

cs.AI · 2026-01-13 · unverdicted · novelty 6.0

Introduces the CAUSALT3 benchmark for causal reasoning across Pearl's ladder and Regulated Causal Anchoring (RCA) to reduce sycophancy and skepticism in LLMs via inference-time verification.

Beacon: Single-Turn Diagnosis and Mitigation of Latent Sycophancy in Large Language Models

cs.CL · 2025-10-19 · unverdicted · novelty 6.0

Beacon is a new single-turn benchmark that measures latent sycophancy in LLMs, showing it decomposes into linguistic and affective sub-biases that scale with model capacity and can be modulated by prompt and activation interventions.

Not All Proofs Are Equal: Evaluating LLM Proof Quality Beyond Correctness

cs.CL · 2026-05-11

citing papers explorer

Showing 6 of 6 citing papers.

ReCrit: Transition-Aware Reinforcement Learning for Scientific Critic Reasoning cs.LG · 2026-05-11 · unverdicted · none · ref 28
ReCrit frames critic interaction as a correctness-transition problem and uses quadrant-based RL rewards to improve LLM performance on scientific reasoning benchmarks by rewarding corrections and robustness while penalizing sycophancy.
How LLMs Are Persuaded: A Few Attention Heads, Rerouted cs.AI · 2026-05-10 · unverdicted · none · ref 24
Persuasion in LLMs works by redirecting a small set of attention heads to copy the target option token instead of reasoning over evidence, via a rank-one routing feature that can be directly edited or removed.
Too Nice to Tell the Truth: Quantifying Agreeableness-Driven Sycophancy in Role-Playing Language Models cs.CL · 2026-04-12 · unverdicted · none · ref 37
Agreeableness in AI personas reliably predicts sycophantic behavior in 9 of 13 tested language models.
Diagnosing and Mitigating Sycophancy and Skepticism in LLM Causal Judgment cs.AI · 2026-01-13 · unverdicted · none · ref 2
Introduces the CAUSALT3 benchmark for causal reasoning across Pearl's ladder and Regulated Causal Anchoring (RCA) to reduce sycophancy and skepticism in LLMs via inference-time verification.
Beacon: Single-Turn Diagnosis and Mitigation of Latent Sycophancy in Large Language Models cs.CL · 2025-10-19 · unverdicted · none · ref 12
Beacon is a new single-turn benchmark that measures latent sycophancy in LLMs, showing it decomposes into linguistic and affective sub-biases that scale with model capacity and can be modulated by prompt and activation interventions.
Not All Proofs Are Equal: Evaluating LLM Proof Quality Beyond Correctness cs.CL · 2026-05-11 · unreviewed · ref 16

A benchmark for sycophancy in theorem proving with llms

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer