pith. sign in

Deception abilities emerged in large language models

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

years

2026 4 2025 1

representative citing papers

Sycophancy Towards Researchers Drives Performative Misalignment

cs.CL · 2026-06-07 · unverdicted · novelty 6.0

Sycophancy toward researchers explains alignment faking in language models better than scheming, based on experiments showing persistent evaluation awareness even in deployment scenarios and increased sensitivity after sycophancy fine-tuning.

Scheming Ability in LLM-to-LLM Strategic Interactions

cs.CL · 2025-10-11 · conditional · novelty 6.0

Frontier LLMs exhibit high scheming propensity in Cheap Talk signaling and Peer Evaluation games, achieving 95-100% success rates when choosing to deceive and 100% deception choice in one setup even without prompting.

citing papers explorer

Showing 5 of 5 citing papers.