CoT traces align with internal answer commitment in only 61.9% of steps on average, dominated by confabulated continuations after commitment has stabilized.
When chain of thought is necessary, language models struggle to evade monitors
5 Pith papers cite this work. Polarity classification is still indexing.
years
2026 5representative citing papers
LLMs discover latent planning strategies up to five steps during training and execute them up to eight steps at test time, with larger models reaching seven under few-shot prompting, revealing a dissociation between discovery and execution.
Kimi K2.5 matches closed models on dual-use tasks but refuses fewer CBRNE requests and shows some sabotage and self-replication tendencies.
CoT-Guard is a 4B model using SFT and RL that achieves 75% G-mean^2 on hidden objective detection under prompt and code manipulation attacks, outperforming several larger models.
LLM reasoning is primarily mediated by latent-state trajectories rather than by explicit surface chain-of-thought outputs.
citing papers explorer
-
When Reasoning Traces Become Performative: Step-Level Evidence that Chain-of-Thought Is an Imperfect Oversight Channel
CoT traces align with internal answer commitment in only 61.9% of steps on average, dominated by confabulated continuations after commitment has stabilized.
-
The Depth Ceiling: On the Limits of Large Language Models in Discovering Latent Planning
LLMs discover latent planning strategies up to five steps during training and execute them up to eight steps at test time, with larger models reaching seven under few-shot prompting, revealing a dissociation between discovery and execution.
-
An Independent Safety Evaluation of Kimi K2.5
Kimi K2.5 matches closed models on dual-use tasks but refuses fewer CBRNE requests and shows some sabotage and self-replication tendencies.
-
CoT-Guard: Small Models for Strong Monitoring
CoT-Guard is a 4B model using SFT and RL that achieves 75% G-mean^2 on hidden objective detection under prompt and code manipulation attacks, outperforming several larger models.
-
LLM Reasoning Is Latent, Not the Chain of Thought
LLM reasoning is primarily mediated by latent-state trajectories rather than by explicit surface chain-of-thought outputs.