Early Stopping Chain-of-thoughts in Large Language Models

Bowen Yin; Minjia Mao; Xiao Fang; Yu Zhu

arxiv: 2509.14004 · v2 · pith:35IQUVYLnew · submitted 2025-09-17 · 💻 cs.CL

Early Stopping Chain-of-thoughts in Large Language Models

Minjia Mao , Bowen Yin , Yu Zhu , Xiao Fang This is my paper

classification 💻 cs.CL

keywords answerreasoningconvergencelargemodelsstepanswerschain-of-thoughts

0 comments

read the original abstract

Reasoning large language models (LLMs) have demonstrated superior capacities in solving complicated problems by generating long chain-of-thoughts (CoT), but such a lengthy CoT incurs high inference costs. Previous methods on inference-stage efficient reasoning either require white-box models to monitor the reasoning process or are not reliable through direct prompting. In response, we introduce ES-CoT, an inference-time method that shortens CoT generation by detecting answer convergence and stopping early with almost no performance loss. When observing a linguistic marker (such as "wait") in the reasoning process, we prompt the LLM to output its current final answer, denoted as a step answer. We then track the run length of consecutive identical step answers as a measure of answer convergence. We show both empirically and theoretically that step answers steadily converge to the final answer, and large run-length jumps reliably mark this convergence. Experiments on six reasoning datasets across three LLMs show that ES-CoT reduces the number of inference tokens by 16.08% on average while maintaining accuracy comparable to standard CoT.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 8 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling
cs.CL 2026-05 conditional novelty 8.0

AutoTTS discovers width-depth test-time scaling controllers through agentic search in a pre-collected trajectory environment, yielding better accuracy-cost tradeoffs than hand-designed baselines on math reasoning task...
LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling
cs.CL 2026-05 unverdicted novelty 7.0

AutoTTS discovers superior test-time scaling strategies for LLMs via cheap controller synthesis in a pre-collected trajectory environment, outperforming manual baselines on math benchmarks with low discovery cost.
Large Reasoning Models Are (Not Yet) Multilingual Latent Reasoners
cs.CL 2026-01 unverdicted novelty 7.0

Large reasoning models exhibit multilingual latent reasoning that is uneven across languages but internally consistent and English-centered.
Stop When Reasoning Converges: Semantic-Preserving Early Exit for Reasoning Models
cs.CL 2026-05 unverdicted novelty 6.0

PUMA detects reasoning-level semantic redundancy to enable early exit in chains of thought, achieving 26.2% average token reduction across five LRMs and five benchmarks while preserving accuracy and CoT quality.
When Should an AI Workflow Release? Always-Valid Inference for Black-Box Generate-Verify Systems
stat.ML 2026-05 unverdicted novelty 6.0

A wrapper for black-box generate-verify AI pipelines that uses a conservative hard-negative reference pool and e-processes to control the probability of releasing on infeasible tasks while permitting release on feasible ones.
interwhen: A Generalizable Framework for Steering Reasoning Models with Test-time Verification
cs.LO 2026-02 unverdicted novelty 6.0

interwhen is a single-trajectory test-time verification system that polls reasoning traces, forks inference for intermediate states, synthesizes verifiers from policies including in Lean and z3, and steers models to n...
Conformal Thinking: Risk Control for Reasoning on a Compute Budget
cs.AI 2026-02 unverdicted novelty 6.0

Conformal risk control with upper and lower thresholds lets LLMs adaptively stop reasoning while guaranteeing a maximum error rate and minimizing token use.
Entropy After </Think> for reasoning model early exiting
cs.LG 2025-09 unverdicted novelty 6.0

Entropy After </Think> (EAT) enables early exiting in reasoning LLMs by tracking entropy stabilization after a </think> token, cutting token use 12-22% on MATH500 and AIME2025 with no accuracy loss.