pith. sign in

arxiv: 2509.14004 · v2 · pith:35IQUVYLnew · submitted 2025-09-17 · 💻 cs.CL

Early Stopping Chain-of-thoughts in Large Language Models

classification 💻 cs.CL
keywords answerreasoningconvergencelargemodelsstepanswerschain-of-thoughts
0
0 comments X
read the original abstract

Reasoning large language models (LLMs) have demonstrated superior capacities in solving complicated problems by generating long chain-of-thoughts (CoT), but such a lengthy CoT incurs high inference costs. Previous methods on inference-stage efficient reasoning either require white-box models to monitor the reasoning process or are not reliable through direct prompting. In response, we introduce ES-CoT, an inference-time method that shortens CoT generation by detecting answer convergence and stopping early with almost no performance loss. When observing a linguistic marker (such as "wait") in the reasoning process, we prompt the LLM to output its current final answer, denoted as a step answer. We then track the run length of consecutive identical step answers as a measure of answer convergence. We show both empirically and theoretically that step answers steadily converge to the final answer, and large run-length jumps reliably mark this convergence. Experiments on six reasoning datasets across three LLMs show that ES-CoT reduces the number of inference tokens by 16.08% on average while maintaining accuracy comparable to standard CoT.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 8 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling

    cs.CL 2026-05 conditional novelty 8.0

    AutoTTS discovers width-depth test-time scaling controllers through agentic search in a pre-collected trajectory environment, yielding better accuracy-cost tradeoffs than hand-designed baselines on math reasoning task...

  2. LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling

    cs.CL 2026-05 unverdicted novelty 7.0

    AutoTTS discovers superior test-time scaling strategies for LLMs via cheap controller synthesis in a pre-collected trajectory environment, outperforming manual baselines on math benchmarks with low discovery cost.

  3. Large Reasoning Models Are (Not Yet) Multilingual Latent Reasoners

    cs.CL 2026-01 unverdicted novelty 7.0

    Large reasoning models exhibit multilingual latent reasoning that is uneven across languages but internally consistent and English-centered.

  4. Stop When Reasoning Converges: Semantic-Preserving Early Exit for Reasoning Models

    cs.CL 2026-05 unverdicted novelty 6.0

    PUMA detects reasoning-level semantic redundancy to enable early exit in chains of thought, achieving 26.2% average token reduction across five LRMs and five benchmarks while preserving accuracy and CoT quality.

  5. When Should an AI Workflow Release? Always-Valid Inference for Black-Box Generate-Verify Systems

    stat.ML 2026-05 unverdicted novelty 6.0

    A wrapper for black-box generate-verify AI pipelines that uses a conservative hard-negative reference pool and e-processes to control the probability of releasing on infeasible tasks while permitting release on feasible ones.

  6. interwhen: A Generalizable Framework for Steering Reasoning Models with Test-time Verification

    cs.LO 2026-02 unverdicted novelty 6.0

    interwhen is a single-trajectory test-time verification system that polls reasoning traces, forks inference for intermediate states, synthesizes verifiers from policies including in Lean and z3, and steers models to n...

  7. Conformal Thinking: Risk Control for Reasoning on a Compute Budget

    cs.AI 2026-02 unverdicted novelty 6.0

    Conformal risk control with upper and lower thresholds lets LLMs adaptively stop reasoning while guaranteeing a maximum error rate and minimizing token use.

  8. Entropy After </Think> for reasoning model early exiting

    cs.LG 2025-09 unverdicted novelty 6.0

    Entropy After </Think> (EAT) enables early exiting in reasoning LLMs by tracking entropy stabilization after a </think> token, cutting token use 12-22% on MATH500 and AIME2025 with no accuracy loss.