Escape sky-high cost: Early-stopping self-consistency for multi-step reasoning

Escape sky-high cost: Early-stopping selfconsistency for multi-step reasoning · 2025 · arXiv 2401.10480

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling

cs.CL · 2026-05-08 · conditional · novelty 8.0 · 2 refs

AutoTTS discovers width-depth test-time scaling controllers through agentic search in a pre-collected trajectory environment, yielding better accuracy-cost tradeoffs than hand-designed baselines on math reasoning tasks at low cost.

When Should an AI Workflow Release? Always-Valid Inference for Black-Box Generate-Verify Systems

stat.ML · 2026-05-13 · unverdicted · novelty 6.0

A wrapper for black-box generate-verify AI pipelines that uses a conservative hard-negative reference pool and e-processes to control the probability of releasing on infeasible tasks while permitting release on feasible ones.

Breaking the Reward Barrier: Accelerating Tree-of-Thought Reasoning via Speculative Exploration

cs.LG · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

SPEX delivers 1.2-3x speedup on ToT algorithms via speculative path selection, dynamic budget allocation, and adaptive early termination, reaching up to 4.1x when combined with token-level speculative decoding.

Test-time Scaling over Perception: Resolving the Grounding Paradox in Thinking with Images

cs.CV · 2026-04-13 · unverdicted · novelty 5.0

TTSP resolves the Grounding Paradox by treating perception as a scalable test-time process that generates, filters, and iteratively refines multiple visual exploration traces, outperforming baselines on high-resolution and multimodal reasoning tasks.

Early Stopping Chain-of-thoughts in Large Language Models

cs.CL · 2025-09-17 · conditional · novelty 5.0

ES-CoT shortens LLM chain-of-thought generation by tracking runs of identical step answers after linguistic markers, cutting tokens 16% on average while keeping accuracy comparable to full CoT across six datasets and three models.

Exploring the System 1 Thinking Capability of Large Reasoning Models

cs.CL · 2025-04-14 · unverdicted · novelty 5.0

LRMs underperform on simple system 1 questions in both accuracy and efficiency, with problem difficulty implicitly encoded in early hidden states.

citing papers explorer

Showing 6 of 6 citing papers.

LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling cs.CL · 2026-05-08 · conditional · none · ref 3 · 2 links
AutoTTS discovers width-depth test-time scaling controllers through agentic search in a pre-collected trajectory environment, yielding better accuracy-cost tradeoffs than hand-designed baselines on math reasoning tasks at low cost.
When Should an AI Workflow Release? Always-Valid Inference for Black-Box Generate-Verify Systems stat.ML · 2026-05-13 · unverdicted · none · ref 11
A wrapper for black-box generate-verify AI pipelines that uses a conservative hard-negative reference pool and e-processes to control the probability of releasing on infeasible tasks while permitting release on feasible ones.
Breaking the Reward Barrier: Accelerating Tree-of-Thought Reasoning via Speculative Exploration cs.LG · 2026-05-11 · unverdicted · none · ref 32 · 2 links
SPEX delivers 1.2-3x speedup on ToT algorithms via speculative path selection, dynamic budget allocation, and adaptive early termination, reaching up to 4.1x when combined with token-level speculative decoding.
Test-time Scaling over Perception: Resolving the Grounding Paradox in Thinking with Images cs.CV · 2026-04-13 · unverdicted · none · ref 20
TTSP resolves the Grounding Paradox by treating perception as a scalable test-time process that generates, filters, and iteratively refines multiple visual exploration traces, outperforming baselines on high-resolution and multimodal reasoning tasks.
Early Stopping Chain-of-thoughts in Large Language Models cs.CL · 2025-09-17 · conditional · none · ref 7
ES-CoT shortens LLM chain-of-thought generation by tracking runs of identical step answers after linguistic markers, cutting tokens 16% on average while keeping accuracy comparable to full CoT across six datasets and three models.
Exploring the System 1 Thinking Capability of Large Reasoning Models cs.CL · 2025-04-14 · unverdicted · none · ref 6
LRMs underperform on simple system 1 questions in both accuracy and efficiency, with problem difficulty implicitly encoded in early hidden states.

Escape sky-high cost: Early-stopping self-consistency for multi-step reasoning

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer