Learning When to Sample: Confidence-Aware Selective Sampling for Efficient Chain-of-Thought Reasoning

Avinash Baidya; Bradley Malin; Chao Yan; Congning Ni; Juming Xiong; Katherine Brown; Kevin Guo; Weixin Liu; Xiang Gao; Zhijun Yin

arxiv: 2603.08999 · v4 · pith:5FRHRIQDnew · submitted 2026-03-09 · 💻 cs.CL

Learning When to Sample: Confidence-Aware Selective Sampling for Efficient Chain-of-Thought Reasoning

Juming Xiong , Kevin Guo , Congning Ni , Weixin Liu , Chao Yan , Katherine Brown , Avinash Baidya , Xiang Gao

show 2 more authors

Bradley Malin Zhijun Yin

This is my paper

classification 💻 cs.CL

keywords reasoningsamplingaccuracyframeworkmulti-pathselectivechain-of-thoughtconfidence-aware

0 comments

read the original abstract

Large language models (LLMs) can achieve strong reasoning performance through chain-of-thought (CoT) reasoning, yet they often generate unnecessarily long reasoning paths that incur high inference cost. Self-consistency-based approaches push accuracy higher still, but they require sampling and aggregating multiple reasoning trajectories, leading to substantial computational overhead. In this paper, we introduce a confidence-aware selective sampling framework that, at inference time, analyzes a single reasoning trajectory to adaptively determine whether to rely on that trajectory alone or trigger multi-path sampling. The framework uses trajectory-level numeric features and sentence-level linguistic features extracted from reasoning states to guide selective multi-path reasoning. We train it on MedQA and evaluate it in-domain on MedQA and under calibration-only transfer on MathQA, MedMCQA, and MMLU, without further fine-tuning. Experimental results show that the proposed framework maintains comparable performance to full and efficient multi-path reasoning baselines, with accuracy changes of $-0.41 \pm 0.58$ and $-0.31 \pm 0.58$ percentage points, respectively, while reducing token usage by $71.7 \pm 5.0%$ and $36.6 \pm 9.1%$. These findings demonstrate that reasoning trajectories contain rich signals for uncertainty estimation, enabling a simple, transferable mechanism to balance accuracy and efficiency in LLM reasoning.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

A Communication-Theoretic Framework for LLM Agents: Cost-Aware Adaptive Reliability
cs.LG 2026-05 unverdicted novelty 6.0

LLM reliability techniques are unified as communication channel operators, with a new cost-aware router achieving superior quality-cost tradeoffs on hard tasks.
Security Considerations for Multi-agent Systems
cs.CR 2026-03 unverdicted novelty 6.0

No existing AI security framework covers a majority of the 193 identified multi-agent system threats in any category, with OWASP Agentic Security Initiative achieving the highest overall coverage at 65.3%.