pith. sign in

arxiv: 2511.02603 · v2 · pith:EULKGU3Jnew · submitted 2025-11-04 · 💻 cs.CL

CGES: Confidence-Guided Early Stopping for Efficient and Accurate Self-Consistency

classification 💻 cs.CL
keywords cgesself-consistencyansweraveragecallsconfidence-guidedearlynumber
0
0 comments X
read the original abstract

Large language models (LLMs) are often queried multiple times at test time, with predictions aggregated by majority vote. While effective, this self-consistency (Wang et al., 2023) strategy requires a fixed number of calls and fails when the correct answer is infrequent. We introduce Confidence-Guided Early Stopping (CGES), a Bayesian framework that forms posteriors over candidate answers and adaptively halts sampling once one answer accumulates enough posterior mass. We prove guarantees in both an ideal calibrated regime and a realistic noisy-confidence regime under a directional drift condition. Averaged over five reasoning benchmarks, CGES reduces the average number of calls by 58% on average (from 16.0 to 6.7) while matching its accuracy within 0.4 percentage points of self-consistency.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. CITE: Anytime-Valid Statistical Inference in LLM Self-Consistency

    stat.ML 2026-05 unverdicted novelty 7.0

    CITE certifies that a prespecified answer is the unique mode of an LLM response distribution with anytime-valid error control under arbitrary data-driven stopping and without prior knowledge of the answer set.