pith. machine review for the scientific record. sign in

arxiv: 2511.00751 · v2 · submitted 2025-11-02 · 💻 cs.AI · cs.CL

Recognition: unknown

Self-Consistency Is Losing Its Edge: Diminishing Returns and Rising Costs in Modern LLMs

Authors on Pith no claims yet
classification 💻 cs.AI cs.CL
keywords modelscostspathsproblemsself-consistencyalreadyfrequenthotpotqa
0
0 comments X
read the original abstract

Self-consistency -- sampling multiple reasoning paths and selecting the most frequent answer -- was designed for an era when language models made frequent, unpredictable errors. This study argues that the technique has become increasingly wasteful as models grow stronger, and may degrade performance on problems that modern models already solve reliably. Using Gemini 2.5 models on HotpotQA and MATH-500, we show that accuracy gains from increasing the number of sampled reasoning paths are minimal -- 0.4% on HotpotQA across 20 samples, and 1.6% on MATH-500 -- while token costs scale nearly linearly with sample count. Critically, performance plateaued early and in some configurations declined at high sample counts, suggesting that additional paths introduce noise rather than signal when models already solve problems reliably. As inference costs rise with model scale, indiscriminate self-consistency is difficult to justify. We recommend reserving multi-path sampling for problems that demonstrably exceed a model's single-pass reliability.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Skill-CMIB: Multimodal Agent Skill for Consistent Action via Conditional Multimodal Information Bottleneck

    cs.LG 2026-05 unverdicted novelty 7.0

    CMIB uses a conditional multimodal information bottleneck to create reusable agent skills that separate verbalizable text content from predictive perceptual residuals, improving execution stability.