Boosting Self-Consistency with Ranking

Alexander Panchenko; Daniil Moskovskiy; Maria Marina; Mikhail Salnikov; Sergey Pletenev; Viktor Moskvoretskii

arxiv: 2606.05054 · v1 · pith:A5FO4DL4new · submitted 2026-06-03 · 💻 cs.CL

Boosting Self-Consistency with Ranking

Maria Marina , Daniil Moskovskiy , Sergey Pletenev , Mikhail Salnikov , Alexander Panchenko , Viktor Moskvoretskii This is my paper

Pith reviewed 2026-06-28 06:47 UTC · model grok-4.3

classification 💻 cs.CL

keywords self-consistencyrankinglanguage modelsreasoning pathsanswer selectionquestion answeringtest-time computation

0 comments

The pith

A lightweight ranking model trained on five features improves answer selection over majority voting in self-consistency.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Self-consistency samples multiple reasoning paths from a language model and selects the answer that appears most often, yet this majority vote frequently overlooks correct answers that are already present among the samples. The paper replaces that vote with Ranking-Improved Self-Consistency (RISC), which casts answer selection as a ranking task solved by a lightweight LambdaRank model. The model scores each candidate answer using five features that together capture frequency, semantic centrality, reasoning-trace consistency and two additional signals. Across three datasets and varying sampling budgets, the ranking approach yields higher accuracy for the same computational cost than standard self-consistency or other baselines, with the clearest gains on question-answering tasks. The features turn out to be individually useful and mutually complementary, indicating that learning to combine them is more effective than relying on any single signal.

Core claim

RISC reformulates answer selection inside self-consistency as a ranking problem. A lightweight LambdaRank model scores candidate answers according to five hand-designed features that measure answer frequency, semantic centrality, reasoning-trace consistency and two further signals. When tested on three datasets under a range of test-time budgets, this ranking-based selector consistently produces a better accuracy-efficiency trade-off than majority voting and strong baselines, with particularly large gains on question-answering benchmarks.

What carries the argument

The Ranking-Improved Self-Consistency (RISC) procedure, which replaces majority voting with a LambdaRank model that ranks answers by a linear combination of five complementary features.

If this is right

RISC delivers higher accuracy than majority voting for any fixed number of sampled reasoning paths.
The gains are largest on question-answering benchmarks and remain stable across different test-time budgets.
Each of the five features contributes useful information, yet their combination yields further improvement.
The ranking formulation works without requiring changes to the underlying language model or additional training data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same feature set and ranking step could be applied to other sampling-based generation tasks where a single best output must be chosen from many candidates.
Because the model is lightweight, the method may transfer to new domains with only modest additional labeled examples.
Future experiments could test whether the learned ranking weights remain stable when the base language model is swapped for a different architecture.

Load-bearing premise

The five hand-designed features supply complementary signals that a small LambdaRank model can learn to combine into accurate rankings without needing large amounts of task-specific labeled data or domain tuning.

What would settle it

Evaluating RISC on a new dataset under matched sampling budgets and finding that its accuracy no longer exceeds that of majority voting would falsify the central claim.

Figures

Figures reproduced from arXiv: 2606.05054 by Alexander Panchenko, Daniil Moskovskiy, Maria Marina, Mikhail Salnikov, Sergey Pletenev, Viktor Moskvoretskii.

**Figure 1.** Figure 1: Accuracy versus the number of sampled responses on PopQA for self-consistency and RISC. RISC consistently achieves higher accuracy while substantially reducing computational cost: with only 18 samples, it already surpasses the performance of selfconsistency with 99 samples. It also delivers systematic accuracy gains over self-consistency across the full range of LLM-call budgets. In this paper, we focus… view at source ↗

**Figure 2.** Figure 2: Comparison of RISC against Self-Consistency, Stable Rank, ReASC, and CISC on three datasets for [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Example visualization of the answer-centroid [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Example visualization of the worst-step co [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 6.** Figure 6: Feature ablation across three datasets. Heatmaps show the percentage change in mean accuracy, averaged over budgets 1–99 LLM calls, relative to the full ranker. Diagonal cells represent single-feature ablations, and lower-triangular cells represent two-feature ablations. Raw mean accuracies for the ablated models are shown in parentheses. Cells are colored by effect type: diagonal cells with an absolute dr… view at source ↗

**Figure 7.** Figure 7: Out-of-domain transfer. Mean ranker quality over 1–99 LLM calls for each train–test dataset pair. Diagonal cells show in-domain performance; offdiagonal cells show out-of-domain transfer. Cells are colored by relative transfer quality: green indicates performance within 10% of the target dataset’s in-domain score, while red indicates larger degradation. Transfer remains competitive on HotpotQA and PopQ… view at source ↗

**Figure 8.** Figure 8: SHAP dependence plots for selected features [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗

**Figure 9.** Figure 9: SHAP feature importance plots for three datasets, illustrating the average impact of selected features on the model output across the evaluated samples [PITH_FULL_IMAGE:figures/full_fig_p012_9.png] view at source ↗

**Figure 12.** Figure 12: Prompt template used for self-consistency [PITH_FULL_IMAGE:figures/full_fig_p013_12.png] view at source ↗

**Figure 11.** Figure 11: Prompt template used for the MATH500 dataset. The system prompt establishes the role of a mathematical assistant, while the user instructions enforce a chain-of-thought structure with a specific \boxed{} format for the final answer extraction. E ReASC algorithm Given an input x, ReASC (Kim et al., 2026) samples responses yi ∼ pθ(· | x) with extracted answers ai = Ans(yi) and confidence scores si = S(yi).… view at source ↗

**Figure 13.** Figure 13: Comparison of RISC against Self-Consistency, Stable Rank, ReASC, and CISC on three datasets for [PITH_FULL_IMAGE:figures/full_fig_p016_13.png] view at source ↗

read the original abstract

Self-consistency improves large language models by sampling multiple reasoning paths and selecting the most frequent answer, but majority voting often fails to recover correct answers that are already present among the samples. We address this limitation with Ranking-Improved Self-Consistency (RISC), which reformulates answer selection in self-consistency as a ranking problem. Instead of relying on a single uncertainty or confidence signal, RISC uses a lightweight LambdaRank model to score candidate answers with five carefully designed features that capture answer frequency, semantic centrality, and reasoning-trace consistency. We evaluate RISC on three datasets under a range of test-time budgets. Across datasets, RISC consistently achieves a better accuracy-efficiency trade-off than standard self-consistency and strong baselines, with particularly large gains on question answering benchmarks. Further analysis shows that the proposed features are individually useful and, more importantly, complementary, highlighting the value of learning to combine multiple informative signals for test-time answer selection.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

RISC swaps majority vote for a LambdaRank on five hand-designed features and reports better accuracy per sample on three datasets, but the gains rest on those features staying complementary with little labeled data.

read the letter

The core move here is turning answer selection into a ranking problem instead of counting votes. They train a lightweight LambdaRank on five features—frequency, semantic centrality, reasoning-trace consistency, and two others—to score candidates from the same set of samples that self-consistency would use.

This is new enough in its application to self-consistency. The paper runs the method on three datasets across different sampling budgets and claims a better accuracy-efficiency curve than plain self-consistency and other baselines, with the largest lift on QA tasks. They also include an analysis showing the features add value individually and together.

The soft spot is exactly the one the stress-test flags. The features are hand-crafted, and the method needs some labeled data to fit the ranker. The abstract says the features are complementary, but it does not show whether that complementarity survives after training or only appears in the training distribution, nor how much labeled data is required when moving to new domains. Without those numbers it is difficult to know if the approach stays lightweight in practice.

The work is aimed at people already tuning test-time sampling for reasoning models. It has a clear method, concrete comparisons, and a testable claim, so it is worth sending to referees even if the improvements turn out modest once the training details are examined.

Referee Report

3 major / 2 minor

Summary. The paper proposes Ranking-Improved Self-Consistency (RISC), which recasts answer selection within self-consistency as a learning-to-rank task. A lightweight LambdaRank model is trained on five hand-designed features (frequency, semantic centrality, reasoning-trace consistency, and two others) to score and select among sampled reasoning paths. The central claim is that RISC yields a superior accuracy-efficiency trade-off compared with majority-vote self-consistency and other baselines across three datasets, with especially large gains on question-answering benchmarks; an additional analysis asserts that the five features are individually useful and complementary.

Significance. If the reported gains are shown to be robust to data-split choices, feature-selection leakage, and cross-dataset generalization with modest labeled data, the approach would offer a practical, model-agnostic way to improve test-time reasoning without retraining the underlying LLM. The explicit use of multiple complementary signals via a learned ranker is a natural and potentially reusable idea, though its impact hinges on whether the complementarity survives proper out-of-sample evaluation.

major comments (3)

[§4.2, Table 3] §4.2 and Table 3: the reported accuracy numbers for RISC versus self-consistency are presented without error bars, statistical significance tests, or the exact sizes of the labeled sets used to train LambdaRank on each dataset. Without these quantities it is impossible to determine whether the claimed consistent gains exceed sampling noise or require per-dataset supervision that violates the “limited labeled data, no domain-specific tuning” premise.
[§5.1] §5.1 (feature analysis): the claim that the five features are “complementary” rests on an in-sample analysis; the manuscript does not report feature-correlation matrices, ablation results under cross-validation, or performance when the ranker is trained on one dataset and tested on another. If the signals are redundant once the model is fitted, the advantage over simple frequency counting disappears.
[§3.2] §3.2 (LambdaRank training): the description of how positive/negative pairs are constructed for the ranking loss does not specify whether the same held-out test samples used for final evaluation ever leak into the ranking training set. Any such overlap would render the accuracy-efficiency comparison circular.

minor comments (2)

The abstract states “particularly large gains on question answering benchmarks” but never names the three datasets or the exact budgets; adding these identifiers would improve readability.
Notation for the five features is introduced only in prose; a compact table listing each feature, its mathematical definition, and its intended signal would reduce ambiguity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments highlighting the need for greater statistical rigor and clearer experimental details. We address each major comment below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses

Referee: [§4.2, Table 3] §4.2 and Table 3: the reported accuracy numbers for RISC versus self-consistency are presented without error bars, statistical significance tests, or the exact sizes of the labeled sets used to train LambdaRank on each dataset. Without these quantities it is impossible to determine whether the claimed consistent gains exceed sampling noise or require per-dataset supervision that violates the “limited labeled data, no domain-specific tuning” premise.

Authors: We agree that error bars, significance tests, and explicit reporting of labeled-set sizes are necessary to substantiate the gains. In the revision we will add standard errors over multiple sampling runs, paired statistical tests, and the precise sizes of the per-dataset training splits for LambdaRank (a few hundred examples each). These sizes remain modest and do not involve LLM fine-tuning, preserving the limited-supervision premise while allowing per-dataset ranker training. revision: yes
Referee: [§5.1] §5.1 (feature analysis): the claim that the five features are “complementary” rests on an in-sample analysis; the manuscript does not report feature-correlation matrices, ablation results under cross-validation, or performance when the ranker is trained on one dataset and tested on another. If the signals are redundant once the model is fitted, the advantage over simple frequency counting disappears.

Authors: The current complementarity analysis is indeed in-sample. We will revise §5.1 to include (i) feature-correlation matrices, (ii) ablation results obtained via cross-validation on each dataset, and (iii) a cross-dataset transfer experiment in which the ranker is trained on one dataset and evaluated on the others. These additions will provide out-of-sample evidence that the features remain complementary. revision: yes
Referee: [§3.2] §3.2 (LambdaRank training): the description of how positive/negative pairs are constructed for the ranking loss does not specify whether the same held-out test samples used for final evaluation ever leak into the ranking training set. Any such overlap would render the accuracy-efficiency comparison circular.

Authors: No test-set leakage occurs: LambdaRank is trained exclusively on a separate held-out labeled split whose examples are never used in the final test evaluation. We will expand §3.2 to explicitly document this partitioning and the construction of positive/negative pairs, removing any ambiguity. revision: yes

Circularity Check

0 steps flagged

No circularity: method uses external evaluation on held-out data

full rationale

The paper describes sampling reasoning paths, extracting five hand-designed features, training a LambdaRank ranker on labeled data, and evaluating accuracy on test sets. No equations, self-citations, or steps reduce a reported result to a fitted parameter or prior self-work by construction. Complementarity is presented as an empirical finding from analysis, not a definitional input. The derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated. The method implicitly assumes that a small supervised ranker can be trained on the same task distribution without introducing new entities.

pith-pipeline@v0.9.1-grok · 5701 in / 1099 out tokens · 16698 ms · 2026-06-28T06:47:22.771631+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

263 extracted references · 109 canonical work pages · 10 internal anchors

[2]

Reliability-Aware Adaptive Self-Consistency for Efficient Sampling in LLM Reasoning

Junseok Kim and Nakyeong Yang and Kyungmin Min and Kyomin Jung , year = 2026, journal =. doi:10.48550/ARXIV.2601.02970 , url =. 2601.02970 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2601.02970 2026
[3]

Chi and Quoc V

Jason Wei and Xuezhi Wang and Dale Schuurmans and Maarten Bosma and Brian Ichter and Fei Xia and Ed H. Chi and Quoc V. Le and Denny Zhou , year = 2022, booktitle =

2022
[4]

doi:10.18653/V1/P17-1171 , url =

Danqi Chen and Adam Fisch and Jason Weston and Antoine Bordes , year = 2017, booktitle =. doi:10.18653/V1/P17-1171 , url =

work page doi:10.18653/v1/p17-1171 2017
[5]

Le and Ed H

Xuezhi Wang and Jason Wei and Dale Schuurmans and Quoc V. Le and Ed H. Chi and Sharan Narang and Aakanksha Chowdhery and Denny Zhou , year = 2023, booktitle =

2023
[6]

doi: 10.18653/v1/2023.acl-long.546

Alex Mallen and Akari Asai and Victor Zhong and Rajarshi Das and Daniel Khashabi and Hannaneh Hajishirzi , year = 2023, booktitle =. doi:10.18653/V1/2023.ACL-LONG.546 , url =

work page doi:10.18653/v1/2023.acl-long.546 2023
[7]

Aho and Jeffrey D

Alfred V. Aho and Jeffrey D. Ullman , year = 1972, publisher =

1972
[8]

Chandra and Dexter C

Ashok K. Chandra and Dexter C. Kozen and Larry J. Stockmeyer , year = 1981, journal =

1981
[9]

Andrew, Galen and Gao, Jianfeng , year = 2007, booktitle =

2007
[10]

Dan Gusfield , year = 1997, publisher =

1997
[11]

Tetreault , year = 2015, journal =

Mohammad Sadegh Rasooli and Joel R. Tetreault , year = 2015, journal =

2015
[12]

Ando, Rie Kubota and Zhang, Tong , year = 2005, month = dec, journal =

2005
[13]

doi:10.18653/V1/2021.FINDINGS-ACL.188 , url =

Wenhui Wang and Hangbo Bao and Shaohan Huang and Li Dong and Furu Wei , year = 2021, booktitle =. doi:10.18653/V1/2021.FINDINGS-ACL.188 , url =

work page doi:10.18653/v1/2021.findings-acl.188 2021
[15]

2512.02807 , archiveprefix =

Yixuan Tang and Yi Yang , year = 2025, url =. 2512.02807 , archiveprefix =

arXiv 2025
[16]

Yangzhen Wu and Zhiqing Sun and Shanda Li and Sean Welleck and Yiming Yang , year = 2025, booktitle =

2025
[17]

Jimenez and John Yang and Alexander Wettig and Shunyu Yao and Kexin Pei and Ofir Press and Karthik R

Carlos E. Jimenez and John Yang and Alexander Wettig and Shunyu Yao and Kexin Pei and Ofir Press and Karthik R. Narasimhan , year = 2024, booktitle =

2024
[18]

2311.17311 , archiveprefix =

Xinyun Chen and Renat Aksitov and Uri Alon and Jie Ren and Kefan Xiao and Pengcheng Yin and Sushant Prakash and Charles Sutton and Xuezhi Wang and Denny Zhou , year = 2023, url =. 2311.17311 , archiveprefix =

arXiv 2023
[19]

2505.10772 , archiveprefix =

Weiqin Wang and Yile Wang and Hui Huang , year = 2025, url =. 2505.10772 , archiveprefix =

arXiv 2025
[20]

Abril and Robert Plant , year = 2007, month = jan, journal =

Patricia S. Abril and Robert Plant , year = 2007, month = jan, journal =. doi:10.1145/1188913.1188915 , url =

work page doi:10.1145/1188913.1188915 2007
[21]

doi:10.1145/1219092.1219093 , url =

Sarah Cohen and Werner Nutt and Yehoshua Sagic , year = 2007, month = apr, journal =. doi:10.1145/1219092.1219093 , url =

work page doi:10.1145/1219092.1219093 2007
[22]

David Kosiur , year = 2001, publisher =

2001
[24]

doi:10.1007/3-540-09237-4 , url =

work page doi:10.1007/3-540-09237-4
[25]

Spector , year = 1990, booktitle =

Asad Z. Spector , year = 1990, booktitle =. doi:10.1145/90417.90738 , url =

work page doi:10.1145/90417.90738 1990
[26]

Douglass and David Harel and Mark B

Bruce P. Douglass and David Harel and Mark B. Trakhtenbrot , year = 1998, booktitle =. doi:10.1007/3-540-65193-4_29 , url =

work page doi:10.1007/3-540-65193-4_29 1998
[27]

Knuth , year = 1997, publisher =

Donald E. Knuth , year = 1997, publisher =

1997
[28]

Knuth , year = 1998, publisher =

Donald E. Knuth , year = 1998, publisher =

1998
[29]

Dan Geiger and Christopher Meek , year = 2005, month = jan, booktitle =

2005
[30]

Smith , year = 2010, booktitle =

Stan W. Smith , year = 2010, booktitle =

2010
[31]

Matthew Van Gundy and Davide Balzarotti and Giovanni Vigna , year = 2007, booktitle =

2007
[32]

Matthew Van Gundy and Davide Balzarotti and Giovanni Vigna , year = 2008, booktitle =

2008
[33]

Matthew Van Gundy and Davide Balzarotti and Giovanni Vigna , year = 2009, booktitle =

2009
[34]

doi:10.1145/567752.567774 , url =

Sten Andler , year = 1979, booktitle =. doi:10.1145/567752.567774 , url =

work page doi:10.1145/567752.567774 1979
[35]

David Harel , year = 1978, address =

1978
[36]

Anisi , year = 2003, school =

David A. Anisi , year = 2003, school =

2003
[37]

Clarkson , year = 1985, address =

Kenneth L. Clarkson , year = 1985, address =

1985
[38]

Harry Thornburg , year = 2001, month = mar, url =

2001
[39]

OpenAI o1 System Card

OpenAI , year = 2024, journal =. doi:10.48550/ARXIV.2412.16720 , url =. 2412.16720 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2412.16720 2024
[40]

Daya Guo and Dejian Yang and Haowei Zhang and Junxiao Song and Peiyi Wang and Qihao Zhu and Runxin Xu and Ruoyu Zhang and Shirong Ma and Xiao Bi and Xiaokang Zhang and Xingkai Yu and Yu Wu and Z. F. Wu and Zhibin Gou and Zhihong Shao and Zhuoshu Li and Ziyi Gao and Aixin Liu and Bing Xue and Bingxuan Wang and Bochao Wu and Bei Feng and Chengda Lu and Chen...

work page doi:10.1038/s41586-025-09422-z 2025
[41]

Min and Yangruibo Ding and Luca Buratti and Saurabh Pujar and Gail E

Marcus J. Min and Yangruibo Ding and Luca Buratti and Saurabh Pujar and Gail E. Kaiser and Suman Jana and Baishakhi Ray , year = 2024, booktitle =

2024
[42]

Amballa, Avinash and Parashar, Aditya and Singh, Aditya Vikram and Lai, Jinlin and Rozonoyer, Benjamin , year = 2025, booktitle =

2025
[43]

doi:10.48550/ARXIV.2512.02807 , url =

Yixuan Tang and Yi Yang , year = 2025, journal =. doi:10.48550/ARXIV.2512.02807 , url =. 2512.02807 , timestamp =

work page doi:10.48550/arxiv.2512.02807 2025
[44]

Sutton , year = 2019, note =

Richard S. Sutton , year = 2019, note =

2019
[45]

doi:10.48550/ARXIV.2504.10478 , url =

Xingyu Dang and Christina Baek and Kaiyue Wen and Zico Kolter and Aditi Raghunathan , year = 2025, journal =. doi:10.48550/ARXIV.2504.10478 , url =. 2504.10478 , timestamp =

work page doi:10.48550/arxiv.2504.10478 2025
[46]

doi:10.48550/ARXIV.2603.01025 , url =

Zhan Zhuang and Xiequn Wang and Zebin Chen and Feiyang Ye and Ying Wei and Kede Ma and Yu Zhang , year = 2026, journal =. doi:10.48550/ARXIV.2603.01025 , url =. 2603.01025 , timestamp =

work page doi:10.48550/arxiv.2603.01025 2026
[47]

doi:10.48550/ARXIV.2503.08681 , url =

Viktor Moskvoretskii and Chris Biemann and Irina Nikishina , year = 2025, journal =. doi:10.48550/ARXIV.2503.08681 , url =. 2503.08681 , timestamp =

work page doi:10.48550/arxiv.2503.08681 2025
[49]

Shunyu Yao and Dian Yu and Jeffrey Zhao and Izhak Shafran and Tom Griffiths and Yuan Cao and Karthik Narasimhan , year = 2023, booktitle =

2023
[50]

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

Charlie Snell and Jaehoon Lee and Kelvin Xu and Aviral Kumar , year = 2024, journal =. doi:10.48550/ARXIV.2408.03314 , url =. 2408.03314 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2408.03314 2024
[51]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing,

Niklas Muennighoff and Zitong Yang and Weijia Shi and Xiang Lisa Li and Li Fei. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing,. doi:10.18653/V1/2025.EMNLP-MAIN.1025 , url =

work page doi:10.18653/v1/2025.emnlp-main.1025 2025
[52]

doi:10.18653/V1/2024.FINDINGS-ACL.297 , url =

Ryan Park and Rafael Rafailov and Stefano Ermon and Chelsea Finn , year = 2024, booktitle =. doi:10.18653/V1/2024.FINDINGS-ACL.297 , url =

work page doi:10.18653/v1/2024.findings-acl.297 2024
[53]

doi:10.48550/ARXIV.2310.03716 , url =

Prasann Singhal and Tanya Goyal and Jiacheng Xu and Greg Durrett , year = 2023, journal =. doi:10.48550/ARXIV.2310.03716 , url =. 2310.03716 , timestamp =

work page doi:10.48550/arxiv.2310.03716 2023
[54]

doi:10.48550/ARXIV.2310.06271 , url =

Ziwei Ji and Tiezheng Yu and Yan Xu and Nayeon Lee and Etsuko Ishii and Pascale Fung , year = 2023, journal =. doi:10.48550/ARXIV.2310.06271 , url =. 2310.06271 , timestamp =

work page doi:10.48550/arxiv.2310.06271 2023
[55]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers),

Han Wang and Archiki Prasad and Elias Stengel. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers),. doi:10.18653/V1/2024.ACL-SHORT.28 , url =

work page doi:10.18653/v1/2024.acl-short.28 2024
[56]

Littman and Richard S

Michael L. Littman and Richard S. Sutton and Satinder Singh , year = 2001, booktitle =

2001
[57]

Amir Taubenfeld and Tom Sheffer and Eran Ofek and Amir Feder and Ariel Goldstein and Zorik Gekhman and Gal Yona , year = 2025, booktitle =

2025
[58]

ACM SIGGRAPH 2003 Video Review on Animation theater Program: Part I - Vol

Dave Novak , year = 2003, month =. ACM SIGGRAPH 2003 Video Review on Animation theater Program: Part I - Vol. 145 (July 27--27, 2003) , publisher =. doi:10.945/woot07-S422 , url =

2003
[59]

doi:10.1145/1057270.1057278 , url =

Newton Lee , year = 2005, month =. doi:10.1145/1057270.1057278 , url =

work page doi:10.1145/1057270.1057278 2005
[60]

Bernard Rous , year = 2008, month = jul, journal =

2008
[62]

doi:10.1145/351827.384253 , issn =

Werneck, Renato and Setubal, Jo\. doi:10.1145/351827.384253 , issn =

work page doi:10.1145/351827.384253
[64]

and Mei, Alessandro , year = 2009, month = oct, journal =

Conti, Mauro and Di Pietro, Roberto and Mancini, Luigi V. and Mei, Alessandro , year = 2009, month = oct, journal =. doi:10.1016/j.inffus.2009.01.002 , issn =

work page doi:10.1016/j.inffus.2009.01.002 2009
[65]

and Hutchful, David K

Li, Cheng-Lun and Buyuktur, Ayse G. and Hutchful, David K. and Sant, Natasha B. and Nainwal, Satyendra K. , year = 2008, booktitle =. doi:10.1145/1358628.1358946 , isbn =

work page doi:10.1145/1358628.1358946 2008
[66]

, year = 1999, publisher =

Hollis, Billy S. , year = 1999, publisher =

1999
[67]

Goossens, Michel and Rahtz, S. P. and Moore, Ross and Sutor, Robert S. , year = 1999, publisher =

1999
[68]

and Rosenberg, Arnold L

Buss, Jonathan F. and Rosenberg, Arnold L. and Knott, Judson D. , year = 1987, publisher =

1987
[69]

, year = 2008, location =

2008
[70]

Clarkson, Kenneth Lee , year = 1985, address =

1985
[71]

doi:http://dx.doi.org/10.1109/ICWS.2004.64 , isbn =

Proceedings of the IEEE International Conference on Web Services , publisher =. doi:http://dx.doi.org/10.1109/ICWS.2004.64 , isbn =

work page doi:10.1109/icws.2004.64 2004
[72]

, year = 1986, publisher =

Petrie, Charles J. , year = 1986, publisher =

1986
[73]

, year = 1986, address =

Petrie, Charles J. , year = 1986, address =

1986
[74]

Knuth , year = 1981, publisher =

Donald E. Knuth , year = 1981, publisher =

1981
[75]

Kong, Wei-Chang , year = 2001, booktitle =

2001
[76]

Kong, Wei-Chang , year = 2001, publisher =

2001
[77]

Kong, Wei-Chang , year = 2002, booktitle =

2002
[78]

Kong, Wei-Chang , year = 2003, booktitle =

2003
[79]

Kong, Wei-Chang , year = 2004, publisher =

2004
[80]

Kong, Wei-Chang , year = 2005, publisher =

2005
[81]

Kong, Wei-Chang , year = 2006, publisher =

2006
[82]

Mehdi Saeedi and Morteza Saheb Zamani and Mehdi Sedighi , year = 2010, month = apr, journal =

2010
[83]

Mehdi Saeedi and Morteza Saheb Zamani and Mehdi Sedighi and Zahra Sasanian , year = 2010, month = dec, journal =

2010
[84]

doi:https://doi.org/10.1137/080734467 , issn =

Kirschmer, Markus and Voight, John , year = 2010, month = jan, journal =. doi:https://doi.org/10.1137/080734467 , issn =

work page doi:10.1137/080734467 2010
[85]

Hoare, C. A. R. , year = 1972, booktitle =

1972
[86]

doi:http://doi.acm.org/10.1145/800025.1198348 , isbn =

Lee, Jan , year = 1981, booktitle =. doi:http://doi.acm.org/10.1145/800025.1198348 , isbn =

work page doi:10.1145/800025.1198348 1981

Showing first 80 references.

[1] [2]

Reliability-Aware Adaptive Self-Consistency for Efficient Sampling in LLM Reasoning

Junseok Kim and Nakyeong Yang and Kyungmin Min and Kyomin Jung , year = 2026, journal =. doi:10.48550/ARXIV.2601.02970 , url =. 2601.02970 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2601.02970 2026

[2] [3]

Chi and Quoc V

Jason Wei and Xuezhi Wang and Dale Schuurmans and Maarten Bosma and Brian Ichter and Fei Xia and Ed H. Chi and Quoc V. Le and Denny Zhou , year = 2022, booktitle =

2022

[3] [4]

doi:10.18653/V1/P17-1171 , url =

Danqi Chen and Adam Fisch and Jason Weston and Antoine Bordes , year = 2017, booktitle =. doi:10.18653/V1/P17-1171 , url =

work page doi:10.18653/v1/p17-1171 2017

[4] [5]

Le and Ed H

Xuezhi Wang and Jason Wei and Dale Schuurmans and Quoc V. Le and Ed H. Chi and Sharan Narang and Aakanksha Chowdhery and Denny Zhou , year = 2023, booktitle =

2023

[5] [6]

doi: 10.18653/v1/2023.acl-long.546

Alex Mallen and Akari Asai and Victor Zhong and Rajarshi Das and Daniel Khashabi and Hannaneh Hajishirzi , year = 2023, booktitle =. doi:10.18653/V1/2023.ACL-LONG.546 , url =

work page doi:10.18653/v1/2023.acl-long.546 2023

[6] [7]

Aho and Jeffrey D

Alfred V. Aho and Jeffrey D. Ullman , year = 1972, publisher =

1972

[7] [8]

Chandra and Dexter C

Ashok K. Chandra and Dexter C. Kozen and Larry J. Stockmeyer , year = 1981, journal =

1981

[8] [9]

Andrew, Galen and Gao, Jianfeng , year = 2007, booktitle =

2007

[9] [10]

Dan Gusfield , year = 1997, publisher =

1997

[10] [11]

Tetreault , year = 2015, journal =

Mohammad Sadegh Rasooli and Joel R. Tetreault , year = 2015, journal =

2015

[11] [12]

Ando, Rie Kubota and Zhang, Tong , year = 2005, month = dec, journal =

2005

[12] [13]

doi:10.18653/V1/2021.FINDINGS-ACL.188 , url =

Wenhui Wang and Hangbo Bao and Shaohan Huang and Li Dong and Furu Wei , year = 2021, booktitle =. doi:10.18653/V1/2021.FINDINGS-ACL.188 , url =

work page doi:10.18653/v1/2021.findings-acl.188 2021

[13] [15]

2512.02807 , archiveprefix =

Yixuan Tang and Yi Yang , year = 2025, url =. 2512.02807 , archiveprefix =

arXiv 2025

[14] [16]

Yangzhen Wu and Zhiqing Sun and Shanda Li and Sean Welleck and Yiming Yang , year = 2025, booktitle =

2025

[15] [17]

Jimenez and John Yang and Alexander Wettig and Shunyu Yao and Kexin Pei and Ofir Press and Karthik R

Carlos E. Jimenez and John Yang and Alexander Wettig and Shunyu Yao and Kexin Pei and Ofir Press and Karthik R. Narasimhan , year = 2024, booktitle =

2024

[16] [18]

2311.17311 , archiveprefix =

Xinyun Chen and Renat Aksitov and Uri Alon and Jie Ren and Kefan Xiao and Pengcheng Yin and Sushant Prakash and Charles Sutton and Xuezhi Wang and Denny Zhou , year = 2023, url =. 2311.17311 , archiveprefix =

arXiv 2023

[17] [19]

2505.10772 , archiveprefix =

Weiqin Wang and Yile Wang and Hui Huang , year = 2025, url =. 2505.10772 , archiveprefix =

arXiv 2025

[18] [20]

Abril and Robert Plant , year = 2007, month = jan, journal =

Patricia S. Abril and Robert Plant , year = 2007, month = jan, journal =. doi:10.1145/1188913.1188915 , url =

work page doi:10.1145/1188913.1188915 2007

[19] [21]

doi:10.1145/1219092.1219093 , url =

Sarah Cohen and Werner Nutt and Yehoshua Sagic , year = 2007, month = apr, journal =. doi:10.1145/1219092.1219093 , url =

work page doi:10.1145/1219092.1219093 2007

[20] [22]

David Kosiur , year = 2001, publisher =

2001

[21] [24]

doi:10.1007/3-540-09237-4 , url =

work page doi:10.1007/3-540-09237-4

[22] [25]

Spector , year = 1990, booktitle =

Asad Z. Spector , year = 1990, booktitle =. doi:10.1145/90417.90738 , url =

work page doi:10.1145/90417.90738 1990

[23] [26]

Douglass and David Harel and Mark B

Bruce P. Douglass and David Harel and Mark B. Trakhtenbrot , year = 1998, booktitle =. doi:10.1007/3-540-65193-4_29 , url =

work page doi:10.1007/3-540-65193-4_29 1998

[24] [27]

Knuth , year = 1997, publisher =

Donald E. Knuth , year = 1997, publisher =

1997

[25] [28]

Knuth , year = 1998, publisher =

Donald E. Knuth , year = 1998, publisher =

1998

[26] [29]

Dan Geiger and Christopher Meek , year = 2005, month = jan, booktitle =

2005

[27] [30]

Smith , year = 2010, booktitle =

Stan W. Smith , year = 2010, booktitle =

2010

[28] [31]

Matthew Van Gundy and Davide Balzarotti and Giovanni Vigna , year = 2007, booktitle =

2007

[29] [32]

Matthew Van Gundy and Davide Balzarotti and Giovanni Vigna , year = 2008, booktitle =

2008

[30] [33]

Matthew Van Gundy and Davide Balzarotti and Giovanni Vigna , year = 2009, booktitle =

2009

[31] [34]

doi:10.1145/567752.567774 , url =

Sten Andler , year = 1979, booktitle =. doi:10.1145/567752.567774 , url =

work page doi:10.1145/567752.567774 1979

[32] [35]

David Harel , year = 1978, address =

1978

[33] [36]

Anisi , year = 2003, school =

David A. Anisi , year = 2003, school =

2003

[34] [37]

Clarkson , year = 1985, address =

Kenneth L. Clarkson , year = 1985, address =

1985

[35] [38]

Harry Thornburg , year = 2001, month = mar, url =

2001

[36] [39]

OpenAI o1 System Card

OpenAI , year = 2024, journal =. doi:10.48550/ARXIV.2412.16720 , url =. 2412.16720 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2412.16720 2024

[37] [40]

Daya Guo and Dejian Yang and Haowei Zhang and Junxiao Song and Peiyi Wang and Qihao Zhu and Runxin Xu and Ruoyu Zhang and Shirong Ma and Xiao Bi and Xiaokang Zhang and Xingkai Yu and Yu Wu and Z. F. Wu and Zhibin Gou and Zhihong Shao and Zhuoshu Li and Ziyi Gao and Aixin Liu and Bing Xue and Bingxuan Wang and Bochao Wu and Bei Feng and Chengda Lu and Chen...

work page doi:10.1038/s41586-025-09422-z 2025

[38] [41]

Min and Yangruibo Ding and Luca Buratti and Saurabh Pujar and Gail E

Marcus J. Min and Yangruibo Ding and Luca Buratti and Saurabh Pujar and Gail E. Kaiser and Suman Jana and Baishakhi Ray , year = 2024, booktitle =

2024

[39] [42]

Amballa, Avinash and Parashar, Aditya and Singh, Aditya Vikram and Lai, Jinlin and Rozonoyer, Benjamin , year = 2025, booktitle =

2025

[40] [43]

doi:10.48550/ARXIV.2512.02807 , url =

Yixuan Tang and Yi Yang , year = 2025, journal =. doi:10.48550/ARXIV.2512.02807 , url =. 2512.02807 , timestamp =

work page doi:10.48550/arxiv.2512.02807 2025

[41] [44]

Sutton , year = 2019, note =

Richard S. Sutton , year = 2019, note =

2019

[42] [45]

doi:10.48550/ARXIV.2504.10478 , url =

Xingyu Dang and Christina Baek and Kaiyue Wen and Zico Kolter and Aditi Raghunathan , year = 2025, journal =. doi:10.48550/ARXIV.2504.10478 , url =. 2504.10478 , timestamp =

work page doi:10.48550/arxiv.2504.10478 2025

[43] [46]

doi:10.48550/ARXIV.2603.01025 , url =

Zhan Zhuang and Xiequn Wang and Zebin Chen and Feiyang Ye and Ying Wei and Kede Ma and Yu Zhang , year = 2026, journal =. doi:10.48550/ARXIV.2603.01025 , url =. 2603.01025 , timestamp =

work page doi:10.48550/arxiv.2603.01025 2026

[44] [47]

doi:10.48550/ARXIV.2503.08681 , url =

Viktor Moskvoretskii and Chris Biemann and Irina Nikishina , year = 2025, journal =. doi:10.48550/ARXIV.2503.08681 , url =. 2503.08681 , timestamp =

work page doi:10.48550/arxiv.2503.08681 2025

[45] [49]

Shunyu Yao and Dian Yu and Jeffrey Zhao and Izhak Shafran and Tom Griffiths and Yuan Cao and Karthik Narasimhan , year = 2023, booktitle =

2023

[46] [50]

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

Charlie Snell and Jaehoon Lee and Kelvin Xu and Aviral Kumar , year = 2024, journal =. doi:10.48550/ARXIV.2408.03314 , url =. 2408.03314 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2408.03314 2024

[47] [51]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing,

Niklas Muennighoff and Zitong Yang and Weijia Shi and Xiang Lisa Li and Li Fei. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing,. doi:10.18653/V1/2025.EMNLP-MAIN.1025 , url =

work page doi:10.18653/v1/2025.emnlp-main.1025 2025

[48] [52]

doi:10.18653/V1/2024.FINDINGS-ACL.297 , url =

Ryan Park and Rafael Rafailov and Stefano Ermon and Chelsea Finn , year = 2024, booktitle =. doi:10.18653/V1/2024.FINDINGS-ACL.297 , url =

work page doi:10.18653/v1/2024.findings-acl.297 2024

[49] [53]

doi:10.48550/ARXIV.2310.03716 , url =

Prasann Singhal and Tanya Goyal and Jiacheng Xu and Greg Durrett , year = 2023, journal =. doi:10.48550/ARXIV.2310.03716 , url =. 2310.03716 , timestamp =

work page doi:10.48550/arxiv.2310.03716 2023

[50] [54]

doi:10.48550/ARXIV.2310.06271 , url =

Ziwei Ji and Tiezheng Yu and Yan Xu and Nayeon Lee and Etsuko Ishii and Pascale Fung , year = 2023, journal =. doi:10.48550/ARXIV.2310.06271 , url =. 2310.06271 , timestamp =

work page doi:10.48550/arxiv.2310.06271 2023

[51] [55]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers),

Han Wang and Archiki Prasad and Elias Stengel. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers),. doi:10.18653/V1/2024.ACL-SHORT.28 , url =

work page doi:10.18653/v1/2024.acl-short.28 2024

[52] [56]

Littman and Richard S

Michael L. Littman and Richard S. Sutton and Satinder Singh , year = 2001, booktitle =

2001

[53] [57]

Amir Taubenfeld and Tom Sheffer and Eran Ofek and Amir Feder and Ariel Goldstein and Zorik Gekhman and Gal Yona , year = 2025, booktitle =

2025

[54] [58]

ACM SIGGRAPH 2003 Video Review on Animation theater Program: Part I - Vol

Dave Novak , year = 2003, month =. ACM SIGGRAPH 2003 Video Review on Animation theater Program: Part I - Vol. 145 (July 27--27, 2003) , publisher =. doi:10.945/woot07-S422 , url =

2003

[55] [59]

doi:10.1145/1057270.1057278 , url =

Newton Lee , year = 2005, month =. doi:10.1145/1057270.1057278 , url =

work page doi:10.1145/1057270.1057278 2005

[56] [60]

Bernard Rous , year = 2008, month = jul, journal =

2008

[57] [62]

doi:10.1145/351827.384253 , issn =

Werneck, Renato and Setubal, Jo\. doi:10.1145/351827.384253 , issn =

work page doi:10.1145/351827.384253

[58] [64]

and Mei, Alessandro , year = 2009, month = oct, journal =

Conti, Mauro and Di Pietro, Roberto and Mancini, Luigi V. and Mei, Alessandro , year = 2009, month = oct, journal =. doi:10.1016/j.inffus.2009.01.002 , issn =

work page doi:10.1016/j.inffus.2009.01.002 2009

[59] [65]

and Hutchful, David K

Li, Cheng-Lun and Buyuktur, Ayse G. and Hutchful, David K. and Sant, Natasha B. and Nainwal, Satyendra K. , year = 2008, booktitle =. doi:10.1145/1358628.1358946 , isbn =

work page doi:10.1145/1358628.1358946 2008

[60] [66]

, year = 1999, publisher =

Hollis, Billy S. , year = 1999, publisher =

1999

[61] [67]

Goossens, Michel and Rahtz, S. P. and Moore, Ross and Sutor, Robert S. , year = 1999, publisher =

1999

[62] [68]

and Rosenberg, Arnold L

Buss, Jonathan F. and Rosenberg, Arnold L. and Knott, Judson D. , year = 1987, publisher =

1987

[63] [69]

, year = 2008, location =

2008

[64] [70]

Clarkson, Kenneth Lee , year = 1985, address =

1985

[65] [71]

doi:http://dx.doi.org/10.1109/ICWS.2004.64 , isbn =

Proceedings of the IEEE International Conference on Web Services , publisher =. doi:http://dx.doi.org/10.1109/ICWS.2004.64 , isbn =

work page doi:10.1109/icws.2004.64 2004

[66] [72]

, year = 1986, publisher =

Petrie, Charles J. , year = 1986, publisher =

1986

[67] [73]

, year = 1986, address =

Petrie, Charles J. , year = 1986, address =

1986

[68] [74]

Knuth , year = 1981, publisher =

Donald E. Knuth , year = 1981, publisher =

1981

[69] [75]

Kong, Wei-Chang , year = 2001, booktitle =

2001

[70] [76]

Kong, Wei-Chang , year = 2001, publisher =

2001

[71] [77]

Kong, Wei-Chang , year = 2002, booktitle =

2002

[72] [78]

Kong, Wei-Chang , year = 2003, booktitle =

2003

[73] [79]

Kong, Wei-Chang , year = 2004, publisher =

2004

[74] [80]

Kong, Wei-Chang , year = 2005, publisher =

2005

[75] [81]

Kong, Wei-Chang , year = 2006, publisher =

2006

[76] [82]

Mehdi Saeedi and Morteza Saheb Zamani and Mehdi Sedighi , year = 2010, month = apr, journal =

2010

[77] [83]

Mehdi Saeedi and Morteza Saheb Zamani and Mehdi Sedighi and Zahra Sasanian , year = 2010, month = dec, journal =

2010

[78] [84]

doi:https://doi.org/10.1137/080734467 , issn =

Kirschmer, Markus and Voight, John , year = 2010, month = jan, journal =. doi:https://doi.org/10.1137/080734467 , issn =

work page doi:10.1137/080734467 2010

[79] [85]

Hoare, C. A. R. , year = 1972, booktitle =

1972

[80] [86]

doi:http://doi.acm.org/10.1145/800025.1198348 , isbn =

Lee, Jan , year = 1981, booktitle =. doi:http://doi.acm.org/10.1145/800025.1198348 , isbn =

work page doi:10.1145/800025.1198348 1981