arxiv: 2605.05592 · v1 · submitted 2026-05-07 · 💻 cs.LG · cs.IT· math.IT

Recognition: unknown

When Can Voting Help, Hurt, or Change Course? Exact Structure of Binary Test-Time Aggregation

Yi Liu

Authors on Pith no claims yet

Pith reviewed 2026-05-09 15:44 UTC · model grok-4.3

classification 💻 cs.LG cs.ITmath.IT

keywords majority votingtest-time aggregationexchangeable correctnessde Finetti representationHausdorff momentslatent distributionbinary predictionnonmonotone curves

0 comments

The pith

The complete odd-budget voting curve is equivalent to a signed voting signature that records excess latent mass above the majority threshold at each binomial scale.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Classical theory treats majority voting on repeated predictions as monotone, with more votes always helping above the 50 percent threshold. This paper shows the picture is incomplete once correctness is modeled as exchangeable draws from a latent distribution over per-example success probabilities. Simple mixtures of that latent law can produce nonmonotone curves with arbitrarily many trend reversals. The central theorem establishes that the full curve of success rates for odd numbers of votes is exactly equivalent to a signed signature: its increments are signed Hausdorff moments, and the curve recovers the signature uniquely. This object captures, at each scale, how much extra latent probability mass sits above versus below the majority threshold.

Core claim

Under the de Finetti representation for exchangeable repeated correctness, the voting success curve for odd budgets is equivalent to the signed voting signature, where increments are signed Hausdorff moments and the full curve recovers the signature uniquely. This object records, at each binomial variance scale, the excess latent mass above rather than below the majority threshold.

What carries the argument

The signed voting signature, which at each binomial variance scale records excess latent mass above rather than below the majority threshold and is recovered from the odd-budget curve via signed Hausdorff moments.

If this is right

Even simple two-point mixtures of the latent correctness distribution can generate voting curves with infinitely many direction changes.
The curve determines the signature uniquely but leaves the full latent law nonidentifiable, producing branch-symmetric families of distributions that agree on all voting outcomes.
Direct per-example success-probability observations target the entire signature, while fixed-depth grouped labels reveal only a finite prefix of it.
Endpoint rates, realizability, and variation in voting performance are all governed by properties of the signature rather than any single competence parameter.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Practitioners could estimate the signature directly from data to decide in advance whether adding more votes will help or hurt on a given task.
The moment-based view suggests designing new aggregation rules that target only the recoverable signature instead of attempting to recover the full latent distribution.
The same signature machinery may apply to other exchangeable binary aggregation settings, such as repeated sensor readings or ensemble predictions with shared latent factors.

Load-bearing premise

The de Finetti representation for exchangeable repeated correctness holds, so that voting is governed by a latent distribution of per-example correctness probabilities.

What would settle it

A concrete latent distribution whose odd-budget voting curve cannot be expressed as the signed Hausdorff moments of any signed measure on [0,1], or two different signatures that produce identical curves.

Figures

Figures reproduced from arXiv: 2605.05592 by Yi Liu.

**Figure 1.** Figure 1: Different behaviors of voting curves, plotted against the vote-budget index view at source ↗

read the original abstract

Majority voting is one of the few black-box interventions that can improve a fixed stochastic predictor: repeated access can be cheaper than changing a high-capability model. Classical fixed-competence theory makes this intervention look monotone -- more votes help above the majority threshold and hurt below it. We show that this picture is fundamentally incomplete. Under the de Finetti representation for exchangeable repeated correctness, voting is governed by a latent distribution of per-example correctness probabilities. Even simple latent mixtures can generate sharply different voting curves, including nonmonotone behavior and, in an explicit construction, infinitely many trend changes. The full latent law determines the curve, but the curve does not determine the law. The exact object recovered by voting is a signed voting signature: at each binomial variance scale, it records excess latent mass above rather than below the majority threshold. Our main theorem proves that the complete odd-budget curve and this signature are equivalent: the curve increments are signed Hausdorff moments, and the full curve recovers the signature uniquely. This viewpoint explains shape phenomena, branch-symmetric nonidentifiability, realizability, variation, and endpoint rates. It also separates estimation regimes: direct per-example success-probability information targets the full signature, whereas fixed-depth grouped labels reveal only a finite prefix.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a precise equivalence between the odd-budget voting curve and a signed Hausdorff-moment signature under latent mixtures, showing that simple de Finetti models can produce non-monotone curves with infinitely many flips.

read the letter

The core contribution is the signed voting signature and the theorem that equates it to the full odd-budget curve. Curve increments turn out to be signed moments, and uniqueness follows from the density of polynomials, so the curve recovers the signature but not the full latent law. This cleanly explains why voting can help, hurt, or switch direction multiple times even with basic mixtures, and it separates what grouped labels can recover from what per-example probabilities can recover. The explicit construction of infinite trend changes is the part that departs from classical fixed-competence pictures, and the paper states the non-identifiability results without overclaiming. That is useful framing for anyone thinking about test-time compute budgets. The de Finetti exchangeability assumption is standard and fits the setup without internal contradiction. The moment inversion argument looks standard once the signed measures are in place. One soft spot is that the practical size of the non-monotonicity regions is not quantified beyond the existence proof, so it is not yet clear how often the infinite-flip case would matter for real predictors. Another is that the paper stays at the level of the latent distribution and does not yet connect the signature to finite-sample estimation error or to specific model families. Readers working on ensemble reliability or test-time aggregation will get the most from it; the exact objects and the estimation-regime split are the parts worth carrying forward. The central claim is mathematically grounded enough that a serious referee should see it, even if revisions are needed on examples and downstream implications.

Referee Report

2 major / 2 minor

Summary. The paper claims that majority voting on repeated predictions from a stochastic predictor is governed by a latent distribution of per-example correctness probabilities under the de Finetti representation for exchangeable correctness. Classical monotone predictions are incomplete; even simple mixtures can produce nonmonotone curves with infinitely many trend changes. The exact object recovered is a signed voting signature recording excess latent mass above the majority threshold at each binomial variance scale. The main theorem proves equivalence: the complete odd-budget curve has increments that are signed Hausdorff moments, and the full curve recovers the signature uniquely via polynomial density in C[0,1]. This framework explains shape phenomena, branch-symmetric nonidentifiability, realizability, and separates direct vs. grouped-label estimation regimes.

Significance. If the central equivalence holds, the work supplies a parameter-free, moment-based characterization of test-time aggregation that moves beyond fixed-competence theory and directly predicts when voting helps, hurts, or oscillates. The de Finetti modeling choice and Hausdorff-moment identification are strengths that yield falsifiable predictions about curve variation and endpoint rates; the separation of estimation regimes is practically useful for ML practitioners deciding between per-example labels and fixed-depth groups.

major comments (2)

[Main Theorem] Main Theorem (likely §4): the uniqueness claim that the curve recovers the signed measure via its Hausdorff moments relies on the determinate moment problem for signed measures on [0,1]. While Stone-Weierstrass gives density, the paper must explicitly confirm that the moment sequence determines the signed measure uniquely (e.g., via total-variation bounds or support restrictions) rather than assuming it follows from the classical positive-measure case.
[Construction of infinitely many trend changes] Construction of infinitely many trend changes (abstract and §3): the explicit mixture producing infinitely many sign changes in the voting curve must be shown to remain compatible with the moment-inversion step; if the latent distribution has unbounded variation, the partial-sum recovery of the signature may require additional regularity to avoid divergence in the odd-budget increments.

minor comments (2)

[Notation] Notation for the signed voting signature (early sections) should include an explicit integral or sum formula alongside the verbal definition to avoid ambiguity when comparing to the latent density.
[Figures] Figure captions for the voting curves should state the exact latent mixture parameters used, so readers can reproduce the nonmonotone and infinite-change examples.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful to the referee for the detailed and insightful comments on our manuscript. We address the major comments point by point below, indicating where revisions will be made to strengthen the presentation.

read point-by-point responses

Referee: [Main Theorem] Main Theorem (likely §4): the uniqueness claim that the curve recovers the signed measure via its Hausdorff moments relies on the determinate moment problem for signed measures on [0,1]. While Stone-Weierstrass gives density, the paper must explicitly confirm that the moment sequence determines the signed measure uniquely (e.g., via total-variation bounds or support restrictions) rather than assuming it follows from the classical positive-measure case.

Authors: We thank the referee for highlighting this point. The uniqueness follows directly from the density of polynomials in C[0,1] under the uniform norm (by Stone-Weierstrass) and the Riesz representation theorem, which identifies the dual space with signed regular Borel measures on [0,1]. Since the signed voting signature is such a measure (with finite total variation by construction), agreement on all polynomials implies agreement on all continuous functions, hence uniqueness of the measure. This argument holds for signed measures without requiring positivity. We will add an explicit remark in the statement of the main theorem (and a brief justification in the proof) to clarify this, including a reference to the Riesz theorem for completeness. revision: yes
Referee: [Construction of infinitely many trend changes] Construction of infinitely many trend changes (abstract and §3): the explicit mixture producing infinitely many sign changes in the voting curve must be shown to remain compatible with the moment-inversion step; if the latent distribution has unbounded variation, the partial-sum recovery of the signature may require additional regularity to avoid divergence in the odd-budget increments.

Authors: We appreciate this observation. The explicit construction in §3 produces a signed measure with bounded total variation by design, as it is a finite signed combination of continuous densities on [0,1]. Consequently, the Hausdorff moment sequence is well-defined, and the partial sums in the inversion formula converge in the appropriate topology without divergence. To address the concern, we will augment the construction with a short verification of the total variation bound and note that the odd-budget increments remain bounded, consistent with the general theory. This ensures compatibility with the moment-inversion step. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained via external theorems

full rationale

The paper's central result equates the odd-budget voting curve to a signed Hausdorff moment signature under the de Finetti representation of exchangeable correctness indicators. This equivalence is established by identifying curve increments with signed moments and invoking the uniqueness of moment sequences for signed measures on [0,1], which follows from the density of polynomials in C[0,1] (Stone-Weierstrass theorem) and the fact that a signed measure with all moments zero is the zero measure. Both de Finetti's theorem and the moment uniqueness result are standard external mathematical facts with no dependence on the paper's own fitted quantities, definitions, or prior self-citations. No load-bearing step reduces to a self-referential definition, a fitted input renamed as prediction, or an ansatz imported via author overlap. The modeling choice introduces no internal inconsistency with the claimed equivalence.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the de Finetti representation for exchangeable sequences and the definition of the signed voting signature as the object recovered by voting.

axioms (1)

domain assumption de Finetti representation for exchangeable repeated correctness
Used to model the latent distribution of per-example correctness probabilities that governs voting behavior.

invented entities (1)

signed voting signature no independent evidence
purpose: Records excess latent mass above rather than below the majority threshold at each binomial variance scale
Introduced as the exact object recovered by voting; no independent evidence outside the paper is provided.

pith-pipeline@v0.9.0 · 5523 in / 1363 out tokens · 35914 ms · 2026-05-09T15:44:42.116087+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

25 extracted references · 4 canonical work pages · 2 internal anchors

[1]

Large Language Monkeys: Scaling Inference Compute with Repeated Sampling

Bradley Brown, Jordan Juravsky, Ryan Ehrlich, Ronald Clark, Quoc V . Le, Christopher Ré, and Azalia Mirhoseini. Large language monkeys: Scaling inference compute with repeated sampling. arXiv:2407.21787, 2024

work page internal anchor Pith review arXiv 2024
[2]

Finite exchangeable sequences.The Annals of Probability, 8(4):745– 764, 1980

Persi Diaconis and David Freedman. Finite exchangeable sequences.The Annals of Probability, 8(4):745– 764, 1980

1980
[3]

Jury theorems

Franz Dietrich and Kai Spiekermann. Jury theorems. In Edward N. Zalta, editor,The Stanford Encyclopedia of Philosophy, 2021

2021
[4]

Rates of convergence in de Finetti’s representation theorem, and Hausdorff moment problem.Bernoulli, 26(2):1294–1322, 2020

Emanuele Dolera and Stefano Favaro. Rates of convergence in de Finetti’s representation theorem, and Hausdorff moment problem.Bernoulli, 26(2):1294–1322, 2020

2020
[5]

Detecting hallucinations in large language models using semantic entropy.Nature, 630:625–630, 2024

Sebastian Farquhar, Jannik Kossen, Lorenz Kuhn, and Yarin Gal. Detecting hallucinations in large language models using semantic entropy.Nature, 630:625–630, 2024

2024
[6]

Dropout as a Bayesian approximation: Representing model uncertainty in deep learning

Yarin Gal and Zoubin Ghahramani. Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. InProceedings of ICML, 2016

2016
[7]

Semantic uncertainty: Linguistic invariances for uncertainty estimation in natural language generation

Lorenz Kuhn, Yarin Gal, and Sebastian Farquhar. Semantic uncertainty: Linguistic invariances for uncertainty estimation in natural language generation. InProceedings of ICLR, 2023

2023
[8]

Simple and scalable predictive uncertainty estimation using deep ensembles

Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. Simple and scalable predictive uncertainty estimation using deep ensembles. InAdvances in Neural Information Processing Systems, 2017

2017
[9]

Let’s verify step by step

Hunter Lightman, Vineet Kosaraju, Yuri Burda, Harrison Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, and Karl Cobbe. Let’s verify step by step. InProceedings of ICLR, 2024

2024
[10]

Generating with confidence: Uncertainty quantification for black-box large language models.Transactions on Machine Learning Research, 2024

Zhen Lin, Shubhendu Trivedi, and Jimeng Sun. Generating with confidence: Uncertainty quantification for black-box large language models.Transactions on Machine Learning Research, 2024

2024
[11]

Two Calls, Two Moments, and the Vote-Accuracy Curve of Repeated LLM Inference

Yi Liu. Two Calls, Two Moments, and the V ote-Accuracy Curve of Repeated LLM Inference. arXiv preprint arXiv:2605.03379, 2026.https://arxiv.org/abs/2605.03379

work page internal anchor Pith review Pith/arXiv arXiv 2026
[12]

Potsawee Manakul, Adian Liusie, and Mark J. F. Gales. SelfCheckGPT: Zero-resource black-box hallucination detection for generative large language models. InProceedings of EMNLP, 2023

2023
[13]

s1: Simple test-time scaling

Niklas Muennighoff, Zitong Yang, Weijia Shi, Xiang Lisa Li, Li Fei-Fei, Hannaneh Hajishirzi, Luke Zettlemoyer, Percy Liang, Emmanuel Candès, and Tatsunori Hashimoto. s1: Simple test-time scaling. In Proceedings of EMNLP, 2025

2025
[14]

Zachary Nado, Neil Band, Mark Collier, Josip Djolonga, Michael W. Dusenberry, Sebastian Farquhar, Qixuan Feng, Angelos Filos, Marton Havasi, Rodolphe Jenatton, Ghassen Jerfel, Jeremiah Liu, Zelda Mariet, Jeremy Nixon, Shreyas Padhy, Jie Ren, Tim G. J. Rudner, Faris Sbahi, Yeming Wen, Florian Wenzel, Kevin Murphy, D. Sculley, Balaji Lakshminarayanan, Jaspe...

work page arXiv 2021
[15]

Kernel language entropy: Fine-grained uncertainty quantification for LLMs from semantic similarities

Alexander Nikitin, Jannik Kossen, Yarin Gal, and Pekka Marttinen. Kernel language entropy: Fine-grained uncertainty quantification for LLMs from semantic similarities. InAdvances in Neural Information Processing Systems, 2024

2024
[16]

Asymptotics for least absolute deviation regression estimators,

Yury Polyanskiy and Yihong Wu. Self-regularizing property of nonparametric maximum likelihood estimator in mixture models. arXiv:2008.08244, 2020

work page arXiv 2008
[17]

Rahul Rahaman and Alexandre H. Thiery. Uncertainty quantification and deep ensembles. InAdvances in Neural Information Processing Systems, 2021

2021
[18]

Confidence improves self-consistency in LLMs

Amir Taubenfeld, Tom Sheffer, Eran Ofek, Amir Feder, Ariel Goldstein, Zorik Gekhman, and Gal Yona. Confidence improves self-consistency in LLMs. InFindings of ACL, 2025

2025
[19]

Learning populations of parameters

Kevin Tian, Weihao Kong, and Gregory Valiant. Learning populations of parameters. InAdvances in Neural Information Processing Systems, 2017. 10

2017
[20]

Benchmarking uncertainty quantification methods for large language models with LM-Polygraph.Transactions of the Association for Computational Linguistics, 13:220–248, 2025

Roman Vashurin, Ekaterina Fadeeva, Artem Vazhentsev, Lyudmila Rvanova, Daniil Vasilev, Akim Tsvigun, Sergey Petrakov, Rui Xing, Abdelrahman Sadallah, Kirill Grishchenkov, Alexander Panchenko, Timothy Baldwin, Preslav Nakov, Maxim Panov, and Artem Shelmanov. Benchmarking uncertainty quantification methods for large language models with LM-Polygraph.Transac...

2025
[21]

Ramya Korlakai Vinayak, Weihao Kong, Gregory Valiant, and Sham M. Kakade. Maximum likelihood estimation for learning populations of parameters. InProceedings of ICML, 2019

2019
[22]

Self-consistency improves chain of thought reasoning in language models

Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. Self-consistency improves chain of thought reasoning in language models. InProceedings of ICLR, 2023

2023
[23]

ConU: Conformal uncertainty in large language models with correctness coverage guarantees

Zhiyuan Wang, Jinhao Duan, Lu Cheng, Yue Zhang, Qingni Wang, Xiaoshuang Shi, Kaidi Xu, Hengtao Shen, and Xiaofeng Zhu. ConU: Conformal uncertainty in large language models with correctness coverage guarantees. InFindings of EMNLP, 2024

2024
[24]

Pham, Michael Glass, and Junkyu Lee

Quan Xiao, Debarun Bhattacharjya, Balaji Ganesan, Radu Marinescu, Katya Mirylenka, Nhan H. Pham, Michael Glass, and Junkyu Lee. The consistency hypothesis in uncertainty quantification for large language models. InProceedings of UAI, 2025

2025
[25]

On efficient and scalable computation of the nonparametric maximum likelihood estimator in mixture models.Journal of Machine Learning Research, 25(8):1–46, 2024

Yangjing Zhang, Ying Cui, Bodhisattva Sen, and Kim-Chuan Toh. On efficient and scalable computation of the nonparametric maximum likelihood estimator in mixture models.Journal of Machine Learning Research, 25(8):1–46, 2024. 11 A Latent laws used in Figure 1 All curves in Figure 1 start from V0 =EQ= 3/4 . The displayed weights are rounded; the unrounded we...

2024