pith. machine review for the scientific record. sign in

arxiv: 2605.05592 · v1 · submitted 2026-05-07 · 💻 cs.LG · cs.IT· math.IT

Recognition: unknown

When Can Voting Help, Hurt, or Change Course? Exact Structure of Binary Test-Time Aggregation

Yi Liu

Authors on Pith no claims yet

Pith reviewed 2026-05-09 15:44 UTC · model grok-4.3

classification 💻 cs.LG cs.ITmath.IT
keywords majority votingtest-time aggregationexchangeable correctnessde Finetti representationHausdorff momentslatent distributionbinary predictionnonmonotone curves
0
0 comments X

The pith

The complete odd-budget voting curve is equivalent to a signed voting signature that records excess latent mass above the majority threshold at each binomial scale.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Classical theory treats majority voting on repeated predictions as monotone, with more votes always helping above the 50 percent threshold. This paper shows the picture is incomplete once correctness is modeled as exchangeable draws from a latent distribution over per-example success probabilities. Simple mixtures of that latent law can produce nonmonotone curves with arbitrarily many trend reversals. The central theorem establishes that the full curve of success rates for odd numbers of votes is exactly equivalent to a signed signature: its increments are signed Hausdorff moments, and the curve recovers the signature uniquely. This object captures, at each scale, how much extra latent probability mass sits above versus below the majority threshold.

Core claim

Under the de Finetti representation for exchangeable repeated correctness, the voting success curve for odd budgets is equivalent to the signed voting signature, where increments are signed Hausdorff moments and the full curve recovers the signature uniquely. This object records, at each binomial variance scale, the excess latent mass above rather than below the majority threshold.

What carries the argument

The signed voting signature, which at each binomial variance scale records excess latent mass above rather than below the majority threshold and is recovered from the odd-budget curve via signed Hausdorff moments.

If this is right

  • Even simple two-point mixtures of the latent correctness distribution can generate voting curves with infinitely many direction changes.
  • The curve determines the signature uniquely but leaves the full latent law nonidentifiable, producing branch-symmetric families of distributions that agree on all voting outcomes.
  • Direct per-example success-probability observations target the entire signature, while fixed-depth grouped labels reveal only a finite prefix of it.
  • Endpoint rates, realizability, and variation in voting performance are all governed by properties of the signature rather than any single competence parameter.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Practitioners could estimate the signature directly from data to decide in advance whether adding more votes will help or hurt on a given task.
  • The moment-based view suggests designing new aggregation rules that target only the recoverable signature instead of attempting to recover the full latent distribution.
  • The same signature machinery may apply to other exchangeable binary aggregation settings, such as repeated sensor readings or ensemble predictions with shared latent factors.

Load-bearing premise

The de Finetti representation for exchangeable repeated correctness holds, so that voting is governed by a latent distribution of per-example correctness probabilities.

What would settle it

A concrete latent distribution whose odd-budget voting curve cannot be expressed as the signed Hausdorff moments of any signed measure on [0,1], or two different signatures that produce identical curves.

Figures

Figures reproduced from arXiv: 2605.05592 by Yi Liu.

Figure 1
Figure 1. Figure 1: Different behaviors of voting curves, plotted against the vote-budget index view at source ↗
read the original abstract

Majority voting is one of the few black-box interventions that can improve a fixed stochastic predictor: repeated access can be cheaper than changing a high-capability model. Classical fixed-competence theory makes this intervention look monotone -- more votes help above the majority threshold and hurt below it. We show that this picture is fundamentally incomplete. Under the de Finetti representation for exchangeable repeated correctness, voting is governed by a latent distribution of per-example correctness probabilities. Even simple latent mixtures can generate sharply different voting curves, including nonmonotone behavior and, in an explicit construction, infinitely many trend changes. The full latent law determines the curve, but the curve does not determine the law. The exact object recovered by voting is a signed voting signature: at each binomial variance scale, it records excess latent mass above rather than below the majority threshold. Our main theorem proves that the complete odd-budget curve and this signature are equivalent: the curve increments are signed Hausdorff moments, and the full curve recovers the signature uniquely. This viewpoint explains shape phenomena, branch-symmetric nonidentifiability, realizability, variation, and endpoint rates. It also separates estimation regimes: direct per-example success-probability information targets the full signature, whereas fixed-depth grouped labels reveal only a finite prefix.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that majority voting on repeated predictions from a stochastic predictor is governed by a latent distribution of per-example correctness probabilities under the de Finetti representation for exchangeable correctness. Classical monotone predictions are incomplete; even simple mixtures can produce nonmonotone curves with infinitely many trend changes. The exact object recovered is a signed voting signature recording excess latent mass above the majority threshold at each binomial variance scale. The main theorem proves equivalence: the complete odd-budget curve has increments that are signed Hausdorff moments, and the full curve recovers the signature uniquely via polynomial density in C[0,1]. This framework explains shape phenomena, branch-symmetric nonidentifiability, realizability, and separates direct vs. grouped-label estimation regimes.

Significance. If the central equivalence holds, the work supplies a parameter-free, moment-based characterization of test-time aggregation that moves beyond fixed-competence theory and directly predicts when voting helps, hurts, or oscillates. The de Finetti modeling choice and Hausdorff-moment identification are strengths that yield falsifiable predictions about curve variation and endpoint rates; the separation of estimation regimes is practically useful for ML practitioners deciding between per-example labels and fixed-depth groups.

major comments (2)
  1. [Main Theorem] Main Theorem (likely §4): the uniqueness claim that the curve recovers the signed measure via its Hausdorff moments relies on the determinate moment problem for signed measures on [0,1]. While Stone-Weierstrass gives density, the paper must explicitly confirm that the moment sequence determines the signed measure uniquely (e.g., via total-variation bounds or support restrictions) rather than assuming it follows from the classical positive-measure case.
  2. [Construction of infinitely many trend changes] Construction of infinitely many trend changes (abstract and §3): the explicit mixture producing infinitely many sign changes in the voting curve must be shown to remain compatible with the moment-inversion step; if the latent distribution has unbounded variation, the partial-sum recovery of the signature may require additional regularity to avoid divergence in the odd-budget increments.
minor comments (2)
  1. [Notation] Notation for the signed voting signature (early sections) should include an explicit integral or sum formula alongside the verbal definition to avoid ambiguity when comparing to the latent density.
  2. [Figures] Figure captions for the voting curves should state the exact latent mixture parameters used, so readers can reproduce the nonmonotone and infinite-change examples.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful to the referee for the detailed and insightful comments on our manuscript. We address the major comments point by point below, indicating where revisions will be made to strengthen the presentation.

read point-by-point responses
  1. Referee: [Main Theorem] Main Theorem (likely §4): the uniqueness claim that the curve recovers the signed measure via its Hausdorff moments relies on the determinate moment problem for signed measures on [0,1]. While Stone-Weierstrass gives density, the paper must explicitly confirm that the moment sequence determines the signed measure uniquely (e.g., via total-variation bounds or support restrictions) rather than assuming it follows from the classical positive-measure case.

    Authors: We thank the referee for highlighting this point. The uniqueness follows directly from the density of polynomials in C[0,1] under the uniform norm (by Stone-Weierstrass) and the Riesz representation theorem, which identifies the dual space with signed regular Borel measures on [0,1]. Since the signed voting signature is such a measure (with finite total variation by construction), agreement on all polynomials implies agreement on all continuous functions, hence uniqueness of the measure. This argument holds for signed measures without requiring positivity. We will add an explicit remark in the statement of the main theorem (and a brief justification in the proof) to clarify this, including a reference to the Riesz theorem for completeness. revision: yes

  2. Referee: [Construction of infinitely many trend changes] Construction of infinitely many trend changes (abstract and §3): the explicit mixture producing infinitely many sign changes in the voting curve must be shown to remain compatible with the moment-inversion step; if the latent distribution has unbounded variation, the partial-sum recovery of the signature may require additional regularity to avoid divergence in the odd-budget increments.

    Authors: We appreciate this observation. The explicit construction in §3 produces a signed measure with bounded total variation by design, as it is a finite signed combination of continuous densities on [0,1]. Consequently, the Hausdorff moment sequence is well-defined, and the partial sums in the inversion formula converge in the appropriate topology without divergence. To address the concern, we will augment the construction with a short verification of the total variation bound and note that the odd-budget increments remain bounded, consistent with the general theory. This ensures compatibility with the moment-inversion step. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained via external theorems

full rationale

The paper's central result equates the odd-budget voting curve to a signed Hausdorff moment signature under the de Finetti representation of exchangeable correctness indicators. This equivalence is established by identifying curve increments with signed moments and invoking the uniqueness of moment sequences for signed measures on [0,1], which follows from the density of polynomials in C[0,1] (Stone-Weierstrass theorem) and the fact that a signed measure with all moments zero is the zero measure. Both de Finetti's theorem and the moment uniqueness result are standard external mathematical facts with no dependence on the paper's own fitted quantities, definitions, or prior self-citations. No load-bearing step reduces to a self-referential definition, a fitted input renamed as prediction, or an ansatz imported via author overlap. The modeling choice introduces no internal inconsistency with the claimed equivalence.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the de Finetti representation for exchangeable sequences and the definition of the signed voting signature as the object recovered by voting.

axioms (1)
  • domain assumption de Finetti representation for exchangeable repeated correctness
    Used to model the latent distribution of per-example correctness probabilities that governs voting behavior.
invented entities (1)
  • signed voting signature no independent evidence
    purpose: Records excess latent mass above rather than below the majority threshold at each binomial variance scale
    Introduced as the exact object recovered by voting; no independent evidence outside the paper is provided.

pith-pipeline@v0.9.0 · 5523 in / 1363 out tokens · 35914 ms · 2026-05-09T15:44:42.116087+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

25 extracted references · 4 canonical work pages · 2 internal anchors

  1. [1]

    Large Language Monkeys: Scaling Inference Compute with Repeated Sampling

    Bradley Brown, Jordan Juravsky, Ryan Ehrlich, Ronald Clark, Quoc V . Le, Christopher Ré, and Azalia Mirhoseini. Large language monkeys: Scaling inference compute with repeated sampling. arXiv:2407.21787, 2024

  2. [2]

    Finite exchangeable sequences.The Annals of Probability, 8(4):745– 764, 1980

    Persi Diaconis and David Freedman. Finite exchangeable sequences.The Annals of Probability, 8(4):745– 764, 1980

  3. [3]

    Jury theorems

    Franz Dietrich and Kai Spiekermann. Jury theorems. In Edward N. Zalta, editor,The Stanford Encyclopedia of Philosophy, 2021

  4. [4]

    Rates of convergence in de Finetti’s representation theorem, and Hausdorff moment problem.Bernoulli, 26(2):1294–1322, 2020

    Emanuele Dolera and Stefano Favaro. Rates of convergence in de Finetti’s representation theorem, and Hausdorff moment problem.Bernoulli, 26(2):1294–1322, 2020

  5. [5]

    Detecting hallucinations in large language models using semantic entropy.Nature, 630:625–630, 2024

    Sebastian Farquhar, Jannik Kossen, Lorenz Kuhn, and Yarin Gal. Detecting hallucinations in large language models using semantic entropy.Nature, 630:625–630, 2024

  6. [6]

    Dropout as a Bayesian approximation: Representing model uncertainty in deep learning

    Yarin Gal and Zoubin Ghahramani. Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. InProceedings of ICML, 2016

  7. [7]

    Semantic uncertainty: Linguistic invariances for uncertainty estimation in natural language generation

    Lorenz Kuhn, Yarin Gal, and Sebastian Farquhar. Semantic uncertainty: Linguistic invariances for uncertainty estimation in natural language generation. InProceedings of ICLR, 2023

  8. [8]

    Simple and scalable predictive uncertainty estimation using deep ensembles

    Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. Simple and scalable predictive uncertainty estimation using deep ensembles. InAdvances in Neural Information Processing Systems, 2017

  9. [9]

    Let’s verify step by step

    Hunter Lightman, Vineet Kosaraju, Yuri Burda, Harrison Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, and Karl Cobbe. Let’s verify step by step. InProceedings of ICLR, 2024

  10. [10]

    Generating with confidence: Uncertainty quantification for black-box large language models.Transactions on Machine Learning Research, 2024

    Zhen Lin, Shubhendu Trivedi, and Jimeng Sun. Generating with confidence: Uncertainty quantification for black-box large language models.Transactions on Machine Learning Research, 2024

  11. [11]

    Two Calls, Two Moments, and the Vote-Accuracy Curve of Repeated LLM Inference

    Yi Liu. Two Calls, Two Moments, and the V ote-Accuracy Curve of Repeated LLM Inference. arXiv preprint arXiv:2605.03379, 2026.https://arxiv.org/abs/2605.03379

  12. [12]

    Potsawee Manakul, Adian Liusie, and Mark J. F. Gales. SelfCheckGPT: Zero-resource black-box hallucination detection for generative large language models. InProceedings of EMNLP, 2023

  13. [13]

    s1: Simple test-time scaling

    Niklas Muennighoff, Zitong Yang, Weijia Shi, Xiang Lisa Li, Li Fei-Fei, Hannaneh Hajishirzi, Luke Zettlemoyer, Percy Liang, Emmanuel Candès, and Tatsunori Hashimoto. s1: Simple test-time scaling. In Proceedings of EMNLP, 2025

  14. [14]

    Zachary Nado, Neil Band, Mark Collier, Josip Djolonga, Michael W. Dusenberry, Sebastian Farquhar, Qixuan Feng, Angelos Filos, Marton Havasi, Rodolphe Jenatton, Ghassen Jerfel, Jeremiah Liu, Zelda Mariet, Jeremy Nixon, Shreyas Padhy, Jie Ren, Tim G. J. Rudner, Faris Sbahi, Yeming Wen, Florian Wenzel, Kevin Murphy, D. Sculley, Balaji Lakshminarayanan, Jaspe...

  15. [15]

    Kernel language entropy: Fine-grained uncertainty quantification for LLMs from semantic similarities

    Alexander Nikitin, Jannik Kossen, Yarin Gal, and Pekka Marttinen. Kernel language entropy: Fine-grained uncertainty quantification for LLMs from semantic similarities. InAdvances in Neural Information Processing Systems, 2024

  16. [16]

    Asymptotics for least absolute deviation regression estimators,

    Yury Polyanskiy and Yihong Wu. Self-regularizing property of nonparametric maximum likelihood estimator in mixture models. arXiv:2008.08244, 2020

  17. [17]

    Rahul Rahaman and Alexandre H. Thiery. Uncertainty quantification and deep ensembles. InAdvances in Neural Information Processing Systems, 2021

  18. [18]

    Confidence improves self-consistency in LLMs

    Amir Taubenfeld, Tom Sheffer, Eran Ofek, Amir Feder, Ariel Goldstein, Zorik Gekhman, and Gal Yona. Confidence improves self-consistency in LLMs. InFindings of ACL, 2025

  19. [19]

    Learning populations of parameters

    Kevin Tian, Weihao Kong, and Gregory Valiant. Learning populations of parameters. InAdvances in Neural Information Processing Systems, 2017. 10

  20. [20]

    Benchmarking uncertainty quantification methods for large language models with LM-Polygraph.Transactions of the Association for Computational Linguistics, 13:220–248, 2025

    Roman Vashurin, Ekaterina Fadeeva, Artem Vazhentsev, Lyudmila Rvanova, Daniil Vasilev, Akim Tsvigun, Sergey Petrakov, Rui Xing, Abdelrahman Sadallah, Kirill Grishchenkov, Alexander Panchenko, Timothy Baldwin, Preslav Nakov, Maxim Panov, and Artem Shelmanov. Benchmarking uncertainty quantification methods for large language models with LM-Polygraph.Transac...

  21. [21]

    Ramya Korlakai Vinayak, Weihao Kong, Gregory Valiant, and Sham M. Kakade. Maximum likelihood estimation for learning populations of parameters. InProceedings of ICML, 2019

  22. [22]

    Self-consistency improves chain of thought reasoning in language models

    Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. Self-consistency improves chain of thought reasoning in language models. InProceedings of ICLR, 2023

  23. [23]

    ConU: Conformal uncertainty in large language models with correctness coverage guarantees

    Zhiyuan Wang, Jinhao Duan, Lu Cheng, Yue Zhang, Qingni Wang, Xiaoshuang Shi, Kaidi Xu, Hengtao Shen, and Xiaofeng Zhu. ConU: Conformal uncertainty in large language models with correctness coverage guarantees. InFindings of EMNLP, 2024

  24. [24]

    Pham, Michael Glass, and Junkyu Lee

    Quan Xiao, Debarun Bhattacharjya, Balaji Ganesan, Radu Marinescu, Katya Mirylenka, Nhan H. Pham, Michael Glass, and Junkyu Lee. The consistency hypothesis in uncertainty quantification for large language models. InProceedings of UAI, 2025

  25. [25]

    On efficient and scalable computation of the nonparametric maximum likelihood estimator in mixture models.Journal of Machine Learning Research, 25(8):1–46, 2024

    Yangjing Zhang, Ying Cui, Bodhisattva Sen, and Kim-Chuan Toh. On efficient and scalable computation of the nonparametric maximum likelihood estimator in mixture models.Journal of Machine Learning Research, 25(8):1–46, 2024. 11 A Latent laws used in Figure 1 All curves in Figure 1 start from V0 =EQ= 3/4 . The displayed weights are rounded; the unrounded we...