Recognition: unknown
When Can Voting Help, Hurt, or Change Course? Exact Structure of Binary Test-Time Aggregation
Pith reviewed 2026-05-09 15:44 UTC · model grok-4.3
The pith
The complete odd-budget voting curve is equivalent to a signed voting signature that records excess latent mass above the majority threshold at each binomial scale.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Under the de Finetti representation for exchangeable repeated correctness, the voting success curve for odd budgets is equivalent to the signed voting signature, where increments are signed Hausdorff moments and the full curve recovers the signature uniquely. This object records, at each binomial variance scale, the excess latent mass above rather than below the majority threshold.
What carries the argument
The signed voting signature, which at each binomial variance scale records excess latent mass above rather than below the majority threshold and is recovered from the odd-budget curve via signed Hausdorff moments.
If this is right
- Even simple two-point mixtures of the latent correctness distribution can generate voting curves with infinitely many direction changes.
- The curve determines the signature uniquely but leaves the full latent law nonidentifiable, producing branch-symmetric families of distributions that agree on all voting outcomes.
- Direct per-example success-probability observations target the entire signature, while fixed-depth grouped labels reveal only a finite prefix of it.
- Endpoint rates, realizability, and variation in voting performance are all governed by properties of the signature rather than any single competence parameter.
Where Pith is reading between the lines
- Practitioners could estimate the signature directly from data to decide in advance whether adding more votes will help or hurt on a given task.
- The moment-based view suggests designing new aggregation rules that target only the recoverable signature instead of attempting to recover the full latent distribution.
- The same signature machinery may apply to other exchangeable binary aggregation settings, such as repeated sensor readings or ensemble predictions with shared latent factors.
Load-bearing premise
The de Finetti representation for exchangeable repeated correctness holds, so that voting is governed by a latent distribution of per-example correctness probabilities.
What would settle it
A concrete latent distribution whose odd-budget voting curve cannot be expressed as the signed Hausdorff moments of any signed measure on [0,1], or two different signatures that produce identical curves.
Figures
read the original abstract
Majority voting is one of the few black-box interventions that can improve a fixed stochastic predictor: repeated access can be cheaper than changing a high-capability model. Classical fixed-competence theory makes this intervention look monotone -- more votes help above the majority threshold and hurt below it. We show that this picture is fundamentally incomplete. Under the de Finetti representation for exchangeable repeated correctness, voting is governed by a latent distribution of per-example correctness probabilities. Even simple latent mixtures can generate sharply different voting curves, including nonmonotone behavior and, in an explicit construction, infinitely many trend changes. The full latent law determines the curve, but the curve does not determine the law. The exact object recovered by voting is a signed voting signature: at each binomial variance scale, it records excess latent mass above rather than below the majority threshold. Our main theorem proves that the complete odd-budget curve and this signature are equivalent: the curve increments are signed Hausdorff moments, and the full curve recovers the signature uniquely. This viewpoint explains shape phenomena, branch-symmetric nonidentifiability, realizability, variation, and endpoint rates. It also separates estimation regimes: direct per-example success-probability information targets the full signature, whereas fixed-depth grouped labels reveal only a finite prefix.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that majority voting on repeated predictions from a stochastic predictor is governed by a latent distribution of per-example correctness probabilities under the de Finetti representation for exchangeable correctness. Classical monotone predictions are incomplete; even simple mixtures can produce nonmonotone curves with infinitely many trend changes. The exact object recovered is a signed voting signature recording excess latent mass above the majority threshold at each binomial variance scale. The main theorem proves equivalence: the complete odd-budget curve has increments that are signed Hausdorff moments, and the full curve recovers the signature uniquely via polynomial density in C[0,1]. This framework explains shape phenomena, branch-symmetric nonidentifiability, realizability, and separates direct vs. grouped-label estimation regimes.
Significance. If the central equivalence holds, the work supplies a parameter-free, moment-based characterization of test-time aggregation that moves beyond fixed-competence theory and directly predicts when voting helps, hurts, or oscillates. The de Finetti modeling choice and Hausdorff-moment identification are strengths that yield falsifiable predictions about curve variation and endpoint rates; the separation of estimation regimes is practically useful for ML practitioners deciding between per-example labels and fixed-depth groups.
major comments (2)
- [Main Theorem] Main Theorem (likely §4): the uniqueness claim that the curve recovers the signed measure via its Hausdorff moments relies on the determinate moment problem for signed measures on [0,1]. While Stone-Weierstrass gives density, the paper must explicitly confirm that the moment sequence determines the signed measure uniquely (e.g., via total-variation bounds or support restrictions) rather than assuming it follows from the classical positive-measure case.
- [Construction of infinitely many trend changes] Construction of infinitely many trend changes (abstract and §3): the explicit mixture producing infinitely many sign changes in the voting curve must be shown to remain compatible with the moment-inversion step; if the latent distribution has unbounded variation, the partial-sum recovery of the signature may require additional regularity to avoid divergence in the odd-budget increments.
minor comments (2)
- [Notation] Notation for the signed voting signature (early sections) should include an explicit integral or sum formula alongside the verbal definition to avoid ambiguity when comparing to the latent density.
- [Figures] Figure captions for the voting curves should state the exact latent mixture parameters used, so readers can reproduce the nonmonotone and infinite-change examples.
Simulated Author's Rebuttal
We are grateful to the referee for the detailed and insightful comments on our manuscript. We address the major comments point by point below, indicating where revisions will be made to strengthen the presentation.
read point-by-point responses
-
Referee: [Main Theorem] Main Theorem (likely §4): the uniqueness claim that the curve recovers the signed measure via its Hausdorff moments relies on the determinate moment problem for signed measures on [0,1]. While Stone-Weierstrass gives density, the paper must explicitly confirm that the moment sequence determines the signed measure uniquely (e.g., via total-variation bounds or support restrictions) rather than assuming it follows from the classical positive-measure case.
Authors: We thank the referee for highlighting this point. The uniqueness follows directly from the density of polynomials in C[0,1] under the uniform norm (by Stone-Weierstrass) and the Riesz representation theorem, which identifies the dual space with signed regular Borel measures on [0,1]. Since the signed voting signature is such a measure (with finite total variation by construction), agreement on all polynomials implies agreement on all continuous functions, hence uniqueness of the measure. This argument holds for signed measures without requiring positivity. We will add an explicit remark in the statement of the main theorem (and a brief justification in the proof) to clarify this, including a reference to the Riesz theorem for completeness. revision: yes
-
Referee: [Construction of infinitely many trend changes] Construction of infinitely many trend changes (abstract and §3): the explicit mixture producing infinitely many sign changes in the voting curve must be shown to remain compatible with the moment-inversion step; if the latent distribution has unbounded variation, the partial-sum recovery of the signature may require additional regularity to avoid divergence in the odd-budget increments.
Authors: We appreciate this observation. The explicit construction in §3 produces a signed measure with bounded total variation by design, as it is a finite signed combination of continuous densities on [0,1]. Consequently, the Hausdorff moment sequence is well-defined, and the partial sums in the inversion formula converge in the appropriate topology without divergence. To address the concern, we will augment the construction with a short verification of the total variation bound and note that the odd-budget increments remain bounded, consistent with the general theory. This ensures compatibility with the moment-inversion step. revision: yes
Circularity Check
No significant circularity; derivation is self-contained via external theorems
full rationale
The paper's central result equates the odd-budget voting curve to a signed Hausdorff moment signature under the de Finetti representation of exchangeable correctness indicators. This equivalence is established by identifying curve increments with signed moments and invoking the uniqueness of moment sequences for signed measures on [0,1], which follows from the density of polynomials in C[0,1] (Stone-Weierstrass theorem) and the fact that a signed measure with all moments zero is the zero measure. Both de Finetti's theorem and the moment uniqueness result are standard external mathematical facts with no dependence on the paper's own fitted quantities, definitions, or prior self-citations. No load-bearing step reduces to a self-referential definition, a fitted input renamed as prediction, or an ansatz imported via author overlap. The modeling choice introduces no internal inconsistency with the claimed equivalence.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption de Finetti representation for exchangeable repeated correctness
invented entities (1)
-
signed voting signature
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Bradley Brown, Jordan Juravsky, Ryan Ehrlich, Ronald Clark, Quoc V . Le, Christopher Ré, and Azalia Mirhoseini. Large language monkeys: Scaling inference compute with repeated sampling. arXiv:2407.21787, 2024
work page internal anchor Pith review arXiv 2024
-
[2]
Finite exchangeable sequences.The Annals of Probability, 8(4):745– 764, 1980
Persi Diaconis and David Freedman. Finite exchangeable sequences.The Annals of Probability, 8(4):745– 764, 1980
1980
-
[3]
Jury theorems
Franz Dietrich and Kai Spiekermann. Jury theorems. In Edward N. Zalta, editor,The Stanford Encyclopedia of Philosophy, 2021
2021
-
[4]
Rates of convergence in de Finetti’s representation theorem, and Hausdorff moment problem.Bernoulli, 26(2):1294–1322, 2020
Emanuele Dolera and Stefano Favaro. Rates of convergence in de Finetti’s representation theorem, and Hausdorff moment problem.Bernoulli, 26(2):1294–1322, 2020
2020
-
[5]
Detecting hallucinations in large language models using semantic entropy.Nature, 630:625–630, 2024
Sebastian Farquhar, Jannik Kossen, Lorenz Kuhn, and Yarin Gal. Detecting hallucinations in large language models using semantic entropy.Nature, 630:625–630, 2024
2024
-
[6]
Dropout as a Bayesian approximation: Representing model uncertainty in deep learning
Yarin Gal and Zoubin Ghahramani. Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. InProceedings of ICML, 2016
2016
-
[7]
Semantic uncertainty: Linguistic invariances for uncertainty estimation in natural language generation
Lorenz Kuhn, Yarin Gal, and Sebastian Farquhar. Semantic uncertainty: Linguistic invariances for uncertainty estimation in natural language generation. InProceedings of ICLR, 2023
2023
-
[8]
Simple and scalable predictive uncertainty estimation using deep ensembles
Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. Simple and scalable predictive uncertainty estimation using deep ensembles. InAdvances in Neural Information Processing Systems, 2017
2017
-
[9]
Let’s verify step by step
Hunter Lightman, Vineet Kosaraju, Yuri Burda, Harrison Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, and Karl Cobbe. Let’s verify step by step. InProceedings of ICLR, 2024
2024
-
[10]
Generating with confidence: Uncertainty quantification for black-box large language models.Transactions on Machine Learning Research, 2024
Zhen Lin, Shubhendu Trivedi, and Jimeng Sun. Generating with confidence: Uncertainty quantification for black-box large language models.Transactions on Machine Learning Research, 2024
2024
-
[11]
Two Calls, Two Moments, and the Vote-Accuracy Curve of Repeated LLM Inference
Yi Liu. Two Calls, Two Moments, and the V ote-Accuracy Curve of Repeated LLM Inference. arXiv preprint arXiv:2605.03379, 2026.https://arxiv.org/abs/2605.03379
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[12]
Potsawee Manakul, Adian Liusie, and Mark J. F. Gales. SelfCheckGPT: Zero-resource black-box hallucination detection for generative large language models. InProceedings of EMNLP, 2023
2023
-
[13]
s1: Simple test-time scaling
Niklas Muennighoff, Zitong Yang, Weijia Shi, Xiang Lisa Li, Li Fei-Fei, Hannaneh Hajishirzi, Luke Zettlemoyer, Percy Liang, Emmanuel Candès, and Tatsunori Hashimoto. s1: Simple test-time scaling. In Proceedings of EMNLP, 2025
2025
-
[14]
Zachary Nado, Neil Band, Mark Collier, Josip Djolonga, Michael W. Dusenberry, Sebastian Farquhar, Qixuan Feng, Angelos Filos, Marton Havasi, Rodolphe Jenatton, Ghassen Jerfel, Jeremiah Liu, Zelda Mariet, Jeremy Nixon, Shreyas Padhy, Jie Ren, Tim G. J. Rudner, Faris Sbahi, Yeming Wen, Florian Wenzel, Kevin Murphy, D. Sculley, Balaji Lakshminarayanan, Jaspe...
-
[15]
Kernel language entropy: Fine-grained uncertainty quantification for LLMs from semantic similarities
Alexander Nikitin, Jannik Kossen, Yarin Gal, and Pekka Marttinen. Kernel language entropy: Fine-grained uncertainty quantification for LLMs from semantic similarities. InAdvances in Neural Information Processing Systems, 2024
2024
-
[16]
Asymptotics for least absolute deviation regression estimators,
Yury Polyanskiy and Yihong Wu. Self-regularizing property of nonparametric maximum likelihood estimator in mixture models. arXiv:2008.08244, 2020
-
[17]
Rahul Rahaman and Alexandre H. Thiery. Uncertainty quantification and deep ensembles. InAdvances in Neural Information Processing Systems, 2021
2021
-
[18]
Confidence improves self-consistency in LLMs
Amir Taubenfeld, Tom Sheffer, Eran Ofek, Amir Feder, Ariel Goldstein, Zorik Gekhman, and Gal Yona. Confidence improves self-consistency in LLMs. InFindings of ACL, 2025
2025
-
[19]
Learning populations of parameters
Kevin Tian, Weihao Kong, and Gregory Valiant. Learning populations of parameters. InAdvances in Neural Information Processing Systems, 2017. 10
2017
-
[20]
Benchmarking uncertainty quantification methods for large language models with LM-Polygraph.Transactions of the Association for Computational Linguistics, 13:220–248, 2025
Roman Vashurin, Ekaterina Fadeeva, Artem Vazhentsev, Lyudmila Rvanova, Daniil Vasilev, Akim Tsvigun, Sergey Petrakov, Rui Xing, Abdelrahman Sadallah, Kirill Grishchenkov, Alexander Panchenko, Timothy Baldwin, Preslav Nakov, Maxim Panov, and Artem Shelmanov. Benchmarking uncertainty quantification methods for large language models with LM-Polygraph.Transac...
2025
-
[21]
Ramya Korlakai Vinayak, Weihao Kong, Gregory Valiant, and Sham M. Kakade. Maximum likelihood estimation for learning populations of parameters. InProceedings of ICML, 2019
2019
-
[22]
Self-consistency improves chain of thought reasoning in language models
Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. Self-consistency improves chain of thought reasoning in language models. InProceedings of ICLR, 2023
2023
-
[23]
ConU: Conformal uncertainty in large language models with correctness coverage guarantees
Zhiyuan Wang, Jinhao Duan, Lu Cheng, Yue Zhang, Qingni Wang, Xiaoshuang Shi, Kaidi Xu, Hengtao Shen, and Xiaofeng Zhu. ConU: Conformal uncertainty in large language models with correctness coverage guarantees. InFindings of EMNLP, 2024
2024
-
[24]
Pham, Michael Glass, and Junkyu Lee
Quan Xiao, Debarun Bhattacharjya, Balaji Ganesan, Radu Marinescu, Katya Mirylenka, Nhan H. Pham, Michael Glass, and Junkyu Lee. The consistency hypothesis in uncertainty quantification for large language models. InProceedings of UAI, 2025
2025
-
[25]
On efficient and scalable computation of the nonparametric maximum likelihood estimator in mixture models.Journal of Machine Learning Research, 25(8):1–46, 2024
Yangjing Zhang, Ying Cui, Bodhisattva Sen, and Kim-Chuan Toh. On efficient and scalable computation of the nonparametric maximum likelihood estimator in mixture models.Journal of Machine Learning Research, 25(8):1–46, 2024. 11 A Latent laws used in Figure 1 All curves in Figure 1 start from V0 =EQ= 3/4 . The displayed weights are rounded; the unrounded we...
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.