Recognition: 2 theorem links
· Lean TheoremFrom Debate to Decision: Conformal Social Choice for Safe Multi-Agent Deliberation
Pith reviewed 2026-05-10 18:32 UTC · model grok-4.3
The pith
Applying split conformal prediction to aggregated probabilities from multiple LLM agents in debate creates guaranteed-coverage sets that allow safe autonomous action only on singletons.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Verbalized probability distributions from heterogeneous agents are aggregated via a linear opinion pool and calibrated with split conformal prediction, yielding prediction sets with a marginal coverage guarantee that the correct answer is included with probability at least 1 minus alpha. A hierarchical action policy then maps singleton sets to autonomous action and larger sets to human escalation. On benchmarks with three agents, coverage stays close to target, 81.9 percent of wrong-consensus cases are intercepted at alpha equal to 0.05, and the remaining singleton cases reach 90.0 to 96.8 percent accuracy.
What carries the argument
The conformal prediction sets obtained after linear pooling of agent probabilities, which enforce marginal coverage and drive the singleton-versus-larger-set decision rule.
If this is right
- 81.9 percent of wrong-consensus cases lead to escalation at alpha of 0.05 instead of erroneous autonomous action.
- The accuracy of cases where the system acts autonomously reaches 90.0 to 96.8 percent, up to 22.1 percentage points above simple consensus-based stopping.
- Empirical coverage of the correct answer stays within 1 to 2 points of the target level across the tested domains.
- The fraction of cases handled autonomously versus escalated can be adjusted directly by changing the alpha parameter.
Where Pith is reading between the lines
- The same post-hoc calibration approach could be tested on multi-agent systems that use different interaction protocols beyond debate.
- High-stakes applications might accept a lower automation rate in exchange for the coverage guarantee.
- Future work could explore whether non-linear pooling methods preserve the conformal guarantees as well as linear ones do.
- Deployment would require maintaining a calibration set of past debates that remains similar to new inputs.
Load-bearing premise
The past calibration examples and the new debate cases must be similar enough in distribution that the coverage guarantee still holds for the combined probabilities.
What would settle it
Collecting a new set of debate instances, computing the conformal sets, and checking whether the fraction of cases where the correct answer is missing from the set exceeds the chosen alpha.
Figures
read the original abstract
Multi-agent debate improves LLM reasoning, yet agreement among agents is not evidence of correctness. When agents converge on a wrong answer through social reinforcement, consensus-based stopping commits that error to an automated action with no recourse. We introduce Conformal Social Choice, a post-hoc decision layer that converts debate outputs into calibrated act-versus-escalate decisions. Verbalized probability distributions from heterogeneous agents are aggregated via a linear opinion pool and calibrated with split conformal prediction, yielding prediction sets with a marginal coverage guarantee: the correct answer is included with probability ${\geq}\,1{-}\alpha$, without assumptions on individual model calibration. A hierarchical action policy maps singleton sets to autonomous action and larger sets to human escalation. On eight MMLU-Pro domains with three agents (Claude Haiku, DeepSeek-R1, Qwen-3 32B), coverage stays within 1--2 points of the target. The key finding is not that debate becomes more accurate, but that the conformal layer makes its failures actionable: 81.9% of wrong-consensus cases are intercepted at $\alpha{=}0.05$. Because the layer refuses to act on cases where debate is confidently wrong, the remaining conformal singletons reach 90.0--96.8% accuracy (up to 22.1pp above consensus stopping) -- a selection effect, not a reasoning improvement. This safety comes at the cost of automation, but the operating point is user-adjustable via $\alpha$.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Conformal Social Choice, a post-hoc decision layer for multi-agent LLM debate. Verbalized probability distributions from heterogeneous agents are aggregated via linear opinion pool and calibrated using split conformal prediction to produce prediction sets with a marginal coverage guarantee (correct answer included with probability ≥1-α, without requiring individual model calibration). A hierarchical policy maps singleton sets to autonomous action and larger sets to human escalation. On eight MMLU-Pro domains with three fixed LLMs, empirical coverage stays within 1-2 points of target; at α=0.05, 81.9% of wrong-consensus cases are intercepted, and remaining singletons achieve 90.0-96.8% accuracy (up to 22.1pp gain over consensus stopping) via selection effects rather than improved reasoning.
Significance. If the coverage guarantee is valid under the experimental conditions, the work offers a practical, adjustable mechanism to mitigate risks of erroneous consensus in multi-agent deliberation by making failures actionable through calibrated escalation. It correctly applies established split conformal prediction theory to aggregated scores and demonstrates that safety gains arise from selective abstention. This could be relevant for high-stakes LLM agent deployment, with the α parameter providing a tunable automation-safety tradeoff.
major comments (1)
- [Conformal calibration procedure and experimental setup] The marginal coverage guarantee and the 81.9% wrong-consensus interception rate at α=0.05 rest on exchangeability of calibration and test instances for the linearly pooled scores. The setup uses three heterogeneous LLMs across eight MMLU-Pro domains with debate dynamics; the manuscript does not address whether domain shifts, agent-specific behaviors, or non-random splits violate exchangeability (see abstract claim of guarantee 'without assumptions on individual model calibration' and the conformal calibration description). This is load-bearing for the central safety claims, as violation would render the reported coverage and interception rates without theoretical backing.
minor comments (2)
- [Methods] Clarify the exact form of the linear opinion pool aggregation with an explicit equation or pseudocode, including how verbalized probabilities are normalized across agents.
- [Experiments] In results tables or figures showing coverage and accuracy, include the target 1-α line and report the number of calibration/test instances per domain to allow assessment of statistical reliability.
Simulated Author's Rebuttal
We thank the referee for the positive evaluation of the work's significance and for identifying the need to explicitly address the exchangeability assumption underlying the conformal coverage guarantee. We respond to the major comment below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [Conformal calibration procedure and experimental setup] The marginal coverage guarantee and the 81.9% wrong-consensus interception rate at α=0.05 rest on exchangeability of calibration and test instances for the linearly pooled scores. The setup uses three heterogeneous LLMs across eight MMLU-Pro domains with debate dynamics; the manuscript does not address whether domain shifts, agent-specific behaviors, or non-random splits violate exchangeability (see abstract claim of guarantee 'without assumptions on individual model calibration' and the conformal calibration description). This is load-bearing for the central safety claims, as violation would render the reported coverage and interception rates without theoretical backing.
Authors: We agree that the marginal coverage guarantee of split conformal prediction requires exchangeability between calibration and test instances with respect to the linearly pooled scores. The abstract phrase 'without assumptions on individual model calibration' specifically means that the base LLMs need not output calibrated probabilities; the linear opinion pool produces an aggregated score that is calibrated post-hoc by the conformal procedure. However, the manuscript does not explicitly discuss the exchangeability assumption for these aggregated scores or potential violations arising from domain shifts across MMLU-Pro domains, heterogeneous agent behaviors, or debate dynamics. In the revised version we will add a dedicated paragraph in the Methods section clarifying that (i) the guarantee is marginal and holds under exchangeability of the pooled scores, (ii) calibration and test splits are formed by random partitioning of instances (we will state this explicitly), and (iii) while domain heterogeneity could in principle affect exchangeability, the observed coverage remains within 1–2 points of the nominal level across all eight domains, providing empirical support for practical validity. We will also report per-domain coverage statistics to allow readers to evaluate robustness. These clarifications address the theoretical foundation without changing the empirical results or core claims. revision: yes
Circularity Check
No circularity: coverage guarantee from established split conformal prediction theory
full rationale
The paper aggregates verbalized probabilities from heterogeneous agents via a linear opinion pool and applies split conformal prediction to produce prediction sets with a marginal coverage guarantee. This guarantee follows directly from the standard theorem for split conformal prediction (requiring only exchangeability of calibration and test instances, which the paper states holds), without any reduction to quantities defined, fitted, or renamed within the paper's own equations. The 81.9% interception rate and accuracy improvements on conformal singletons are empirical observations on the MMLU-Pro evaluation, not forced by construction. No self-citation is load-bearing for the core claim, no uniqueness theorem is imported from the authors' prior work, and no ansatz is smuggled via citation. The derivation chain is self-contained against external benchmarks of conformal prediction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The calibration and test instances (debate examples) are exchangeable, as required for split conformal prediction to deliver marginal coverage guarantees.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel (J uniqueness) unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Verbalized probability distributions from heterogeneous agents are aggregated via a linear opinion pool and calibrated with split conformal prediction, yielding prediction sets with a marginal coverage guarantee: Pr[y∈C(x)]≥1−α
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
A hierarchical action policy maps singleton sets to autonomous action and larger sets to human escalation
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
On the steady state of continuous-time stochas- tic opinion dynamics with power-law confidence. Journal of Applied Probability, 58(3):746–772. Hyeong Kyu Choi, Xiaojin Zhu, and Sharon Li. 2025. Debate or vote: Which yields better decisions in multi-agent large language models?Preprint, arXiv:2508.17536. Roi Cohen, May Hamri, Mor Geva, and Amir Glober- son...
-
[2]
Towards Understanding Sycophancy in Language Models
Classification with valid and adaptive cover- age. InAdvances in Neural Information Processing Systems, volume 33, pages 3581–3591. Mrinank Sharma, Meg Tong, Tomasz Korbak, David Duvenaud, Amanda Askell, Samuel R. Bow- man, Newton Cheng, Esin Durmus, Zac Hatfield- Dodds, Scott R. Johnston, Shauna Kravec, Timo- thy Maxwell, Sam McCandlish, Kamal Ndousse, O...
work page internal anchor Pith review arXiv 2024
-
[3]
Su, J., Luo, J., Wang, H., and Cheng, L
API is enough: Conformal prediction for large language models without logit-access.Preprint, arXiv:2403.01216. Katherine Tian, Eric Mitchell, Allan Zhou, Archit Sharma, Rafael Rafailov, Huaxiu Yao, Chelsea Finn, and Christopher D. Manning. 2023. Just ask for cali- bration: Strategies for eliciting calibrated confidence scores from language models fine-tun...
-
[4]
But by round 3, ˆq reaches 1.0 and the set size jumps to 9.76
through rounds 0–2 because even the 99th- percentile calibration example retains some prob- ability on the correct answer. But by round 3, ˆq reaches 1.0 and the set size jumps to 9.76. In con- trast, domains like Law (60–66% accuracy) already have ˆq= 1.0 by round 1, as the larger fraction of errors produces extreme nonconformity scores earlier in the de...
-
[5]
The ratedecreasesacross rounds: as debate refines social probabilities, the residual uncer- tainty on wrong answers increases, pushing them out of singleton sets
-
[6]
G.4 Net Safety Balance The net effect of conformal prediction on safety is overwhelmingly positive
The rate is higher at α=0.10 than α=0.05: tighter sets are more likely to collapse to a wrong singleton. G.4 Net Safety Balance The net effect of conformal prediction on safety is overwhelmingly positive. At α=0.05 and round 3, conformal prediction: • Catches480 out of 586 wrong-consensus er- rors (81.9% rejection rate); • Introducesonly 2 wrong singleton...
2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.