Epistemic Gain, Aleatoric Cost: Uncertainty Decomposition in Multi-Agent Debate for Math Reasoning

Baoxiang Wang; Binbin Chen; Dan Qiao; Fengyu Cai; Fuxin Jiang; Hongyuan Zha; Jianlong Chen; Tieying Zhang; Wenhao Li; Zuzhi Chen

arxiv: 2603.01221 · v2 · pith:HSU5WAS3new · submitted 2026-03-01 · 💻 cs.MA

Epistemic Gain, Aleatoric Cost: Uncertainty Decomposition in Multi-Agent Debate for Math Reasoning

Dan Qiao , Binbin Chen , Fengyu Cai , Jianlong Chen , Wenhao Li , Fuxin Jiang , Zuzhi Chen , Hongyuan Zha

show 2 more authors

Tieying Zhang Baoxiang Wang

This is my paper

classification 💻 cs.MA

keywords debateuncertaintyaleatoriccostepistemicagentgainmulti-agent

0 comments

read the original abstract

Multi-Agent Debate (MAD) has shown promise in improving reasoning and reducing hallucinations, yet it remains unclear how information exchange shapes individual reasoning behavior. Empirically, MAD exhibits paradoxical phenomena, including rising accuracy with increasing token entropy and marked differences between homogeneous and heterogeneous agent combinations. In this paper, we introduce a Bayesian uncertainty analysis framework for MAD, which decomposes answer-level predictive uncertainty into epistemic uncertainty and aleatoric uncertainty, corresponding to the potential gain and cost of debate. Across multiple agent configurations, we find that effective debate depends on achieving high epistemic gain under controlled aleatoric cost. Building on this insight, we design an uncertainty-guided multi-agent reinforcement learning algorithm that encourages lower aleatoric cost and more effective epistemic information utilization. Experiments show that our approach simultaneously enhances each agent's accuracy and promotes a more productive debate process, providing an operational Bayesian perspective for understanding and improving MAD.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Multiagent Protocols with Aggregated Confidence Signals
cs.AI 2026-06 unverdicted novelty 6.0

Introduces protocols to aggregate transformed confidence signals from multiagent debates via soft voting or Bayesian fusion, yielding higher AUARC than single agents or standard baselines while keeping F1 stable acros...