pith. sign in

arxiv: 2603.01221 · v2 · pith:HSU5WAS3new · submitted 2026-03-01 · 💻 cs.MA

Epistemic Gain, Aleatoric Cost: Uncertainty Decomposition in Multi-Agent Debate for Math Reasoning

classification 💻 cs.MA
keywords debateuncertaintyaleatoriccostepistemicagentgainmulti-agent
0
0 comments X
read the original abstract

Multi-Agent Debate (MAD) has shown promise in improving reasoning and reducing hallucinations, yet it remains unclear how information exchange shapes individual reasoning behavior. Empirically, MAD exhibits paradoxical phenomena, including rising accuracy with increasing token entropy and marked differences between homogeneous and heterogeneous agent combinations. In this paper, we introduce a Bayesian uncertainty analysis framework for MAD, which decomposes answer-level predictive uncertainty into epistemic uncertainty and aleatoric uncertainty, corresponding to the potential gain and cost of debate. Across multiple agent configurations, we find that effective debate depends on achieving high epistemic gain under controlled aleatoric cost. Building on this insight, we design an uncertainty-guided multi-agent reinforcement learning algorithm that encourages lower aleatoric cost and more effective epistemic information utilization. Experiments show that our approach simultaneously enhances each agent's accuracy and promotes a more productive debate process, providing an operational Bayesian perspective for understanding and improving MAD.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Multiagent Protocols with Aggregated Confidence Signals

    cs.AI 2026-06 unverdicted novelty 6.0

    Introduces protocols to aggregate transformed confidence signals from multiagent debates via soft voting or Bayesian fusion, yielding higher AUARC than single agents or standard baselines while keeping F1 stable acros...