NonZero: Interaction-Guided Exploration for Multi-Agent Monte Carlo Tree Search

· 2026 · cs.LG · arXiv 2605.00751

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

Monte Carlo Tree Search (MCTS) scales poorly in cooperative multi-agent domains because expansion must consider an exponentially large set of joint actions, severely limiting exploration under realistic search budgets. We propose NonZero, which keeps multi-agent MCTS tractable by running surrogate-guided selection over a low-dimensional nonlinear representation using an interaction-guided proposal rule, instead of directly exploring the full joint-action space. Our exploration uses an interaction score: single-agent deviations are ranked by predicted gain, while two-agent deviations are scored by a mixed-difference measure that reveals coordination benefits even when no single agent can improve alone. We formalize candidate proposal as a bandit problem over local deviations and derive a proposal rule, NonZero, with a sublinear local-regret guarantee for reaching approximate graph-local optima without enumerating the joint-action space. Empirically, NonZero improves sample efficiency and final performance on MatGame, SMAC, and SMACv2 relative to strong model-based and model-free baselines under matched search budgets.

representative citing papers

Matrix-Space Reinforcement Learning for Reusing Local Transition Geometry

cs.LG · 2026-05-14 · unverdicted · novelty 7.0

MSRL represents trajectory segments as PSD matrices to prove additive composition properties and bootstrap value functions for better transfer, reaching 0.73 AUC versus 0.57-0.65 baselines.

Metric-Gradient Projection for Stable Multi-Agent Policy Learning

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

HPML projects multi-agent update fields onto the closest metric-gradient potential flow via Hodge decomposition, yielding Lyapunov potentials and equilibrium-gap bounds.

citing papers explorer

Showing 2 of 2 citing papers.

Matrix-Space Reinforcement Learning for Reusing Local Transition Geometry cs.LG · 2026-05-14 · unverdicted · none · ref 16 · internal anchor
MSRL represents trajectory segments as PSD matrices to prove additive composition properties and bootstrap value functions for better transfer, reaching 0.73 AUC versus 0.57-0.65 baselines.
Metric-Gradient Projection for Stable Multi-Agent Policy Learning cs.LG · 2026-05-12 · unverdicted · none · ref 18 · internal anchor
HPML projects multi-agent update fields onto the closest metric-gradient potential flow via Hodge decomposition, yielding Lyapunov potentials and equilibrium-gap bounds.

NonZero: Interaction-Guided Exploration for Multi-Agent Monte Carlo Tree Search

fields

years

verdicts

representative citing papers

citing papers explorer