Value-Decomposition Networks For Cooperative Multi-Agent Learning

Audrunas Gruslys; Guy Lever; Joel Z. Leibo; Karl Tuyls; Marc Lanctot; Max Jaderberg; Nicolas Sonnerat; Peter Sunehag; Thore Graepel; Vinicius Zambaldi

arxiv: 1706.05296 · v1 · pith:ZI2VCQZ3new · submitted 2017-06-16 · 💻 cs.AI

Value-Decomposition Networks For Cooperative Multi-Agent Learning

Peter Sunehag , Guy Lever , Audrunas Gruslys , Wojciech Marian Czarnecki , Vinicius Zambaldi , Max Jaderberg , Marc Lanctot , Nicolas Sonnerat

show 3 more authors

Joel Z. Leibo Karl Tuyls Thore Graepel

This is my paper

classification 💻 cs.AI

keywords learningmulti-agentproblemvaluecombinedcooperativeinformationproblems

0 comments

read the original abstract

We study the problem of cooperative multi-agent reinforcement learning with a single joint reward signal. This class of learning problems is difficult because of the often large combined action and observation spaces. In the fully centralized and decentralized approaches, we find the problem of spurious rewards and a phenomenon we call the "lazy agent" problem, which arises due to partial observability. We address these problems by training individual agents with a novel value decomposition network architecture, which learns to decompose the team value function into agent-wise value functions. We perform an experimental evaluation across a range of partially-observable multi-agent domains and show that learning such value-decompositions leads to superior results, in particular when combined with weight sharing, role information and information channels.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 26 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Beyond Individual Intelligence: Surveying Collaboration, Failure Attribution, and Self-Evolution in LLM-based Multi-Agent Systems
cs.AI 2026-05 unverdicted novelty 7.0

A survey that unifies prior work on multi-agent LLM systems via the LIFE framework, mapping dependencies across collaboration, failure attribution, and autonomous self-evolution while identifying cross-stage challenges.
Self-Supervised On-Policy Reinforcement Learning via Contrastive Proximal Policy Optimisation
cs.LG 2026-05 unverdicted novelty 7.0

CPPO is an on-policy contrastive RL method that derives advantages from contrastive Q-values for PPO optimization, outperforming prior CRL baselines in 14/18 tasks and matching or exceeding reward-based PPO in 12/18 tasks.
Adaptive TD-Lambda for Cooperative Multi-agent Reinforcement Learning
cs.LG 2026-05 unverdicted novelty 7.0

ATD(λ) adapts TD(λ) in MARL via a density ratio estimator on past/current replay buffers to assign λ per state-action pair, yielding competitive or better results than fixed-λ QMIX and MAPPO on SMAC and Gfootball.
Metric-Gradient Projection for Stable Multi-Agent Policy Learning
cs.LG 2026-05 unverdicted novelty 7.0

HPML projects multi-agent update fields onto the closest metric-gradient potential flow via Hodge decomposition, yielding Lyapunov potentials and equilibrium-gap bounds.
Randomness is sometimes necessary for coordination
cs.AI 2026-05 conditional novelty 7.0

Structured per-agent randomness via ranked masking in attention allows symmetric agents to break ties and coordinate, achieving perfect success on symmetric tasks where deterministic policies fail and enabling zero-sh...
Coordination Matters: Evaluation of Cooperative Multi-Agent Reinforcement Learning
cs.MA 2026-05 unverdicted novelty 7.0

A new controlled testbed and coordination diagnostics show that multi-agent RL methods achieving similar returns can differ substantially in redundant assignments, diversity, and efficiency.
NonZero: Interaction-Guided Exploration for Multi-Agent Monte Carlo Tree Search
cs.LG 2026-05 unverdicted novelty 7.0

NonZero introduces an interaction score and bandit-formalized proposal rule for local agent deviations in multi-agent MCTS, delivering a sublinear local-regret guarantee and improved sample efficiency on game benchmar...
Descent-Guided Policy Gradient for Scalable Cooperative Multi-Agent Learning
cs.MA 2026-02 unverdicted novelty 7.0

DG-PG augments policy gradients with descent signals from analytical models to reduce estimator variance from O(N) to O(1), preserve game equilibria, and achieve agent-independent sample complexity while converging on...
Interaction-Breaking Adversarial Learning Framework for Robust Multi-Agent Reinforcement Learning
cs.LG 2026-05 unverdicted novelty 6.0

The IBAL framework builds information-theoretic attacks that break agent interactions in MARL and trains policies to stay robust under observation and action perturbations.
Quantum Advantage in Multi Agent Reinforcement Learning
cs.LG 2026-05 conditional novelty 6.0

Entangled QMARL agents approach the Tsirelson bound of 0.854 in CHSH while unentangled versions match classical baselines, and hybrid quantum-classical setups outperform both in CoopNav.
SACHI: Structured Agent Coordination via Holistic Information Integration in Multi-Agent Reinforcement Learning
cs.LG 2026-05 unverdicted novelty 6.0

SACHI uses graph transformer convolutions on inter-agent coordination graphs to enrich partial-observation agents with content-dependent teammate information, yielding statistically significant gains over baselines in...
SACHI: Structured Agent Coordination via Holistic Information Integration in Multi-Agent Reinforcement Learning
cs.LG 2026-05 conditional novelty 6.0

SACHI enriches agent representations via graph transformer convolutions over inter-agent graphs to enable holistic information integration, outperforming baselines across five cooperative tasks with statistical significance.
Superminds Test: Actively Evaluating Collective Intelligence of Agent Society via Probing Agents
cs.AI 2026-04 unverdicted novelty 6.0

Large-scale experiments on two million agents reveal that collective intelligence does not emerge from scale alone due to sparse and shallow interactions.
Do LLM-derived graph priors improve multi-agent coordination?
cs.LG 2026-04 unverdicted novelty 6.0

LLM-generated coordination graph priors improve multi-agent reinforcement learning performance on MPE benchmarks, with models as small as 1.5B parameters proving effective.
Reflective Context Learning: Studying the Optimization Primitives of Context Space
cs.LG 2026-04 unverdicted novelty 6.0

Reflective Context Learning unifies context optimization for agents by recasting prior methods as instances of a shared learning problem and extending them with classical primitives such as batching, failure replay, a...
Learning Decentralized LLM Collaboration with Multi-Agent Actor Critic
cs.AI 2026-01 unverdicted novelty 6.0

Multi-agent actor-critic methods with a centralized critic improve decentralized LLM collaboration over Monte Carlo baselines in long-horizon and sparse-reward settings.
Optimistic {\epsilon}-Greedy Exploration for Cooperative Multi-Agent Reinforcement Learning
cs.MA 2025-02 unverdicted novelty 6.0

Optimistic ε-Greedy Exploration adds decoupled optimistic networks that converge in probability to maximum returns and samples from them with probability ε to increase optimal joint-action frequency in CTDE MARL.
Wolfpack Adversarial Attack for Robust Multi-Agent Reinforcement Learning
cs.LG 2025-02 unverdicted novelty 6.0

Wolfpack attack framework disrupts MARL cooperation by targeting initial and assisting agents; WALL trains robust policies against it with reported experimental gains.
Beyond Individual Intelligence: Surveying Collaboration, Failure Attribution, and Self-Evolution in LLM-based Multi-Agent Systems
cs.AI 2026-05 conditional novelty 5.0

The survey proposes the LIFE framework to unify fragmented research on collaboration, failure attribution, and self-evolution in LLM multi-agent systems into a progression toward self-organizing intelligence.
LLM-Enhanced Deep Reinforcement Learning for Task Offloading in Collaborative Edge Computing
cs.DC 2026-05 unverdicted novelty 5.0

LeDRL integrates a lightweight LLM using structured prompts and a reflective evaluator with self-attention DRL to achieve over 17% higher task success rates and better convergence in edge computing task offloading.
Enhancing Cloud Network Resilience via a Robust LLM-Empowered Multi-Agent Reinforcement Learning Framework
cs.CR 2026-01 unverdicted novelty 5.0

CyberOps-Bots is a hierarchical LLM-empowered multi-agent RL framework that reports 68.5% higher network availability and 34.7% better jumpstart performance in new scenarios without retraining on real cloud datasets.
Fully Decentralized Cooperative Multi-Agent Reinforcement Learning is A Context Modeling Problem
cs.LG 2025-09 unverdicted novelty 5.0

DAC models fully decentralized cooperative MARL as a context modeling problem, using latent variables for joint policies to fix non-stationarity in value updates and relative overgeneralization in value estimation.
Centralized Adaptive Sampling for Reliable Co-Training of Independent Multi-Agent Policies
cs.LG 2025-08 unverdicted novelty 5.0

CoSER adaptively samples joint actions in CTDE MARL to reduce sampling error relative to the joint on-policy distribution, empirically improving reliability of independent policy gradient convergence.
Reflection of Episodes: Learning to Play Game from Expert and Self Experiences
cs.AI 2025-02 unverdicted novelty 5.0

ROE framework lets LLM defeat Very Hard bot in TextStarCraft II via keyframe selection, expert/self-experience decisions, and post-game reflection for new self-experience.
Growing Action Spaces
cs.LG 2019-06 unverdicted novelty 5.0

A curriculum of growing action spaces combined with simultaneous off-policy value estimation accelerates learning in large multi-agent action spaces.
GLo-MAPPO: Multi-Agent Deep Reinforcement Learning for Energy-Efficient UAV-Assisted LoRa Networks
cs.NI 2025-09 unverdicted novelty 4.0

GLo-MAPPO applies centralized-training decentralized-execution MAPPO with a gain-based association scheme to jointly optimize LoRa parameters and UAV paths, yielding higher weighted energy efficiency than prior MARL b...