hub

Under review

17 Preprint · 2017 · cs.AI · arXiv 1706.05296

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

open full Pith review browse 11 citing papers arXiv PDF

abstract

We study the problem of cooperative multi-agent reinforcement learning with a single joint reward signal. This class of learning problems is difficult because of the often large combined action and observation spaces. In the fully centralized and decentralized approaches, we find the problem of spurious rewards and a phenomenon we call the "lazy agent" problem, which arises due to partial observability. We address these problems by training individual agents with a novel value decomposition network architecture, which learns to decompose the team value function into agent-wise value functions. We perform an experimental evaluation across a range of partially-observable multi-agent domains and show that learning such value-decompositions leads to superior results, in particular when combined with weight sharing, role information and information channels.

hub tools

JSON dossier citing papers JSON arXiv source

representative citing papers

Self-Supervised On-Policy Reinforcement Learning via Contrastive Proximal Policy Optimisation

cs.LG · 2026-05-13 · unverdicted · novelty 7.0

CPPO is an on-policy contrastive RL method that derives advantages from contrastive Q-values for PPO optimization, outperforming prior CRL baselines in 14/18 tasks and matching or exceeding reward-based PPO in 12/18 tasks.

Adaptive TD-Lambda for Cooperative Multi-agent Reinforcement Learning

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

ATD(λ) adapts TD(λ) in MARL via a density ratio estimator on past/current replay buffers to assign λ per state-action pair, yielding competitive or better results than fixed-λ QMIX and MAPPO on SMAC and Gfootball.

Randomness is sometimes necessary for coordination

cs.AI · 2026-05-07 · conditional · novelty 7.0

Structured per-agent randomness via ranked masking in attention allows symmetric agents to break ties and coordinate, achieving perfect success on symmetric tasks where deterministic policies fail and enabling zero-shot transfer across team sizes.

Coordination Matters: Evaluation of Cooperative Multi-Agent Reinforcement Learning

cs.MA · 2026-05-07 · unverdicted · novelty 7.0

A new controlled testbed and coordination diagnostics show that multi-agent RL methods achieving similar returns can differ substantially in redundant assignments, diversity, and efficiency.

NonZero: Interaction-Guided Exploration for Multi-Agent Monte Carlo Tree Search

cs.LG · 2026-05-01 · unverdicted · novelty 7.0

NonZero introduces an interaction score and bandit-formalized proposal rule for local agent deviations in multi-agent MCTS, delivering a sublinear local-regret guarantee and improved sample efficiency on game benchmarks without full joint-action enumeration.

Quantum Advantage in Multi Agent Reinforcement Learning

cs.LG · 2026-05-14 · conditional · novelty 6.0

Entangled QMARL agents approach the Tsirelson bound of 0.854 in CHSH while unentangled versions match classical baselines, and hybrid quantum-classical setups outperform both in CoopNav.

SACHI: Structured Agent Coordination via Holistic Information Integration in Multi-Agent Reinforcement Learning

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

SACHI uses graph transformer convolutions on inter-agent coordination graphs to enrich partial-observation agents with content-dependent teammate information, yielding statistically significant gains over baselines in five cooperative tasks.

Superminds Test: Actively Evaluating Collective Intelligence of Agent Society via Probing Agents

cs.AI · 2026-04-24 · unverdicted · novelty 6.0

Large-scale experiments on two million agents reveal that collective intelligence does not emerge from scale alone due to sparse and shallow interactions.

Do LLM-derived graph priors improve multi-agent coordination?

cs.LG · 2026-04-19 · unverdicted · novelty 6.0

LLM-generated coordination graph priors improve multi-agent reinforcement learning performance on MPE benchmarks, with models as small as 1.5B parameters proving effective.

Reflective Context Learning: Studying the Optimization Primitives of Context Space

cs.LG · 2026-04-03 · unverdicted · novelty 6.0

Reflective Context Learning unifies context optimization for agents by recasting prior methods as instances of a shared learning problem and extending them with classical primitives such as batching, failure replay, and grouped rollouts, yielding improvements on AppWorld, BrowseComp+, and RewardBene

LLM-Enhanced Deep Reinforcement Learning for Task Offloading in Collaborative Edge Computing

cs.DC · 2026-05-07 · unverdicted · novelty 5.0

LeDRL integrates a lightweight LLM using structured prompts and a reflective evaluator with self-attention DRL to achieve over 17% higher task success rates and better convergence in edge computing task offloading.

citing papers explorer

Showing 11 of 11 citing papers.

Self-Supervised On-Policy Reinforcement Learning via Contrastive Proximal Policy Optimisation cs.LG · 2026-05-13 · unverdicted · none · ref 23 · internal anchor
CPPO is an on-policy contrastive RL method that derives advantages from contrastive Q-values for PPO optimization, outperforming prior CRL baselines in 14/18 tasks and matching or exceeding reward-based PPO in 12/18 tasks.
Adaptive TD-Lambda for Cooperative Multi-agent Reinforcement Learning cs.LG · 2026-05-12 · unverdicted · none · ref 5
ATD(λ) adapts TD(λ) in MARL via a density ratio estimator on past/current replay buffers to assign λ per state-action pair, yielding competitive or better results than fixed-λ QMIX and MAPPO on SMAC and Gfootball.
Randomness is sometimes necessary for coordination cs.AI · 2026-05-07 · conditional · none · ref 92
Structured per-agent randomness via ranked masking in attention allows symmetric agents to break ties and coordinate, achieving perfect success on symmetric tasks where deterministic policies fail and enabling zero-shot transfer across team sizes.
Coordination Matters: Evaluation of Cooperative Multi-Agent Reinforcement Learning cs.MA · 2026-05-07 · unverdicted · none · ref 31
A new controlled testbed and coordination diagnostics show that multi-agent RL methods achieving similar returns can differ substantially in redundant assignments, diversity, and efficiency.
NonZero: Interaction-Guided Exploration for Multi-Agent Monte Carlo Tree Search cs.LG · 2026-05-01 · unverdicted · none · ref 33
NonZero introduces an interaction score and bandit-formalized proposal rule for local agent deviations in multi-agent MCTS, delivering a sublinear local-regret guarantee and improved sample efficiency on game benchmarks without full joint-action enumeration.
Quantum Advantage in Multi Agent Reinforcement Learning cs.LG · 2026-05-14 · conditional · none · ref 5 · internal anchor
Entangled QMARL agents approach the Tsirelson bound of 0.854 in CHSH while unentangled versions match classical baselines, and hybrid quantum-classical setups outperform both in CoopNav.
SACHI: Structured Agent Coordination via Holistic Information Integration in Multi-Agent Reinforcement Learning cs.LG · 2026-05-08 · unverdicted · none · ref 4
SACHI uses graph transformer convolutions on inter-agent coordination graphs to enrich partial-observation agents with content-dependent teammate information, yielding statistically significant gains over baselines in five cooperative tasks.
Superminds Test: Actively Evaluating Collective Intelligence of Agent Society via Probing Agents cs.AI · 2026-04-24 · unverdicted · none · ref 45
Large-scale experiments on two million agents reveal that collective intelligence does not emerge from scale alone due to sparse and shallow interactions.
Do LLM-derived graph priors improve multi-agent coordination? cs.LG · 2026-04-19 · unverdicted · none · ref 17
LLM-generated coordination graph priors improve multi-agent reinforcement learning performance on MPE benchmarks, with models as small as 1.5B parameters proving effective.
Reflective Context Learning: Studying the Optimization Primitives of Context Space cs.LG · 2026-04-03 · unverdicted · none · ref 10
Reflective Context Learning unifies context optimization for agents by recasting prior methods as instances of a shared learning problem and extending them with classical primitives such as batching, failure replay, and grouped rollouts, yielding improvements on AppWorld, BrowseComp+, and RewardBene
LLM-Enhanced Deep Reinforcement Learning for Task Offloading in Collaborative Edge Computing cs.DC · 2026-05-07 · unverdicted · none · ref 22
LeDRL integrates a lightweight LLM using structured prompts and a reflective evaluator with self-attention DRL to achieve over 17% higher task success rates and better convergence in edge computing task offloading.

Under review

hub tools

fields

years

verdicts

representative citing papers

citing papers explorer