Counterfactual multi-agent policy gradients

Jakob Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, Shimon Whiteson · 2024 · arXiv 1705.08926

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

DecompRL: Solving Harder Problems by Learning Modular Code Generation

cs.LG · 2026-07-02 · unverdicted · novelty 7.0

DecompRL is an RL method that learns modular code decomposition for LLMs, enabling exponential candidate generation via recombination to solve harder coding problems with lower GPU cost.

Plasticity-Enhanced Multi-Agent Mixture of Experts for Dynamic Objective Adaptation in UAVs-Assisted Emergency Communication Networks

cs.MA · 2026-04-10 · unverdicted · novelty 7.0

PE-MAMoE combines sparsely gated mixture-of-experts actors with a non-parametric phase controller in MAPPO to maintain plasticity under dynamic user mobility and traffic, yielding 26.3% higher normalized IQM return in simulations.

Robust Instruction Compliance in Cooperative Multi-Agent Reinforcement Learning

cs.AI · 2026-05-12 · unverdicted · novelty 6.0

MAVIC corrects Bellman backups at instruction boundaries by adjusting the incoming objective and restoring continuation value, enabling consistent estimation under stochastic instruction switching in cooperative MARL.

Scalable Neighborhood-Based Multi-Agent Actor-Critic

cs.LG · 2026-04-20 · unverdicted · novelty 6.0

MADDPG-K scales centralized critics in multi-agent RL by limiting each critic to k-nearest neighbors under Euclidean distance, yielding constant input size and competitive performance.

Learning Safe Unlabeled Multi-Robot Planning with Motion Constraints

cs.RO · 2019-07-11 · unverdicted · novelty 5.0

A multi-agent RL framework for unlabeled multi-robot planning that uses velocity obstacle projections to guarantee collision-free trajectories applicable to arbitrary robot models.

citing papers explorer

Showing 5 of 5 citing papers.

DecompRL: Solving Harder Problems by Learning Modular Code Generation cs.LG · 2026-07-02 · unverdicted · none · ref 18
DecompRL is an RL method that learns modular code decomposition for LLMs, enabling exponential candidate generation via recombination to solve harder coding problems with lower GPU cost.
Plasticity-Enhanced Multi-Agent Mixture of Experts for Dynamic Objective Adaptation in UAVs-Assisted Emergency Communication Networks cs.MA · 2026-04-10 · unverdicted · none · ref 4
PE-MAMoE combines sparsely gated mixture-of-experts actors with a non-parametric phase controller in MAPPO to maintain plasticity under dynamic user mobility and traffic, yielding 26.3% higher normalized IQM return in simulations.
Robust Instruction Compliance in Cooperative Multi-Agent Reinforcement Learning cs.AI · 2026-05-12 · unverdicted · none · ref 133
MAVIC corrects Bellman backups at instruction boundaries by adjusting the incoming objective and restoring continuation value, enabling consistent estimation under stochastic instruction switching in cooperative MARL.
Scalable Neighborhood-Based Multi-Agent Actor-Critic cs.LG · 2026-04-20 · unverdicted · none · ref 2
MADDPG-K scales centralized critics in multi-agent RL by limiting each critic to k-nearest neighbors under Euclidean distance, yielding constant input size and competitive performance.
Learning Safe Unlabeled Multi-Robot Planning with Motion Constraints cs.RO · 2019-07-11 · unverdicted · none · ref 11
A multi-agent RL framework for unlabeled multi-robot planning that uses velocity obstacle projections to guarantee collision-free trajectories applicable to arbitrary robot models.

Counterfactual multi-agent policy gradients

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer