CoRRabs/1902.04043(2019)

Samvelyan, M · 1902 · arXiv 1902.04043

29 Pith papers cite this work. Polarity classification is still indexing.

29 Pith papers citing it

read on arXiv browse 29 citing papers

citation-role summary

background 2 method 1 other 1

citation-polarity summary

background 2 unclear 1 use method 1

representative citing papers

Randomized Advantage Transformation (RAT): Computing Natural Policy Gradients via Direct Backpropagation

cs.LG · 2026-05-18 · unverdicted · novelty 7.0

RAT reformulates regularized natural policy gradients as vanilla gradients with a transformed advantage, computed efficiently via randomized block Kaczmarz iterations on on-policy data.

Interaction-Breaking Adversarial Learning Framework for Robust Multi-Agent Reinforcement Learning

cs.LG · 2026-05-18 · unverdicted · novelty 7.0 · 2 refs

IBAL framework constructs information-theoretic adversarial attacks on agent observations and actions to train MARL agents that remain robust to interaction disruptions and agent-missing scenarios.

Beyond the All-in-One Agent: Benchmarking Role-Specialized Multi-Agent Collaboration in Enterprise Workflows

cs.MA · 2026-05-09 · unverdicted · novelty 7.0

EntCollabBench shows that today's LLM agents still struggle with delegation, context transfer, parameter grounding, workflow closure, and decision commitment when tested in a simulated enterprise with 11 role-specialized agents.

CoFlow: Coordinated Few-Step Flow for Offline Multi-Agent Decision Making

cs.AI · 2026-05-02 · unverdicted · novelty 7.0 · 3 refs

CoFlow achieves state-of-the-art coordination in offline MARL using single-pass joint velocity fields with Coordinated Velocity Attention and Adaptive Coordination Gating.

Wireless Communication Enhanced Value Decomposition for Multi-Agent Reinforcement Learning

cs.LG · 2026-04-09 · unverdicted · novelty 7.0

CLOVER augments value decomposition with a GNN mixer whose weights depend on the realized wireless communication graph, proving permutation invariance, monotonicity, and greater expressiveness than QMIX while showing gains on Predator-Prey and Lumberjacks under p-CSMA channels.

Play Like Champions: Counterfactual Feedback Generation in Latent Space

cs.LG · 2026-06-30 · unverdicted · novelty 6.0

A guided VAE trained on pro StarCraft replays enables four latent-space traversal strategies to produce counterfactual improvement trajectories for amateur players.

ASALT: Adaptive State Alignment for Lateral Transfer in Multi-agent Reinforcement Learning

cs.AI · 2026-06-23 · unverdicted · novelty 6.0

ASALT uses observation-level and state-level adapters to align mismatched dimensionalities into a shared embedding for transferring actors and critics in MARL, showing improved sample efficiency and reduced negative transfer in cooperative benchmarks.

CCKS: Consensus-based Communication and Knowledge Sharing

cs.MA · 2026-06-10 · unverdicted · novelty 6.0

CCKS adds consensus constraints built by contrastive learning on local observations to action-advising in DTDE MARL, yielding faster learning and higher performance on football and StarCraft benchmarks.

Beyond Partner Diversity: An Influence-Based Team Steering Framework for Zero-Shot Human-Machine Teaming

cs.AI · 2026-05-14 · unverdicted · novelty 6.0

IBTS framework uses influence shaping to improve zero-shot human-machine teaming beyond partner diversity alone, with gains shown in Overcooked-AI simulations and a 30-subject human study.

SACHI: Structured Agent Coordination via Holistic Information Integration in Multi-Agent Reinforcement Learning

cs.LG · 2026-05-08 · conditional · novelty 6.0 · 2 refs

SACHI enriches agent representations via graph transformer convolutions over inter-agent graphs to enable holistic information integration, outperforming baselines across five cooperative tasks with statistical significance.

Interactive Inverse Reinforcement Learning of Interaction Scenarios via Bi-level Optimization

cs.LG · 2026-05-01 · unverdicted · novelty 6.0

Interactive IRL is cast as bi-level optimization with an inner loop learning expert rewards and an outer loop learning interaction policies, solved by the convergent BISIRL algorithm.

Superminds Test: Actively Evaluating Collective Intelligence of Agent Society via Probing Agents

cs.AI · 2026-04-24 · unverdicted · novelty 6.0

Large-scale experiments on two million agents reveal that collective intelligence does not emerge from scale alone due to sparse and shallow interactions.

Value-Guidance MeanFlow for Offline Multi-Agent Reinforcement Learning

cs.LG · 2026-04-09 · unverdicted · novelty 6.0

VGM²P achieves SOTA-comparable performance in offline MARL via value-guided conditional behavior cloning with MeanFlow, enabling efficient single-step action generation insensitive to regularization coefficients.

Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory

cs.CL · 2025-11-25 · unverdicted · novelty 6.0

Evo-Memory is a new streaming benchmark and evaluation framework for self-evolving memory in LLM agents, unifying over ten memory modules and introducing the ReMem pipeline for continual improvement on multi-turn and reasoning datasets.

Overcoming Environmental Meta-Stationarity in MARL via Adaptive Curriculum and Counterfactual Group Advantage

cs.AI · 2025-06-09 · unverdicted · novelty 6.0

CL-MARL uses an adaptive curriculum scheduler called FlexDiff and Counterfactual Group Relative Policy Advantage to break static-difficulty training in MARL and achieve higher win rates on hard StarCraft maps.

VS-Bench: Evaluating VLMs for Strategic Abilities in Multi-Agent Environments

cs.AI · 2025-06-03 · unverdicted · novelty 6.0

VS-Bench is a new benchmark of ten visual multi-agent environments that measures VLMs on element recognition, next-action prediction, and normalized episode return, showing strong perception but large gaps in reasoning and decision-making with the best model at 46.6% prediction accuracy and 31.4% of

Optimistic {\epsilon}-Greedy Exploration for Cooperative Multi-Agent Reinforcement Learning

cs.MA · 2025-02-05 · unverdicted · novelty 6.0

Optimistic ε-Greedy Exploration adds decoupled optimistic networks that converge in probability to maximum returns and samples from them with probability ε to increase optimal joint-action frequency in CTDE MARL.

Wolfpack Adversarial Attack for Robust Multi-Agent Reinforcement Learning

cs.LG · 2025-02-05 · unverdicted · novelty 6.0

Wolfpack attack framework disrupts MARL cooperation by targeting initial and assisting agents; WALL trains robust policies against it with reported experimental gains.

Group-Aware Coordination Graph for Multi-Agent Reinforcement Learning

cs.LG · 2024-04-17 · unverdicted · novelty 6.0

GACG infers a coordination graph capturing both pair and group dependencies for information exchange in MARL, adds a group distance loss for consistency, and reports superior performance on StarCraft II micromanagement tasks.

Arena: a toolkit for Multi-Agent Reinforcement Learning

cs.LG · 2019-07-20 · accept · novelty 6.0

Arena introduces a modular Interface design that extends OpenAI Gym wrappers to support complex multi-agent RL scenarios including self-play and cooperative-competitive interactions.

TRIDENT: Breaking the Hybrid-Safety-Physics Coupling for Provably Safe Multi-Agent Reinforcement Learning

cs.LG · 2026-06-16 · unverdicted · novelty 5.0

TRIDENT is a MARL framework using Richardson-Romberg gradient correction, Lyapunov-constrained trust-region updates, and a physics-informed residual critic that claims O(1/sqrt(K)) convergence to constrained Nash equilibrium with O(sqrt(K)) violation bounds and large reductions in training violation

Phi-Actor-Critic: Steering General-Sum Games to Pareto-Efficient Correlated Equilibria

cs.MA · 2026-06-09 · unverdicted · novelty 5.0

Phi-Actor-Critic is a new method that steers multi-agent reinforcement learning toward Pareto-efficient correlated equilibria using regret minimization and Lagrangian selection.

A High-Throughput Compute-Efficient POMDP Hide-And-Seek-Engine (HASE) for Multi-Agent Operations

cs.MA · 2026-04-29 · unverdicted · novelty 5.0

A C++ Dec-POMDP simulator using data-oriented design and zero-copy PyTorch integration achieves up to 33 million steps per second on a 16-core CPU, enabling multi-agent policy training in minutes with PPO, DQN, and SAC.

Fully Decentralized Cooperative Multi-Agent Reinforcement Learning is A Context Modeling Problem

cs.LG · 2025-09-19 · unverdicted · novelty 5.0

DAC models fully decentralized cooperative MARL as a context modeling problem, using latent variables for joint policies to fix non-stationarity in value updates and relative overgeneralization in value estimation.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Arena: a toolkit for Multi-Agent Reinforcement Learning cs.LG · 2019-07-20 · accept · none · ref 17
Arena introduces a modular Interface design that extends OpenAI Gym wrappers to support complex multi-agent RL scenarios including self-play and cooperative-competitive interactions.

CoRRabs/1902.04043(2019)

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer