RAT reformulates regularized natural policy gradients as vanilla gradients with a transformed advantage, computed efficiently via randomized block Kaczmarz iterations on on-policy data.
CoRRabs/1902.04043(2019)
29 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
IBAL framework constructs information-theoretic adversarial attacks on agent observations and actions to train MARL agents that remain robust to interaction disruptions and agent-missing scenarios.
EntCollabBench shows that today's LLM agents still struggle with delegation, context transfer, parameter grounding, workflow closure, and decision commitment when tested in a simulated enterprise with 11 role-specialized agents.
CoFlow achieves state-of-the-art coordination in offline MARL using single-pass joint velocity fields with Coordinated Velocity Attention and Adaptive Coordination Gating.
CLOVER augments value decomposition with a GNN mixer whose weights depend on the realized wireless communication graph, proving permutation invariance, monotonicity, and greater expressiveness than QMIX while showing gains on Predator-Prey and Lumberjacks under p-CSMA channels.
A guided VAE trained on pro StarCraft replays enables four latent-space traversal strategies to produce counterfactual improvement trajectories for amateur players.
ASALT uses observation-level and state-level adapters to align mismatched dimensionalities into a shared embedding for transferring actors and critics in MARL, showing improved sample efficiency and reduced negative transfer in cooperative benchmarks.
CCKS adds consensus constraints built by contrastive learning on local observations to action-advising in DTDE MARL, yielding faster learning and higher performance on football and StarCraft benchmarks.
IBTS framework uses influence shaping to improve zero-shot human-machine teaming beyond partner diversity alone, with gains shown in Overcooked-AI simulations and a 30-subject human study.
SACHI enriches agent representations via graph transformer convolutions over inter-agent graphs to enable holistic information integration, outperforming baselines across five cooperative tasks with statistical significance.
Interactive IRL is cast as bi-level optimization with an inner loop learning expert rewards and an outer loop learning interaction policies, solved by the convergent BISIRL algorithm.
Large-scale experiments on two million agents reveal that collective intelligence does not emerge from scale alone due to sparse and shallow interactions.
VGM²P achieves SOTA-comparable performance in offline MARL via value-guided conditional behavior cloning with MeanFlow, enabling efficient single-step action generation insensitive to regularization coefficients.
Evo-Memory is a new streaming benchmark and evaluation framework for self-evolving memory in LLM agents, unifying over ten memory modules and introducing the ReMem pipeline for continual improvement on multi-turn and reasoning datasets.
CL-MARL uses an adaptive curriculum scheduler called FlexDiff and Counterfactual Group Relative Policy Advantage to break static-difficulty training in MARL and achieve higher win rates on hard StarCraft maps.
VS-Bench is a new benchmark of ten visual multi-agent environments that measures VLMs on element recognition, next-action prediction, and normalized episode return, showing strong perception but large gaps in reasoning and decision-making with the best model at 46.6% prediction accuracy and 31.4% of
Optimistic ε-Greedy Exploration adds decoupled optimistic networks that converge in probability to maximum returns and samples from them with probability ε to increase optimal joint-action frequency in CTDE MARL.
Wolfpack attack framework disrupts MARL cooperation by targeting initial and assisting agents; WALL trains robust policies against it with reported experimental gains.
GACG infers a coordination graph capturing both pair and group dependencies for information exchange in MARL, adds a group distance loss for consistency, and reports superior performance on StarCraft II micromanagement tasks.
Arena introduces a modular Interface design that extends OpenAI Gym wrappers to support complex multi-agent RL scenarios including self-play and cooperative-competitive interactions.
TRIDENT is a MARL framework using Richardson-Romberg gradient correction, Lyapunov-constrained trust-region updates, and a physics-informed residual critic that claims O(1/sqrt(K)) convergence to constrained Nash equilibrium with O(sqrt(K)) violation bounds and large reductions in training violation
Phi-Actor-Critic is a new method that steers multi-agent reinforcement learning toward Pareto-efficient correlated equilibria using regret minimization and Lagrangian selection.
A C++ Dec-POMDP simulator using data-oriented design and zero-copy PyTorch integration achieves up to 33 million steps per second on a 16-core CPU, enabling multi-agent policy training in minutes with PPO, DQN, and SAC.
DAC models fully decentralized cooperative MARL as a context modeling problem, using latent variables for joint policies to fix non-stationarity in value updates and relative overgeneralization in value estimation.
citing papers explorer
-
Arena: a toolkit for Multi-Agent Reinforcement Learning
Arena introduces a modular Interface design that extends OpenAI Gym wrappers to support complex multi-agent RL scenarios including self-play and cooperative-competitive interactions.