The starcraft multi-agent challenge

The starcraft multi-agent challenge , author= · 1902 · arXiv 1902.04043

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

representative citing papers

Beyond the All-in-One Agent: Benchmarking Role-Specialized Multi-Agent Collaboration in Enterprise Workflows

cs.MA · 2026-05-09 · unverdicted · novelty 7.0

EntCollabBench shows that today's LLM agents still struggle with delegation, context transfer, parameter grounding, workflow closure, and decision commitment when tested in a simulated enterprise with 11 role-specialized agents.

CoFlow: Coordinated Few-Step Flow for Offline Multi-Agent Decision Making

cs.AI · 2026-05-02 · unverdicted · novelty 7.0 · 3 refs

CoFlow achieves state-of-the-art coordination in offline MARL using single-pass joint velocity fields with Coordinated Velocity Attention and Adaptive Coordination Gating.

Wireless Communication Enhanced Value Decomposition for Multi-Agent Reinforcement Learning

cs.LG · 2026-04-09 · unverdicted · novelty 7.0

CLOVER augments value decomposition with a GNN mixer whose weights depend on the realized wireless communication graph, proving permutation invariance, monotonicity, and greater expressiveness than QMIX while showing gains on Predator-Prey and Lumberjacks under p-CSMA channels.

Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory

cs.CL · 2025-11-25 · unverdicted · novelty 7.0

Evo-Memory is a new benchmark for self-evolving memory in LLM agents across task streams, with baseline ExpRAG and proposed ReMem method that integrates reasoning, actions, and memory updates for continual improvement.

SACHI: Structured Agent Coordination via Holistic Information Integration in Multi-Agent Reinforcement Learning

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

SACHI uses graph transformer convolutions on inter-agent coordination graphs to enrich partial-observation agents with content-dependent teammate information, yielding statistically significant gains over baselines in five cooperative tasks.

Interactive Inverse Reinforcement Learning of Interaction Scenarios via Bi-level Optimization

cs.LG · 2026-05-01 · unverdicted · novelty 6.0

Interactive IRL is cast as bi-level optimization with an inner loop learning expert rewards and an outer loop learning interaction policies, solved by the convergent BISIRL algorithm.

Superminds Test: Actively Evaluating Collective Intelligence of Agent Society via Probing Agents

cs.AI · 2026-04-24 · unverdicted · novelty 6.0

Large-scale experiments on two million agents reveal that collective intelligence does not emerge from scale alone due to sparse and shallow interactions.

Value-Guidance MeanFlow for Offline Multi-Agent Reinforcement Learning

cs.LG · 2026-04-09 · unverdicted · novelty 6.0

VGM²P achieves SOTA-comparable performance in offline MARL via value-guided conditional behavior cloning with MeanFlow, enabling efficient single-step action generation insensitive to regularization coefficients.

A High-Throughput Compute-Efficient POMDP Hide-And-Seek-Engine (HASE) for Multi-Agent Operations

cs.MA · 2026-04-29 · unverdicted · novelty 5.0

A C++ Dec-POMDP simulator using data-oriented design and zero-copy PyTorch integration achieves up to 33 million steps per second on a 16-core CPU, enabling multi-agent policy training in minutes with PPO, DQN, and SAC.

citing papers explorer

Showing 9 of 9 citing papers.

Beyond the All-in-One Agent: Benchmarking Role-Specialized Multi-Agent Collaboration in Enterprise Workflows cs.MA · 2026-05-09 · unverdicted · none · ref 8
EntCollabBench shows that today's LLM agents still struggle with delegation, context transfer, parameter grounding, workflow closure, and decision commitment when tested in a simulated enterprise with 11 role-specialized agents.
CoFlow: Coordinated Few-Step Flow for Offline Multi-Agent Decision Making cs.AI · 2026-05-02 · unverdicted · none · ref 41 · 3 links
CoFlow achieves state-of-the-art coordination in offline MARL using single-pass joint velocity fields with Coordinated Velocity Attention and Adaptive Coordination Gating.
Wireless Communication Enhanced Value Decomposition for Multi-Agent Reinforcement Learning cs.LG · 2026-04-09 · unverdicted · none · ref 50
CLOVER augments value decomposition with a GNN mixer whose weights depend on the realized wireless communication graph, proving permutation invariance, monotonicity, and greater expressiveness than QMIX while showing gains on Predator-Prey and Lumberjacks under p-CSMA channels.
Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory cs.CL · 2025-11-25 · unverdicted · none · ref 106
Evo-Memory is a new benchmark for self-evolving memory in LLM agents across task streams, with baseline ExpRAG and proposed ReMem method that integrates reasoning, actions, and memory updates for continual improvement.
SACHI: Structured Agent Coordination via Holistic Information Integration in Multi-Agent Reinforcement Learning cs.LG · 2026-05-08 · unverdicted · none · ref 52
SACHI uses graph transformer convolutions on inter-agent coordination graphs to enrich partial-observation agents with content-dependent teammate information, yielding statistically significant gains over baselines in five cooperative tasks.
Interactive Inverse Reinforcement Learning of Interaction Scenarios via Bi-level Optimization cs.LG · 2026-05-01 · unverdicted · none · ref 31
Interactive IRL is cast as bi-level optimization with an inner loop learning expert rewards and an outer loop learning interaction policies, solved by the convergent BISIRL algorithm.
Superminds Test: Actively Evaluating Collective Intelligence of Agent Society via Probing Agents cs.AI · 2026-04-24 · unverdicted · none · ref 40
Large-scale experiments on two million agents reveal that collective intelligence does not emerge from scale alone due to sparse and shallow interactions.
Value-Guidance MeanFlow for Offline Multi-Agent Reinforcement Learning cs.LG · 2026-04-09 · unverdicted · none · ref 37
VGM²P achieves SOTA-comparable performance in offline MARL via value-guided conditional behavior cloning with MeanFlow, enabling efficient single-step action generation insensitive to regularization coefficients.
A High-Throughput Compute-Efficient POMDP Hide-And-Seek-Engine (HASE) for Multi-Agent Operations cs.MA · 2026-04-29 · unverdicted · none · ref 25
A C++ Dec-POMDP simulator using data-oriented design and zero-copy PyTorch integration achieves up to 33 million steps per second on a 16-core CPU, enabling multi-agent policy training in minutes with PPO, DQN, and SAC.

The starcraft multi-agent challenge

fields

years

verdicts

representative citing papers

citing papers explorer