COMPASS uses VLMs to generate and refine code-based strategies with structured communication, achieving 57% win rate on SMACv2 Protoss 5v5 versus 27% for QMIX.
Guidelines: • The answer NA means that the paper poses no such risks
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
years
2025 3representative citing papers
A curriculum sampling questions with high variance in success rate improves reinforcement learning performance for LLM reasoning tasks.
citing papers explorer
-
Closed-Loop Vision-Language Planning for Multi-Agent Coordination
COMPASS uses VLMs to generate and refine code-based strategies with structured communication, achieving 57% win rate on SMACv2 Protoss 5v5 versus 27% for QMIX.
-
Learning to Reason at the Frontier of Learnability
A curriculum sampling questions with high variance in success rate improves reinforcement learning performance for LLM reasoning tasks.
- Evaluating and Learning Robust Bandit Policies Under Uncertain Causal Mechanisms