SMARTS: Scalable Multi-Agent Reinforcement Learning Training School for Autonomous Driving

Alexander Cowen Rivers; Aurora Chongxi Huang; Baokuan Zhang; Daniel Graves; Daniel Palenicek; David Rusu; Dong Chen; Haitham Bou Ammar; Hongbo Zhang; Iman Fadakar

arxiv: 2010.09776 · v2 · pith:EUJIQQOSnew · submitted 2020-10-19 · 💻 cs.MA · cs.AI· cs.GT· cs.LG· cs.SY· eess.SY

SMARTS: Scalable Multi-Agent Reinforcement Learning Training School for Autonomous Driving

Ming Zhou , Jun Luo , Julian Villella , Yaodong Yang , David Rusu , Jiayu Miao , Weinan Zhang , Montgomery Alban

show 29 more authors

Iman Fadakar Zheng Chen Aurora Chongxi Huang Ying Wen Kimia Hassanzadeh Daniel Graves Dong Chen Zhengbang Zhu Nhat Nguyen Mohamed Elsayed Kun Shao Sanjeevan Ahilan Baokuan Zhang Jiannan Wu Zhengang Fu Kasra Rezaee Peyman Yadmellat Mohsen Rohani Nicolas Perez Nieves Yihan Ni Seyedershad Banijamali Alexander Cowen Rivers Zheng Tian Daniel Palenicek Haitham bou Ammar Hongbo Zhang Wulong Liu Jianye Hao Jun Wang

This is my paper

classification 💻 cs.MA cs.AIcs.GTcs.LGcs.SYeess.SY

keywords multi-agentsmartsdiversedrivingautonomouslearningresearchtraining

0 comments

read the original abstract

Multi-agent interaction is a fundamental aspect of autonomous driving in the real world. Despite more than a decade of research and development, the problem of how to competently interact with diverse road users in diverse scenarios remains largely unsolved. Learning methods have much to offer towards solving this problem. But they require a realistic multi-agent simulator that generates diverse and competent driving interactions. To meet this need, we develop a dedicated simulation platform called SMARTS (Scalable Multi-Agent RL Training School). SMARTS supports the training, accumulation, and use of diverse behavior models of road users. These are in turn used to create increasingly more realistic and diverse interactions that enable deeper and broader research on multi-agent interaction. In this paper, we describe the design goals of SMARTS, explain its basic architecture and its key features, and illustrate its use through concrete multi-agent experiments on interactive scenarios. We open-source the SMARTS platform and the associated benchmark tasks and evaluation metrics to encourage and empower research on multi-agent learning for autonomous driving. Our code is available at https://github.com/huawei-noah/SMARTS.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 7 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Taming the Curses of Multiagency in Robust Markov Games with Large State Space through Linear Function Approximation
cs.LG 2026-05 unverdicted novelty 8.0

The work gives the first algorithms for general robust Markov games with linear function approximation whose sample complexity breaks the curse of multiagency for large state spaces in both generative and online settings.
ScenarioControl: Vision-Language Controllable Vectorized Latent Scenario Generation
cs.CV 2026-04 unverdicted novelty 7.0

ScenarioControl introduces the first vision-language controllable generator for realistic vectorized 3D driving scenarios with temporal consistency across actor views.
Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory
cs.CL 2025-11 unverdicted novelty 7.0

Evo-Memory is a new benchmark for self-evolving memory in LLM agents across task streams, with baseline ExpRAG and proposed ReMem method that integrates reasoning, actions, and memory updates for continual improvement.
Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory
cs.CL 2025-11 unverdicted novelty 6.0

Evo-Memory is a new streaming benchmark and evaluation framework for self-evolving memory in LLM agents, unifying over ten memory modules and introducing the ReMem pipeline for continual improvement on multi-turn and ...
FAST: A Framework for Aligned Sampling and Training in Parallel Reinforcement Learning for Autonomous Driving
cs.LG 2026-06 unverdicted novelty 5.0

FAST uses Dynamic Parallel Sampling Alignment via virtual continuation and Scaled Mask-Padding Optimization to remove straggler bottlenecks in parallel RL, delivering 1.78x wall-clock speedup while preserving unbiasedness.
RAD-2: Scaling Reinforcement Learning in a Generator-Discriminator Framework
cs.CV 2026-04 unverdicted novelty 5.0

RAD-2 uses a diffusion generator and RL discriminator to cut collision rates by 56% in closed-loop autonomous driving planning.
CHARMS: A Cognitive Hierarchical Agent for Reasoning and Motion Stylization in Autonomous Driving
cs.RO 2025-04 unverdicted novelty 5.0

CHARMS applies Level-k game theory and Poisson cognitive hierarchy theory to autonomous driving agents via a two-stage RL-then-SFT pipeline for human-like decisions and realistic scenario generation.