Self-play DAgger training in a batched pixel renderer produces end-to-end driving policies that reach competitive performance on HUGSIM and NAVSIM-v2 after real-world adaptation and improve with more self-play compute.
Learning to drive in new cities without human demonstrations
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
A hierarchical Stackelberg MARL plus continuous-motion architecture with hybrid co-training produces smoother and safer closed-loop traffic behavior than standard self-play methods.
Self-play RL regularized with 30 minutes of human data produces driving policies that coordinate with humans, training in 15 hours on one GPU with 2500x less data than imitation learning.
citing papers explorer
-
Scaling Self-Play for End-to-End Driving
Self-play DAgger training in a batched pixel renderer produces end-to-end driving policies that reach competitive performance on HUGSIM and NAVSIM-v2 after real-world adaptation and improve with more self-play compute.
-
Beyond Self-Play: Hierarchical Reasoning for Continuous Motion in Closed-Loop Traffic Simulation
A hierarchical Stackelberg MARL plus continuous-motion architecture with hybrid co-training produces smoother and safer closed-loop traffic behavior than standard self-play methods.