Learning to drive in new cities without human demonstrations

· 2026 · arXiv 2602.15891

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

Scaling Self-Play for End-to-End Driving

cs.RO · 2026-06-17 · unverdicted · novelty 6.0

Self-play DAgger training in a batched pixel renderer produces end-to-end driving policies that reach competitive performance on HUGSIM and NAVSIM-v2 after real-world adaptation and improve with more self-play compute.

Beyond Self-Play: Hierarchical Reasoning for Continuous Motion in Closed-Loop Traffic Simulation

cs.RO · 2026-05-09 · unverdicted · novelty 6.0

A hierarchical Stackelberg MARL plus continuous-motion architecture with hybrid co-training produces smoother and safer closed-loop traffic behavior than standard self-play methods.

Human-like autonomy emerges from self-play and a pinch of human data

cs.LG · 2026-06-11 · unverdicted · novelty 5.0

Self-play RL regularized with 30 minutes of human data produces driving policies that coordinate with humans, training in 15 hours on one GPU with 2500x less data than imitation learning.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Scaling Self-Play for End-to-End Driving cs.RO · 2026-06-17 · unverdicted · none · ref 79
Self-play DAgger training in a batched pixel renderer produces end-to-end driving policies that reach competitive performance on HUGSIM and NAVSIM-v2 after real-world adaptation and improve with more self-play compute.
Beyond Self-Play: Hierarchical Reasoning for Continuous Motion in Closed-Loop Traffic Simulation cs.RO · 2026-05-09 · unverdicted · none · ref 4
A hierarchical Stackelberg MARL plus continuous-motion architecture with hybrid co-training produces smoother and safer closed-loop traffic behavior than standard self-play methods.

Learning to drive in new cities without human demonstrations

fields

years

verdicts

representative citing papers

citing papers explorer