BehaviorBench reveals that self-play RL policies for autonomous driving overfit to their training traffic agents and do not generalize to other behaviors, motivating a hybrid rule-based plus learned planner.
Carl: Learning scalable planning policies with simple rewards
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.RO 6years
2026 6roles
background 1polarities
background 1representative citing papers
Fail2Drive is the first paired-route benchmark for closed-loop generalization in CARLA, showing an average 22.8% success-rate drop on shifted scenarios and revealing failure modes such as ignoring visible LiDAR objects.
MAPLE proposes latent multi-agent rollouts with supervised fine-tuning followed by reinforcement learning using safety, progress, interaction, and diversity rewards to enable scalable closed-loop training for end-to-end autonomous driving.
GRIT learns dexterous grasping from sparse taxonomy guidance, achieving 87.9% success and better generalization to novel objects via a two-stage prediction-plus-policy approach.
Closed-loop on-policy training with a reactive goal-oriented scene decoder cuts collision rates by up to 79.5% in dense traffic compared to standard open-loop baselines.
DriveSafer reduces catastrophic failures (PDMS=0) by 48% and drivable-area compliance failures by over 65% versus DiffusionDrive on the NAVSIM benchmark by combining training-time safety constraints with inference-time guidance.
citing papers explorer
-
Beyond Self-Play and Scale: A Behavior Benchmark for Generalization in Autonomous Driving
BehaviorBench reveals that self-play RL policies for autonomous driving overfit to their training traffic agents and do not generalize to other behaviors, motivating a hybrid rule-based plus learned planner.
-
Fail2Drive: Benchmarking Closed-Loop Driving Generalization
Fail2Drive is the first paired-route benchmark for closed-loop generalization in CARLA, showing an average 22.8% success-rate drop on shifted scenarios and revealing failure modes such as ignoring visible LiDAR objects.
-
MAPLE: Latent Multi-Agent Play for End-to-End Autonomous Driving
MAPLE proposes latent multi-agent rollouts with supervised fine-tuning followed by reinforcement learning using safety, progress, interaction, and diversity rewards to enable scalable closed-loop training for end-to-end autonomous driving.
-
Learning Dexterous Grasping from Sparse Taxonomy Guidance
GRIT learns dexterous grasping from sparse taxonomy guidance, achieving 87.9% success and better generalization to novel objects via a two-stage prediction-plus-policy approach.
-
Goal-Oriented Reactive Simulation for Closed-Loop Trajectory Prediction
Closed-loop on-policy training with a reactive goal-oriented scene decoder cuts collision rates by up to 79.5% in dense traffic compared to standard open-loop baselines.
-
DriveSafer: End-to-End Autonomous Driving with Safety Guidance
DriveSafer reduces catastrophic failures (PDMS=0) by 48% and drivable-area compliance failures by over 65% versus DiffusionDrive on the NAVSIM benchmark by combining training-time safety constraints with inference-time guidance.