Carl: Learning scalable planning policies with simple rewards

Bernhard Jaeger, Daniel Dauner, Jens Beißwenger, Simon Gerstenecker, Kashyap Chitta, Andreas Geiger · 2025 · arXiv 2504.17838

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Beyond Self-Play and Scale: A Behavior Benchmark for Generalization in Autonomous Driving

cs.RO · 2026-05-11 · unverdicted · novelty 7.0

BehaviorBench reveals that self-play RL policies for autonomous driving overfit to their training traffic agents and do not generalize to other behaviors, motivating a hybrid rule-based plus learned planner.

Fail2Drive: Benchmarking Closed-Loop Driving Generalization

cs.RO · 2026-04-09 · conditional · novelty 7.0

Fail2Drive is the first paired-route benchmark for closed-loop generalization in CARLA, showing an average 22.8% success-rate drop on shifted scenarios and revealing failure modes such as ignoring visible LiDAR objects.

MAPLE: Latent Multi-Agent Play for End-to-End Autonomous Driving

cs.RO · 2026-05-13 · unverdicted · novelty 6.0 · 2 refs

MAPLE proposes latent multi-agent rollouts with supervised fine-tuning followed by reinforcement learning using safety, progress, interaction, and diversity rewards to enable scalable closed-loop training for end-to-end autonomous driving.

Learning Dexterous Grasping from Sparse Taxonomy Guidance

cs.RO · 2026-04-05 · unverdicted · novelty 6.0

GRIT learns dexterous grasping from sparse taxonomy guidance, achieving 87.9% success and better generalization to novel objects via a two-stage prediction-plus-policy approach.

Goal-Oriented Reactive Simulation for Closed-Loop Trajectory Prediction

cs.RO · 2026-03-25 · conditional · novelty 6.0

Closed-loop on-policy training with a reactive goal-oriented scene decoder cuts collision rates by up to 79.5% in dense traffic compared to standard open-loop baselines.

DriveSafer: End-to-End Autonomous Driving with Safety Guidance

cs.RO · 2026-05-16 · unverdicted · novelty 5.0

DriveSafer reduces catastrophic failures (PDMS=0) by 48% and drivable-area compliance failures by over 65% versus DiffusionDrive on the NAVSIM benchmark by combining training-time safety constraints with inference-time guidance.

citing papers explorer

Showing 6 of 6 citing papers.

Beyond Self-Play and Scale: A Behavior Benchmark for Generalization in Autonomous Driving cs.RO · 2026-05-11 · unverdicted · none · ref 4
BehaviorBench reveals that self-play RL policies for autonomous driving overfit to their training traffic agents and do not generalize to other behaviors, motivating a hybrid rule-based plus learned planner.
Fail2Drive: Benchmarking Closed-Loop Driving Generalization cs.RO · 2026-04-09 · conditional · none · ref 22
Fail2Drive is the first paired-route benchmark for closed-loop generalization in CARLA, showing an average 22.8% success-rate drop on shifted scenarios and revealing failure modes such as ignoring visible LiDAR objects.
MAPLE: Latent Multi-Agent Play for End-to-End Autonomous Driving cs.RO · 2026-05-13 · unverdicted · none · ref 16 · 2 links
MAPLE proposes latent multi-agent rollouts with supervised fine-tuning followed by reinforcement learning using safety, progress, interaction, and diversity rewards to enable scalable closed-loop training for end-to-end autonomous driving.
Learning Dexterous Grasping from Sparse Taxonomy Guidance cs.RO · 2026-04-05 · unverdicted · none · ref 22
GRIT learns dexterous grasping from sparse taxonomy guidance, achieving 87.9% success and better generalization to novel objects via a two-stage prediction-plus-policy approach.
Goal-Oriented Reactive Simulation for Closed-Loop Trajectory Prediction cs.RO · 2026-03-25 · conditional · none · ref 23
Closed-loop on-policy training with a reactive goal-oriented scene decoder cuts collision rates by up to 79.5% in dense traffic compared to standard open-loop baselines.
DriveSafer: End-to-End Autonomous Driving with Safety Guidance cs.RO · 2026-05-16 · unverdicted · none · ref 12
DriveSafer reduces catastrophic failures (PDMS=0) by 48% and drivable-area compliance failures by over 65% versus DiffusionDrive on the NAVSIM benchmark by combining training-time safety constraints with inference-time guidance.

Carl: Learning scalable planning policies with simple rewards

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer