Unidrivevla: Unifying understanding, perception, and action planning for autonomous driving

Yongkang Li, Lijun Zhou, Sixu Yan, Bencheng Liao, Tianyi Yan, Kaixin Xiong, Long Chen, Hongwei Xie, Bing Wang, Guang Chen, et al · 2026 · arXiv 2604.02190

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

CLAP: Contrastive Latent-space Prompt Optimization for End-to-end Autonomous Driving

cs.CV · 2026-05-17 · unverdicted · novelty 6.0

CLAP reduces planning error on challenging driving scenarios by 24% on NAVSIM using contrastive latent-space prompt optimization on frozen VLA models with no regression on normal frames.

The DAWN of World-Action Interactive Models

cs.CV · 2026-05-12 · unverdicted · novelty 6.0

DAWN couples a world predictor with a world-conditioned action denoiser in latent space so that each refines the other recursively, yielding strong planning and safety results on autonomous driving benchmarks.

Distill to Think, Foresee to Act: Cognitive-Physical Reinforcement Learning for Autonomous Driving

cs.CV · 2026-05-20 · unverdicted · novelty 5.0 · 2 refs

CoPhy is a new RL framework that distills VLM cognition into BEV encoders, adds an auto-regressive BEV world model for action-conditioned future prediction, and optimizes policies via GRPO with dual physical-cognitive rewards, claiming SOTA on NAVSIM v1/v2.

citing papers explorer

Showing 3 of 3 citing papers.

CLAP: Contrastive Latent-space Prompt Optimization for End-to-end Autonomous Driving cs.CV · 2026-05-17 · unverdicted · none · ref 24
CLAP reduces planning error on challenging driving scenarios by 24% on NAVSIM using contrastive latent-space prompt optimization on frozen VLA models with no regression on normal frames.
The DAWN of World-Action Interactive Models cs.CV · 2026-05-12 · unverdicted · none · ref 30
DAWN couples a world predictor with a world-conditioned action denoiser in latent space so that each refines the other recursively, yielding strong planning and safety results on autonomous driving benchmarks.
Distill to Think, Foresee to Act: Cognitive-Physical Reinforcement Learning for Autonomous Driving cs.CV · 2026-05-20 · unverdicted · none · ref 25 · 2 links
CoPhy is a new RL framework that distills VLM cognition into BEV encoders, adds an auto-regressive BEV world model for action-conditioned future prediction, and optimizes policies via GRPO with dual physical-cognitive rewards, claiming SOTA on NAVSIM v1/v2.

Unidrivevla: Unifying understanding, perception, and action planning for autonomous driving

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer