VLM-Safe-RL adds frozen VLM signals as anticipatory costs to the CMDP Lagrangian update via dual-path CLIP, VLM-Lagrange, and confidence gating, outperforming baselines on Safety-Gymnasium FormulaOne while showing partial generalization.
Vlm-rl: A unified vision language models and reinforcement learning framework for safe autonomous driv- ing
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 4verdicts
UNVERDICTED 4roles
background 1polarities
background 1representative citing papers
SignReasoner decomposes traffic signs into functional structure units and uses a two-stage VLM post-training pipeline to achieve state-of-the-art compositional reasoning on a new benchmark.
LLM interprets user language to set parameters of a risk-aware MPPI controller, with human-in-the-loop validation for adaptive autonomous driving behavior.
XEmbodied is a foundation model that integrates 3D geometric and physical signals into VLMs using a 3D Adapter and Efficient Image-Embodied Adapter, plus progressive curriculum and RL post-training, to improve spatial reasoning and embodied performance on 18 benchmarks.
citing papers explorer
-
Seeing Before Colliding: Anticipatory Safe RL with Frozen Vision-Language Models
VLM-Safe-RL adds frozen VLM signals as anticipatory costs to the CMDP Lagrangian update via dual-path CLIP, VLM-Lagrange, and confidence gating, outperforming baselines on Safety-Gymnasium FormulaOne while showing partial generalization.
-
SignReasoner: Compositional Reasoning for Complex Traffic Sign Understanding via Functional Structure Units
SignReasoner decomposes traffic signs into functional structure units and uses a two-stage VLM post-training pipeline to achieve state-of-the-art compositional reasoning on a new benchmark.
-
Language-Driven Cost Optimization for Autonomous Driving
LLM interprets user language to set parameters of a risk-aware MPPI controller, with human-in-the-loop validation for adaptive autonomous driving behavior.
-
XEmbodied: A Foundation Model with Enhanced Geometric and Physical Cues for Large-Scale Embodied Environments
XEmbodied is a foundation model that integrates 3D geometric and physical signals into VLMs using a 3D Adapter and Efficient Image-Embodied Adapter, plus progressive curriculum and RL post-training, to improve spatial reasoning and embodied performance on 18 benchmarks.