CoWorld-VLA extracts semantic, geometric, dynamic, and trajectory expert tokens from multi-source supervision and feeds them into a diffusion-based hierarchical planner, achieving competitive collision avoidance and trajectory accuracy on the NAVSIM v1 benchmark.
Analyzing reasoning consistency in large multimodal models under cross-modal conflicts
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 2
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
MLLMs show stochastic collapse with top-1 probabilities up to 97% and low randomness indices when choosing among equivalent options.
citing papers explorer
-
CoWorld-VLA: Thinking in a Multi-Expert World Model for Autonomous Driving
CoWorld-VLA extracts semantic, geometric, dynamic, and trajectory expert tokens from multi-source supervision and feeds them into a diffusion-based hierarchical planner, achieving competitive collision avoidance and trajectory accuracy on the NAVSIM v1 benchmark.
-
Evaluating Stochastic Collapse and Implicit Bias in Multimodal Large Language Models
MLLMs show stochastic collapse with top-1 probabilities up to 97% and low randomness indices when choosing among equivalent options.