X-Mind proposes an efficient internal visual chain-of-thought using compressed BEV sketches and recurrent block diffusion to embed predictive world models into end-to-end driving policies.
4d driving scene generation with stereo forcing.arXiv preprint arXiv:2509.20251, 2025
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
X-Mind: Efficient Visual Chain-of-Thought via Predictive World Model for End-to-End Driving
X-Mind proposes an efficient internal visual chain-of-thought using compressed BEV sketches and recurrent block diffusion to embed predictive world models into end-to-end driving policies.