pith. machine review for the scientific record. sign in

arxiv: 2511.17792 · v2 · submitted 2025-11-21 · 💻 cs.CV · cs.RO

Recognition: unknown

Target-Bench: Can Video World Models Achieve Mapless Path Planning with Semantic Targets?

Authors on Pith no claims yet
classification 💻 cs.CV cs.RO
keywords planningsemanticmodelsvideoworldevaluationreasoningtarget-bench
0
0 comments X
read the original abstract

While recent video world models can generate highly realistic videos, their ability to perform semantic reasoning and planning remains unclear and unquantified. We introduce Target-Bench, the first benchmark that enables comprehensive evaluation of video world models' semantic reasoning, spatial estimation, and planning capabilities. Target-Bench provides 450 robot-collected scenarios spanning 47 semantic categories, with SLAM-based trajectories serving as motion tendency references. Our benchmark reconstructs motion from generated videos with a metric scale recovery mechanism, enabling the evaluation of planning performance with five complementary metrics that focus on target-approaching capability and directional consistency. Our evaluation result shows that the best off-the-shelf model achieves only a 0.341 overall score, revealing a significant gap between realistic visual generation and semantic reasoning in current video world models. Furthermore, we demonstrate that fine-tuning process on a relatively small real-world robot dataset can significantly improve task-level planning performance.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. WorldMAP: Bootstrapping Vision-Language Navigation Trajectory Prediction with Generative World Models

    cs.AI 2026-04 unverdicted novelty 7.0

    WorldMAP bootstraps reliable trajectory prediction in vision-language navigation by converting world-model-generated futures into structured supervision, cutting ADE by 18% and FDE by 42.1% on Target-Bench while makin...