Vln-mme: Diagnosing mllms as language-guided visual navigation agents.arXiv preprint arXiv:2512.24851, 2025

Xunyi Zhao, Gengze Zhou, Qi Wu · 2025 · arXiv 2512.24851

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

AirGroundBench: Probing Spatial Intelligence in Multimodal Large Models under Heterogeneous Multi-View Embodied Collaboration

cs.CV · 2026-06-26 · unverdicted · novelty 7.0

AirGroundBench is a new diagnostic benchmark exposing that MLLMs handle basic spatial perception but struggle with cross-view alignment, transformation reasoning, and embodied navigation under heterogeneous air-ground views.

Qwen-RobotNav Technical Report: A Scalable Navigation Model Designed for an Agentic Navigation System

cs.RO · 2026-06-16 · unverdicted · novelty 5.0

Qwen-RobotNav provides a parameterized navigation model trained on 15.6M samples with vision-language co-training that achieves SOTA results on benchmarks and zero-shot transfer to real robots.

Ask When It Pays: Cost-Aware Open-Ended Interaction for Instance Goal Navigation

cs.CV · 2026-06-02 · unverdicted · novelty 4.0

Proposes cost-aware question selection for ambiguous object navigation via information-gain analysis on corpora, a cost-penalizing benchmark, and a zero-shot MLLM agent.

citing papers explorer

Showing 2 of 2 citing papers after filters.

AirGroundBench: Probing Spatial Intelligence in Multimodal Large Models under Heterogeneous Multi-View Embodied Collaboration cs.CV · 2026-06-26 · unverdicted · none · ref 39
AirGroundBench is a new diagnostic benchmark exposing that MLLMs handle basic spatial perception but struggle with cross-view alignment, transformation reasoning, and embodied navigation under heterogeneous air-ground views.
Ask When It Pays: Cost-Aware Open-Ended Interaction for Instance Goal Navigation cs.CV · 2026-06-02 · unverdicted · none · ref 67
Proposes cost-aware question selection for ambiguous object navigation via information-gain analysis on corpora, a cost-penalizing benchmark, and a zero-shot MLLM agent.

Vln-mme: Diagnosing mllms as language-guided visual navigation agents.arXiv preprint arXiv:2512.24851, 2025

fields

years

verdicts

representative citing papers

citing papers explorer