Vision-and-language navigation: Interpreting visually-grounded navigation instructions in real environments

· 2018

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

representative citing papers

WorldMAP: Bootstrapping Vision-Language Navigation Trajectory Prediction with Generative World Models

cs.AI · 2026-04-09 · unverdicted · novelty 7.0

WorldMAP bootstraps reliable trajectory prediction in vision-language navigation by converting world-model-generated futures into structured supervision, cutting ADE by 18% and FDE by 42.1% on Target-Bench while making small VLMs competitive with large ones.

ToLL: Topological Layout Learning with Asymmetric Cross-View Structural Distillation for 3D Scene Graph Generation Pretraining

cs.CV · 2026-03-30 · unverdicted · novelty 7.0

ToLL pretrains 3D scene graph generators via anchor-conditioned topological layout recovery and asymmetric structural distillation to learn predicate constraints rather than geometric interpolation shortcuts.

Aerial Vision-Language Navigation with a Unified Framework for Spatial, Temporal and Embodied Reasoning

cs.CV · 2025-12-09 · unverdicted · novelty 6.0

A monocular RGB-only aerial VLN framework outperforms baselines via prompt-guided multi-task learning, keyframe selection, and label reweighting on AerialVLN and OpenFly benchmarks.

citing papers explorer

Showing 3 of 3 citing papers.

WorldMAP: Bootstrapping Vision-Language Navigation Trajectory Prediction with Generative World Models cs.AI · 2026-04-09 · unverdicted · none · ref 4
WorldMAP bootstraps reliable trajectory prediction in vision-language navigation by converting world-model-generated futures into structured supervision, cutting ADE by 18% and FDE by 42.1% on Target-Bench while making small VLMs competitive with large ones.
ToLL: Topological Layout Learning with Asymmetric Cross-View Structural Distillation for 3D Scene Graph Generation Pretraining cs.CV · 2026-03-30 · unverdicted · none · ref 1
ToLL pretrains 3D scene graph generators via anchor-conditioned topological layout recovery and asymmetric structural distillation to learn predicate constraints rather than geometric interpolation shortcuts.
Aerial Vision-Language Navigation with a Unified Framework for Spatial, Temporal and Embodied Reasoning cs.CV · 2025-12-09 · unverdicted · none · ref 48
A monocular RGB-only aerial VLN framework outperforms baselines via prompt-guided multi-task learning, keyframe selection, and label reweighting on AerialVLN and OpenFly benchmarks.

Vision-and-language navigation: Interpreting visually-grounded navigation instructions in real environments

fields

years

verdicts

representative citing papers

citing papers explorer