pith. sign in

Airnav: A large-scale real-world uav vision-and- language navigation dataset with natural and diverse instructions

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it
abstract

Existing UAV vision-and-language navigation (VLN) benchmarks rarely provide realistic aerial scenes, natural process-level instructions, and sufficient scale simultaneously, making it difficult to systematically train and evaluate UAV VLN agents under realistic settings. To address this, we propose \textbf{AirNav}, a large-scale benchmark built on real urban aerial data, comprising 137K navigation samples with natural and diverse instructions generated via a human--LLM collaborative pipeline with 10 user personas. We conduct a systematic evaluation of representative approaches on AirNav, ranging from traditional models to multimodal large language models (MLLMs), under unified metrics with open-source implementations. We further propose \textbf{AirVLN-R1}, trained via supervised fine-tuning (SFT) and reinforcement fine-tuning (RFT), achieving state-of-the-art performance with a 51.82\% success rate on the test-unseen split. Real-world experiments on a physical UAV platform provide preliminary evidence of sim-to-real transferability, and our dataset and code are publicly available.

citation-role summary

dataset 1

citation-polarity summary

fields

cs.CV 1 cs.RO 1

years

2026 2

verdicts

UNVERDICTED 2

roles

dataset 1

polarities

background 1

representative citing papers

citing papers explorer

Showing 2 of 2 citing papers.