A zero-shot unified agent for VLN-CE, ObjectNav, EQA and Aerial-VLN on wheeled, quadruped, humanoid and UAV platforms that translates language and vision inputs into actions via MLLMs plus TDM and SCB mechanisms, matching trained foundation models on multiple benchmarks.
LongFly: Long-horizon UA V vision-and-language naviga- tion with spatiotemporal context integration
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.RO 4years
2026 4verdicts
UNVERDICTED 4roles
background 1polarities
background 1representative citing papers
DynFly adds a B-spline and flow-matching trajectory layer with UAV-specific dynamic losses to existing UAV-VLN systems, yielding 4.69 NDTW and 4.51 m NE gains on the OpenUAV unseen split.
EG-GRPO augments VLA aerial navigation with expert-guided group relative policy optimization and a faster simulation pipeline, claiming 2.13x success rate and 60.9% better intent alignment versus SFT baseline.
This survey organizes aerial vision-language navigation methods into five architectural categories, critically reviews evaluation infrastructure, and synthesizes seven open problems for LLM/VLM integration.
citing papers explorer
-
Uni-LaViRA: Language-Vision-Robot Actions Translation for Unified Embodied Navigation
A zero-shot unified agent for VLN-CE, ObjectNav, EQA and Aerial-VLN on wheeled, quadruped, humanoid and UAV platforms that translates language and vision inputs into actions via MLLMs plus TDM and SCB mechanisms, matching trained foundation models on multiple benchmarks.
-
DynFly: Dynamic-Aware Continuous Trajectory Generation for UAV Vision-Language Navigation in Urban Environments
DynFly adds a B-spline and flow-matching trajectory layer with UAV-specific dynamic losses to existing UAV-VLN systems, yielding 4.69 NDTW and 4.51 m NE gains on the OpenUAV unseen split.
-
Towards Precise Intent-Aligned VLA Aerial Navigation via Expert-Guided GRPO
EG-GRPO augments VLA aerial navigation with expert-guided group relative policy optimization and a faster simulation pipeline, claiming 2.13x success rate and 60.9% better intent alignment versus SFT baseline.
-
Vision-Language Navigation for Aerial Robots: Towards the Era of Large Language Models
This survey organizes aerial vision-language navigation methods into five architectural categories, critically reviews evaluation infrastructure, and synthesizes seven open problems for LLM/VLM integration.