Aero-World adapts a pretrained latent diffusion transformer for action-conditioned aerial video generation by injecting inertial action tokens and using a frozen latent-space Physics Probe for inertial consistency supervision during LoRA finetuning, with a new AeroBench benchmark showing improved AA
Racevla: Vla-based racing drone navigation with human-like behaviour
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
The survey frames VLA models as pipelines that generate progressively grounded action tokens and classifies those tokens into eight types to guide future development.
An open-sourced Unified Autonomy Stack fuses LiDAR, radar, vision and inertial data with sampling-based planning and control barrier functions to deliver resilient autonomy on aerial and ground robots in challenging real-world settings.
A survey of UAV vision-and-language navigation that establishes a methodological taxonomy, reviews resources and challenges, and proposes a forward-looking research roadmap.
GRaD-Nav++ combines 3D Gaussian Splatting simulation and differentiable RL to train an onboard VLA policy that achieves 50-83% success on language-guided drone navigation tasks in simulation and real hardware.
citing papers explorer
-
A Survey on Vision-Language-Action Models: An Action Tokenization Perspective
The survey frames VLA models as pipelines that generate progressively grounded action tokens and classifies those tokens into eight types to guide future development.
-
GRaD-Nav++: Vision-Language Model Enabled Visual Drone Navigation with Gaussian Radiance Fields and Differentiable Dynamics
GRaD-Nav++ combines 3D Gaussian Splatting simulation and differentiable RL to train an onboard VLA policy that achieves 50-83% success on language-guided drone navigation tasks in simulation and real hardware.