AirDreamer: Generalist Drone Navigation with World Models

Andong Yang; Chao Gao; Chunkai Yang; Guyue Zhou; Ruidong An; Zian Liu

arxiv: 2606.03252 · v1 · pith:Y6XBNEZYnew · submitted 2026-06-02 · 💻 cs.RO · cs.AI

AirDreamer: Generalist Drone Navigation with World Models

Zian Liu , Andong Yang , Chunkai Yang , Ruidong An , Chao Gao , Guyue Zhou This is my paper

Pith reviewed 2026-06-28 09:59 UTC · model grok-4.3

classification 💻 cs.RO cs.AI

keywords drone navigationworld modelsreinforcement learningsparse rewardssim-to-real transfergeneralizationunseen environments

0 comments

The pith

A world model paired with an RL policy and sparse rewards lets drones navigate unseen cluttered spaces without hand-crafted perception or tuning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a navigation framework that places a reinforcement-learning policy on top of world-model predictions of the environment structure. It uses a sparse reward without shaping terms to promote escape from local minima and useful yaw behaviors. This replaces environment-dependent human-designed pipelines that fail to generalize. The result is demonstrated higher success in challenging maps and direct sim-to-real deployment on physical drones.

Core claim

The framework navigates with a reinforcement-learning-based policy on top of a world-model-based environment understanding. A sparse reward function without hand-crafted shaping terms is designed to avoid local minima traps and encourage yaw control behaviors. In simulation and on real drones, the method exhibits emergent capabilities for navigating complex, unseen environments and escaping local optima where other methods fail, achieving a 5.3% higher navigation success rate than the best baseline along with effective sim-to-real transfer without any tuning during deployment.

What carries the argument

World-model-based environment understanding feeding an RL policy, driven by a sparse reward function that avoids local minima.

If this is right

Drones can succeed in complex unseen environments that trap other methods in local optima.
Navigation succeeds without environment-specific perception pipelines or hand-crafted rules.
A 5.3% higher success rate holds in challenging simulation maps.
Sim-to-real transfer occurs directly without deployment adjustments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same world-model-plus-RL structure could be tested on other mobile platforms that currently need custom perception stacks.
Emergent yaw behaviors imply the world model captures dynamics useful beyond position goals.
Extending the sparse reward to multi-drone coordination would test whether the generalization benefit scales.

Load-bearing premise

The world model gives sufficiently accurate predictions of environmental structure relative to the drone's motion capabilities in completely new scene layouts.

What would settle it

A controlled test in a new cluttered map where the method records a lower success rate than the strongest baseline or requires any real-world tuning to match simulation performance.

Figures

Figures reproduced from arXiv: 2606.03252 by Andong Yang, Chao Gao, Chunkai Yang, Guyue Zhou, Ruidong An, Zian Liu.

**Figure 2.** Figure 2: Reference frames used for AirDreamer. The body frame is attached [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Example training environment. goal while maintaining altitude and avoiding obstacles. The observation consists of local information and location. We adopt a quadrotor as the experimental platform. Its 6- DoF dynamics and real-time control requirements provide a testbed for AirDreamer. We target settings where both traditional and previous learning-based methods fall into local optima or produce unsafe tra… view at source ↗

**Figure 4.** Figure 4: Architecture of AirDreamer. Modules used for real-world deployment are shown on a grey background. Policy training process in the simulator [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: Latent imaginations of AirDreamer in the simulation. The images are decoded from the latent states [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

**Figure 6.** Figure 6: Success rate versus environment steps during training. The label [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

**Figure 8.** Figure 8: (a) Trajectory visualization by stacking pictures from the run in C [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗

read the original abstract

Navigating a drone in unseen and cluttered environments requires reliable generalization to unseen scene layouts and understanding of environmental structure relative to the robot's capabilities. Previous methods, which assume the same environment configuration, often rely heavily on human-designed perception pipelines and predefined rules to guide the robot toward the target. This process is environment-dependent and generalizes poorly across environments. Inspired by animal navigation behavior, we design a navigation framework that navigates with a reinforcement-learning-based policy on top of a world-model-based environment understanding to overcome these issues. In addition, a sparse reward function without hand-crafted shaping terms is designed to avoid local minima traps and encourage yaw control behaviors. In simulation and on real drones, our method exhibits emergent capabilities for navigating complex, unseen environments and escaping local optima where other methods fail. In challenging maps, it achieves a 5.3% higher navigation success rate than best baseline. Furthermore, the proposed framework achieves effective sim-to-real transfer without any tuning during deployment. The code will be publicly available.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper combines world models with RL and a sparse yaw-focused reward for drone navigation in unseen spaces, but the abstract gives no evidence on world model accuracy or ablations, leaving the main claim under-supported.

read the letter

The core contribution is an RL policy that sits on top of a learned world model for drone navigation, paired with a sparse reward that avoids hand-crafted shaping and pushes yaw control. They claim this lets the system handle complex unseen layouts better than baselines, with a 5.3% success-rate edge in hard maps and direct sim-to-real transfer on real hardware.

What stands out is the attempt to move away from environment-specific perception pipelines toward something more general. The sparse reward choice is a clean way to encourage useful behaviors without manual tuning, and the real-drone results add some weight for a robotics paper.

The weak part is the lack of any numbers on the world model itself. No prediction error on held-out scenes, no reconstruction metrics, no ablation that isolates the world model from the RL policy or the reward. The 5.3% gain is modest, and without details on baselines, map definitions, or statistical tests it is hard to judge whether the improvement is reliable or just noise. The central assumption—that the world model supplies accurate enough forecasts of structure relative to the drone’s dynamics in novel layouts—remains untested in the text we have.

This is aimed at people working on learned navigation for aerial robots. It is not a foundational shift, but the real-robot experiments and the sparse-reward angle make it worth a referee’s time if the full paper supplies the missing validation on the world model. I would send it to review with a request for those ablations and error metrics.

Referee Report

3 major / 1 minor

Summary. The paper proposes AirDreamer, a drone navigation framework combining a world-model-based environment understanding module with a reinforcement-learning policy trained under a sparse reward function (no hand-crafted shaping). It claims emergent navigation capabilities in complex unseen environments, escape from local optima, a 5.3% higher success rate than the best baseline on challenging maps, and zero-shot sim-to-real transfer without deployment tuning.

Significance. If the performance claims and the role of the world model are substantiated with quantitative validation, the approach could contribute to more generalist drone navigation that reduces reliance on environment-specific perception pipelines and rules.

major comments (3)

[Abstract] Abstract: the central performance claim of a 5.3% success-rate improvement is stated without identifying the baselines, map definitions, number of trials, or any statistical significance test, preventing assessment of whether the gain is load-bearing or reproducible.
[Abstract] Abstract and §3 (method description): no prediction-error metrics, reconstruction loss on held-out maps, or ablation isolating the world-model contribution versus the sparse-reward RL policy are reported, leaving the key assumption that the world model supplies sufficiently accurate and generalizable forecasts unquantified.
[Abstract] Abstract: the sim-to-real transfer claim is presented without any quantitative comparison of world-model prediction accuracy or policy behavior between simulation and real-robot deployment, so the zero-tuning assertion cannot be evaluated.

minor comments (1)

The abstract states that code will be publicly available, but no repository link or availability statement appears in the provided text.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and indicate planned revisions to improve clarity and substantiation of the claims.

read point-by-point responses

Referee: [Abstract] Abstract: the central performance claim of a 5.3% success-rate improvement is stated without identifying the baselines, map definitions, number of trials, or any statistical significance test, preventing assessment of whether the gain is load-bearing or reproducible.

Authors: We agree the abstract would benefit from additional context. In the revision we will expand the abstract to name the baselines, define the challenging maps, state the number of trials, and note that the reported improvement is statistically significant. Full experimental details already appear in the results section. revision: yes
Referee: [Abstract] Abstract and §3 (method description): no prediction-error metrics, reconstruction loss on held-out maps, or ablation isolating the world-model contribution versus the sparse-reward RL policy are reported, leaving the key assumption that the world model supplies sufficiently accurate and generalizable forecasts unquantified.

Authors: The manuscript emphasizes end-to-end navigation success. We acknowledge that explicit quantification of the world-model contribution would strengthen the paper and will add, in the revision, an ablation isolating the world-model component together with prediction-error and reconstruction metrics on held-out maps. revision: yes
Referee: [Abstract] Abstract: the sim-to-real transfer claim is presented without any quantitative comparison of world-model prediction accuracy or policy behavior between simulation and real-robot deployment, so the zero-tuning assertion cannot be evaluated.

Authors: We will revise the abstract to reference the quantitative sim-to-real results already present in the experiments (success-rate parity between sim and real without tuning). Any additional prediction-accuracy comparisons between domains will be highlighted in the revised text. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical results rest on external evaluation

full rationale

The paper describes an RL policy trained atop a learned world model for drone navigation, with claims supported by simulation success rates and zero-shot real-robot transfer. No equations, derivations, or parameter-fitting steps are presented that reduce the reported 5.3% success-rate improvement or emergent behaviors to quantities defined by the model's own fitted values. The central performance claims are evaluated against external baselines in held-out environments and physical hardware, rendering them falsifiable outside any internal definitions. No self-citation chains or ansatzes are invoked as load-bearing uniqueness theorems. This is the expected non-finding for an applied robotics paper whose value lies in experimental outcomes rather than closed-form derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; the framework implicitly assumes standard RL and world-model training procedures from prior literature.

pith-pipeline@v0.9.1-grok · 5713 in / 1116 out tokens · 19231 ms · 2026-06-28T09:59:30.030849+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

22 extracted references · 3 canonical work pages

[1]

CERBERUS in the DARPA subterranean chal- lenge,

M. Tranzattoet al., “CERBERUS in the DARPA subterranean chal- lenge,”Science Robotics, vol. 7, no. 66, p. eabp9742, 2022

2022
[2]

FASTER: Fast and safe trajectory planner for navigation in unknown environments,

J. Tordesillas, B. T. Lopez, M. Everett, and J. P. How, “FASTER: Fast and safe trajectory planner for navigation in unknown environments,” IEEE Transactions on Robotics, vol. 38, no. 2, pp. 922–938, 2021

2021
[3]

EGO-Planner: An ESDF-free gradient-based local planner for quadrotors,

X. Zhou, Z. Wang, H. Ye, C. Xu, and F. Gao, “EGO-Planner: An ESDF-free gradient-based local planner for quadrotors,”IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 478–485, 2020

2020
[4]

Learning high-speed flight in the wild,

A. Loquercio, E. Kaufmann, R. Ranftl, M. M ¨uller, V . Koltun, and D. Scaramuzza, “Learning high-speed flight in the wild,”Science Robotics, vol. 6, no. 59, p. eabg5810, 2021

2021
[5]

Quadrotor navigation using reinforcement learning with privileged information,

J. Lee, A. Rathod, K. Goel, J. Stecklein, and W. Tabib, “Quadrotor navigation using reinforcement learning with privileged information,” 2025, arXiv:2509.08177

work page arXiv 2025
[6]

MA VRL: Learn to fly in cluttered environments with varying speed,

H. Yu, C. De Wagter, and G. C. H. E. de Croon, “MA VRL: Learn to fly in cluttered environments with varying speed,”IEEE Robotics and Automation Letters, vol. 10, no. 2, pp. 1441–1448, 2025

2025
[7]

Hippocampal place-cell sequences depict future paths to remembered goals,

B. E. Pfeiffer and D. J. Foster, “Hippocampal place-cell sequences depict future paths to remembered goals,”Nature, vol. 497, no. 7447, pp. 74–79, 2013

2013
[8]

An MPC framework for efficient navigation of mobile robots in cluttered environments,

J. K ¨ohler, D. Zhang, R. Soloperto, A. Carron, and M. Zeilinger, “An MPC framework for efficient navigation of mobile robots in cluttered environments,” 2025, arXiv:2509.15917

work page arXiv 2025
[9]

NavRL: Learning safe flight in dynamic environments,

Z. Xu, X. Han, H. Shen, H. Jin, and K. Shimada, “NavRL: Learning safe flight in dynamic environments,”IEEE Robotics and Automation Letters, vol. 10, no. 4, pp. 3668–3675, 2025

2025
[10]

DreamerNav: Learning-based autonomous navigation in dynamic indoor environments using world models,

S. Shanks, J. Embley-Riches, J. Liu, A. M. Delfaki, C. Ciliberto, and D. Kanoulas, “DreamerNav: Learning-based autonomous navigation in dynamic indoor environments using world models,”Frontiers in Robotics and AI, vol. 12, p. 1655171, 2025

2025
[11]

Recurrent world models facilitate policy evolution,

D. Ha and J. Schmidhuber, “Recurrent world models facilitate policy evolution,” inAdvances in Neural Information Processing Systems 31 (NeurIPS), 2018, pp. 2451–2463

2018
[12]

Learning interactive real-world simulators,

S. Yanget al., “Learning interactive real-world simulators,” inPro- ceedings of the International Conference on Learning Representations (ICLR), 2024

2024
[13]

Genie: Generative interactive environments,

J. Bruceet al., “Genie: Generative interactive environments,” in Proceedings of the International Conference on Machine Learning (ICML), 2024, pp. 4603–4623

2024
[14]

Path- dreamer: A world model for indoor navigation,

J. Y . Koh, H. Lee, Y . Yang, J. Baldridge, and P. Anderson, “Path- dreamer: A world model for indoor navigation,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 14 718–14 728

2021
[15]

Navigation world models,

A. Bar, G. Zhou, D. Tran, T. Darrell, and Y . LeCun, “Navigation world models,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025, pp. 15 791–15 801

2025
[16]

Day- Dreamer: World models for physical robot learning,

P. Wu, A. Escontrela, D. Hafner, P. Abbeel, and K. Goldberg, “Day- Dreamer: World models for physical robot learning,” inConference on Robot Learning (CoRL). PMLR, 2023, pp. 2226–2240

2023
[17]

Verraest, S

A. Verraest, S. Bahnam, R. Ferede, G. de Croon, and C. De Wagter, “SkyDreamer: Interpretable end-to-end vision-based drone racing with model-based reinforcement learning,” 2025, arXiv:2510.14783

work page arXiv 2025
[18]

Mastering diverse control tasks through world models,

D. Hafner, J. Pasukonis, J. Ba, and T. Lillicrap, “Mastering diverse control tasks through world models,”Nature, pp. 1–7, 2025

2025
[19]

OmniDrones: An efficient and flexible platform for reinforcement learning in drone control,

B. Xu, F. Gao, C. Yu, R. Zhang, Y . Wu, and Y . Wang, “OmniDrones: An efficient and flexible platform for reinforcement learning in drone control,”IEEE Robotics and Automation Letters, vol. 9, no. 3, pp. 2838–2845, 2024

2024
[20]

XTDrone: A customizable multi-rotor UA Vs simulation platform,

K. Xiao, S. Tan, G. Wang, X. An, X. Wang, and X. Wang, “XTDrone: A customizable multi-rotor UA Vs simulation platform,” inProceedings of the 4th International Conference on Robotics and Automation Sciences (ICRAS), 2020, pp. 55–61

2020
[21]

FAST-LIO2: Fast direct LiDAR-inertial odometry,

W. Xu, Y . Cai, D. He, J. Lin, and F. Zhang, “FAST-LIO2: Fast direct LiDAR-inertial odometry,”IEEE Transactions on Robotics, vol. 38, no. 4, pp. 2053–2073, 2022

2053
[22]

One net to rule them all: Domain randomization in quadcopter racing across different platforms,

R. Ferede, T. Blaha, E. Lucassen, C. De Wagter, and G. C. H. E. de Croon, “One net to rule them all: Domain randomization in quadcopter racing across different platforms,” inProceedings of the IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2025, pp. 6357–6363

2025

[1] [1]

CERBERUS in the DARPA subterranean chal- lenge,

M. Tranzattoet al., “CERBERUS in the DARPA subterranean chal- lenge,”Science Robotics, vol. 7, no. 66, p. eabp9742, 2022

2022

[2] [2]

FASTER: Fast and safe trajectory planner for navigation in unknown environments,

J. Tordesillas, B. T. Lopez, M. Everett, and J. P. How, “FASTER: Fast and safe trajectory planner for navigation in unknown environments,” IEEE Transactions on Robotics, vol. 38, no. 2, pp. 922–938, 2021

2021

[3] [3]

EGO-Planner: An ESDF-free gradient-based local planner for quadrotors,

X. Zhou, Z. Wang, H. Ye, C. Xu, and F. Gao, “EGO-Planner: An ESDF-free gradient-based local planner for quadrotors,”IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 478–485, 2020

2020

[4] [4]

Learning high-speed flight in the wild,

A. Loquercio, E. Kaufmann, R. Ranftl, M. M ¨uller, V . Koltun, and D. Scaramuzza, “Learning high-speed flight in the wild,”Science Robotics, vol. 6, no. 59, p. eabg5810, 2021

2021

[5] [5]

Quadrotor navigation using reinforcement learning with privileged information,

J. Lee, A. Rathod, K. Goel, J. Stecklein, and W. Tabib, “Quadrotor navigation using reinforcement learning with privileged information,” 2025, arXiv:2509.08177

work page arXiv 2025

[6] [6]

MA VRL: Learn to fly in cluttered environments with varying speed,

H. Yu, C. De Wagter, and G. C. H. E. de Croon, “MA VRL: Learn to fly in cluttered environments with varying speed,”IEEE Robotics and Automation Letters, vol. 10, no. 2, pp. 1441–1448, 2025

2025

[7] [7]

Hippocampal place-cell sequences depict future paths to remembered goals,

B. E. Pfeiffer and D. J. Foster, “Hippocampal place-cell sequences depict future paths to remembered goals,”Nature, vol. 497, no. 7447, pp. 74–79, 2013

2013

[8] [8]

An MPC framework for efficient navigation of mobile robots in cluttered environments,

J. K ¨ohler, D. Zhang, R. Soloperto, A. Carron, and M. Zeilinger, “An MPC framework for efficient navigation of mobile robots in cluttered environments,” 2025, arXiv:2509.15917

work page arXiv 2025

[9] [9]

NavRL: Learning safe flight in dynamic environments,

Z. Xu, X. Han, H. Shen, H. Jin, and K. Shimada, “NavRL: Learning safe flight in dynamic environments,”IEEE Robotics and Automation Letters, vol. 10, no. 4, pp. 3668–3675, 2025

2025

[10] [10]

DreamerNav: Learning-based autonomous navigation in dynamic indoor environments using world models,

S. Shanks, J. Embley-Riches, J. Liu, A. M. Delfaki, C. Ciliberto, and D. Kanoulas, “DreamerNav: Learning-based autonomous navigation in dynamic indoor environments using world models,”Frontiers in Robotics and AI, vol. 12, p. 1655171, 2025

2025

[11] [11]

Recurrent world models facilitate policy evolution,

D. Ha and J. Schmidhuber, “Recurrent world models facilitate policy evolution,” inAdvances in Neural Information Processing Systems 31 (NeurIPS), 2018, pp. 2451–2463

2018

[12] [12]

Learning interactive real-world simulators,

S. Yanget al., “Learning interactive real-world simulators,” inPro- ceedings of the International Conference on Learning Representations (ICLR), 2024

2024

[13] [13]

Genie: Generative interactive environments,

J. Bruceet al., “Genie: Generative interactive environments,” in Proceedings of the International Conference on Machine Learning (ICML), 2024, pp. 4603–4623

2024

[14] [14]

Path- dreamer: A world model for indoor navigation,

J. Y . Koh, H. Lee, Y . Yang, J. Baldridge, and P. Anderson, “Path- dreamer: A world model for indoor navigation,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 14 718–14 728

2021

[15] [15]

Navigation world models,

A. Bar, G. Zhou, D. Tran, T. Darrell, and Y . LeCun, “Navigation world models,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025, pp. 15 791–15 801

2025

[16] [16]

Day- Dreamer: World models for physical robot learning,

P. Wu, A. Escontrela, D. Hafner, P. Abbeel, and K. Goldberg, “Day- Dreamer: World models for physical robot learning,” inConference on Robot Learning (CoRL). PMLR, 2023, pp. 2226–2240

2023

[17] [17]

Verraest, S

A. Verraest, S. Bahnam, R. Ferede, G. de Croon, and C. De Wagter, “SkyDreamer: Interpretable end-to-end vision-based drone racing with model-based reinforcement learning,” 2025, arXiv:2510.14783

work page arXiv 2025

[18] [18]

Mastering diverse control tasks through world models,

D. Hafner, J. Pasukonis, J. Ba, and T. Lillicrap, “Mastering diverse control tasks through world models,”Nature, pp. 1–7, 2025

2025

[19] [19]

OmniDrones: An efficient and flexible platform for reinforcement learning in drone control,

B. Xu, F. Gao, C. Yu, R. Zhang, Y . Wu, and Y . Wang, “OmniDrones: An efficient and flexible platform for reinforcement learning in drone control,”IEEE Robotics and Automation Letters, vol. 9, no. 3, pp. 2838–2845, 2024

2024

[20] [20]

XTDrone: A customizable multi-rotor UA Vs simulation platform,

K. Xiao, S. Tan, G. Wang, X. An, X. Wang, and X. Wang, “XTDrone: A customizable multi-rotor UA Vs simulation platform,” inProceedings of the 4th International Conference on Robotics and Automation Sciences (ICRAS), 2020, pp. 55–61

2020

[21] [21]

FAST-LIO2: Fast direct LiDAR-inertial odometry,

W. Xu, Y . Cai, D. He, J. Lin, and F. Zhang, “FAST-LIO2: Fast direct LiDAR-inertial odometry,”IEEE Transactions on Robotics, vol. 38, no. 4, pp. 2053–2073, 2022

2053

[22] [22]

One net to rule them all: Domain randomization in quadcopter racing across different platforms,

R. Ferede, T. Blaha, E. Lucassen, C. De Wagter, and G. C. H. E. de Croon, “One net to rule them all: Domain randomization in quadcopter racing across different platforms,” inProceedings of the IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2025, pp. 6357–6363

2025