pith. machine review for the scientific record. sign in

arxiv: 2604.05828 · v2 · submitted 2026-04-07 · 💻 cs.RO

Recognition: no theorem link

Precise Aggressive Aerial Maneuvers with Sensorimotor Policies

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:57 UTC · model grok-4.3

classification 💻 cs.RO
keywords quadrotorreinforcement learningsensorimotor policyaggressive maneuvernarrow gap traversalsim-to-realaerial robotics
0
0 comments X

The pith

Sensorimotor policies trained via reinforcement learning enable quadrotors to traverse narrow gaps tilted up to 90 degrees with 5 cm clearance using only onboard sensors.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that direct mapping from vision and proprioception to control via RL policies can solve aggressive gap traversal under SE(3) constraints. Policies are trained in simulation with initialization from model-based trajectories to aid exploration, then transferred to hardware. This allows navigation without knowledge of the gap position or orientation, and even reactive response to moving gaps. A sympathetic reader would care because it demonstrates a path to autonomous drone flight in cluttered environments where precise planning is difficult due to uncertainty.

Core claim

The authors claim that sensorimotor policies, trained end-to-end with reinforcement learning and policy distillation in simulation after initialization with model-based planner trajectories, allow a quadrotor to perform precise aggressive maneuvers through narrow rectangular gaps. These include passages with only 5 cm clearance at up to 90-degree tilt, without any prior information on the gap's location or orientation, and with the ability to handle dynamic gaps reactively. The method extends to sequences of gaps and varied geometries.

What carries the argument

End-to-end sensorimotor policies that map onboard vision and proprioception directly to low-level control commands, trained with RL and initialized using trajectories from a model-based planner.

If this is right

  • The policy achieves high repeatability in real-world gap traversals with low clearance and high tilt.
  • It enables reactive servo control for moving gaps without training on dynamic scenarios.
  • Policies can be developed for challenging tracks consisting of multiple narrow gaps placed closely together.
  • The approach works for geometrically diverse gaps without requiring manually defined traversal poses or visual features.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the sim-to-real transfer holds, this method could be applied to other aggressive aerial tasks such as rapid obstacle avoidance in unknown environments.
  • The initialization strategy using model-based plans may help overcome exploration challenges in other robotic RL applications with constrained action spaces.

Load-bearing premise

The simulation accurately enough represents real aerodynamics, sensor noise, and vehicle dynamics that the learned policy transfers directly to the physical quadrotor without major domain randomization or fine-tuning.

What would settle it

Testing the policy on a real quadrotor attempting to fly through a rectangular gap with 5 cm clearance at 90-degree tilt; success with high repeatability would support the claim, while frequent collisions or failures would indicate simulation-reality gap.

read the original abstract

Precise aggressive maneuvers with lightweight onboard sensors remains a key bottleneck in fully exploiting the maneuverability of drones. Such maneuvers are critical for expanding the systems' accessible area by navigating through narrow openings in the environment. Among the most relevant problems, a representative one is aggressive traversal through narrow gaps with quadrotors under SE(3) constraints, which require the quadrotors to leverage a momentary tilted attitude and the asymmetry of the airframe to navigate through gaps. In this paper, we achieve such maneuvers by developing sensorimotor policies directly mapping onboard vision and proprioception into low-level control commands. The policies are trained using reinforcement learning (RL) with end-to-end policy distillation in simulation. We mitigate the fundamental hardness of model-free RL's exploration on the restricted solution space with an initialization strategy leveraging trajectories generated by a model-based planner. Careful sim-to-real design allows the policy to control a quadrotor through narrow gaps with low clearances and high repeatability. For instance, the proposed method enables a quadrotor to navigate a rectangular gap at a 5 cm clearance, tilted at up to 90-degree orientation, without knowledge of the gap's position or orientation. Without training on dynamic gaps, the policy can reactively servo the quadrotor to traverse through a moving gap. The proposed method is also validated by training and deploying policies on challenging tracks of narrow gaps placed closely. The flexibility of the policy learning method is demonstrated by developing policies for geometrically diverse gaps, without relying on manually defined traversal poses and visual features.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims to develop sensorimotor policies for quadrotors that directly map onboard vision and proprioception to low-level control commands, enabling precise aggressive traversals through narrow rectangular gaps at 5 cm clearance and up to 90° tilt without explicit knowledge of gap position or orientation. Policies are trained via RL with end-to-end distillation from model-based planner trajectories in simulation, transferred to hardware through careful sim-to-real design, and demonstrated to reactively handle moving gaps as well as sequences of diverse gaps.

Significance. If the empirical claims hold under rigorous validation, the work would represent a meaningful advance in autonomous aerial robotics by showing that learned policies can achieve repeatable, high-precision SE(3) maneuvers on physical hardware without pose estimation or hand-crafted features. The model-based initialization strategy to address RL exploration hardness and the generalization to unseen dynamic gaps are notable strengths that could inform future sim-to-real efforts for underactuated systems.

major comments (2)
  1. [Abstract and §4] Abstract and §4 (Experimental Results): The abstract and results claim 'high repeatability' for 5 cm clearance and 90° tilt traversals on hardware, yet report no quantitative success rates, trial counts, failure modes, error bars, or ablation studies on sim-to-real factors. This is load-bearing for the central claim of direct policy transfer and real-world viability.
  2. [§3] §3 (Method, sim-to-real design): The 'careful sim-to-real design' invoked to justify zero-shot hardware deployment does not quantify or mitigate mismatches in aerodynamics (blade flapping, asymmetric downwash, gap-induced ground effect) during 90°-tilt aggressive flight, which directly risks the reported repeatability even with perfect vision/proprioception.
minor comments (2)
  1. [Abstract] Abstract: The phrasing 'Among the most relevant problems, a representative one is aggressive traversal...' is somewhat awkward and could be tightened for readability.
  2. [Notation and §2] Throughout: Some notation for SE(3) constraints and policy inputs could be clarified with an explicit diagram or table of sensor modalities.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing the potential significance of our sensorimotor policy approach for aggressive aerial maneuvers. We address each major comment below with clarifications and planned revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Abstract and §4] Abstract and §4 (Experimental Results): The abstract and results claim 'high repeatability' for 5 cm clearance and 90° tilt traversals on hardware, yet report no quantitative success rates, trial counts, failure modes, error bars, or ablation studies on sim-to-real factors. This is load-bearing for the central claim of direct policy transfer and real-world viability.

    Authors: We agree that explicit quantitative metrics would better support the repeatability claims. The full manuscript reports multiple hardware trials across different gap configurations and orientations, with consistent successful traversals shown in the accompanying videos and described in §4. To address this directly, we will revise the abstract and §4 to include specific success rates (e.g., successful traversals out of total attempts), trial counts, observed failure modes, and basic statistics such as position error distributions. We will also incorporate a concise ablation on key sim-to-real factors like domain randomization ranges. These additions will provide the rigorous validation requested without altering the core empirical findings. revision: yes

  2. Referee: [§3] §3 (Method, sim-to-real design): The 'careful sim-to-real design' invoked to justify zero-shot hardware deployment does not quantify or mitigate mismatches in aerodynamics (blade flapping, asymmetric downwash, gap-induced ground effect) during 90°-tilt aggressive flight, which directly risks the reported repeatability even with perfect vision/proprioception.

    Authors: This is a valid observation regarding the level of detail in our sim-to-real transfer discussion. Section 3 describes our use of domain randomization over dynamics parameters (including thrust curves and delays) and sensor noise to enable zero-shot deployment, which proved sufficient for the reported hardware results. However, we did not provide explicit quantification or targeted mitigation for effects such as blade flapping or gap-induced ground effects at extreme tilts. In the revision, we will expand §3 to include a discussion of these aerodynamic considerations, referencing our simulation parameters and post-hoc hardware observations that informed the randomization strategy. This will better justify why the policy maintained repeatability despite potential mismatches. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical RL training and sim-to-real deployment

full rationale

The paper's claims rest on training sensorimotor policies via RL in simulation (with model-based initialization and end-to-end distillation) followed by hardware deployment under 'careful sim-to-real design.' These are standard empirical steps whose success is measured by repeatable real-world traversal results, not by any equation or parameter that reduces to its own inputs by construction. No self-citations, uniqueness theorems, or fitted quantities are presented as independent predictions. The derivation chain is self-contained through conventional RL practices and experimental validation.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on the domain assumption that simulation-to-real transfer succeeds for aggressive maneuvers and that model-based planner trajectories provide a useful initialization for RL exploration in a restricted solution space.

axioms (2)
  • domain assumption Simulation dynamics and sensor models are sufficiently accurate for zero-shot policy transfer to real hardware
    The method relies on careful sim-to-real design but does not detail domain randomization or adaptation steps in the abstract.
  • domain assumption Model-based planner trajectories lie close enough to feasible RL solutions to bootstrap exploration
    Initialization strategy is invoked to mitigate exploration hardness without quantifying how close the trajectories are.

pith-pipeline@v0.9.0 · 5598 in / 1310 out tokens · 51486 ms · 2026-05-10T18:57:53.191766+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

75 extracted references · 10 canonical work pages · 1 internal anchor

  1. [1]

    Trajectory generation and control for precise aggressive maneuvers with quadrotors.The International Journal of Robotics Research, 31(5):664–674, 2012

    Daniel Mellinger, Nathan Michael, and Vijay Kumar. Trajectory generation and control for precise aggressive maneuvers with quadrotors.The International Journal of Robotics Research, 31(5):664–674, 2012

  2. [2]

    Polynomial trajectory planning for aggressive quadrotor flight in dense indoor environments

    Charles Richter, Adam Bry, and Nicholas Roy. Polynomial trajectory planning for aggressive quadrotor flight in dense indoor environments. InRobotics Research: The 16th International Symposium ISRR, pages 649–666. Springer, 2013

  3. [3]

    Multi-fidelity black-box optimization for time-optimal quadrotor maneuvers.The International Journal of Robotics Research, 40(12- 14):1352–1369, 2021

    Gilhyun Ryou, Ezra Tal, and Sertac Karaman. Multi-fidelity black-box optimization for time-optimal quadrotor maneuvers.The International Journal of Robotics Research, 40(12- 14):1352–1369, 2021

  4. [4]

    Time-optimal planning for quadrotor waypoint flight.Science robotics, 6(56):eabh1221, 2021

    Philipp Foehn, Angel Romero, and Davide Scaramuzza. Time-optimal planning for quadrotor waypoint flight.Science robotics, 6(56):eabh1221, 2021

  5. [5]

    Champion-level drone racing using deep reinforcement learning

    Elia Kaufmann, Leonard Bauersfeld, Antonio Loquercio, Matthias M¨ uller, Vladlen Koltun, and Davide Scaramuzza. Champion-level drone racing using deep reinforcement learning. Nature, 620(7976):982–987, 2023

  6. [6]

    Safety-assured high-speed navigation for mavs.Science Robotics, 10(98):eado6187, 2025

    Yunfan Ren, Fangcheng Zhu, Guozheng Lu, Yixi Cai, Longji Yin, Fanze Kong, Jiarong Lin, Nan Chen, and Fu Zhang. Safety-assured high-speed navigation for mavs.Science Robotics, 10(98):eado6187, 2025. 37

  7. [7]

    Minding the gap: in-flight body awareness in birds.Frontiers in Zoology, 11(1):64, 2014

    Ingo Schiffner, Hong Duc Vo, Prasanna S Bhagavatula, and Mandyam V Srinivasan. Minding the gap: in-flight body awareness in birds.Frontiers in Zoology, 11(1):64, 2014

  8. [8]

    Flying through gaps: how does a bird deal with the problem and what costs are there?Royal Society Open Science, 9(3):211072, 2022

    Per Henningsson. Flying through gaps: how does a bird deal with the problem and what costs are there?Royal Society Open Science, 9(3):211072, 2022

  9. [9]

    Side- ways maneuvers enable narrow aperture negotiation by free-flying hummingbirds.Journal of Experimental Biology, 226(21):jeb245643, 2023

    Marc A Badger, Kathryn McClain, Ashley Smiley, Jessica Ye, and Robert Dudley. Side- ways maneuvers enable narrow aperture negotiation by free-flying hummingbirds.Journal of Experimental Biology, 226(21):jeb245643, 2023

  10. [10]

    Nest predation and nest sites: new perspectives on old patterns.Bioscience, 43(8):523–532, 1993

    Thomas E Martin. Nest predation and nest sites: new perspectives on old patterns.Bioscience, 43(8):523–532, 1993

  11. [11]

    Regional forest fragmentation and the nesting success of migratory birds.Science, 267(5206):1987–1990, 1995

    Scott K Robinson, Frank R Thompson III, Therese M Donovan, Donald R Whitehead, and John Faaborg. Regional forest fragmentation and the nesting success of migratory birds.Science, 267(5206):1987–1990, 1995

  12. [12]

    Geometrically constrained trajectory opti- mization for multicopters.IEEE Transactions on Robotics, 38(5):3259–3278, 2022

    Zhepei Wang, Xin Zhou, Chao Xu, and Fei Gao. Geometrically constrained trajectory opti- mization for multicopters.IEEE Transactions on Robotics, 38(5):3259–3278, 2022

  13. [13]

    Estimation, control, and planning for aggressive flight with a small quadrotor with a single camera and imu.IEEE Robotics and Automation Letters, 2(2):404–411, 2016

    Giuseppe Loianno, Chris Brunner, Gary McGrath, and Vijay Kumar. Estimation, control, and planning for aggressive flight with a small quadrotor with a single camera and imu.IEEE Robotics and Automation Letters, 2(2):404–411, 2016

  14. [14]

    Aggressive quadrotor flight through narrow gaps with onboard sensing and computing using active vision

    Davide Falanga, Elias Mueggler, Matthias Faessler, and Davide Scaramuzza. Aggressive quadrotor flight through narrow gaps with onboard sensing and computing using active vision. In2017 IEEE international conference on robotics and automation (ICRA), pages 5774–5781. IEEE, 2017

  15. [15]

    Flying through a narrow gap using neural network: an end-to-end planning and control approach

    Jiarong Lin, Luqi Wang, Fei Gao, Shaojie Shen, and Fu Zhang. Flying through a narrow gap using neural network: an end-to-end planning and control approach. In2019 IEEE/RSJ international conference on intelligent robots and systems (IROS), pages 3526–3533. IEEE, 2019. 38

  16. [16]

    Chenxi Xiao, Peng Lu, and Qizhi He. Flying through a narrow gap using end-to-end deep reinforcement learning augmented with curriculum learning and sim2real.IEEE transactions on neural networks and learning systems, 34(5):2701–2708, 2021

  17. [17]

    Robust visual inertial odometry using a direct ekf-based approach

    Michael Bloesch, Sammy Omari, Marco Hutter, and Roland Siegwart. Robust visual inertial odometry using a direct ekf-based approach. In2015 IEEE/RSJ international conference on intelligent robots and systems (IROS), pages 298–304. IEEE, 2015

  18. [18]

    Keyframe-based visual–inertial odometry using nonlinear optimization.The International Journal of Robotics Research, 34(3):314–334, 2015

    Stefan Leutenegger, Simon Lynen, Michael Bosse, Roland Siegwart, and Paul Furgale. Keyframe-based visual–inertial odometry using nonlinear optimization.The International Journal of Robotics Research, 34(3):314–334, 2015

  19. [19]

    Estimating uncertain spatial relationships in robotics

    Randall Smith, Matthew Self, and Peter Cheeseman. Estimating uncertain spatial relationships in robotics. InAutonomous robot vehicles, pages 167–193. Springer, 1990

  20. [20]

    Reaching the limit in autonomous racing: Optimal control versus reinforcement learning

    Yunlong Song, Angel Romero, Matthias M¨ uller, Vladlen Koltun, and Davide Scaramuzza. Reaching the limit in autonomous racing: Optimal control versus reinforcement learning. Science Robotics, 8(82):eadg1462, 2023

  21. [21]

    Online whole-body motion planning for quadrotor using multi-resolution search

    Yunfan Ren, Siqi Liang, Fangcheng Zhu, Guozheng Lu, and Fu Zhang. Online whole-body motion planning for quadrotor using multi-resolution search. In2023 IEEE International Conference on Robotics and Automation (ICRA), pages 1594–1600. IEEE, 2023

  22. [22]

    Planning-oriented autonomous driving

    Yihan Hu, Jiazhi Yang, Li Chen, Keyu Li, Chonghao Sima, Xizhou Zhu, Siqi Chai, Senyao Du, Tianwei Lin, Wenhai Wang, et al. Planning-oriented autonomous driving. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 17853–17862, 2023

  23. [23]

    End-to-end training of deep visuomotor policies.Journal of Machine Learning Research, 17(39):1–40, 2016

    Sergey Levine, Chelsea Finn, Trevor Darrell, and Pieter Abbeel. End-to-end training of deep visuomotor policies.Journal of Machine Learning Research, 17(39):1–40, 2016

  24. [24]

    Legged locomotion in challenging terrains using egocentric vision

    Ananye Agarwal, Ashish Kumar, Jitendra Malik, and Deepak Pathak. Legged locomotion in challenging terrains using egocentric vision. InConference on robot learning, pages 403–415. PMLR, 2023. 39

  25. [25]

    Demonstrating agile flight from pixels without state estimation.Robotics: Science and Systems (RSS), 2024

    Ismail Geles, Leonard Bauersfeld, Angel Romero, Jiaxu Xing, and Davide Scaramuzza. Demonstrating agile flight from pixels without state estimation.Robotics: Science and Systems (RSS), 2024

  26. [26]

    MIT press Cambridge, 1998

    Richard S Sutton, Andrew G Barto, et al.Reinforcement learning: An introduction, volume 1. MIT press Cambridge, 1998

  27. [27]

    Princeton University, 1954

    Marvin Lee Minsky.Theory of neural-analog reinforcement systems and its application to the brain-model problem. Princeton University, 1954

  28. [28]

    Policy Distillation

    Andrei A Rusu, Sergio Gomez Colmenarejo, Caglar Gulcehre, Guillaume Desjardins, James Kirkpatrick, Razvan Pascanu, Volodymyr Mnih, Koray Kavukcuoglu, and Raia Hadsell. Policy distillation.arXiv preprint arXiv:1511.06295, 2015

  29. [29]

    A memory of errors in sensorimotor learning.Science, 345(6202):1349–1353, 2014

    David J Herzfeld, Pavan A Vaswani, Mollie K Marko, and Reza Shadmehr. A memory of errors in sensorimotor learning.Science, 345(6202):1349–1353, 2014

  30. [30]

    Rep- resentation learning in the artificial and biological neural networks underlying sensorimotor integration.Science Advances, 8(22):eabn0984, 2022

    Ahmad Suhaimi, Amos WH Lim, Xin Wei Chia, Chunyue Li, and Hiroshi Makino. Rep- resentation learning in the artificial and biological neural networks underlying sensorimotor integration.Science Advances, 8(22):eabn0984, 2022

  31. [31]

    Learning by cheating

    Dian Chen, Brady Zhou, Vladlen Koltun, and Philipp Kr ¨ahenb¨ uhl. Learning by cheating. In Conference on robot learning, pages 66–75. PMLR, 2020

  32. [32]

    A reduction of imitation learning and structured prediction to no-regret online learning

    St ´ephane Ross, Geoffrey Gordon, and Drew Bagnell. A reduction of imitation learning and structured prediction to no-regret online learning. InProceedings of the fourteenth interna- tional conference on artificial intelligence and statistics, pages 627–635. JMLR Workshop and Conference Proceedings, 2011

  33. [33]

    Minimum snap trajectory generation and control for quadrotors

    Daniel Mellinger and Vijay Kumar. Minimum snap trajectory generation and control for quadrotors. In2011 IEEE international conference on robotics and automation, pages 2520–

  34. [34]

    Neural-fly enables rapid learning for agile flight in strong winds.Science Robotics, 7(66):eabm6597, 2022

    Michael O’Connell, Guanya Shi, Xichen Shi, Kamyar Azizzadenesheli, Anima Anandkumar, Yisong Yue, and Soon-Jo Chung. Neural-fly enables rapid learning for agile flight in strong winds.Science Robotics, 7(66):eabm6597, 2022. 40

  35. [35]

    Neurobem: Hybrid aerodynamic quadrotor model.arXiv preprint arXiv:2106.08015, 2021

    Leonard Bauersfeld, Elia Kaufmann, Philipp Foehn, Sihao Sun, and Davide Scaramuzza. Neurobem: Hybrid aerodynamic quadrotor model.arXiv preprint arXiv:2106.08015, 2021

  36. [36]

    Px4 autopilot.https://px4.io/, 2023

    PX4 Development Team. Px4 autopilot.https://px4.io/, 2023. Accessed: 2024-01-20

  37. [37]

    Robotics and Edge AI, NVIDIA Jetson Orin

    NVIDIA. Robotics and Edge AI, NVIDIA Jetson Orin. https://www.nvidia.com/enus/autonomous-machines/embedded-systems/jetson-orin/, 2025

  38. [38]

    Sim-to-real transfer of robotic control with dynamics randomization

    Xue Bin Peng, Marcin Andrychowicz, Wojciech Zaremba, and Pieter Abbeel. Sim-to-real transfer of robotic control with dynamics randomization. In2018 IEEE international conference on robotics and automation (ICRA), pages 3803–3810. IEEE, 2018

  39. [39]

    Learning robust perceptive locomotion for quadrupedal robots in the wild.Science robotics, 7(62):eabk2822, 2022

    Takahiro Miki, Joonho Lee, Jemin Hwangbo, Lorenz Wellhausen, Vladlen Koltun, and Marco Hutter. Learning robust perceptive locomotion for quadrupedal robots in the wild.Science robotics, 7(62):eabk2822, 2022

  40. [40]

    Blender - a 3d modelling and rendering package, 2023

    Blender Online Community. Blender - a 3d modelling and rendering package, 2023. Version 3.6

  41. [41]

    Pose estimation from corresponding point data.IEEE Transactions on Systems, Man, and Cybernetics, 19(6):1426–1446, 1989

    Robert M Haralick, Hyonam Joo, Chung-Nan Lee, Xinhua Zhuang, Vinay G Vaidya, and Man Bae Kim. Pose estimation from corresponding point data.IEEE Transactions on Systems, Man, and Cybernetics, 19(6):1426–1446, 1989

  42. [42]

    Searching for mobilenetv3

    Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, et al. Searching for mobilenetv3. InProceedings of the IEEE/CVF international conference on computer vision, pages 1314– 1324, 2019

  43. [43]

    Encoder-decoder with atrous separable convolution for semantic image segmentation

    Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European conference on computer vision (ECCV), pages 801–818, 2018

  44. [44]

    Proximal Policy Optimization Algorithms

    John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017. 41

  45. [45]

    A benchmark comparison of learned control policies for agile quadrotor flight

    Elia Kaufmann, Leonard Bauersfeld, and Davide Scaramuzza. A benchmark comparison of learned control policies for agile quadrotor flight. In2022 International Conference on Robotics and Automation (ICRA), pages 10504–10510. IEEE, 2022

  46. [46]

    A computationally efficient motion primitive for quadrocopter trajectory generation.IEEE Transactions on Robotics, 31(6):1294– 1310, 2015

    Mark W Mueller, Markus Hehn, and Raffaello D’ Andrea. A computationally efficient motion primitive for quadrocopter trajectory generation.IEEE Transactions on Robotics, 31(6):1294– 1310, 2015

  47. [47]

    Optimization-based locomotion planning, estimation, and control design for the atlas humanoid robot.Autonomous robots, 40:429–455, 2016

    Scott Kuindersma, Robin Deits, Maurice Fallon, Andr ´es Valenzuela, Hongkai Dai, Frank Permenter, Twan Koolen, Pat Marion, and Russ Tedrake. Optimization-based locomotion planning, estimation, and control design for the atlas humanoid robot.Autonomous robots, 40:429–455, 2016

  48. [48]

    Swarm of micro flying robots in the wild

    Xin Zhou, Xiangyong Wen, Zhepei Wang, Yuman Gao, Haojia Li, Qianhao Wang, Tiankai Yang, Haojian Lu, Yanjun Cao, Chao Xu, et al. Swarm of micro flying robots in the wild. Science Robotics, 7(66):eabm5954, 2022

  49. [49]

    Flipping the script with atlas, 2020

    Boston Dynamics. Flipping the script with atlas, 2020. https://bostondynamics.com/blog/flipping-the-script-with-atlas/

  50. [50]

    Mobileye under the hood, 2022

    Mobileye. Mobileye under the hood, 2022. https://www.mobileye.com/ces-2022/

  51. [51]

    Segment anything

    Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, et al. Segment anything. In Proceedings of the IEEE/CVF international conference on computer vision, pages 4015–4026, 2023

  52. [52]

    Back to newton’s laws: Learning vision-based agile flight via differentiable physics

    Yuang Zhang, Yu Hu, Yunlong Song, Danping Zou, and Weiyao Lin. Back to newton’s laws: Learning vision-based agile flight via differentiable physics.arXiv preprint arXiv:2407.10648, 2024

  53. [53]

    Flying on point clouds with reinforcement learning.arXiv preprint arXiv:2503.00496, 2025

    Guangtong Xu, Tianyue Wu, Zihan Wang, Qianhao Wang, and Fei Gao. Flying on point clouds with reinforcement learning.arXiv preprint arXiv:2503.00496, 2025. 42

  54. [54]

    Computing large convex regions of obstacle-free space through semidefinite programming

    Robin Deits and Russ Tedrake. Computing large convex regions of obstacle-free space through semidefinite programming. InAlgorithmic Foundations of Robotics XI: Selected Contributions of the Eleventh International Workshop on the Algorithmic Foundations of Robotics, pages 109–124. Springer, 2015

  55. [55]

    Fast iterative region inflation for computing large 2-d/3-d convex regions of obstacle-free space.IEEE Transactions on Robotics, 2025

    Qianhao Wang, Zhepei Wang, Mingyang Wang, Jialin Ji, Zhichao Han, Tianyue Wu, Rui Jin, Yuman Gao, Chao Xu, and Fei Gao. Fast iterative region inflation for computing large 2-d/3-d convex regions of obstacle-free space.IEEE Transactions on Robotics, 2025

  56. [56]

    Simple, a visuotactile method learned in simulation to precisely pick, localize, regrasp, and place objects.Science Robotics, 9(91):eadi8808, 2024

    Maria Bauza, Antonia Bronars, Yifan Hou, Ian Taylor, Nikhil Chavan-Dafle, and Alberto Rodriguez. Simple, a visuotactile method learned in simulation to precisely pick, localize, regrasp, and place objects.Science Robotics, 9(91):eadi8808, 2024

  57. [57]

    Agile autonomous driving using end-to-end deep imitation learning.Robotics: Science and Systems (RSS), 2017

    Yunpeng Pan, Ching-An Cheng, Kamil Saigol, Keuntaek Lee, Xinyan Yan, Evangelos Theodorou, and Byron Boots. Agile autonomous driving using end-to-end deep imitation learning.Robotics: Science and Systems (RSS), 2017

  58. [58]

    Serl: A software suite for sample- efficient robotic reinforcement learning

    Jianlan Luo, Zheyuan Hu, Charles Xu, You Liang Tan, Jacob Berg, Archit Sharma, Stefan Schaal, Chelsea Finn, Abhishek Gupta, and Sergey Levine. Serl: A software suite for sample- efficient robotic reinforcement learning. In2024 IEEE International Conference on Robotics and Automation (ICRA), pages 16961–16969. IEEE, 2024

  59. [59]

    A direct visual servoing-based framework for the 2016 IROS autonomous drone racing challenge

    Sunggoo Jung, Sungwook Cho, Dasol Lee, Hanseob Lee, and David Hyunchul Shim. A direct visual servoing-based framework for the 2016 IROS autonomous drone racing challenge. Journal of Field Robotics, 35(1):146–166, 2018

  60. [60]

    Sim-to-Real: Learning Agile Locomotion For Quadruped Robots

    Jie Tan, Tingnan Zhang, Erwin Coumans, Atil Iscen, Yunfei Bai, Danijar Hafner, Steven Bohez, and Vincent Vanhoucke. Sim-to-real: Learning agile locomotion for quadruped robots.arXiv preprint arXiv:1804.10332, 2018

  61. [61]

    Policy invariance under reward transforma- tions: Theory and application to reward shaping

    Andrew Y Ng, Daishi Harada, and Stuart Russell. Policy invariance under reward transforma- tions: Theory and application to reward shaping. InIcml, volume 99, pages 278–287. Citeseer, 1999. 43

  62. [62]

    Pointnet: Deep learning on point sets for 3d classification and segmentation

    Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 652–660, 2017

  63. [63]

    Isaac gym: High performance gpu-based physics simulation for robot learning.Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2021

    Viktor Makoviychuk, Lukasz Wawrzyniak, Yunrong Guo, Michelle Lu, Kier Storey, Miles Macklin, David Hoeller, Nikita Rudin, Arthur Allshire, Ankur Handa, et al. Isaac gym: High performance gpu-based physics simulation for robot learning.Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2021

  64. [64]

    Adaptive mobile manipulation for articulated objects in the open world,

    Haoyu Xiong, Russell Mendonca, Kenneth Shaw, and Deepak Pathak. Adaptive mobile ma- nipulation for articulated objects in the open world.arXiv preprint arXiv:2401.14403, 2024

  65. [65]

    Vi- sual whole-body control for legged loco-manipulation,

    Minghuan Liu, Zixuan Chen, Xuxin Cheng, Yandong Ji, Ri-Zhao Qiu, Ruihan Yang, and Xiaolong Wang. Visual whole-body control for legged loco-manipulation.arXiv preprint arXiv:2403.16967, 2024

  66. [66]

    Reinforcement learning for versatile, dynamic, and robust bipedal locomotion control.The International Journal of Robotics Research, 44(5):840–888, 2025

    Zhongyu Li, Xue Bin Peng, Pieter Abbeel, Sergey Levine, Glen Berseth, and Koushil Sreenath. Reinforcement learning for versatile, dynamic, and robust bipedal locomotion control.The International Journal of Robotics Research, 44(5):840–888, 2025

  67. [67]

    Robot model identification and learning: A modern perspective.Annual Review of Control, Robotics, and Autonomous Systems, 7, 2024

    Taeyoon Lee, Jaewoon Kwon, Patrick M Wensing, and Frank C Park. Robot model identification and learning: A modern perspective.Annual Review of Control, Robotics, and Autonomous Systems, 7, 2024

  68. [68]

    Learn- ing dexterous in-hand manipulation.The International Journal of Robotics Research, 39(1):3– 20, 2020

    OpenAI: Marcin Andrychowicz, Bowen Baker, Maciek Chociej, Rafal Jozefowicz, Bob Mc- Grew, Jakub Pachocki, Arthur Petron, Matthias Plappert, Glenn Powell, Alex Ray, et al. Learn- ing dexterous in-hand manipulation.The International Journal of Robotics Research, 39(1):3– 20, 2020

  69. [69]

    Vimo: Simultaneous visual inertial model-based odometry and force estimation.IEEE Robotics and Automation Letters, 4(3):2785–2792, 2019

    Barza Nisar, Philipp Foehn, Davide Falanga, and Davide Scaramuzza. Vimo: Simultaneous visual inertial model-based odometry and force estimation.IEEE Robotics and Automation Letters, 4(3):2785–2792, 2019. 44

  70. [70]

    Vid-fusion: Robust visual- inertial-dynamics odometry for accurate external force estimation

    Ziming Ding, Tiankai Yang, Kunyi Zhang, Chao Xu, and Fei Gao. Vid-fusion: Robust visual- inertial-dynamics odometry for accurate external force estimation. In2021 IEEE International Conference on Robotics and Automation (ICRA), pages 14469–14475. IEEE, 2021

  71. [71]

    Flightmare: A flexible quadrotor simulator

    Yunlong Song, Selim Naji, Elia Kaufmann, Antonio Loquercio, and Davide Scaramuzza. Flightmare: A flexible quadrotor simulator. InConference on Robot Learning, pages 1147–

  72. [72]

    Geometrically constrained trajectory opti- mization for multicopters.https://arxiv.org/pdf/2103.00190v1, 2021

    Zhepei Wang, Xin Zhou, Chao Xu, and Fei Gao. Geometrically constrained trajectory opti- mization for multicopters.https://arxiv.org/pdf/2103.00190v1, 2021

  73. [73]

    A robust and modular multi-sensor fusion approach applied to mav navigation

    Simon Lynen, Markus W Achtelik, Stephan Weiss, Margarita Chli, and Roland Siegwart. A robust and modular multi-sensor fusion approach applied to mav navigation. In2013 IEEE/RSJ international conference on intelligent robots and systems, pages 3923–3929. IEEE, 2013

  74. [74]

    Unidexgrasp++: Improving dexterous grasping policy learning via geometry-aware curriculum and iterative generalist-specialist learning

    Weikang Wan, Haoran Geng, Yun Liu, Zikang Shan, Yaodong Yang, Li Yi, and He Wang. Unidexgrasp++: Improving dexterous grasping policy learning via geometry-aware curriculum and iterative generalist-specialist learning. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 3891–3902, 2023

  75. [75]

    C”, “K”, “S

    Logan Engstrom, Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Firdaus Janoos, Larry Rudolph, and Aleksander Madry. Implementation matters in deep policy gradients: A case study on ppo and trpo.arXiv preprint arXiv:2005.12729, 2020. 45 Acknowledgments We express our gratitude to Weijie Kong, Jiarui Zhang, Rui Jin, and Yuman Gao for their invaluable ph...