arxiv: 2604.05828 · v2 · submitted 2026-04-07 · 💻 cs.RO

Recognition: no theorem link

Precise Aggressive Aerial Maneuvers with Sensorimotor Policies

Tianyue Wu , Guangtong Xu , Zihan Wang , Junxiao Lin , Tianyang Chen , Yuze Wu , Zhichao Han , Zhiyang Liu

show 1 more author

Fei Gao

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:57 UTC · model grok-4.3

classification 💻 cs.RO

keywords quadrotorreinforcement learningsensorimotor policyaggressive maneuvernarrow gap traversalsim-to-realaerial robotics

0 comments

The pith

Sensorimotor policies trained via reinforcement learning enable quadrotors to traverse narrow gaps tilted up to 90 degrees with 5 cm clearance using only onboard sensors.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that direct mapping from vision and proprioception to control via RL policies can solve aggressive gap traversal under SE(3) constraints. Policies are trained in simulation with initialization from model-based trajectories to aid exploration, then transferred to hardware. This allows navigation without knowledge of the gap position or orientation, and even reactive response to moving gaps. A sympathetic reader would care because it demonstrates a path to autonomous drone flight in cluttered environments where precise planning is difficult due to uncertainty.

Core claim

The authors claim that sensorimotor policies, trained end-to-end with reinforcement learning and policy distillation in simulation after initialization with model-based planner trajectories, allow a quadrotor to perform precise aggressive maneuvers through narrow rectangular gaps. These include passages with only 5 cm clearance at up to 90-degree tilt, without any prior information on the gap's location or orientation, and with the ability to handle dynamic gaps reactively. The method extends to sequences of gaps and varied geometries.

What carries the argument

End-to-end sensorimotor policies that map onboard vision and proprioception directly to low-level control commands, trained with RL and initialized using trajectories from a model-based planner.

If this is right

The policy achieves high repeatability in real-world gap traversals with low clearance and high tilt.
It enables reactive servo control for moving gaps without training on dynamic scenarios.
Policies can be developed for challenging tracks consisting of multiple narrow gaps placed closely together.
The approach works for geometrically diverse gaps without requiring manually defined traversal poses or visual features.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the sim-to-real transfer holds, this method could be applied to other aggressive aerial tasks such as rapid obstacle avoidance in unknown environments.
The initialization strategy using model-based plans may help overcome exploration challenges in other robotic RL applications with constrained action spaces.

Load-bearing premise

The simulation accurately enough represents real aerodynamics, sensor noise, and vehicle dynamics that the learned policy transfers directly to the physical quadrotor without major domain randomization or fine-tuning.

What would settle it

Testing the policy on a real quadrotor attempting to fly through a rectangular gap with 5 cm clearance at 90-degree tilt; success with high repeatability would support the claim, while frequent collisions or failures would indicate simulation-reality gap.

read the original abstract

Precise aggressive maneuvers with lightweight onboard sensors remains a key bottleneck in fully exploiting the maneuverability of drones. Such maneuvers are critical for expanding the systems' accessible area by navigating through narrow openings in the environment. Among the most relevant problems, a representative one is aggressive traversal through narrow gaps with quadrotors under SE(3) constraints, which require the quadrotors to leverage a momentary tilted attitude and the asymmetry of the airframe to navigate through gaps. In this paper, we achieve such maneuvers by developing sensorimotor policies directly mapping onboard vision and proprioception into low-level control commands. The policies are trained using reinforcement learning (RL) with end-to-end policy distillation in simulation. We mitigate the fundamental hardness of model-free RL's exploration on the restricted solution space with an initialization strategy leveraging trajectories generated by a model-based planner. Careful sim-to-real design allows the policy to control a quadrotor through narrow gaps with low clearances and high repeatability. For instance, the proposed method enables a quadrotor to navigate a rectangular gap at a 5 cm clearance, tilted at up to 90-degree orientation, without knowledge of the gap's position or orientation. Without training on dynamic gaps, the policy can reactively servo the quadrotor to traverse through a moving gap. The proposed method is also validated by training and deploying policies on challenging tracks of narrow gaps placed closely. The flexibility of the policy learning method is demonstrated by developing policies for geometrically diverse gaps, without relying on manually defined traversal poses and visual features.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper trains RL sensorimotor policies that let a real quadrotor fly through 5 cm rectangular gaps at up to 90-degree tilt using only onboard vision and proprioception, and the same policy reacts to moving gaps it never saw in training.

read the letter

The core result is a policy that maps raw onboard sensors straight to low-level commands and achieves the reported clearances and tilts on hardware after training in simulation with model-based initialization and distillation. They also train on geometrically varied gaps without hand-specified poses or visual features, and the policy generalizes to moving gaps and to sequences of closely spaced gaps. That combination of end-to-end learning, reactive behavior, and real-world deployment on an SE(3)-constrained problem is the actual advance over earlier model-based or pose-dependent methods. The initialization step is a practical way to make the RL exploration feasible in the narrow feasible set of aggressive trajectories. The hardware demonstrations are the strongest part; showing repeatable traversal at those extremes without external infrastructure is useful for inspection or search tasks. The main weakness is the thin experimental reporting. The abstract states specific clearances and tilts but gives no success rates, trial counts, failure modes, or statistical spread. There are also no ablations on the initialization, distillation, or sim-to-real choices. The stress-test concern about unmodeled aerodynamics in high-tilt flight is reasonable because the paper only says “careful sim-to-real design” without quantifying domain randomization, propeller interactions, or ground-effect modeling. If those details and the missing metrics are in the full manuscript, the claims become much easier to evaluate. This work is aimed at aerial-robotics researchers who want learning-based controllers for tight, aggressive maneuvers. A reader already working on sim-to-real transfer or RL for underactuated vehicles will get concrete ideas from the pipeline. It is worth sending to peer review because the hardware results on a genuinely hard problem justify referee attention, even if the paper will need added statistics and ablation data to be fully convincing.

Referee Report

2 major / 2 minor

Summary. The paper claims to develop sensorimotor policies for quadrotors that directly map onboard vision and proprioception to low-level control commands, enabling precise aggressive traversals through narrow rectangular gaps at 5 cm clearance and up to 90° tilt without explicit knowledge of gap position or orientation. Policies are trained via RL with end-to-end distillation from model-based planner trajectories in simulation, transferred to hardware through careful sim-to-real design, and demonstrated to reactively handle moving gaps as well as sequences of diverse gaps.

Significance. If the empirical claims hold under rigorous validation, the work would represent a meaningful advance in autonomous aerial robotics by showing that learned policies can achieve repeatable, high-precision SE(3) maneuvers on physical hardware without pose estimation or hand-crafted features. The model-based initialization strategy to address RL exploration hardness and the generalization to unseen dynamic gaps are notable strengths that could inform future sim-to-real efforts for underactuated systems.

major comments (2)

[Abstract and §4] Abstract and §4 (Experimental Results): The abstract and results claim 'high repeatability' for 5 cm clearance and 90° tilt traversals on hardware, yet report no quantitative success rates, trial counts, failure modes, error bars, or ablation studies on sim-to-real factors. This is load-bearing for the central claim of direct policy transfer and real-world viability.
[§3] §3 (Method, sim-to-real design): The 'careful sim-to-real design' invoked to justify zero-shot hardware deployment does not quantify or mitigate mismatches in aerodynamics (blade flapping, asymmetric downwash, gap-induced ground effect) during 90°-tilt aggressive flight, which directly risks the reported repeatability even with perfect vision/proprioception.

minor comments (2)

[Abstract] Abstract: The phrasing 'Among the most relevant problems, a representative one is aggressive traversal...' is somewhat awkward and could be tightened for readability.
[Notation and §2] Throughout: Some notation for SE(3) constraints and policy inputs could be clarified with an explicit diagram or table of sensor modalities.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing the potential significance of our sensorimotor policy approach for aggressive aerial maneuvers. We address each major comment below with clarifications and planned revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract and §4] Abstract and §4 (Experimental Results): The abstract and results claim 'high repeatability' for 5 cm clearance and 90° tilt traversals on hardware, yet report no quantitative success rates, trial counts, failure modes, error bars, or ablation studies on sim-to-real factors. This is load-bearing for the central claim of direct policy transfer and real-world viability.

Authors: We agree that explicit quantitative metrics would better support the repeatability claims. The full manuscript reports multiple hardware trials across different gap configurations and orientations, with consistent successful traversals shown in the accompanying videos and described in §4. To address this directly, we will revise the abstract and §4 to include specific success rates (e.g., successful traversals out of total attempts), trial counts, observed failure modes, and basic statistics such as position error distributions. We will also incorporate a concise ablation on key sim-to-real factors like domain randomization ranges. These additions will provide the rigorous validation requested without altering the core empirical findings. revision: yes
Referee: [§3] §3 (Method, sim-to-real design): The 'careful sim-to-real design' invoked to justify zero-shot hardware deployment does not quantify or mitigate mismatches in aerodynamics (blade flapping, asymmetric downwash, gap-induced ground effect) during 90°-tilt aggressive flight, which directly risks the reported repeatability even with perfect vision/proprioception.

Authors: This is a valid observation regarding the level of detail in our sim-to-real transfer discussion. Section 3 describes our use of domain randomization over dynamics parameters (including thrust curves and delays) and sensor noise to enable zero-shot deployment, which proved sufficient for the reported hardware results. However, we did not provide explicit quantification or targeted mitigation for effects such as blade flapping or gap-induced ground effects at extreme tilts. In the revision, we will expand §3 to include a discussion of these aerodynamic considerations, referencing our simulation parameters and post-hoc hardware observations that informed the randomization strategy. This will better justify why the policy maintained repeatability despite potential mismatches. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical RL training and sim-to-real deployment

full rationale

The paper's claims rest on training sensorimotor policies via RL in simulation (with model-based initialization and end-to-end distillation) followed by hardware deployment under 'careful sim-to-real design.' These are standard empirical steps whose success is measured by repeatable real-world traversal results, not by any equation or parameter that reduces to its own inputs by construction. No self-citations, uniqueness theorems, or fitted quantities are presented as independent predictions. The derivation chain is self-contained through conventional RL practices and experimental validation.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on the domain assumption that simulation-to-real transfer succeeds for aggressive maneuvers and that model-based planner trajectories provide a useful initialization for RL exploration in a restricted solution space.

axioms (2)

domain assumption Simulation dynamics and sensor models are sufficiently accurate for zero-shot policy transfer to real hardware
The method relies on careful sim-to-real design but does not detail domain randomization or adaptation steps in the abstract.
domain assumption Model-based planner trajectories lie close enough to feasible RL solutions to bootstrap exploration
Initialization strategy is invoked to mitigate exploration hardness without quantifying how close the trajectories are.

pith-pipeline@v0.9.0 · 5598 in / 1310 out tokens · 51486 ms · 2026-05-10T18:57:53.191766+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

75 extracted references · 10 canonical work pages · 1 internal anchor

[1]

Trajectory generation and control for precise aggressive maneuvers with quadrotors.The International Journal of Robotics Research, 31(5):664–674, 2012

Daniel Mellinger, Nathan Michael, and Vijay Kumar. Trajectory generation and control for precise aggressive maneuvers with quadrotors.The International Journal of Robotics Research, 31(5):664–674, 2012

2012
[2]

Polynomial trajectory planning for aggressive quadrotor flight in dense indoor environments

Charles Richter, Adam Bry, and Nicholas Roy. Polynomial trajectory planning for aggressive quadrotor flight in dense indoor environments. InRobotics Research: The 16th International Symposium ISRR, pages 649–666. Springer, 2013

2013
[3]

Multi-fidelity black-box optimization for time-optimal quadrotor maneuvers.The International Journal of Robotics Research, 40(12- 14):1352–1369, 2021

Gilhyun Ryou, Ezra Tal, and Sertac Karaman. Multi-fidelity black-box optimization for time-optimal quadrotor maneuvers.The International Journal of Robotics Research, 40(12- 14):1352–1369, 2021

2021
[4]

Time-optimal planning for quadrotor waypoint flight.Science robotics, 6(56):eabh1221, 2021

Philipp Foehn, Angel Romero, and Davide Scaramuzza. Time-optimal planning for quadrotor waypoint flight.Science robotics, 6(56):eabh1221, 2021

2021
[5]

Champion-level drone racing using deep reinforcement learning

Elia Kaufmann, Leonard Bauersfeld, Antonio Loquercio, Matthias M¨ uller, Vladlen Koltun, and Davide Scaramuzza. Champion-level drone racing using deep reinforcement learning. Nature, 620(7976):982–987, 2023

2023
[6]

Safety-assured high-speed navigation for mavs.Science Robotics, 10(98):eado6187, 2025

Yunfan Ren, Fangcheng Zhu, Guozheng Lu, Yixi Cai, Longji Yin, Fanze Kong, Jiarong Lin, Nan Chen, and Fu Zhang. Safety-assured high-speed navigation for mavs.Science Robotics, 10(98):eado6187, 2025. 37

2025
[7]

Minding the gap: in-flight body awareness in birds.Frontiers in Zoology, 11(1):64, 2014

Ingo Schiffner, Hong Duc Vo, Prasanna S Bhagavatula, and Mandyam V Srinivasan. Minding the gap: in-flight body awareness in birds.Frontiers in Zoology, 11(1):64, 2014

2014
[8]

Flying through gaps: how does a bird deal with the problem and what costs are there?Royal Society Open Science, 9(3):211072, 2022

Per Henningsson. Flying through gaps: how does a bird deal with the problem and what costs are there?Royal Society Open Science, 9(3):211072, 2022

2022
[9]

Side- ways maneuvers enable narrow aperture negotiation by free-flying hummingbirds.Journal of Experimental Biology, 226(21):jeb245643, 2023

Marc A Badger, Kathryn McClain, Ashley Smiley, Jessica Ye, and Robert Dudley. Side- ways maneuvers enable narrow aperture negotiation by free-flying hummingbirds.Journal of Experimental Biology, 226(21):jeb245643, 2023

2023
[10]

Nest predation and nest sites: new perspectives on old patterns.Bioscience, 43(8):523–532, 1993

Thomas E Martin. Nest predation and nest sites: new perspectives on old patterns.Bioscience, 43(8):523–532, 1993

1993
[11]

Regional forest fragmentation and the nesting success of migratory birds.Science, 267(5206):1987–1990, 1995

Scott K Robinson, Frank R Thompson III, Therese M Donovan, Donald R Whitehead, and John Faaborg. Regional forest fragmentation and the nesting success of migratory birds.Science, 267(5206):1987–1990, 1995

1987
[12]

Geometrically constrained trajectory opti- mization for multicopters.IEEE Transactions on Robotics, 38(5):3259–3278, 2022

Zhepei Wang, Xin Zhou, Chao Xu, and Fei Gao. Geometrically constrained trajectory opti- mization for multicopters.IEEE Transactions on Robotics, 38(5):3259–3278, 2022

2022
[13]

Estimation, control, and planning for aggressive flight with a small quadrotor with a single camera and imu.IEEE Robotics and Automation Letters, 2(2):404–411, 2016

Giuseppe Loianno, Chris Brunner, Gary McGrath, and Vijay Kumar. Estimation, control, and planning for aggressive flight with a small quadrotor with a single camera and imu.IEEE Robotics and Automation Letters, 2(2):404–411, 2016

2016
[14]

Aggressive quadrotor flight through narrow gaps with onboard sensing and computing using active vision

Davide Falanga, Elias Mueggler, Matthias Faessler, and Davide Scaramuzza. Aggressive quadrotor flight through narrow gaps with onboard sensing and computing using active vision. In2017 IEEE international conference on robotics and automation (ICRA), pages 5774–5781. IEEE, 2017

2017
[15]

Flying through a narrow gap using neural network: an end-to-end planning and control approach

Jiarong Lin, Luqi Wang, Fei Gao, Shaojie Shen, and Fu Zhang. Flying through a narrow gap using neural network: an end-to-end planning and control approach. In2019 IEEE/RSJ international conference on intelligent robots and systems (IROS), pages 3526–3533. IEEE, 2019. 38

2019
[16]

Chenxi Xiao, Peng Lu, and Qizhi He. Flying through a narrow gap using end-to-end deep reinforcement learning augmented with curriculum learning and sim2real.IEEE transactions on neural networks and learning systems, 34(5):2701–2708, 2021

2021
[17]

Robust visual inertial odometry using a direct ekf-based approach

Michael Bloesch, Sammy Omari, Marco Hutter, and Roland Siegwart. Robust visual inertial odometry using a direct ekf-based approach. In2015 IEEE/RSJ international conference on intelligent robots and systems (IROS), pages 298–304. IEEE, 2015

2015
[18]

Keyframe-based visual–inertial odometry using nonlinear optimization.The International Journal of Robotics Research, 34(3):314–334, 2015

Stefan Leutenegger, Simon Lynen, Michael Bosse, Roland Siegwart, and Paul Furgale. Keyframe-based visual–inertial odometry using nonlinear optimization.The International Journal of Robotics Research, 34(3):314–334, 2015

2015
[19]

Estimating uncertain spatial relationships in robotics

Randall Smith, Matthew Self, and Peter Cheeseman. Estimating uncertain spatial relationships in robotics. InAutonomous robot vehicles, pages 167–193. Springer, 1990

1990
[20]

Reaching the limit in autonomous racing: Optimal control versus reinforcement learning

Yunlong Song, Angel Romero, Matthias M¨ uller, Vladlen Koltun, and Davide Scaramuzza. Reaching the limit in autonomous racing: Optimal control versus reinforcement learning. Science Robotics, 8(82):eadg1462, 2023

2023
[21]

Online whole-body motion planning for quadrotor using multi-resolution search

Yunfan Ren, Siqi Liang, Fangcheng Zhu, Guozheng Lu, and Fu Zhang. Online whole-body motion planning for quadrotor using multi-resolution search. In2023 IEEE International Conference on Robotics and Automation (ICRA), pages 1594–1600. IEEE, 2023

2023
[22]

Planning-oriented autonomous driving

Yihan Hu, Jiazhi Yang, Li Chen, Keyu Li, Chonghao Sima, Xizhou Zhu, Siqi Chai, Senyao Du, Tianwei Lin, Wenhai Wang, et al. Planning-oriented autonomous driving. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 17853–17862, 2023

2023
[23]

End-to-end training of deep visuomotor policies.Journal of Machine Learning Research, 17(39):1–40, 2016

Sergey Levine, Chelsea Finn, Trevor Darrell, and Pieter Abbeel. End-to-end training of deep visuomotor policies.Journal of Machine Learning Research, 17(39):1–40, 2016

2016
[24]

Legged locomotion in challenging terrains using egocentric vision

Ananye Agarwal, Ashish Kumar, Jitendra Malik, and Deepak Pathak. Legged locomotion in challenging terrains using egocentric vision. InConference on robot learning, pages 403–415. PMLR, 2023. 39

2023
[25]

Demonstrating agile flight from pixels without state estimation.Robotics: Science and Systems (RSS), 2024

Ismail Geles, Leonard Bauersfeld, Angel Romero, Jiaxu Xing, and Davide Scaramuzza. Demonstrating agile flight from pixels without state estimation.Robotics: Science and Systems (RSS), 2024

2024
[26]

MIT press Cambridge, 1998

Richard S Sutton, Andrew G Barto, et al.Reinforcement learning: An introduction, volume 1. MIT press Cambridge, 1998

1998
[27]

Princeton University, 1954

Marvin Lee Minsky.Theory of neural-analog reinforcement systems and its application to the brain-model problem. Princeton University, 1954

1954
[28]

Policy Distillation

Andrei A Rusu, Sergio Gomez Colmenarejo, Caglar Gulcehre, Guillaume Desjardins, James Kirkpatrick, Razvan Pascanu, Volodymyr Mnih, Koray Kavukcuoglu, and Raia Hadsell. Policy distillation.arXiv preprint arXiv:1511.06295, 2015

work page Pith review arXiv 2015
[29]

A memory of errors in sensorimotor learning.Science, 345(6202):1349–1353, 2014

David J Herzfeld, Pavan A Vaswani, Mollie K Marko, and Reza Shadmehr. A memory of errors in sensorimotor learning.Science, 345(6202):1349–1353, 2014

2014
[30]

Rep- resentation learning in the artificial and biological neural networks underlying sensorimotor integration.Science Advances, 8(22):eabn0984, 2022

Ahmad Suhaimi, Amos WH Lim, Xin Wei Chia, Chunyue Li, and Hiroshi Makino. Rep- resentation learning in the artificial and biological neural networks underlying sensorimotor integration.Science Advances, 8(22):eabn0984, 2022

2022
[31]

Learning by cheating

Dian Chen, Brady Zhou, Vladlen Koltun, and Philipp Kr ¨ahenb¨ uhl. Learning by cheating. In Conference on robot learning, pages 66–75. PMLR, 2020

2020
[32]

A reduction of imitation learning and structured prediction to no-regret online learning

St ´ephane Ross, Geoffrey Gordon, and Drew Bagnell. A reduction of imitation learning and structured prediction to no-regret online learning. InProceedings of the fourteenth interna- tional conference on artificial intelligence and statistics, pages 627–635. JMLR Workshop and Conference Proceedings, 2011

2011
[33]

Minimum snap trajectory generation and control for quadrotors

Daniel Mellinger and Vijay Kumar. Minimum snap trajectory generation and control for quadrotors. In2011 IEEE international conference on robotics and automation, pages 2520–
[34]

Neural-fly enables rapid learning for agile flight in strong winds.Science Robotics, 7(66):eabm6597, 2022

Michael O’Connell, Guanya Shi, Xichen Shi, Kamyar Azizzadenesheli, Anima Anandkumar, Yisong Yue, and Soon-Jo Chung. Neural-fly enables rapid learning for agile flight in strong winds.Science Robotics, 7(66):eabm6597, 2022. 40

2022
[35]

Neurobem: Hybrid aerodynamic quadrotor model.arXiv preprint arXiv:2106.08015, 2021

Leonard Bauersfeld, Elia Kaufmann, Philipp Foehn, Sihao Sun, and Davide Scaramuzza. Neurobem: Hybrid aerodynamic quadrotor model.arXiv preprint arXiv:2106.08015, 2021

work page arXiv 2021
[36]

Px4 autopilot.https://px4.io/, 2023

PX4 Development Team. Px4 autopilot.https://px4.io/, 2023. Accessed: 2024-01-20

2023
[37]

Robotics and Edge AI, NVIDIA Jetson Orin

NVIDIA. Robotics and Edge AI, NVIDIA Jetson Orin. https://www.nvidia.com/enus/autonomous-machines/embedded-systems/jetson-orin/, 2025

2025
[38]

Sim-to-real transfer of robotic control with dynamics randomization

Xue Bin Peng, Marcin Andrychowicz, Wojciech Zaremba, and Pieter Abbeel. Sim-to-real transfer of robotic control with dynamics randomization. In2018 IEEE international conference on robotics and automation (ICRA), pages 3803–3810. IEEE, 2018

2018
[39]

Learning robust perceptive locomotion for quadrupedal robots in the wild.Science robotics, 7(62):eabk2822, 2022

Takahiro Miki, Joonho Lee, Jemin Hwangbo, Lorenz Wellhausen, Vladlen Koltun, and Marco Hutter. Learning robust perceptive locomotion for quadrupedal robots in the wild.Science robotics, 7(62):eabk2822, 2022

2022
[40]

Blender - a 3d modelling and rendering package, 2023

Blender Online Community. Blender - a 3d modelling and rendering package, 2023. Version 3.6

2023
[41]

Pose estimation from corresponding point data.IEEE Transactions on Systems, Man, and Cybernetics, 19(6):1426–1446, 1989

Robert M Haralick, Hyonam Joo, Chung-Nan Lee, Xinhua Zhuang, Vinay G Vaidya, and Man Bae Kim. Pose estimation from corresponding point data.IEEE Transactions on Systems, Man, and Cybernetics, 19(6):1426–1446, 1989

1989
[42]

Searching for mobilenetv3

Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, et al. Searching for mobilenetv3. InProceedings of the IEEE/CVF international conference on computer vision, pages 1314– 1324, 2019

2019
[43]

Encoder-decoder with atrous separable convolution for semantic image segmentation

Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European conference on computer vision (ECCV), pages 801–818, 2018

2018
[44]

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017. 41

work page internal anchor Pith review Pith/arXiv arXiv 2017
[45]

A benchmark comparison of learned control policies for agile quadrotor flight

Elia Kaufmann, Leonard Bauersfeld, and Davide Scaramuzza. A benchmark comparison of learned control policies for agile quadrotor flight. In2022 International Conference on Robotics and Automation (ICRA), pages 10504–10510. IEEE, 2022

2022
[46]

A computationally efficient motion primitive for quadrocopter trajectory generation.IEEE Transactions on Robotics, 31(6):1294– 1310, 2015

Mark W Mueller, Markus Hehn, and Raffaello D’ Andrea. A computationally efficient motion primitive for quadrocopter trajectory generation.IEEE Transactions on Robotics, 31(6):1294– 1310, 2015

2015
[47]

Optimization-based locomotion planning, estimation, and control design for the atlas humanoid robot.Autonomous robots, 40:429–455, 2016

Scott Kuindersma, Robin Deits, Maurice Fallon, Andr ´es Valenzuela, Hongkai Dai, Frank Permenter, Twan Koolen, Pat Marion, and Russ Tedrake. Optimization-based locomotion planning, estimation, and control design for the atlas humanoid robot.Autonomous robots, 40:429–455, 2016

2016
[48]

Swarm of micro flying robots in the wild

Xin Zhou, Xiangyong Wen, Zhepei Wang, Yuman Gao, Haojia Li, Qianhao Wang, Tiankai Yang, Haojian Lu, Yanjun Cao, Chao Xu, et al. Swarm of micro flying robots in the wild. Science Robotics, 7(66):eabm5954, 2022

2022
[49]

Flipping the script with atlas, 2020

Boston Dynamics. Flipping the script with atlas, 2020. https://bostondynamics.com/blog/flipping-the-script-with-atlas/

2020
[50]

Mobileye under the hood, 2022

Mobileye. Mobileye under the hood, 2022. https://www.mobileye.com/ces-2022/

2022
[51]

Segment anything

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, et al. Segment anything. In Proceedings of the IEEE/CVF international conference on computer vision, pages 4015–4026, 2023

2023
[52]

Back to newton’s laws: Learning vision-based agile flight via differentiable physics

Yuang Zhang, Yu Hu, Yunlong Song, Danping Zou, and Weiyao Lin. Back to newton’s laws: Learning vision-based agile flight via differentiable physics.arXiv preprint arXiv:2407.10648, 2024

work page arXiv 2024
[53]

Flying on point clouds with reinforcement learning.arXiv preprint arXiv:2503.00496, 2025

Guangtong Xu, Tianyue Wu, Zihan Wang, Qianhao Wang, and Fei Gao. Flying on point clouds with reinforcement learning.arXiv preprint arXiv:2503.00496, 2025. 42

work page arXiv 2025
[54]

Computing large convex regions of obstacle-free space through semidefinite programming

Robin Deits and Russ Tedrake. Computing large convex regions of obstacle-free space through semidefinite programming. InAlgorithmic Foundations of Robotics XI: Selected Contributions of the Eleventh International Workshop on the Algorithmic Foundations of Robotics, pages 109–124. Springer, 2015

2015
[55]

Fast iterative region inflation for computing large 2-d/3-d convex regions of obstacle-free space.IEEE Transactions on Robotics, 2025

Qianhao Wang, Zhepei Wang, Mingyang Wang, Jialin Ji, Zhichao Han, Tianyue Wu, Rui Jin, Yuman Gao, Chao Xu, and Fei Gao. Fast iterative region inflation for computing large 2-d/3-d convex regions of obstacle-free space.IEEE Transactions on Robotics, 2025

2025
[56]

Simple, a visuotactile method learned in simulation to precisely pick, localize, regrasp, and place objects.Science Robotics, 9(91):eadi8808, 2024

Maria Bauza, Antonia Bronars, Yifan Hou, Ian Taylor, Nikhil Chavan-Dafle, and Alberto Rodriguez. Simple, a visuotactile method learned in simulation to precisely pick, localize, regrasp, and place objects.Science Robotics, 9(91):eadi8808, 2024

2024
[57]

Agile autonomous driving using end-to-end deep imitation learning.Robotics: Science and Systems (RSS), 2017

Yunpeng Pan, Ching-An Cheng, Kamil Saigol, Keuntaek Lee, Xinyan Yan, Evangelos Theodorou, and Byron Boots. Agile autonomous driving using end-to-end deep imitation learning.Robotics: Science and Systems (RSS), 2017

2017
[58]

Serl: A software suite for sample- efficient robotic reinforcement learning

Jianlan Luo, Zheyuan Hu, Charles Xu, You Liang Tan, Jacob Berg, Archit Sharma, Stefan Schaal, Chelsea Finn, Abhishek Gupta, and Sergey Levine. Serl: A software suite for sample- efficient robotic reinforcement learning. In2024 IEEE International Conference on Robotics and Automation (ICRA), pages 16961–16969. IEEE, 2024

2024
[59]

A direct visual servoing-based framework for the 2016 IROS autonomous drone racing challenge

Sunggoo Jung, Sungwook Cho, Dasol Lee, Hanseob Lee, and David Hyunchul Shim. A direct visual servoing-based framework for the 2016 IROS autonomous drone racing challenge. Journal of Field Robotics, 35(1):146–166, 2018

2016
[60]

Sim-to-Real: Learning Agile Locomotion For Quadruped Robots

Jie Tan, Tingnan Zhang, Erwin Coumans, Atil Iscen, Yunfei Bai, Danijar Hafner, Steven Bohez, and Vincent Vanhoucke. Sim-to-real: Learning agile locomotion for quadruped robots.arXiv preprint arXiv:1804.10332, 2018

work page Pith review arXiv 2018
[61]

Policy invariance under reward transforma- tions: Theory and application to reward shaping

Andrew Y Ng, Daishi Harada, and Stuart Russell. Policy invariance under reward transforma- tions: Theory and application to reward shaping. InIcml, volume 99, pages 278–287. Citeseer, 1999. 43

1999
[62]

Pointnet: Deep learning on point sets for 3d classification and segmentation

Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 652–660, 2017

2017
[63]

Isaac gym: High performance gpu-based physics simulation for robot learning.Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2021

Viktor Makoviychuk, Lukasz Wawrzyniak, Yunrong Guo, Michelle Lu, Kier Storey, Miles Macklin, David Hoeller, Nikita Rudin, Arthur Allshire, Ankur Handa, et al. Isaac gym: High performance gpu-based physics simulation for robot learning.Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2021

2021
[64]

Adaptive mobile manipulation for articulated objects in the open world,

Haoyu Xiong, Russell Mendonca, Kenneth Shaw, and Deepak Pathak. Adaptive mobile ma- nipulation for articulated objects in the open world.arXiv preprint arXiv:2401.14403, 2024

work page arXiv 2024
[65]

Vi- sual whole-body control for legged loco-manipulation,

Minghuan Liu, Zixuan Chen, Xuxin Cheng, Yandong Ji, Ri-Zhao Qiu, Ruihan Yang, and Xiaolong Wang. Visual whole-body control for legged loco-manipulation.arXiv preprint arXiv:2403.16967, 2024

work page arXiv 2024
[66]

Reinforcement learning for versatile, dynamic, and robust bipedal locomotion control.The International Journal of Robotics Research, 44(5):840–888, 2025

Zhongyu Li, Xue Bin Peng, Pieter Abbeel, Sergey Levine, Glen Berseth, and Koushil Sreenath. Reinforcement learning for versatile, dynamic, and robust bipedal locomotion control.The International Journal of Robotics Research, 44(5):840–888, 2025

2025
[67]

Robot model identification and learning: A modern perspective.Annual Review of Control, Robotics, and Autonomous Systems, 7, 2024

Taeyoon Lee, Jaewoon Kwon, Patrick M Wensing, and Frank C Park. Robot model identification and learning: A modern perspective.Annual Review of Control, Robotics, and Autonomous Systems, 7, 2024

2024
[68]

Learn- ing dexterous in-hand manipulation.The International Journal of Robotics Research, 39(1):3– 20, 2020

OpenAI: Marcin Andrychowicz, Bowen Baker, Maciek Chociej, Rafal Jozefowicz, Bob Mc- Grew, Jakub Pachocki, Arthur Petron, Matthias Plappert, Glenn Powell, Alex Ray, et al. Learn- ing dexterous in-hand manipulation.The International Journal of Robotics Research, 39(1):3– 20, 2020

2020
[69]

Vimo: Simultaneous visual inertial model-based odometry and force estimation.IEEE Robotics and Automation Letters, 4(3):2785–2792, 2019

Barza Nisar, Philipp Foehn, Davide Falanga, and Davide Scaramuzza. Vimo: Simultaneous visual inertial model-based odometry and force estimation.IEEE Robotics and Automation Letters, 4(3):2785–2792, 2019. 44

2019
[70]

Vid-fusion: Robust visual- inertial-dynamics odometry for accurate external force estimation

Ziming Ding, Tiankai Yang, Kunyi Zhang, Chao Xu, and Fei Gao. Vid-fusion: Robust visual- inertial-dynamics odometry for accurate external force estimation. In2021 IEEE International Conference on Robotics and Automation (ICRA), pages 14469–14475. IEEE, 2021

2021
[71]

Flightmare: A flexible quadrotor simulator

Yunlong Song, Selim Naji, Elia Kaufmann, Antonio Loquercio, and Davide Scaramuzza. Flightmare: A flexible quadrotor simulator. InConference on Robot Learning, pages 1147–
[72]

Geometrically constrained trajectory opti- mization for multicopters.https://arxiv.org/pdf/2103.00190v1, 2021

Zhepei Wang, Xin Zhou, Chao Xu, and Fei Gao. Geometrically constrained trajectory opti- mization for multicopters.https://arxiv.org/pdf/2103.00190v1, 2021

work page arXiv 2021
[73]

A robust and modular multi-sensor fusion approach applied to mav navigation

Simon Lynen, Markus W Achtelik, Stephan Weiss, Margarita Chli, and Roland Siegwart. A robust and modular multi-sensor fusion approach applied to mav navigation. In2013 IEEE/RSJ international conference on intelligent robots and systems, pages 3923–3929. IEEE, 2013

2013
[74]

Unidexgrasp++: Improving dexterous grasping policy learning via geometry-aware curriculum and iterative generalist-specialist learning

Weikang Wan, Haoran Geng, Yun Liu, Zikang Shan, Yaodong Yang, Li Yi, and He Wang. Unidexgrasp++: Improving dexterous grasping policy learning via geometry-aware curriculum and iterative generalist-specialist learning. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 3891–3902, 2023

2023
[75]

C”, “K”, “S

Logan Engstrom, Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Firdaus Janoos, Larry Rudolph, and Aleksander Madry. Implementation matters in deep policy gradients: A case study on ppo and trpo.arXiv preprint arXiv:2005.12729, 2020. 45 Acknowledgments We express our gratitude to Weijie Kong, Jiarui Zhang, Rui Jin, and Yuman Gao for their invaluable ph...

work page doi:10.5281/zenodo.18005929 2005