arxiv: 2604.13505 · v2 · submitted 2026-04-15 · 📡 eess.SY · cs.SY

Recognition: 2 theorem links

· Lean Theorem

Cascaded TD3-PID Hybrid Controller for Quadrotor Trajectory Tracking in Wind Disturbance Environments

Yukang Zhang , Shuqi Chai , Yuhang Zhang , Danlan Huang , Quanbo Ge

Authors on Pith no claims yet

Pith reviewed 2026-05-14 21:01 UTC · model grok-4.3

classification 📡 eess.SY cs.SY

keywords quadrotortrajectory trackingTD3PID controlwind disturbancehybrid controllerdisturbance observerreinforcement learning

0 comments

The pith

A cascaded TD3-PID controller with disturbance observer improves quadrotor trajectory tracking under wind disturbances.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a hybrid control system for quadrotors that uses PID controllers for altitude and attitude while employing an enhanced TD3 reinforcement learning agent for horizontal position control. This combination addresses the different dynamics of the channels, with PID handling fast structured responses and TD3 managing coupling and disturbances. A hybrid disturbance observer is added to reject wind effects. Simulations and real flight tests show better accuracy and robustness than standard methods.

Core claim

The cascaded hybrid framework augments PID stabilization for altitude and attitude with an enhanced TD3 agent for horizontal-position control, incorporating a multi-Q-network structure and a hybrid disturbance observer using low-pass and exponential moving average filtering, leading to more accurate and robust trajectory tracking in wind disturbances as verified by simulations and real-world tests.

What carries the argument

Cascaded TD3-PID hybrid controller with multi-Q-network TD3 and hybrid disturbance observer (HDOB).

If this is right

The enhanced TD3 improves horizontal control under disturbances.
PID with HDOB strengthens altitude and attitude regulation.
Ablation studies confirm the TD3 enhancements.
Real-world tests validate sim-to-real transfer for the hybrid system.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar hybrid approaches could apply to other UAVs or robotic systems with mixed fast and uncertain dynamics.
Further tuning of the TD3 reward function might reduce energy consumption during tracking.
Extending to multi-agent quadrotor formations could test scalability.

Load-bearing premise

The enhanced TD3 agent trained in simulation transfers reliably to real quadrotor hardware without causing instability when wind disturbances occur.

What would settle it

A real-world flight test where the hybrid controller shows larger tracking errors or instability compared to a baseline PID controller under the same wind conditions would falsify the claim.

Figures

Figures reproduced from arXiv: 2604.13505 by Danlan Huang, Quanbo Ge, Shuqi Chai, Yuhang Zhang, Yukang Zhang.

**Figure 2.** Figure 2: The proposed hybrid cascade control architecture uses the same type of controller represented by the same color regions. A quadrotor is an underactuated [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Algorithm design of the horizontal position TD3 controller. On the basis of the original TD3 algorithm, improvements were made to action selection, [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: The proposed HDOB structure combining a median filter, EMA, and IIR low-pass filter. [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: The training results of the horizontal position TD3 controller are evaluated using the reward and the Root Mean Square Normalized Error (RMSNE) [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Comparison of final errors over 200 episodes for the PID controller [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

**Figure 7.** Figure 7: 3D flight trajectories of the quadrotor toward 10 random target points [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗

**Figure 8.** Figure 8: From left to right and top to bottom: position errors, velocities, attitudes, and angular velocities during the point-to-point trajectory test. [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗

**Figure 9.** Figure 9: The results of the controller performing elliptical trajectory tracking under different environmental conditions. From left to right, the scenarios [PITH_FULL_IMAGE:figures/full_fig_p012_9.png] view at source ↗

**Figure 10.** Figure 10: The results of the controller performing rectangular trajectory tracking under different environmental conditions. From left to right, the scenarios [PITH_FULL_IMAGE:figures/full_fig_p012_10.png] view at source ↗

**Figure 11.** Figure 11: The generalization capability of the proposed strategy was evaluated. (a) shows the trajectory tracking performance of the PID, HDOPID, and CTPH [PITH_FULL_IMAGE:figures/full_fig_p013_11.png] view at source ↗

**Figure 12.** Figure 12: Introduction to the experimental platform and deployment of [PITH_FULL_IMAGE:figures/full_fig_p013_12.png] view at source ↗

**Figure 13.** Figure 13: Trajectory tracking comparison between the PID controller and the [PITH_FULL_IMAGE:figures/full_fig_p014_13.png] view at source ↗

**Figure 14.** Figure 14: Trajectory tracking comparison between the PID controller and [PITH_FULL_IMAGE:figures/full_fig_p014_14.png] view at source ↗

read the original abstract

This work presents a cascaded hybrid control framework for quadrotor trajectory tracking under nonlinear dynamics and external disturbances. In quadrotor systems, the altitude and attitude channels exhibit fast, structured dynamics that are well suited to reliable regulation, whereas horizontal-position control is more strongly affected by coupling effects, uncertainty, and disturbances, so that neither pure feedback control nor purely learning-based control alone is equally well suited to all channels. Accordingly, the proposed framework augments conventional proportional-integral-derivative (PID) stabilization for altitude and attitude control with an enhanced Twin Delayed Deep Deterministic Policy Gradient (TD3) agent incorporating a multi-Q-network structure, thereby improving horizontal-position control under severe disturbances. To further strengthen disturbance rejection in altitude and attitude control, a hybrid disturbance observer (HDOB) using low-pass and exponential moving average filtering is embedded in the control loops. The proposed TD3 enhancements are verified through ablation studies, and both numerical simulations and real-world flight tests on the quadrotor platform demonstrate that the proposed method achieves more accurate and robust trajectory tracking under wind disturbances than baseline approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes a cascaded hybrid control architecture for quadrotor trajectory tracking under wind disturbances. It uses PID controllers augmented by a hybrid disturbance observer (HDOB) with low-pass and exponential moving average filters for fast altitude and attitude channels, while an enhanced TD3 agent with multi-Q-network structure handles slower horizontal position control. The central claim is that this hybrid approach yields more accurate and robust tracking than baseline methods, as verified by ablation studies on the TD3 enhancements plus numerical simulations and real-world flight tests.

Significance. If the quantitative performance gains and sim-to-real transfer can be rigorously demonstrated, the work would provide a practical example of combining classical control reliability with RL adaptability for UAVs in disturbed environments. The cascaded separation of dynamics and the HDOB augmentation represent reasonable engineering choices that could inform hybrid controller design, provided the robustness claims are supported by explicit metrics and transfer details.

major comments (2)

[Abstract] Abstract: The claim that 'both numerical simulations and real-world flight tests ... demonstrate that the proposed method achieves more accurate and robust trajectory tracking under wind disturbances than baseline approaches' is not supported by any reported quantitative error metrics (e.g., RMSE or MAE values), wind speed profiles, gust spectra, or statistical tests, making the central performance superiority assertion unverifiable from the supplied information.
[Methodology and Experiments] Methodology and Experiments: The sim-to-real transfer of the enhanced TD3 policy for position control under real wind is asserted but lacks any description of domain randomization schedule, matching between simulated and measured wind spectra, or quantitative before/after retuning comparison; this leaves the real-flight robustness result dependent on an untested transfer assumption rather than demonstrated invariance.

minor comments (1)

[Methodology] The free parameters (TD3 hyperparameters and HDOB cut-off frequencies) are listed but their specific values or tuning procedure are not tabulated, which would aid reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major comment below and indicate the revisions that will be incorporated to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: The claim that 'both numerical simulations and real-world flight tests ... demonstrate that the proposed method achieves more accurate and robust trajectory tracking under wind disturbances than baseline approaches' is not supported by any reported quantitative error metrics (e.g., RMSE or MAE values), wind speed profiles, gust spectra, or statistical tests, making the central performance superiority assertion unverifiable from the supplied information.

Authors: We agree that the abstract would benefit from explicit quantitative support. In the revised manuscript we will add RMSE and MAE values for horizontal position, altitude, and attitude tracking errors under the tested wind conditions, together with the corresponding wind speed profiles, gust spectra, and results of statistical significance tests comparing the proposed controller against the baselines. These metrics are already available from the simulation and flight-test data sets and will be reported both in the abstract and in a new summary table in the results section. revision: yes
Referee: [Methodology and Experiments] Methodology and Experiments: The sim-to-real transfer of the enhanced TD3 policy for position control under real wind is asserted but lacks any description of domain randomization schedule, matching between simulated and measured wind spectra, or quantitative before/after retuning comparison; this leaves the real-flight robustness result dependent on an untested transfer assumption rather than demonstrated invariance.

Authors: We acknowledge that the current description of the sim-to-real transfer is incomplete. We will expand the methodology section to include (i) the full domain-randomization schedule applied during TD3 training, (ii) the procedure used to match the power spectral density of simulated wind to the measured real-world wind spectra, and (iii) quantitative performance metrics (RMSE before and after any policy retuning) that demonstrate the invariance achieved. These additions will make the transfer process explicit and verifiable. revision: yes

Circularity Check

0 steps flagged

No circularity in the hybrid controller derivation or claims

full rationale

The paper presents an engineering synthesis: a cascaded architecture with PID+HDOB for fast attitude/altitude loops and an enhanced TD3 agent for horizontal position. Enhancements to TD3 are checked via ablation studies, and overall performance is asserted via numerical simulations plus real-flight tests against external baselines. No equations reduce to fitted parameters by construction, no uniqueness theorems are imported via self-citation, and no ansatz or renaming is smuggled in. The central claims rest on empirical comparison rather than self-referential definitions, so the derivation chain is self-contained.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 1 invented entities

The framework rests on the domain assumption that altitude/attitude dynamics are sufficiently decoupled from horizontal motion to allow independent PID regulation, plus standard RL training assumptions that simulation-to-real transfer is feasible after modest tuning.

free parameters (2)

TD3 network and training hyperparameters
Chosen to stabilize the horizontal-position policy under wind; exact values not supplied in abstract.
HDOB filter cut-off frequencies
Tuned to balance disturbance estimation speed against noise amplification.

axioms (2)

domain assumption Altitude and attitude channels exhibit fast, structured dynamics amenable to reliable PID regulation
Explicitly stated as the justification for keeping PID on those loops.
domain assumption Horizontal-position control is dominated by coupling, uncertainty, and disturbances
Used to motivate replacement of PID by TD3 on that channel.

invented entities (1)

Hybrid disturbance observer (HDOB) with low-pass and exponential moving average filters no independent evidence
purpose: To estimate and cancel wind effects inside the altitude and attitude loops
Newly introduced component whose independent evidence is the claimed improvement in real-flight tests.

pith-pipeline@v0.9.0 · 5506 in / 1531 out tokens · 34379 ms · 2026-05-14T21:01:57.271630+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

cascaded TD3-PID hybrid framework... enhanced Twin Delayed Deep Deterministic Policy Gradient (TD3) agent incorporating a multi-Q-network structure... hybrid disturbance observer (HDOB) using low-pass and exponential moving average filtering
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

both numerical simulations and real-world flight tests... demonstrate that the proposed method achieves more accurate and robust trajectory tracking under wind disturbances

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

37 extracted references · 1 canonical work pages · 1 internal anchor

[1]

LiDAR-based quadrotor autonomous inspection system in cluttered environments,

W. Liu et al., “LiDAR-based quadrotor autonomous inspection system in cluttered environments,”IEEE Trans. Field Robot., vol. 2, pp. 753–767, 2025

2025
[2]

Recon- figurable drone system for transportation of parcels with variable mass and size,

F. Schiano, P. M. Kornatowski, L. Cencetti, and D. Floreano, “Recon- figurable drone system for transportation of parcels with variable mass and size,”IEEE Robot. Autom. Lett., vol. 7, no. 4, pp. 12150–12157, Oct. 2022

2022
[3]

Embrace the era of drones: a new practical design approach to emergency rescue drones,

Z. Wang, K. Yang, Y . Wang, Z. Zhu, and X. Liang, “Embrace the era of drones: a new practical design approach to emergency rescue drones,” Appl. Sci., vol. 15, no. 1, p. 135, 2025

2025
[4]

An overview of swarm coordinated control,

D. Yu, J. Li, Z. Wang, and X. Li, “An overview of swarm coordinated control,”IEEE Trans. Artif. Intell., vol. 5, no. 5, pp. 1918–1938, May 2024

1918
[5]

Modeling and trajectory tracking with cascaded PD controller for quadrotor,

C. S. Subudhi and D. Ezhilarasi, “Modeling and trajectory tracking with cascaded PD controller for quadrotor,”Procedia Comput. Sci., vol. 133, pp. 952–959, 2018, presented at the Int. Conf. Robotics and Smart Manufacturing (RoSMa2018)

2018
[6]

Unified robust path planning and optimal trajectory generation for efficient 3D area coverage of quadrotor UA Vs,

F. Rekabi-Bana, J. Hu, T. Krajn ´ık, and F. Arvin, “Unified robust path planning and optimal trajectory generation for efficient 3D area coverage of quadrotor UA Vs,”IEEE Trans. Intell. Transp. Syst., vol. 25, no. 3, pp. 2492–2507, Mar. 2024

2024
[7]

Four-stage cascaded control scheme based on robust nonlinear dynamic inversion technique for quadrotors,

M. Micu, M. Lungu, M. Chen, and M. Ebrahimpour, “Four-stage cascaded control scheme based on robust nonlinear dynamic inversion technique for quadrotors,” inProc. 28th Int. Conf. System Theory, Control and Computing (ICSTCC), Sinaia, Romania, 2024, pp. 235– 240

2024
[8]

Quaternion-based sliding mode con- trol for six degrees of freedom flight control of quadrotors,

A. Yazdanshenas and R. Faieghi, “Quaternion-based sliding mode con- trol for six degrees of freedom flight control of quadrotors,” inProc. 2024 IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS), Abu Dhabi, United Arab Emirates, 2024, pp. 10385–10390

2024
[9]

Ben Abdi, A

S. Ben Abdi, A. Debilou, L. Guettal, and A. Guergazi, “Robust trajectory tracking control of a quadrotor under external disturbances and dynamic parameter uncertainties using a hybrid P-PID controller tuned with ant colony optimization,”Aerospace Sci. Technol., vol. 160, p. 110053, 2025

2025
[10]

Neural adaptive PID control of a quadrotor using EFK,

C. Rosales, S. Tosetti, C. Soria, and F. Rossomando, “Neural adaptive PID control of a quadrotor using EFK,”IEEE Lat. Am. Trans., vol. 16, no. 11, pp. 2722–2730, Nov. 2018

2018
[11]

Modelling and PID controller design for a quadrotor unmanned air vehicle,

A. L. Salih, M. Moghavvemi, H. A. F. Mohamed, and K. S. Gaeid, “Modelling and PID controller design for a quadrotor unmanned air vehicle,” inProc. 2010 IEEE Int. Conf. Autom., Quality and Testing, Robotics (AQTR), Cluj-Napoca, Romania, 2010, pp. 1–5

2010
[12]

Second order sliding mode control for a quadrotor UA V ,

E.-H. Zheng, J.-J. Xiong, and J.-L. Luo, “Second order sliding mode control for a quadrotor UA V ,”ISA Trans., vol. 53, no. 4, pp. 1350–1356, 2014

2014
[13]

Model-free-based terminal SMC of quadrotor attitude and position,

H. Wang, X. Ye, Y . Tian, G. Zheng, and N. Christov, “Model-free-based terminal SMC of quadrotor attitude and position,”IEEE Trans. Aerosp. Electron. Syst., vol. 52, no. 5, pp. 2519–2528, Oct. 2016

2016
[14]

Data-driven MPC for quadrotors,

G. Torrente, E. Kaufmann, P. F ¨ohn, and D. Scaramuzza, “Data-driven MPC for quadrotors,”IEEE Robot. Autom. Lett., vol. 6, no. 2, pp. 3769– 3776, Apr. 2021

2021
[15]

Nonlinear MPC for quadrotor fault-tolerant control,

F. Nan, S. Sun, P. Foehn, and D. Scaramuzza, “Nonlinear MPC for quadrotor fault-tolerant control,”IEEE Robot. Autom. Lett., vol. 7, no. 2, pp. 5047–5054, Apr. 2022

2022
[16]

Iterative learning cascade trajectory tracking control for quadrotor-UA Vs with finite-frequency disturbances,

S. Qian, J. Xu, Y . Niu, and T. Jiao, “Iterative learning cascade trajectory tracking control for quadrotor-UA Vs with finite-frequency disturbances,” IEEE Trans. V eh. Technol., vol. 74, no. 4, pp. 5624–5636, Apr. 2025

2025
[17]

Nonlinear PID-type controller for quadrotor trajectory tracking,

J. Moreno-Valenzuela, R. P ´erez-Alcocer, M. Guerrero-Medina, and A. Dzul, “Nonlinear PID-type controller for quadrotor trajectory tracking,” IEEE/ASME Trans. Mechatron., vol. 23, no. 5, pp. 2436–2447, Oct. 2018

2018
[18]

Backstepping sliding-mode and cascade active disturbance rejection control for a quadrotor UA V ,

L.-X. Xu, H.-J. Ma, D. Guo, A.-H. Xie, and D.-L. Song, “Backstepping sliding-mode and cascade active disturbance rejection control for a quadrotor UA V ,”IEEE/ASME Trans. Mechatron., vol. 25, no. 6, pp. 2743–2753, Dec. 2020

2020
[19]

Energy saving quadrotor control for field inspections,

Y . Wang, Y . Wang, and B. Ren, “Energy saving quadrotor control for field inspections,”IEEE Trans. Syst., Man, Cybern.: Syst., vol. 52, no. 3, pp. 1768–1777, Mar. 2022

2022
[20]

Nonlinear hierarchical control for unmanned quadrotor transportation systems,

X. Liang, Y . Fang, N. Sun, and H. Lin, “Nonlinear hierarchical control for unmanned quadrotor transportation systems,”IEEE Trans. Ind. Electron., vol. 65, no. 4, pp. 3395–3405, Apr. 2018

2018
[21]

Safety-critical control of quadrotor UA Vs with control barrier functions,

T. Yang, Z. Miao, G. Yi, and Y . Wang, “Safety-critical control of quadrotor UA Vs with control barrier functions,” inProc. 2022 IEEE Int. Conf. Robot. Biomimetics (ROBIO), Jinghong, China, 2022, pp. 1074– 1079

2022
[22]

Cascade flight control of quadrotors based on deep reinforcement learning,

H. Han, J. Cheng, Z. Xi, and B. Yao, “Cascade flight control of quadrotors based on deep reinforcement learning,”IEEE Robot. Autom. Lett., vol. 7, no. 4, pp. 11134–11141, Oct. 2022

2022
[23]

Supplementary reinforcement learning controller designed for quadrotor UA Vs,

X. Lin, Y . Yu, and C. Sun, “Supplementary reinforcement learning controller designed for quadrotor UA Vs,”IEEE Access, vol. 7, pp. 26422–26431, 2019

2019
[24]

Hybrid reinforcement learning control for a micro quadrotor flight,

J. Yoo, D. Jang, H. J. Kim, and K. H. Johansson, “Hybrid reinforcement learning control for a micro quadrotor flight,”IEEE Control Syst. Lett., vol. 5, no. 2, pp. 505–510, Apr. 2021

2021
[25]

Aggressive quadrotor flight using curiosity-driven reinforcement learning,

Q. Sun, J. Fang, W. X. Zheng, and Y . Tang, “Aggressive quadrotor flight using curiosity-driven reinforcement learning,”IEEE Trans. Ind. Electron., vol. 69, no. 12, pp. 13838–13848, Dec. 2022

2022
[26]

Sliding surface-based integral reinforce- ment learning for optimal tracking control of quadcopters considering uncertainties,

H. Lee, J. Kim, and Y . Kim, “Sliding surface-based integral reinforce- ment learning for optimal tracking control of quadcopters considering uncertainties,”IEEE Trans. Aerosp. Electron. Syst., vol. 61, no. 2, pp. 1677–1691, Apr. 2025

2025
[27]

Nonlinear robust compensation method for trajectory tracking control of quadrotors,

J. Sun, Y . Wang, Y . Yu, and C. Sun, “Nonlinear robust compensation method for trajectory tracking control of quadrotors,”IEEE Access, vol. 7, pp. 26766–26776, 2019

2019
[28]

Uncertainty and disturbance estimator-based global trajectory tracking control for a quadrotor,

Q. Lu, B. Ren, and S. Parameswaran, “Uncertainty and disturbance estimator-based global trajectory tracking control for a quadrotor,” IEEE/ASME Trans. Mechatron., vol. 25, no. 3, pp. 1519–1530, Jun. 2020

2020
[29]

Precise trajectory tracking of multi-rotor UA Vs using wind disturbance rejection approach,

S. I. Azid, S. A. Ali, M. Kumar, M. Cirrincione, and A. Fagiolini, “Precise trajectory tracking of multi-rotor UA Vs using wind disturbance rejection approach,”IEEE Access, vol. 11, pp. 91796–91806, 2023

2023
[30]

A novel robust observer- based nonlinear trajectory tracking control strategy for quadrotors,

H. Hua, Y . Fang, X. Zhang, and B. Lu, “A novel robust observer- based nonlinear trajectory tracking control strategy for quadrotors,” IEEE Trans. Control Syst. Technol., vol. 29, no. 5, pp. 1952–1963, Sept. 2021

1952
[31]

Design and control of an indoor micro quadrotor,

S. Bouabdallah, P. Murrieri, and R. Siegwart, “Design and control of an indoor micro quadrotor,” inProc. IEEE Int. Conf. Robotics and Automation (ICRA), New Orleans, LA, USA, 2004, pp. 4393–4398

2004
[32]

Deterministic policy gradient algorithms,

D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller, “Deterministic policy gradient algorithms,” inProc. Int. Conf. Mach. Learn., 2014, pp. 387–395

2014
[33]

Addressing Function Approximation Error in Actor-Critic Methods

S. Fujimoto, H. van Hoof, and D. Meger, “Addressing function approxi- mation error in actor-critic methods,”arXiv preprint arXiv:1802.09477, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[34]

PyBullet Physics Simulation for Robotics and Machine Learning,

Erwin Coumans and Yunfei Bai, “PyBullet Physics Simulation for Robotics and Machine Learning,” [Online]. Available: https://pybullet.org/ (accessed Jul. 12, 2025)

2025
[35]

Crazyflie 2.0,

Bitcraze, “Crazyflie 2.0,” accessed Jul. 12, 2025, [Online]. Available: https://www.bitcraze.io/products/old-products/crazyflie-2-0

2025
[36]

Symmetric actor-critic deep reinforcement learning for cascade quadrotor flight control,

H. Han, J. Cheng, Z. Xi, and M. Lv, “Symmetric actor-critic deep reinforcement learning for cascade quadrotor flight control,”Neurocom- puting, vol. 559, p. 126789, 2023

2023
[37]

Trajectory tracking of QUA V based on cascade DRL with feedforward control,

S. He, H. Han, and J. Cheng, “Trajectory tracking of QUA V based on cascade DRL with feedforward control,”Neurocomputing, vol. 618, p. 129057, 2025

2025