Transformer-Guided Deep Reinforcement Learning for Optimal Takeoff Trajectory Design of an eVTOL Drone
Pith reviewed 2026-05-17 20:17 UTC · model grok-4.3
The pith
Transformer-guided DRL trains eVTOL takeoff trajectories with 25 percent of the steps needed by standard reinforcement learning.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The transformer-guided DRL agent learned to take off with 4.57×10^6 time steps, representing 25 percent of the 19.79×10^6 time steps needed by a vanilla DRL agent. It achieved 97.2 percent accuracy on the optimal energy consumption compared against the simulation-based optimal reference, while the vanilla DRL achieved 96.1 percent accuracy. The transformer works by exploring a realistic state space at each time step using power and wing angle to the vertical as control variables.
What carries the argument
The transformer module that, at each time step, identifies and prioritizes realistic regions of the state space to guide the reinforcement learning agent's exploration and policy updates.
If this is right
- Training converges with roughly one-quarter the number of environment interactions required by unguided DRL.
- The final policy satisfies the takeoff constraints on vertical displacement and horizontal velocity.
- Energy use lies within three percent of the value obtained from a separate simulation-based optimizer.
- The same guidance structure can be reused for other eVTOL trajectory problems that share the same state and action structure.
Where Pith is reading between the lines
- The same transformer guidance pattern could shorten training for landing or transition-to-cruise phases without new algorithm development.
- If the state-space pruning remains accurate under sensor noise, the method may transfer to onboard hardware with modest additional tuning.
- Extending the transformer to output uncertainty estimates could allow the agent to request more samples only in ambiguous regions.
Load-bearing premise
The transformer can reliably select realistic state-space regions without systematically excluding high-reward trajectories or biasing the learned policy away from true optimality.
What would settle it
A new high-fidelity simulation run in which the final energy consumption of the transformer-guided policy exceeds the known simulation-based optimum by more than a few percent, or in which a vanilla DRL agent reaches comparable performance with similar total steps.
read the original abstract
The rapid advancement of electric vertical takeoff and landing (eVTOL) aircraft offers a promising opportunity to alleviate urban traffic congestion but is still limited by excessive power demands, especially during the takeoff phase. Thus, developing optimal takeoff trajectories for minimum energy consumption becomes essential for broader eVTOL aircraft applications. Conventional optimal control methods (such as dynamic programming and linear quadratic regulator) provide highly efficient and well-established solutions but are prohibited by problem dimensionality and complexity. Deep reinforcement learning (DRL) emerges as a special type of artificial intelligence tackling complex, nonlinear systems; however, the training difficulty is a key bottleneck that hinders DRL applications. To address these challenges, we propose the transformer-guided DRL to alleviate the training difficulty by exploring a realistic state space at each time step using a transformer. The proposed transformer-guided DRL was demonstrated on an optimal takeoff trajectory design of an eVTOL drone for minimal energy consumption while meeting takeoff conditions (i.e., minimum vertical displacement and minimum horizontal velocity) by varying control variables (i.e., power and wing angle to the vertical). Results presented that the transformer-guided DRL agent learned to take off with $4.57\times10^6$ time steps, representing $25\%$ of the $19.79\times10^6$ time steps needed by a vanilla DRL agent. In addition, the transformer-guided DRL achieved $97.2\%$ accuracy on the optimal energy consumption compared against the simulation-based optimal reference, while the vanilla DRL achieved $96.1\%$ accuracy. Therefore, the proposed transformer-guided DRL outperformed vanilla DRL in terms of both training efficiency and optimal design verification.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a transformer-guided deep reinforcement learning (DRL) method to optimize takeoff trajectories for an eVTOL drone, minimizing energy consumption subject to minimum vertical displacement and horizontal velocity constraints by controlling power and wing angle. It reports that the guided agent requires 4.57×10^6 training steps (25% of the 19.79×10^6 steps for vanilla DRL) and reaches 97.2% of the energy optimality achieved by a simulation-based reference, versus 96.1% for the baseline.
Significance. If substantiated with full methodological details, the approach could provide a practical means to accelerate DRL training for high-dimensional aerospace trajectory optimization by restricting exploration to realistic state regions, addressing a known bottleneck in applying model-free RL to nonlinear optimal control problems.
major comments (2)
- [Abstract] Abstract and Methods: the central performance claims (4.57×10^6 vs. 19.79×10^6 steps and 97.2% vs. 96.1% optimality) are presented without any description of the reward function, state representation, action space discretization, hyperparameter search procedure, or number of independent runs with statistical significance testing; these omissions make it impossible to evaluate whether the reported gains are robust or sensitive to modeling assumptions.
- [Methods] The manuscript provides no ablation isolating the transformer module's contribution nor any analysis showing that the guided policy class still contains the simulation-based global optimum; without this, the efficiency gain could result from unintended restriction of the search space rather than improved guidance.
minor comments (1)
- [Abstract] The abstract states '25% of the 19.79×10^6 time steps' but 4.57/19.79 ≈ 0.231; a precise ratio or clarification would improve accuracy.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our manuscript. We address each major comment in turn below, indicating where we agree that revisions are warranted and outlining the changes we will make.
read point-by-point responses
-
Referee: [Abstract] Abstract and Methods: the central performance claims (4.57×10^6 vs. 19.79×10^6 steps and 97.2% vs. 96.1% optimality) are presented without any description of the reward function, state representation, action space discretization, hyperparameter search procedure, or number of independent runs with statistical significance testing; these omissions make it impossible to evaluate whether the reported gains are robust or sensitive to modeling assumptions.
Authors: We agree that these methodological details are essential for reproducibility and for assessing robustness. The current manuscript focuses on the high-level results in the abstract and provides only a concise methods overview. In the revised version we will expand the Methods section with explicit descriptions of the reward function (including all weighting terms and constraints), the full state representation, the discretization scheme for the action space (power and wing angle), the hyperparameter search procedure employed, and results aggregated over multiple independent runs together with statistical significance testing. revision: yes
-
Referee: [Methods] The manuscript provides no ablation isolating the transformer module's contribution nor any analysis showing that the guided policy class still contains the simulation-based global optimum; without this, the efficiency gain could result from unintended restriction of the search space rather than improved guidance.
Authors: We acknowledge that a dedicated ablation would more cleanly isolate the transformer's contribution. The existing comparison to vanilla DRL already holds the underlying DRL algorithm, environment, and hyperparameters fixed while varying only the presence of transformer guidance; nevertheless, we will add an explicit ablation study in the revision. On the question of whether the guided policy class contains the simulation-based global optimum, we note that the transformer is trained to propose realistic next states consistent with the physics of the eVTOL takeoff problem rather than to exclude feasible regions. The fact that the guided agent reaches 97.2 % of the simulation-based reference (versus 96.1 % for vanilla DRL) provides empirical evidence that the guidance does not exclude the optimum. We will augment the revision with a short theoretical argument and, if space permits, additional verification runs confirming that the simulation-based optimum remains reachable under the guided policy. revision: yes
Circularity Check
No circularity: results rest on external simulation reference and vanilla DRL baseline
full rationale
The paper's central claims concern empirical training efficiency (4.57e6 vs 19.79e6 steps) and optimality accuracy (97.2% vs 96.1%) for the transformer-guided DRL agent. These quantities are obtained by direct comparison against an independent simulation-based optimal reference trajectory and a standard vanilla DRL run; neither metric is obtained by algebraic rearrangement of the method's own fitted parameters, state-space definitions, or transformer outputs. No equations or sections in the abstract or described methods reduce the reported performance figures to self-definition, fitted-input renaming, or load-bearing self-citation. The derivation therefore remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- Transformer and RL hyperparameters
axioms (1)
- domain assumption The simulation dynamics accurately capture real eVTOL aerodynamics and power consumption during takeoff.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The transformer produces an action proposal distribution characterized by a mean and variance for each action component, conditioned on the previous action history. The action proposal distribution sets up the DRL state space at each time step...
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The proposed transformer-guided DRL was demonstrated on an optimal takeoff trajectory design of an eVTOL drone for minimal energy consumption...
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Emerging Trends in Urban Air Mobility: An Extensive Review,
Tripaldi, F., Vianello, S., and Bianchi, N., “Emerging Trends in Urban Air Mobility: An Extensive Review,”Energies, Vol. 18, No. 6, 2025, p. 1426
work page 2025
-
[2]
Fast-Forwarding to a Future of On-Demand Urban Air Transportation, Uber Elevate, October 2016. URL https://d1nyezh1ys8wfo.cloudfront.net/static/PDFs/Elevate%2BWhitepaper.pdf?uclick_id=a12a5e10- ccfe-4b20-b2b7-b13a6485bd26. [3]Concept of Operations for Uncrewed Urban Air Mobility, Boeing, 2023. URLhttps://wisk.aero/conops/
work page 2016
-
[3]
UTM Concept of Operations Version 2.0 (UTM ConOps v2.0), FAA, 2020. URL https://www.faa.gov/ researchdevelopment/trafficmanagement/utm-concept-operations-version-20-utm-conops-v20
work page 2020
-
[4]
Urban Aviation: The Future Aerospace Transportation System for Intercity and Intracity Mobility,
Wild, G., “Urban Aviation: The Future Aerospace Transportation System for Intercity and Intracity Mobility,”Urban Science, Vol. 8, No. 4, 2024. https://doi.org/10.3390/urbansci8040218, URL https://www.mdpi.com/2413-8851/8/4/218
-
[5]
Avionics of Electric Vertical Take-off and Landing in the Urban Air Mobility: A Review,
Zhou, Q., and Tan, F., “Avionics of Electric Vertical Take-off and Landing in the Urban Air Mobility: A Review,”IEEE Aerospace and Electronic Systems Magazine, 2024, pp. 1–26. https://doi.org/10.1109/MAES.2024.3488655
-
[6]
Robust environmental life cycle assessment of electric VTOL concepts for urban air mobility,
André, N., and Hajek, M., “Robust environmental life cycle assessment of electric VTOL concepts for urban air mobility,” AIAA aviation 2019 forum, 2019, p. 3473
work page 2019
-
[7]
Advisory Circular, Subject: Type Certification—Powered-lift, AC No: 21.17-4, United States Department of Transportation, Federal Aviation Administration, July 2025
work page 2025
-
[8]
FAA Drone and AAM Symposium Remarks,
Thomson, K., “FAA Drone and AAM Symposium Remarks,” FAA Drone and AAM Symposium, Baltimore, Maryland, July 30 2024
work page 2024
-
[9]
Minimum-TimeTrajectoryGenerationofeVTOLinLow-Speed Phase: Application in Control Law Design,
Wang,M.,Chu,N.,Bhardwaj,P.,Zhang,S.,andHolzapfel,F.,“Minimum-TimeTrajectoryGenerationofeVTOLinLow-Speed Phase: Application in Control Law Design,”IEEE Transactions on Aerospace and Electronic Systems, Vol. 59, No. 2, 2023, pp. 1260–1275. https://doi.org/10.1109/TAES.2022.3198033
-
[10]
IEEE Transactions on Intelligent Vehicles pp
Wei, H., Lou, B., Zhang, Z., Liang, B., Wang, F.-Y., and Lv, C., “Autonomous Navigation for eVTOL: Review and Future Perspectives,”IEEE Transactions on Intelligent Vehicles, Vol. 9, No. 2, 2024, pp. 4145–4171. https://doi.org/10.1109/TIV.2024. 3352613
-
[11]
Yeh, S.-T., and Du, X., “Transfer-Learning-Enhanced Regression Generative Adversarial Networks for Optimal eVTOL Takeoff Trajectory Prediction,”Electronics, Vol. 13, No. 10, 2024, p. 1911
work page 2024
-
[12]
Surrogate-Based Multidisciplinary Optimization for the Takeoff Trajectory Design of Electric Drones,
Sisk, S., and Du, X., “Surrogate-Based Multidisciplinary Optimization for the Takeoff Trajectory Design of Electric Drones,” Processes, Vol. 12, No. 9, 2024. https://doi.org/10.3390/pr12091864, URL https://www.mdpi.com/2227-9717/12/9/1864
-
[13]
Tilt-wing eVTOL takeoff trajectory optimization,
Chauhan, S. S., and Martins, J. R., “Tilt-wing eVTOL takeoff trajectory optimization,”Journal of Aircraft, Vol. 57, No. 1, 2020, pp. 93–112
work page 2020
-
[14]
dymos: A Python package for optimal control of multidisciplinary systems,
Falck, R., Gray, J. S., Ponnapalli, K., and Wright, T., “dymos: A Python package for optimal control of multidisciplinary systems,”Journal of Open Source Software, Vol. 6, No. 59, 2021, p. 2809. https://doi.org/10.21105/joss.02809, URL https://doi.org/10.21105/joss.02809. 11
-
[15]
https://doi.org/10.2514/6.2025-3800, URL https://arc.aiaa.org/doi/abs/10
Roberts,N.M.,andDu,X.,DeepReinforcementLearningforOptimalTakeoffTrajectoryDesignofaneVTOLDrone,American Institute of Aeronautics and Astronautics, inc., 2025. https://doi.org/10.2514/6.2025-3800, URL https://arc.aiaa.org/doi/abs/10. 2514/6.2025-3800
-
[16]
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I., “Attention is all you need,”Proceedings of the 31st International Conference on Neural Information Processing Systems, Curran Associates Inc., Red Hook, NY, USA, 2017, p. 6000–6010
work page 2017
-
[17]
Trans- formers in Time Series: A Survey,
Wen, Q., Zhou, T., Zhang, C., Chen, W., Ma, Z., Yan, J., and Sun, L., “Transformers in time series: A survey,”arXiv preprint arXiv:2202.07125, 2022
-
[18]
URL https://acubed.airbus.com/blog/vahana/, [Online; accessed in 2025]
Airbus, “Vahana,” , 2016. URL https://acubed.airbus.com/blog/vahana/, [Online; accessed in 2025]
work page 2016
-
[19]
Horizontal axis wind turbine post stall airfoil characteristics synthesization,
Tangler, J. L., and Ostowari, C., “Horizontal axis wind turbine post stall airfoil characteristics synthesization,” Tech. rep., Solar Energy Research Inst., Golden, CO (United States), 1991
work page 1991
-
[20]
Young, J. D., “Propeller at high incidence,”Journal of Aircraft, Vol. 2, No. 3, 1965, pp. 241–250
work page 1965
-
[21]
Reinforcement learning: a survey,
Kaelbling, L. P., Littman, M. L., and Moore, A. W., “Reinforcement learning: a survey,”J. Artif. Int. Res., Vol. 4, No. 1, 1996, p. 237–285
work page 1996
-
[22]
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
Haarnoja, T., Zhou, A., Abbeel, P., and Levine, S., “Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor,” , 2018. URL https://arxiv.org/abs/1801.01290
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[23]
OpenAI, “Soft Actor-Critic,” , 2018. URL https://spinningup.openai.com/en/latest/algorithms/sac.html, accessed online, 2025
work page 2018
-
[24]
Stable-Baselines3: Reliable Reinforcement Learning Implementations,
Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M., and Dormann, N., “Stable-Baselines3: Reliable Reinforcement Learning Implementations,”Journal of Machine Learning Research, Vol. 22, No. 268, 2021, pp. 1–8. URL http://jmlr.org/ papers/v22/20-1364.html
work page 2021
-
[25]
Yeh, S.-T., and Du, X., “Optimal Tilt-Wing eVTOL Takeoff Trajectory Prediction Using Regression Generative Adversarial Networks,”Mathematics, Vol. 12, No. 1, 2023. https://doi.org/10.3390/math12010026, URL https://www.mdpi.com/2227- 7390/12/1/26
-
[26]
Gymnasium: A Standard Interface for Reinforcement Learning Environments
Towers, M., Kwiatkowski, A., Terry, J., Balis, J. U., Cola, G. D., Deleu, T., Goulão, M., Kallinteris, A., Krimmel, M., KG, A., Perez-Vicente, R., Pierré, A., Schulhoff, S., Tai, J. J., Tan, H., and Younis, O. G., “Gymnasium: A Standard Interface for Reinforcement Learning Environments,” , 2025. URL https://arxiv.org/abs/2407.17032
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[27]
Reward shaping in multiagent reinforcement learning for self-organizing systems in assembly tasks,
Huang, B., and Jin, Y., “Reward shaping in multiagent reinforcement learning for self-organizing systems in assembly tasks,” Advanced Engineering Informatics, Vol. 54, 2022, p. 101800. https://doi.org/https://doi.org/10.1016/j.aei.2022.101800, URL https://www.sciencedirect.com/science/article/pii/S1474034622002580. 12
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.