pith. machine review for the scientific record. sign in

arxiv: 2605.11509 · v1 · submitted 2026-05-12 · 💻 cs.AI · cs.LG· cs.MA· cs.SY· eess.SY

Recognition: 2 theorem links

· Lean Theorem

Hierarchical LLM-Driven Control for HAPS-Assisted UAV Networks: Joint Optimization of Flight and Connectivity

Authors on Pith no claims yet

Pith reviewed 2026-05-13 01:18 UTC · model grok-4.3

classification 💻 cs.AI cs.LGcs.MAcs.SYeess.SY
keywords UAV networksHAPS-assisted networksLLM-driven controlhierarchical optimizationjoint flight and connectivitymulti-UAV systemsPOMDPaerial highways
0
0 comments X

The pith

An LLM-powered hierarchical controller jointly optimizes UAV flight paths and wireless connectivity in networks with high-altitude platforms.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a control system for multiple UAVs that must simultaneously avoid collisions, maintain efficient traffic flow, and sustain reliable communications while operating in a mixed ground and aerial network. It formulates the task as a hierarchical multi-objective partially observable Markov decision process and solves it using large language models at different time scales: one global LLM on the high-altitude platform for overall planning and local LLMs combined with reinforcement learning on each UAV for immediate actions. Experiments in a realistic three-dimensional simulator show gains of 14 percent in transportation efficiency, 25 percent in data throughput, and 23 percent fewer collisions compared to prior methods, along with stable handovers and the ability to handle new situations without retraining. Readers would care because future drone fleets for logistics or monitoring need both safe movement and dependable links, yet most existing controllers treat motion and communication separately.

Core claim

The authors claim that their LLM-driven hierarchical multi-rate control framework, derived from the H-MO-POMDP model, successfully couples long-term global planning on HAPS with fast local control on UAVs to achieve joint optimization of motion and connectivity objectives under partial observability.

What carries the argument

The hierarchical multi-rate control framework that uses an LLM-based controller on the HAPS for global load balancing and handover decisions together with hybrid LLM and reinforcement learning controllers on individual UAVs for local spatial reasoning and U2I communication.

If this is right

  • The proposed framework increases transportation efficiency by 14 percent over state-of-the-art baselines.
  • It improves telecommunication throughput by 25 percent.
  • It reduces physical collision rates by 23 percent.
  • It maintains strong handover stability and exhibits zero-shot generalization in changing environments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar hierarchical LLM structures might apply to other multi-agent systems that require both high-level strategy and low-level execution, such as autonomous vehicle fleets or robotic swarms.
  • The separation of slow LLM reasoning from fast RL control may reduce the computational burden on resource-limited drones.
  • Extending the global planning layer to coordinate with terrestrial base stations could further improve overall network performance.

Load-bearing premise

The high-fidelity 3D simulation platform accurately represents real-world UAV flight dynamics, communication conditions, and the reasoning abilities of large language models.

What would settle it

Deploying the controllers on physical UAVs in an outdoor setting with live radio measurements and verifying whether the efficiency, throughput, and collision-reduction benefits persist.

Figures

Figures reproduced from arXiv: 2605.11509 by Halim Yanikomeroglu, Hao Zhou, Hina Tabassum, Jianhua Pei, Ping Wang, Wael Jaafar, Zijiang Yan.

Figure 1
Figure 1. Figure 1: 3D HAPS-assisted UAV network model. {1, 2, . . . , M}, operate within an ITNTN infrastructure. This infrastructure comprises a set of B terrestrial base stations (TBSs), denoted by B = {1, 2, . . . , B}, and a high-altitude platform station (HAPS) node denoted by H. The specific mission and communication requirements of the UAVs are defined as follows: (a) Heterogeneous 3D Transit Missions: Rather than act… view at source ↗
Figure 2
Figure 2. Figure 2: The generative AI-driven hierarchical control paradigm for multi-UAV networks in an ITNTN environment. [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: LLM-based HAPS Meta-controller Decision Process, illustrating the synthesis of the static role, dynamic observation [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The architectural framework of the proposed UAV edge-agent. [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Comprehensive 20-second simulation profiles of two UAVs navigating an integrated terrestrial and non-terrestrial network [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: 3D visualizations of the ITNTN network topology under varying traffic densities. (a) [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Training phase convergence comparison of the proposed hierarchical generative AI framework. [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Scalability evaluation comparing operational penalties and safety metrics under varying 3D UAV traffic densities. [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗
read the original abstract

Uncrewed aerial vehicles (UAVs) are increasingly deployed in complex networked environments, yet the joint optimization of multi-UAV motion control and connectivity remains a fundamental challenge. In this paper, we study a multi-UAV system operating in an integrated terrestrial and non-terrestrial network (ITNTN) comprising terrestrial base stations and high-altitude platform stations (HAPS). We consider a three-dimensional (3D) aerial highway scenario where UAVs must adapt their motion to ensure collision avoidance, efficient traffic flow, and reliable communication under dynamic and partially observable conditions. We first model the problem as a hierarchical multi-objective partially observable Markov decision process (H-MO-POMDP), capturing the coupling between control and communication objectives. Based on this formulation, we propose a large language model (LLM)-driven hierarchical multi-rate control framework. At the global level, an LLM-based controller on the HAPS performs long-term planning for load balancing and handover decisions. At the local level, each UAV employs a hybrid controller that integrates a slow-timescale LLM for high-level spatial reasoning with a reinforcement learning agent for faster UAV-to-infrastructure (U2I) communication and motion control. We further develop a high-fidelity 3D simulation platform by integrating the gym-pybullet-drones environment with 3GPP-compliant RF/THz channel models. Numerical results demonstrate that the proposed framework significantly outperforms state-of-the-art baselines, achieving a 14% increase in transportation efficiency and a 25% improvement in telecommunication throughput. Additionally, it achieves a 23% reduction in physical collision rates, demonstrating strong handover stability and zero-shot generalization in dynamic scenarios.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper models joint UAV motion control and connectivity optimization in an ITNTN with HAPS as a hierarchical multi-objective POMDP (H-MO-POMDP). It proposes an LLM-driven hierarchical multi-rate framework with a global LLM planner on the HAPS for load balancing/handover and local hybrid LLM+RL controllers on each UAV for spatial reasoning and U2I control. A custom simulation platform integrates gym-pybullet-drones with 3GPP RF/THz models; numerical results claim 14% higher transportation efficiency, 25% higher throughput, and 23% lower collision rates versus baselines, plus handover stability and zero-shot generalization.

Significance. If the simulation faithfully reproduces partial observability, communication delays, and LLM reasoning timescales, the work offers a concrete architecture for coupling long-horizon LLM planning with fast RL control in aerial networks. The multi-rate decomposition and H-MO-POMDP formulation are reasonable modeling choices that could inform future integrated control-communication designs, though the absence of code release or hardware validation limits immediate reproducibility and impact.

major comments (3)
  1. [Numerical Results] Numerical Results section: the headline gains (14% transportation efficiency, 25% throughput, 23% collision reduction) are presented without naming the state-of-the-art baselines, their hyper-parameters, number of random seeds, error bars, or statistical significance tests. This information is load-bearing for the central claim that the hierarchical LLM controller is responsible for the reported deltas.
  2. [Simulation Platform] Simulation Platform and Evaluation sections: no ablation disables the LLM components while retaining the RL motion controller, no comparison against perfect-information oracles, and no sensitivity analysis to LLM prompt wording or inference latency. Because all quantitative claims rest on the gym-pybullet-drones + 3GPP platform faithfully implementing the partial observability assumed by the H-MO-POMDP, these omissions prevent attribution of gains to the proposed framework rather than simulator artifacts.
  3. [Problem Formulation] Problem Formulation and Framework sections: the H-MO-POMDP is introduced as the modeling foundation, yet the manuscript provides no derivation showing how the hierarchical multi-rate structure reduces the POMDP to tractable sub-problems or how the LLM policies are formally mapped onto the action spaces; without this, it is unclear whether the numerical improvements follow from the model or from ad-hoc engineering choices.
minor comments (2)
  1. [Abstract] The abstract and introduction refer to 'state-of-the-art baselines' without citation or brief description; adding one sentence naming the closest prior RL or optimization methods would improve readability.
  2. [Figures] Figure captions for the simulation environment and architecture diagrams should explicitly state the timescales (global vs. local) and the exact interface between LLM outputs and RL actions to avoid ambiguity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. The comments identify key areas where additional details and analyses will strengthen the presentation and attribution of results. We address each major comment below and commit to revisions that enhance clarity and rigor without altering the core contributions.

read point-by-point responses
  1. Referee: [Numerical Results] Numerical Results section: the headline gains (14% transportation efficiency, 25% throughput, 23% collision reduction) are presented without naming the state-of-the-art baselines, their hyper-parameters, number of random seeds, error bars, or statistical significance tests. This information is load-bearing for the central claim that the hierarchical LLM controller is responsible for the reported deltas.

    Authors: We agree that these details are necessary to substantiate the performance claims. In the revised manuscript, we will explicitly name and reference the state-of-the-art baselines (including pure RL, heuristic, and non-hierarchical LLM variants), provide complete hyper-parameter tables, report all metrics averaged over 10 independent random seeds with standard error bars, and include statistical significance tests (e.g., paired t-tests with p-values) to confirm the deltas are significant. This will directly support attribution to the proposed hierarchical framework. revision: yes

  2. Referee: [Simulation Platform] Simulation Platform and Evaluation sections: no ablation disables the LLM components while retaining the RL motion controller, no comparison against perfect-information oracles, and no sensitivity analysis to LLM prompt wording or inference latency. Because all quantitative claims rest on the gym-pybullet-drones + 3GPP platform faithfully implementing the partial observability assumed by the H-MO-POMDP, these omissions prevent attribution of gains to the proposed framework rather than simulator artifacts.

    Authors: We acknowledge the need for these controls to isolate contributions. We will add an ablation that disables LLM components while retaining the RL motion controller, include comparisons against perfect-information oracles (by relaxing partial observability in controlled simulation runs), and perform sensitivity analyses on prompt variations and inference latency (by testing multiple prompt templates and emulating latency ranges). These will be presented in an expanded Evaluation section to demonstrate that gains arise from the hierarchical LLM-RL integration rather than platform specifics. revision: yes

  3. Referee: [Problem Formulation] Problem Formulation and Framework sections: the H-MO-POMDP is introduced as the modeling foundation, yet the manuscript provides no derivation showing how the hierarchical multi-rate structure reduces the POMDP to tractable sub-problems or how the LLM policies are formally mapped onto the action spaces; without this, it is unclear whether the numerical improvements follow from the model or from ad-hoc engineering choices.

    Authors: We agree that a formal derivation would improve rigor and clarify the link between model and results. In the revision, we will insert a new subsection deriving the decomposition of the H-MO-POMDP into multi-rate sub-problems (global long-horizon vs. local fast-timescale) and explicitly mapping LLM-generated high-level decisions to the corresponding action spaces, showing how this structure yields tractability and the observed gains. This will address concerns about ad-hoc choices. revision: yes

Circularity Check

0 steps flagged

No circularity: new H-MO-POMDP formulation and LLM controller yield simulation results without self-referential reduction

full rationale

The paper presents a novel hierarchical multi-objective POMDP (H-MO-POMDP) model for joint UAV motion and connectivity optimization, followed by a proposed LLM-driven multi-rate controller architecture and a custom 3D simulation platform. Performance deltas (14% efficiency, 25% throughput, 23% collision reduction) are reported as outputs of this new construction evaluated in the simulator. No equations, fitted parameters, or claims are shown to reduce by construction to the inputs or to prior self-citations; the derivation chain from problem formulation to controller design to numerical evaluation remains independent and self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard domain assumptions in POMDP modeling and communication channels plus the unverified fidelity of the simulation; no explicit free parameters or new entities are named in the abstract.

axioms (1)
  • domain assumption The UAV system dynamics and communication environment can be accurately captured by a hierarchical multi-objective partially observable Markov decision process (H-MO-POMDP)
    This modeling choice underpins the entire control framework and is stated as the starting point in the abstract.

pith-pipeline@v0.9.0 · 5639 in / 1374 out tokens · 137770 ms · 2026-05-13T01:18:31.482650+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages · 1 internal anchor

  1. [1]

    Deep learning for channel tracking in irs-assisted UA V communication systems,

    J. Yuet al., “Deep learning for channel tracking in irs-assisted UA V communication systems,”IEEE Trans. Wireless Commun., vol. 21, no. 9, pp. 7711–7722, 2022

  2. [2]

    3D aerial highway: The key enabler of the retail industry transformation,

    N. Cherifet al., “3D aerial highway: The key enabler of the retail industry transformation,”IEEE Commun. Mag., vol. 59, no. 9, pp. 65– 71, 2021

  3. [3]

    Hierarchical and Collaborative LLM-Based Control for Multi-UA V Motion and Communication in Integrated Terrestrial and Non-Terrestrial Networks,

    Z. Yanet al., “Hierarchical and Collaborative LLM-Based Control for Multi-UA V Motion and Communication in Integrated Terrestrial and Non-Terrestrial Networks,”arXiv preprint arXiv:2506.06532, 2025

  4. [4]

    HAPS-ITS: Enabling future its services in trans-continental highways,

    W. Jaafar and H. Yanikomeroglu, “HAPS-ITS: Enabling future its services in trans-continental highways,”IEEE Commun. Mag., vol. 60, no. 10, pp. 80–86, 2022

  5. [5]

    Caching and computation offloading in high altitude platform station (HAPS) assisted intelligent transportation systems,

    Q. Renet al., “Caching and computation offloading in high altitude platform station (HAPS) assisted intelligent transportation systems,” IEEE Trans. Wireless Commun., vol. 21, no. 11, pp. 9010–9024, 2022

  6. [6]

    Energy-efficient vehicular task offloading using multi- mode MEC and RIS-equipped aerial platforms,

    I. Rziget al., “Energy-efficient vehicular task offloading using multi- mode MEC and RIS-equipped aerial platforms,”IEEE Op. J. of the Commun. Soc., vol. 6, pp. 7604–7619, 2025

  7. [7]

    Multi-UA V speed control with collision avoidance and handover-aware cell association: DRL with action branching,

    Z. Yanet al., “Multi-UA V speed control with collision avoidance and handover-aware cell association: DRL with action branching,” inProc. IEEE Global Commun. Conf. (GLOBECOM), 2023, pp. 5067–5072

  8. [8]

    RL-based cargo-UA V trajectory planning and cell association for minimum handoffs, disconnectivity, and energy consump- tion,

    N. Cherifet al., “RL-based cargo-UA V trajectory planning and cell association for minimum handoffs, disconnectivity, and energy consump- tion,”IEEE Trans. Veh. Technol., vol. 73, no. 5, pp. 7304–7309, 2024

  9. [9]

    Design considerations for autonomous cargo trans- portation multirotor UA Vs,

    D. Kotarskiet al., “Design considerations for autonomous cargo trans- portation multirotor UA Vs,” inSelf-Driving Vehicles and Enabling Technologies, ser. Artificial Intelligence, V olume 6, M. G ˘aiceanu and A. Engelbrecht, Eds. London, U.K.: IntechOpen, 2021

  10. [10]

    BDFL: A byzantine-fault-tolerance decentralized federated learning method for autonomous vehicle,

    J.-H. Chenet al., “BDFL: A byzantine-fault-tolerance decentralized federated learning method for autonomous vehicle,”IEEE Trans. Veh. Technol., vol. 70, no. 9, pp. 8639–8652, 2021

  11. [11]

    3D trajectory optimization for energy-efficient UA V communication: A control design perspective,

    B. Liet al., “3D trajectory optimization for energy-efficient UA V communication: A control design perspective,”IEEE Trans. Wireless Commun., vol. 21, no. 6, pp. 4579–4593, 2021

  12. [12]

    Game of drones: Multi-UA V pursuit-evasion game with online motion planning by deep reinforcement learning,

    R. Zhanget al., “Game of drones: Multi-UA V pursuit-evasion game with online motion planning by deep reinforcement learning,”IEEE Trans. Neural Netw. Learn. Syst., vol. 34, no. 10, pp. 7900–7909, 2022

  13. [13]

    3D multi-UA V cooperative velocity-aware motion plan- ning,

    Y . Huet al., “3D multi-UA V cooperative velocity-aware motion plan- ning,”Future Generation Computer Systems, vol. 102, pp. 762–774, 2020

  14. [14]

    Disconnectivity-aware energy-efficient cargo-UA V trajectory planning with minimum handoffs,

    N. Cherifet al., “Disconnectivity-aware energy-efficient cargo-UA V trajectory planning with minimum handoffs,” inProc. IEEE Int. Conf. Commun. (ICC), 2021, pp. 1–6

  15. [15]

    A novel UA V-enabled data collection scheme for intelligent transportation system through UA V speed control,

    X. Liet al., “A novel UA V-enabled data collection scheme for intelligent transportation system through UA V speed control,”IEEE Trans. Intelli. Transport. Syst., vol. 22, no. 4, pp. 2100–2110, 2020

  16. [16]

    Unmanned aerial vehicles (UA Vs): Collision avoid- ance systems and approaches,

    J. N. Yasinet al., “Unmanned aerial vehicles (UA Vs): Collision avoid- ance systems and approaches,”IEEE Access, vol. 8, pp. 105 139– 105 155, 2020

  17. [17]

    UA V-based cargo-UA V trajectory planning and cell association for minimum handoffs, disconnectivity, and energy consump- tion,

    N. Cherifet al., “UA V-based cargo-UA V trajectory planning and cell association for minimum handoffs, disconnectivity, and energy consump- tion,”IEEE Trans. Veh. Technol., vol. 73, no. 5, pp. 7304–7309, 2023

  18. [18]

    Efficient drone mobility support using reinforcement learning,

    Y . Chenet al., “Efficient drone mobility support using reinforcement learning,” inProc. IEEE Wireless Commun. Network. Conf. (WCNC), 2020, pp. 1–6

  19. [19]

    CVaR-based variational quantum optimization for user association in handoff-aware vehicular networks,

    Z. Yanet al., “CVaR-based variational quantum optimization for user association in handoff-aware vehicular networks,” inProc. IEEE Int. Conf. Commun. (ICC), 2025, pp. 6088–6093

  20. [20]

    Optimization of speed and network deployment for reliable V2I communication in the presence of handoffs and interference,

    H. Shoaib and H. Tabassum, “Optimization of speed and network deployment for reliable V2I communication in the presence of handoffs and interference,”IEEE Wireless Commun. Lett., vol. 12, no. 6, pp. 1051–1055, 2023

  21. [21]

    Handoff-aware distributed computing in high altitude platform station (HAPS)–assisted vehicular networks,

    Q. Renet al., “Handoff-aware distributed computing in high altitude platform station (HAPS)–assisted vehicular networks,”IEEE Trans. Wireless Commun., vol. 22, no. 12, pp. 8814–8827, 2023

  22. [22]

    Dynamic optimization of vehicle production planning in transportation networks using federated reinforcement learning,

    J. Chenet al., “Dynamic optimization of vehicle production planning in transportation networks using federated reinforcement learning,”IEEE Trans. Intelli. Transport. Syst., vol. 27, no. 2, pp. 2528–2540, 2026

  23. [23]

    Hybrid LLM-DDQN-Based Joint Optimization of V2I Communication and Autonomous Driving,

    Z. Yanet al., “Hybrid LLM-DDQN-Based Joint Optimization of V2I Communication and Autonomous Driving,”IEEE Wireless Commun. Lett., vol. 14, no. 4, pp. 1214–1218, 2025. 15

  24. [24]

    Quadcopter trajectory generation based on large language model,

    N. Sutraet al., “Quadcopter trajectory generation based on large language model,” inProc. IEEE Int. Conf. Artif. Intell. Mechatron. Syst. (AIMS), 2025

  25. [25]

    REAL: Resilience and adaptation using large language models on autonomous aerial robots,

    A. Tagliabueet al., “REAL: Resilience and adaptation using large language models on autonomous aerial robots,” inProc. IEEE Conf. Decis. Control (CDC). IEEE, 2024

  26. [26]

    LLM-based decision making framework for au- tonomous drone navigation,

    M. A. Baiget al., “LLM-based decision making framework for au- tonomous drone navigation,” inProc. IEEE Int. Conf. AI Cybersecurity (ICAIC), 2026

  27. [27]

    A robust aggregation of federated large language models for multimodal knowledge discovery in computational social systems,

    J. Chenet al., “A robust aggregation of federated large language models for multimodal knowledge discovery in computational social systems,” IEEE Trans. Comput. Social Syst., vol. 12, no. 6, pp. 5433–5448, 2025

  28. [28]

    Design, implementation, and deployment of multi-task neural networks in programmable data-planes,

    K. Zhanget al., “Design, implementation, and deployment of multi-task neural networks in programmable data-planes,”IEEE Trans. Netw. Serv. Manag., vol. 23, pp. 740–755, 2025

  29. [29]

    LLM-enabled in-context learning for data collection scheduling in UA V-assisted sensor networks,

    Y . Emamiet al., “LLM-enabled in-context learning for data collection scheduling in UA V-assisted sensor networks,”IEEE Internet of Things J., vol. 12, no. 23, pp. 51 664–51 676, 2025

  30. [30]

    Research on the construction and resource optimization of a UA V command information system based on large language models,

    S. Hanet al., “Research on the construction and resource optimization of a UA V command information system based on large language models,” Drones, vol. 9, no. 9, 2025

  31. [31]

    Large language model-assisted uav operations and communications: A multifaceted survey and tutorial,

    Y . Emami, H. Zhou, R. Reddy, A. H. Arani, B. Wang, K. Li, L. Almeida, and Z. Han, “Large language model-assisted uav operations and communications: A multifaceted survey and tutorial,”arXiv preprint arXiv:2602.19534, 2026

  32. [32]

    Enhancing large language models (LLMs) for telecom using dynamic knowledge graphs and explainable retrieval-augmented generation,

    D. Yuanet al., “Enhancing large language models (LLMs) for telecom using dynamic knowledge graphs and explainable retrieval-augmented generation,”IEEE Wireless Commun., Early Access, pp. 1–9, 2026

  33. [33]

    Reinforcement learning for joint V2I network selection and autonomous driving policies,

    Z. Yan and H. Tabassum, “Reinforcement learning for joint V2I network selection and autonomous driving policies,” inProc. IEEE Global Commun. Conf. (GLOBECOM), 2022, pp. 1241–1246

  34. [34]

    Energy efficient and AoI-aware resource allocation for UA V-assisted its networks,

    F. Yanget al., “Energy efficient and AoI-aware resource allocation for UA V-assisted its networks,”IEEE Trans. Veh. Technol., vol. 75, no. 4, pp. 6141–6156, 2026

  35. [35]

    Action branching architectures for deep reinforce- ment learning,

    A. Tavakoliet al., “Action branching architectures for deep reinforce- ment learning,” inProc. AAAI Conf. Artif. Intell. (AAAI), 2018, pp. 4131–4138

  36. [36]

    Learning to fly—a gym environment with pybullet physics for reinforcement learning of multi-agent quadcopter control,

    J. Paneratiet al., “Learning to fly—a gym environment with pybullet physics for reinforcement learning of multi-agent quadcopter control,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS), 2021, pp. 7512– 7519

  37. [37]

    Study on Enhanced LTE Support for Aerial Vehicles (Release 15), TR 36.777,

    3GPP, “Study on Enhanced LTE Support for Aerial Vehicles (Release 15), TR 36.777,” 3rd Generation Partnership Project (3GPP), Technical Report TR 36.777, Jun. 2018, release

  38. [38]

    Available: https://portal.3gpp.org/desktopmodules/ Specifications/SpecificationDetails.aspx?specificationId=3231

    [Online]. Available: https://portal.3gpp.org/desktopmodules/ Specifications/SpecificationDetails.aspx?specificationId=3231

  39. [39]

    Cellular-connected UA V in next-generation wireless net- works,

    N. Cherif, “Cellular-connected UA V in next-generation wireless net- works,” Ph.D. dissertation, Univ. Ottawa, 2022

  40. [40]

    Improvement of the global connectiv- ity using integrated satellite-airborne-terrestrial networks with resource optimization,

    A. Alsharoa and M.-S. Alouini, “Improvement of the global connectiv- ity using integrated satellite-airborne-terrestrial networks with resource optimization,”IEEE Trans. Wireless Commun., vol. 19, no. 8, pp. 5088– 5100, 2020

  41. [41]

    System identification of the crazyflie 2.0 nano quadrocopter,

    J. F ¨orster, “System identification of the crazyflie 2.0 nano quadrocopter,” Master’s Thesis, ETH Zurich, 2015

  42. [42]

    Quadrotor kinematics and dynamics,

    C. Powerset al., “Quadrotor kinematics and dynamics,” inSpringer Handbook of Robot.Springer Netherlands, 2015, pp. 307–328

  43. [43]

    Neural lander: Stable drone landing control using learned dynamics,

    G. Shiet al., “Neural lander: Stable drone landing control using learned dynamics,” inProc. IEEE Int. Conf. Robot. Autom. (ICRA), 2019, pp. 9784–9790

  44. [44]

    Generalized multi-objective reinforcement learning with envelope updates in URLLC-enabled vehicular networks,

    Z. Yan and H. Tabassum, “Generalized multi-objective reinforcement learning with envelope updates in URLLC-enabled vehicular networks,” IEEE Trans. Veh. Technol., vol. 74, no. 11, pp. 17 666–17 682, 2025

  45. [45]

    Ollama: Run large language models locally,

    Ollama, “Ollama: Run large language models locally,” https://ollama. com, 2024, accessed: 2026-04-08

  46. [46]

    Qwen3 Technical Report

    A. Yanget al., “Qwen3 technical report,”arXiv preprint arXiv:2505.09388, 2025