pith. machine review for the scientific record. sign in

arxiv: 2605.04436 · v1 · submitted 2026-05-06 · 💻 cs.NI · cs.AI

Recognition: unknown

Joint Optimization of Trajectory Control, Resource Allocation, and Task Offloading for Multi-UAV-Assisted IoV

Cui Zhang, Khaled B. Letaief, Maoxin Ji, Nan Cheng, Pingyi Fan, Qiong Wu, Wen Chen

Pith reviewed 2026-05-08 17:23 UTC · model grok-4.3

classification 💻 cs.NI cs.AI
keywords multi-UAV IoVtask offloadingtrajectory optimizationresource allocationDRL-LLM hybriddelay minimizationenergy efficiency
0
0 comments X

The pith

Decoupling a joint optimization into trajectory planning, DRL-LLM resource scheduling, and linear offloading improves task success and cuts delay plus energy in multi-UAV IoV.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a method to minimize delay and energy in vehicle task offloading assisted by multiple UAVs in dense cities. It splits the hard coupled problem into a sequence of solvable pieces: cone programming sets each UAV's 3D path for coverage, a mix of reinforcement learning and language models allocates resources while fixing imbalances, and linear programming sets exact offloading shares. The language model corrects long-tail problems the learning agent misses, and a reward split keeps the learning stable despite the corrections. Simulations show the full loop beats standard multi-agent reinforcement learning on completed tasks and overall efficiency. If correct, the approach makes aerial-assisted vehicle computing more workable under tight urban constraints.

Core claim

The central claim is that the non-convex joint problem of UAV trajectories, resource allocation, and task offloading can be solved to near-optimality by a hierarchical alternating framework: SOCP optimizes 3D UAV paths for adaptive coverage, a DRL agent sets initial resource blocks while an LLM semantic scheduler corrects failed and surplus allocations with a reward decoupling mechanism to preserve convergence, and LP determines precise offloading ratios, yielding higher task success rates and lower system delay and energy than multi-agent RL baselines in simulation.

What carries the argument

The hierarchical execution framework that decouples SOCP-based 3D trajectory control, DRL-LLM hybrid resource scheduling with reward decoupling for imbalance correction, and LP-based offloading ratio optimization in an alternating loop.

Load-bearing premise

The non-convex joint optimization can be split into these three subproblems while keeping near-optimal performance under the original coupling constraints, and the simulation model matches real dense-urban vehicle and UAV dynamics.

What would settle it

A simulation run with realistic vehicle mobility traces and UAV energy limits in which the proposed method shows no improvement in task success rate or total delay-energy over multi-agent reinforcement learning baselines under identical densities and constraints.

read the original abstract

This paper investigates a multi-Unmanned Aerial Vehicle (UAV) joint base station-assisted Internet of Vehicles (IoV) task offloading system in dense urban environments. To minimize system delay and energy consumption under strict coupling constraints, the complex non-convex optimization problem is decoupled into a hierarchical execution framework. First, a sequential distributed optimization algorithm based on Second-Order Cone Programming (SOCP) is proposed to optimize the 3D flight trajectory of each UAV, ensuring adaptive network coverage. Second, a novel hybrid resource scheduling paradigm synergizing Deep Reinforcement Learning (DRL) and Large Language Models (LLMs) is developed. Within this framework, the DRL agent dictates the initial resource allocation, while the LLM acts as a semantic macro-scheduler to rectify long-tail allocation imbalances for failed and surplus tasks. Crucially, a reward decoupling mechanism is introduced to isolate DRL training from external LLM interventions, thereby ensuring policy convergence. Finally, the task offloading ratios are precisely determined via Linear Programming (LP) within an alternating optimization loop. Simulation results demonstrate that the proposed method significantly outperforms traditional multi-agent reinforcement learning baselines in terms of task success rate and system efficiency.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. This paper investigates joint optimization of 3D UAV trajectory, resource allocation, and task offloading in a multi-UAV-assisted IoV system for dense urban environments. It decouples the non-convex problem into a hierarchical alternating-optimization framework: a distributed SOCP algorithm for sequential trajectory control, a hybrid DRL-LLM resource scheduler (with DRL setting initial allocations and LLM acting as a semantic macro-scheduler for long-tail imbalances, isolated via reward decoupling), and LP for offloading ratios. Simulations claim significant gains over traditional multi-agent RL baselines in task success rate and system efficiency.

Significance. If the decoupling preserves near-optimality under the stated couplings and the reported simulation gains prove robust, the work could meaningfully advance practical UAV-assisted IoV deployments by offering a scalable way to manage complex joint optimizations. The hybrid DRL-LLM approach with explicit reward decoupling is a timely contribution that could influence hybrid AI methods in wireless resource management. The absence of theoretical support for the alternation, however, caps the immediate significance until addressed.

major comments (3)
  1. [Abstract and hierarchical execution framework] The abstract and hierarchical framework description state that the non-convex joint problem is safely decoupled into SOCP trajectory, DRL-LLM resource, and LP offloading subproblems solved in an alternating loop, yet no suboptimality bound, convergence rate for the outer loop, or comparison to a joint solver (even on a relaxed instance) is provided despite the explicit couplings (UAV 3D position affecting instantaneous channel gains and feasible offloading sets; resource decisions feeding back into trajectory utility). This is load-bearing for the central performance claim.
  2. [DRL-LLM hybrid resource scheduling paradigm] The reward decoupling mechanism is introduced to isolate DRL training from external LLM interventions and thereby ensure policy convergence, but no analysis examines whether this isolation prevents bias in the DRL policy gradient updates or preserves the claimed convergence properties under the macro-scheduler corrections.
  3. [Simulation results] Simulation results are reported to demonstrate outperformance in task success rate and system efficiency, but the manuscript supplies no quantitative details on convergence behavior of the alternating loop, sensitivity to the free parameters (DRL reward weights and LLM intervention thresholds), or explicit isolation of LLM effects, leaving the empirical support for the central claim only weakly grounded.
minor comments (2)
  1. [Abstract] The abstract would be strengthened by briefly stating the key simulation parameters (e.g., number of UAVs, vehicle density, channel model) and the precise baseline algorithms used for comparison.
  2. [System model] Notation for the coupling constraints between trajectory variables and instantaneous resource/offloading feasibility could be clarified with an explicit cross-reference table or diagram.

Simulated Author's Rebuttal

3 responses · 2 unresolved

We thank the referee for the thorough and constructive review of our manuscript. We address each major comment point by point below, providing clarifications and indicating the revisions we will incorporate to strengthen the empirical support and discussion of limitations.

read point-by-point responses
  1. Referee: [Abstract and hierarchical execution framework] The abstract and hierarchical framework description state that the non-convex joint problem is safely decoupled into SOCP trajectory, DRL-LLM resource, and LP offloading subproblems solved in an alternating loop, yet no suboptimality bound, convergence rate for the outer loop, or comparison to a joint solver (even on a relaxed instance) is provided despite the explicit couplings (UAV 3D position affecting instantaneous channel gains and feasible offloading sets; resource decisions feeding back into trajectory utility). This is load-bearing for the central performance claim.

    Authors: We agree that theoretical suboptimality bounds or convergence rates for the alternating optimization would strengthen the central claim. However, deriving such guarantees is analytically intractable due to the non-convex couplings, the stochastic elements in DRL, and the semantic interventions from the LLM. In the revised manuscript, we will add empirical evidence including convergence plots for the outer loop across multiple scenarios and a comparison to a centralized joint solver on a small-scale relaxed instance to demonstrate practical near-optimality. A dedicated limitations subsection will explicitly note the absence of theoretical bounds. revision: partial

  2. Referee: [DRL-LLM hybrid resource scheduling paradigm] The reward decoupling mechanism is introduced to isolate DRL training from external LLM interventions and thereby ensure policy convergence, but no analysis examines whether this isolation prevents bias in the DRL policy gradient updates or preserves the claimed convergence properties under the macro-scheduler corrections.

    Authors: The reward decoupling separates the LLM's macro-level corrections from the DRL agent's immediate reward signal to avoid direct interference during training. We acknowledge the lack of formal analysis on gradient bias. The revised version will include ablation experiments comparing policy convergence and performance with and without decoupling, along with a qualitative discussion of potential bias sources. A rigorous theoretical characterization of bias under LLM corrections is beyond the current scope and will be noted as future work. revision: partial

  3. Referee: [Simulation results] Simulation results are reported to demonstrate outperformance in task success rate and system efficiency, but the manuscript supplies no quantitative details on convergence behavior of the alternating loop, sensitivity to the free parameters (DRL reward weights and LLM intervention thresholds), or explicit isolation of LLM effects, leaving the empirical support for the central claim only weakly grounded.

    Authors: We will substantially expand the simulation section in the revision. This includes adding quantitative convergence metrics (e.g., objective value stabilization over alternating iterations), sensitivity analysis tables and figures for DRL reward weights and LLM thresholds, and explicit ablation studies isolating LLM effects through direct comparisons of variants with and without the macro-scheduler. These additions will provide stronger empirical grounding for the performance claims. revision: yes

standing simulated objections not resolved
  • Theoretical suboptimality bounds or convergence rates for the alternating optimization framework
  • Formal analysis of bias in DRL policy gradient updates under LLM macro-scheduler corrections

Circularity Check

0 steps flagged

No significant circularity: hierarchical decoupling and simulation evaluation are self-contained against external baselines.

full rationale

The paper decouples the non-convex joint optimization into SOCP trajectory optimization, DRL-LLM resource allocation with an introduced reward-decoupling mechanism, and LP offloading, then evaluates the resulting method via simulation against independent multi-agent RL baselines. No equation or claim reduces a prediction or result to a fitted parameter by construction, nor does any load-bearing step rely on a self-citation chain that itself assumes the target outcome. The reward decoupling is presented as an explicit design choice to isolate LLM interventions from DRL policy gradients, with convergence asserted as a consequence rather than a tautology. Performance claims rest on empirical outperformance in simulation, which is externally falsifiable and does not collapse to the inputs by definition.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The framework rests on the domain assumption that hierarchical decoupling is valid and on standard simulation validation; no new physical entities or ad-hoc constants are introduced beyond typical DRL hyperparameters.

free parameters (1)
  • DRL reward weights and LLM intervention thresholds
    These must be chosen or tuned to achieve the reported convergence and performance; their specific values are not supplied in the abstract.
axioms (1)
  • domain assumption The original non-convex problem can be decoupled into independent subproblems (trajectory via SOCP, resource via DRL-LLM, offloading via LP) without materially affecting the global optimum.
    Invoked in the abstract as the justification for the hierarchical execution framework.

pith-pipeline@v0.9.0 · 5528 in / 1317 out tokens · 18922 ms · 2026-05-08T17:23:12.611640+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

59 extracted references · 6 canonical work pages · 1 internal anchor

  1. [1]

    Internet of Things for smart cities,

    A. Zanella, N. Bui, A. Castellani, L. Vangelista, and M. Zorzi, “Internet of Things for smart cities,”IEEE Internet Things J., vol. 1, no. 1, pp. 22–32, 2014

  2. [2]

    V2X-assisted distributed computing and control framework for connected and automated CA Vs under ramp merging scenario,

    J. Chu, Q. Wu, P. Fan, W. Chen, K. Wang, N. Cheng, and K. B. Letaief, “V2X-assisted distributed computing and control framework for connected and automated CA Vs under ramp merging scenario,”IEEE Trans. Mobile Comput., Early Access, 2026, doi: https://doi.org/10.1109/ TMC.2026.3650774

  3. [3]

    Trajectory pro- tection schemes based on a gravity mobility model in IoT,

    Q. Wu, H. Liu, C. Zhang, Q. Fan, Z. Li, and K. Wang, “Trajectory pro- tection schemes based on a gravity mobility model in IoT,”Electronics, vol. 8, no. 2, p. 148, 2019

  4. [4]

    Performance modeling and analysis of the AD- HOC MAC protocol for V ANETs,

    Q. Wu and J. Zheng, “Performance modeling and analysis of the AD- HOC MAC protocol for V ANETs,” inProc. IEEE Int. Conf. Commun. (ICC), London, UK, 2015, pp. 3646–3652

  5. [5]

    Performance analysis of IEEE 802.11 p for continuous backoff freezing in IoV ,

    Q. Wu, S. Xia, Q. Fan, and Z. Li, “Performance analysis of IEEE 802.11 p for continuous backoff freezing in IoV ,”Electronics, vol. 8, no. 12, p. 1404, 2019

  6. [6]

    A swarming approach to optimize the one-hop delay in smart driving inter-platoon communications,

    Q. Wu, S. Nie, P. Fan, H. Liu, F. Qiang, and Z. Li, “A swarming approach to optimize the one-hop delay in smart driving inter-platoon communications,”Sensors, vol. 18, no. 10, p. 3307, 2018

  7. [7]

    Performance modeling and analysis of the ADHOC MAC protocol for vehicular networks,

    Q. Wu and J. Zheng, “Performance modeling and analysis of the ADHOC MAC protocol for vehicular networks,”Wireless Netw., vol. 22, no. 3, pp. 799–812, 2016

  8. [8]

    Performance modeling and analysis of IEEE 802.11 DCF based fair channel access for vehicle-to-roadside commu- nication in a non-saturated state,

    Q. Wu and J. Zheng, “Performance modeling and analysis of IEEE 802.11 DCF based fair channel access for vehicle-to-roadside commu- nication in a non-saturated state,”Wireless Netw., vol. 21, no. 1, pp. 1–11, 2015

  9. [9]

    A tutorial on 5G NR V2X communications,

    M. H. C. Garcia, A. Molina-Galan, M. Boban, J. Gozalvez, B. Coll- Perales, T. S ¸ahin, and A. Kousaridas, “A tutorial on 5G NR V2X communications,”IEEE Commun. Surveys Tuts., vol. 23, no. 3, pp. 1972–2026, 2021

  10. [10]

    Interworking of DSRC and cellular network technologies for V2X communications: A survey,

    K. Abboud, H. A. Omar, and W. Zhuang, “Interworking of DSRC and cellular network technologies for V2X communications: A survey,” IEEE Trans. Veh. Technol., vol. 65, no. 12, pp. 9457–9470, 2016

  11. [11]

    Performance modeling of the IEEE 802.11 p EDCA mechanism for V ANET,

    Q. Wu and J. Zheng, “Performance modeling of the IEEE 802.11 p EDCA mechanism for V ANET,” inProc. IEEE Global Commun. Conf. (GLOBECOM), Austin, TX, USA, 2014, pp. 57–63

  12. [12]

    DRL- based optimization for AoI and energy consumption in C-V2X enabled IoV ,

    Z. Zhang, Q. Wu, P. Fan, N. Cheng, W. Chen, and K. B. Letaief, “DRL- based optimization for AoI and energy consumption in C-V2X enabled IoV ,”IEEE Trans. Green Commun. Netw., early access, 2025

  13. [13]

    Study on refined deployment of wireless mesh sensor network,

    J. Fan, S. Yin, Q. Wu, and F. Gao, “Study on refined deployment of wireless mesh sensor network,” inProc. 6th Int. Conf. Wireless Commun. Netw. Mobile Comput. (WiCOM), Chengdu, China, 2010, pp. 1–5

  14. [14]

    Optimal cooperative beamforming design for MIMO decode-and-forward relay channels,

    K. Xiong, P. Fan, Z. Xu, H. C. Yang, and K. B. Letaief, “Optimal cooperative beamforming design for MIMO decode-and-forward relay channels,”IEEE Trans. Signal Process., vol. 62, no. 6, pp. 1476–1489, 2014

  15. [15]

    Doppler frequency offset estimation and diversity reception scheme of high-speed railway with multiple antennas on separated carriage,

    Y . Yang and P. Fan, “Doppler frequency offset estimation and diversity reception scheme of high-speed railway with multiple antennas on separated carriage,”J. Mod. Transport., vol. 20, no. 4, pp. 227–233, 2012

  16. [16]

    Global proportional fair scheduling for networks with multiple base stations,

    H. Zhou, P. Fan, and J. Li, “Global proportional fair scheduling for networks with multiple base stations,”IEEE Trans. Veh. Technol., vol. 60, no. 4, pp. 1867–1879, 2011

  17. [17]

    Large language model-based task offloading and resource allocation for digital twin edge computing networks,

    Q. Wu, Y . Xie, P. Fan, D. Qin, K. Wang, and K. B. Letaief, “Large language model-based task offloading and resource allocation for digital twin edge computing networks,”IEEE Trans. Mobile Comput., Early Access, 2026, doi: https://doi.org/10.1109/TMC.2026.3664866

  18. [18]

    Resource allocation for twin maintenance and task processing in vehicular edge computing network,

    Y . Xie, Q. Wu, P. Fan, N. Cheng, W. Chen, J. Wang, and K. B. Letaief, “Resource allocation for twin maintenance and task processing in vehicular edge computing network,”IEEE Internet Things J., vol. 12, no. 15, pp. 32008–32021, Aug. 2025. 17

  19. [19]

    Mobile edge computing: A survey on architec- ture and computation offloading,

    P. Mach and Z. Becvar, “Mobile edge computing: A survey on architec- ture and computation offloading,”IEEE Commun. Surveys Tuts., vol. 19, no. 3, pp. 1628–1656, 2017

  20. [20]

    Velocity-adaptive access scheme for semantic-aware vehicular networks: Joint fairness and AoI optimization,

    X. Xu, Q. Wu, P. Fan, K. Wang, N. Cheng, W. Chen, and K. B. Letaief, “Velocity-adaptive access scheme for semantic-aware vehicular networks: Joint fairness and AoI optimization,”IEEE Trans. Mobile Comput., Early Access, 2026, doi: https://doi.org/10.1109/TMC.2026. 3667698

  21. [21]

    Enhanced velocity-adaptive scheme: Joint fair access and age of information optimization in vehicular networks,

    X. Xu, Q. Wu, P. Fan, K. Wang, N. Cheng, W. Chen, and K. B. Letaief, “Enhanced velocity-adaptive scheme: Joint fair access and age of information optimization in vehicular networks,”IEEE Trans. Mobile Comput., vol. 25, no. 3, pp. 3488–3505, Mar. 2026

  22. [22]

    Delay-constrained optimal link scheduling in wireless sensor networks,

    Q. Wang, D. O. Wu, and P. Fan, “Delay-constrained optimal link scheduling in wireless sensor networks,”IEEE Trans. Veh. Technol., vol. 59, no. 9, pp. 4564–4577, 2010

  23. [23]

    Network coding for two-way relaying networks over Rayleigh fading channels,

    W. Li, J. Li, and P. Fan, “Network coding for two-way relaying networks over Rayleigh fading channels,”IEEE Trans. Veh. Technol., vol. 59, no. 9, pp. 4476–4488, 2010

  24. [24]

    Network coding for efficient multicast routing in wireless ad-hoc networks,

    J. Zhang, P. Fan, and K. B. Letaief, “Network coding for efficient multicast routing in wireless ad-hoc networks,”IEEE Trans. Commun., vol. 56, no. 4, pp. 598–607, 2008

  25. [25]

    A neighbor-table-based multipath routing in ad hoc networks,

    Z. Yao, J. Jiang, P. Fan, Z. Cao, and V . O. K. Li, “A neighbor-table-based multipath routing in ad hoc networks,” inProc. 57th IEEE Semiannu. Veh. Technol. Conf. (VTC Spring), Jeju, South Korea, 2003, pp. 1739– 1743

  26. [26]

    Investigation of the time-offset- based QoS support with optical burst switching in WDM networks,

    P. Fan, C. Feng, Y . Wang, and N. Ge, “Investigation of the time-offset- based QoS support with optical burst switching in WDM networks,” in Proc. IEEE Int. Conf. Commun. (ICC), New York, NY , USA, 2002, pp. 2682–2686

  27. [27]

    Block coded modulation for the reduction of the peak to average power ratio in OFDM systems,

    P. Fan and X.-G. Xia, “Block coded modulation for the reduction of the peak to average power ratio in OFDM systems,” inProc. IEEE Wireless Commun. Netw. Conf. (WCNC), New Orleans, LA, USA, 1999, pp. 1095–1099

  28. [28]

    High stable and accurate vehicle selection scheme based on federated edge learning in vehicular networks,

    Q. Wu, X. Wang, Q. Fan, P. Fan, C. Zhang, and Z. Li, “High stable and accurate vehicle selection scheme based on federated edge learning in vehicular networks,”China Commun., vol. 20, no. 3, pp. 1–17, 2023

  29. [29]

    Optimal resource allocation in wireless powered communication networks with user cooperation,

    X. Di, K. Xiong, P. Fan, H. C. Yang, and K. B. Letaief, “Optimal resource allocation in wireless powered communication networks with user cooperation,”IEEE Trans. Wireless Commun., vol. 16, no. 12, pp. 7936–7949, 2017

  30. [30]

    RadioDiff: An effective generative diffusion model for sampling-free dynamic radio map construction,

    X. Wang, K. Tao, N. Cheng, Z. Yin, Z. Li, Y . Zhang, and X. Shen, “RadioDiff: An effective generative diffusion model for sampling-free dynamic radio map construction,”IEEE Trans. Cogn. Commun. Netw., vol. 11, no. 2, pp. 738–750, 2025

  31. [31]

    RadioDiff-k2: Helmholtz equation informed generative diffusion model for multi-path aware radio map construction,

    X. Wang, Q. Zhang, N. Cheng, R. Sun, Z. Li, S. Cui, and X. Shen, “RadioDiff-k2: Helmholtz equation informed generative diffusion model for multi-path aware radio map construction,”IEEE J. Sel. Areas Commun., vol. 44, pp. 2318–2333, 2026

  32. [32]

    Path loss prediction for V2I communications systems: A performance analysis of propagation models,

    M. B. Ameur, J. Chebil, J. B. Hadj Tahar, M. H. Habaebi, and H. Zormati, “Path loss prediction for V2I communications systems: A performance analysis of propagation models,” inProc. Int. Microw. Antenna Symp. (IMAS), Marrakech, Morocco, 2024, pp. 1–5

  33. [33]

    Joint UA V position and power optimization for accurate regional localization in space-air integrated localization network,

    Y . Zhao, Z. Li, N. Cheng, B. Hao, and X. Shen, “Joint UA V position and power optimization for accurate regional localization in space-air integrated localization network,”IEEE Internet Things J., vol. 8, no. 6, pp. 4841–4854, 2021

  34. [34]

    Interpretable and secure trajectory optimization for UA V-assisted communication,

    Y . Quan, N. Cheng, X. Wang, J. Shen, L. Ma, and Z. Yin, “Interpretable and secure trajectory optimization for UA V-assisted communication,” inProc. IEEE/CIC Int. Conf. Commun. China (ICCC), Dalian, China, 2023, pp. 1–6

  35. [35]

    Edge computing task offloading optimization for a UA V-assisted Internet of Vehicles via deep reinforce- ment learning,

    M. Yan, R. Xiong, Y . Wang, and C. Li, “Edge computing task offloading optimization for a UA V-assisted Internet of Vehicles via deep reinforce- ment learning,”IEEE Trans. Veh. Technol., vol. 73, no. 4, pp. 5647–5658, 2024

  36. [36]

    Joint communication and trajectory optimization for multi-UA V enabled mobile Internet of Vehicles,

    X. Liu, B. Lai, B. Lin, and V . C. M. Leung, “Joint communication and trajectory optimization for multi-UA V enabled mobile Internet of Vehicles,”IEEE Trans. Intell. Transp. Syst., vol. 23, no. 9, pp. 15 354– 15 366, 2022

  37. [37]

    Joint deployment and trajectory optimization in UA V-assisted vehicular edge computing networks,

    Z. Wu, Z. Yang, C. Yang, J. Lin, Y . Liu, and X. Chen, “Joint deployment and trajectory optimization in UA V-assisted vehicular edge computing networks,”J. Commun. Netw., vol. 24, no. 1, pp. 47–58, 2022

  38. [38]

    Placement of UA V-mounted edge servers for Internet of Vehicles,

    Y . Wang, Z. Tang, A. Huang, H. Zhang, L. Chang, and J. Pan, “Placement of UA V-mounted edge servers for Internet of Vehicles,” IEEE Trans. Veh. Technol., vol. 73, no. 7, pp. 10 587–10 601, 2024

  39. [39]

    Resource allocation and collaborative offloading in multi-UA V-assisted IoV with federated deep reinforcement learning,

    Z. Chen, Z. Huang, J. Zhang, H. Cheng, and J. Li, “Resource allocation and collaborative offloading in multi-UA V-assisted IoV with federated deep reinforcement learning,”IEEE Internet Things J., vol. 12, no. 5, pp. 4629–4640, 2025

  40. [40]

    Mobile-aware service offloading for UA V-assisted IoV: A multiagent tiny distributed learning approach,

    Y . Liu, P. Lin, M. Zhang, Z. Zhang, and F. R. Yu, “Mobile-aware service offloading for UA V-assisted IoV: A multiagent tiny distributed learning approach,”IEEE Internet Things J., vol. 11, no. 12, pp. 21 191–21 201, 2024

  41. [41]

    Evolutionary multi-objective reinforcement learning based trajectory control and task offloading in UA V-assisted mobile edge computing,

    F. Song, H. Xing, X. Wang, S. Luo, P. Dai, Z. Xiao, and B. Zhao, “Evolutionary multi-objective reinforcement learning based trajectory control and task offloading in UA V-assisted mobile edge computing,” IEEE Trans. Mobile Comput., vol. 22, no. 12, pp. 7387–7405, 2023

  42. [42]

    UA V-assisted relaying and edge computing: Scheduling and trajectory optimization,

    X. Hu, K.-K. Wong, K. Yang, and Z. Zheng, “UA V-assisted relaying and edge computing: Scheduling and trajectory optimization,”IEEE Trans. Wireless Commun., vol. 18, no. 10, pp. 4738–4752, 2019

  43. [43]

    Joint resource and trajectory optimization for security in UA V-assisted MEC systems,

    Y . Xu, T. Zhang, D. Yang, Y . Liu, and M. Tao, “Joint resource and trajectory optimization for security in UA V-assisted MEC systems,” IEEE Trans. Commun., vol. 69, no. 1, pp. 573–588, 2021

  44. [44]

    Data offloading in UA V-assisted multi-access edge computing systems under resource uncertainty,

    P. A. Apostolopoulos, G. Fragkos, E. E. Tsiropoulou, and S. Papavas- siliou, “Data offloading in UA V-assisted multi-access edge computing systems under resource uncertainty,”IEEE Trans. Mobile Comput., vol. 22, no. 1, pp. 175–190, 2023

  45. [45]

    Multi-agent deep reinforcement learning for task offloading in UA V-assisted mobile edge computing,

    N. Zhao, Z. Ye, Y . Pei, Y .-C. Liang, and D. Niyato, “Multi-agent deep reinforcement learning for task offloading in UA V-assisted mobile edge computing,”IEEE Trans. Wireless Commun., vol. 21, no. 9, pp. 6949– 6960, 2022

  46. [46]

    Multi-agent reinforcement learning in NOMA-aided UA V networks for cellular offloading,

    R. Zhong, X. Liu, Y . Liu, and Y . Chen, “Multi-agent reinforcement learning in NOMA-aided UA V networks for cellular offloading,”IEEE Trans. Wireless Commun., vol. 21, no. 3, pp. 1498–1512, 2022

  47. [47]

    UA V- assisted MEC networks with aerial and ground cooperation,

    Y . Xu, T. Zhang, Y . Liu, D. Yang, L. Xiao, and M. Tao, “UA V- assisted MEC networks with aerial and ground cooperation,”IEEE Trans. Wireless Commun., vol. 20, no. 12, pp. 7712–7727, 2021

  48. [48]

    UA V- assisted mobile edge computing: Optimal design of UA V altitude and task offloading,

    M. Hui, J. Chen, L. Yang, L. Lv, H. Jiang, and N. Al-Dhahir, “UA V- assisted mobile edge computing: Optimal design of UA V altitude and task offloading,”IEEE Trans. Wireless Commun., vol. 23, no. 10, pp. 13 633–13 647, 2024

  49. [49]

    Multi-objective optimization for multi-UA V-assisted mobile edge computing,

    G. Sun, Y . Wang, Z. Sun, Q. Wu, J. Kang, D. Niyato, and V . C. M. Leung, “Multi-objective optimization for multi-UA V-assisted mobile edge computing,”IEEE Trans. Mobile Comput., vol. 23, no. 12, pp. 14 803–14 820, 2024

  50. [50]

    DRL- based optimization for AoI and energy consumption in C-V2X enabled IoV ,

    Z. Zhang, Q. Wu, P. Fan, N. Cheng, W. Chen, and K. B. Letaief, “DRL- based optimization for AoI and energy consumption in C-V2X enabled IoV ,”IEEE Trans. Green Commun. Netw., vol. 9, no. 4, pp. 2144–2159, Dec. 2025, doi: https://doi.org/10.1109/TGCN.2025.3531902

  51. [51]

    DRL-based resource allocation for motion blur resistant federated self- supervised learning in IoV ,

    X. Gu, Q. Wu, P. Fan, Q. Fan, N. Cheng, W. Chen, and K. B. Letaief, “DRL-based resource allocation for motion blur resistant federated self- supervised learning in IoV ,”IEEE Internet Things J., vol. 12, no. 6, pp. 7076–7085, Mar. 2025

  52. [52]

    Large language model (LLM)-enabled in-context learning for wireless network optimization: A case study of power control,

    H. Zhou, C. Hu, D. Yuan, Y . Yuan, D. Wu, X. Liu, and C. Zhang, “Large language model (LLM)-enabled in-context learning for wireless network optimization: A case study of power control,”arXiv preprint arXiv:2408.00214, 2024.[Online]. Available: https://arxiv.org/abs/2408. 00214

  53. [53]

    A PDDQNLP algorithm for energy efficient computation offloading in UA V-assisted MEC,

    N. Lin, H. Tang, L. Zhao, S. Wan, A. Hawbani, and M. Guizani, “A PDDQNLP algorithm for energy efficient computation offloading in UA V-assisted MEC,”IEEE Trans. Wireless Commun., vol. 22, no. 12, pp. 8876–8890, 2023

  54. [54]

    Energy- efficient UA V-assisted mobile edge computing: Resource allocation and trajectory optimization,

    M. Li, N. Cheng, J. Gao, Y . Wang, L. Zhao, and X. Shen, “Energy- efficient UA V-assisted mobile edge computing: Resource allocation and trajectory optimization,”IEEE Trans. Veh. Technol., vol. 69, no. 3, pp. 3424–3438, 2020

  55. [55]

    Optimal LAP altitude for maximum coverage,

    A. Al-Hourani, S. Kandeepan, and S. Lardner, “Optimal LAP altitude for maximum coverage,”IEEE Wireless Commun. Lett., vol. 3, no. 6, pp. 569–572, 2014

  56. [56]

    Energy minimization for wireless communication with rotary-wing UA V ,

    Y . Zeng, J. Xu, and R. Zhang, “Energy minimization for wireless communication with rotary-wing UA V ,”IEEE Trans. Wireless Commun., vol. 18, no. 4, pp. 2329–2345, 2019

  57. [57]

    Energy use and life cycle greenhouse gas emissions of drones for commercial package delivery,

    J. K. Stolaroff, C. Samaras, E. R. O’Neill, A. Lubers, A. S. Mitchell, and D. Ceperley, “Energy use and life cycle greenhouse gas emissions of drones for commercial package delivery,”Nat. Commun., vol. 9, no. 1, Art. no. 409, 2018

  58. [58]

    Efficient 3-D placement of an aerial base station in next generation cellular networks,

    R. I. Bor-Yaliniz, A. El-Keyi, and H. Yanikomeroglu, “Efficient 3-D placement of an aerial base station in next generation cellular networks,” inProc. IEEE Int. Conf. Commun. (ICC), Kuala Lumpur, Malaysia, 2016, pp. 1–5

  59. [59]

    Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models

    Q. Zhang, C. Hu, S. Upasani, B. Ma, F. Hong, V . Kamanuru, J. Rainton, C. Wu, M. Ji, H. Li, U. Thakker, J. Zou, and K. Olukotun, “Agentic context engineering: Evolving contexts for self-improving language 18 models,”arXiv preprint arXiv:2510.04618, 2025. [Online]. Available: https://arxiv.org/abs/2510.04618