UAV Trajectory and Bandwidth Allocation for Efficient Data Collection in Low-Altitude Intelligent IoT: A Hierarchical DRL Approach

Guangxu Zhu; Luliang Jia; Nan Qi; Xiaojie Li; Xiaoling Zhang; Zhenjia Xu

arxiv: 2604.23132 · v2 · pith:IQAKMGJ7new · submitted 2026-04-25 · 💻 cs.CE

UAV Trajectory and Bandwidth Allocation for Efficient Data Collection in Low-Altitude Intelligent IoT: A Hierarchical DRL Approach

Zhenjia Xu , Xiaoling Zhang , Nan Qi , Guangxu Zhu , Xiaojie Li , Luliang Jia This is my paper

Pith reviewed 2026-05-25 06:54 UTC · model grok-4.3

classification 💻 cs.CE

keywords UAV trajectory optimizationhierarchical deep reinforcement learningIoT data collectionbandwidth allocationDDPG algorithmlow-altitude networksinterference management

0 comments

The pith

A hierarchical deep reinforcement learning method optimizes UAV trajectories and bandwidth allocation to maximize IoT data collection under interference and dynamics.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper designs a hierarchical DRL framework that splits UAV flight trajectory decisions at coarse time scales from bandwidth allocation at finer scales. This structure is intended to maximize the volume of data collected from ground IoT nodes while accounting for interference, varying data volumes, and multiple obstacle types. The authors introduce the TBH-DDPG algorithm to implement the hierarchy and report simulation gains in convergence speed and computational cost relative to a non-hierarchical baseline. A reader would care because the decomposition promises to make real-time UAV IoT systems more feasible when decisions must run on limited onboard resources.

Core claim

The central claim is that decomposing the joint trajectory-and-bandwidth problem into two DRL levels, with the upper level selecting coarse flight paths and the lower level selecting fine-grained bandwidth shares, produces an effective solution via the TBH-DDPG algorithm; simulations of the resulting policy show a 44.44 percent improvement in convergence speed and a 58.05 percent reduction in computational cost compared with a flat DDPG baseline under the modeled interference, dynamic data, and obstacle conditions.

What carries the argument

The TBH-DDPG algorithm, which runs an upper-level DDPG policy for trajectory at coarse temporal granularity and a lower-level DDPG policy for bandwidth allocation at fine granularity.

If this is right

The hierarchical split enables the UAV to respond to fast-changing bandwidth needs without recomputing entire trajectories at every time step.
Data collection volume can be increased while still respecting interference limits and obstacle avoidance.
Onboard computation load drops enough that the same hardware can support longer missions or additional sensors.
The method scales to scenarios with many IoT nodes because the lower level operates locally on bandwidth while the upper level plans global movement.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same coarse-to-fine split could be applied to other UAV tasks such as target tracking or delivery routing where planning and actuation occur at mismatched time scales.
If the computational savings hold in hardware, the approach could extend mission duration by lowering energy spent on repeated policy evaluations.
Real-world validation would need to check whether wireless channel estimation errors erode the reported gains when the lower-level policy must act on noisy observations.

Load-bearing premise

The simulation environment, including its interference model, data-volume dynamics, and obstacle types, accurately represents the conditions under which the hierarchical split preserves solution quality while delivering the reported speed and cost gains.

What would settle it

Execute both the proposed TBH-DDPG and the non-hierarchical DDPG baseline on an identical scenario whose interference, data arrivals, and obstacle geometry are taken from field measurements rather than synthetic models, then measure whether the 44 percent convergence and 58 percent cost advantages remain.

Figures

Figures reproduced from arXiv: 2604.23132 by Guangxu Zhu, Luliang Jia, Nan Qi, Xiaojie Li, Xiaoling Zhang, Zhenjia Xu.

**Figure 1.** Figure 1: Data collection for the food processing industry in low-altitude IoT view at source ↗

**Figure 2.** Figure 2: Time slot division. communication time slots. The m-th communication time slot within the n-th flight period is represented by δn,m, where m ∈ 1, 2, . . . , M . During any communication time slot, the UAV employs a frequency division multiple access (FDMA) scheme for communication. The slot division for the entire mission is shown in view at source ↗

**Figure 4.** Figure 4: Scenario map of the system after abstraction. view at source ↗

**Figure 5.** Figure 5: Five layered maps. Finally, the output of the network is flattened and combined with the UAV’s remaining battery information to form the input state for the algorithm. B. SMDP model In DRL, the MDP model is commonly used to simplify the scenario. It assumes that state transitions in the environment depend only on the previous state and is primarily composed of the components ⟨S, A,Pr, R⟩. where S represent… view at source ↗

**Figure 6.** Figure 6: TBH-DDPG algorithm framework diagram. where rf (δn) represents the sum of the collision penalty, the return penalty, and the crash penalty. That is rf (δn) = rcollision(δn) + rreturn(δn) + rnland(δn). (15) The collision penalty is applied when the UAV enters a no-fly zone, assigning a fixed penalty value. The specific expression is as follows. rcollision(δn) = ( rcsn , if pu(δn) in red zone 0 , otherwise ,… view at source ↗

**Figure 7.** Figure 7: Reward training curves. allocation actions. The upper-level rewards include the lowerlevel rewards, but optimize only the flight options. In comparison, the non-hierarchical algorithm considers all rewards and simultaneously optimizes both flight and bandwidth allocation actions. Therefore, the proposed algorithm effectively reduces convergence time compared to the non-hierarchical approach. Moreover, th… view at source ↗

**Figure 8.** Figure 8: The first column illustrates the trajectories of different algorithms after convergence, the second column shows the cumulative data collected by the UAV view at source ↗

**Figure 9.** Figure 9: Impact of data growth per communication slot on data loss. view at source ↗

**Figure 10.** Figure 10: Average number of collisions for different algorithms in different view at source ↗

**Figure 11.** Figure 11: UAV trajectories of TBH-DDPG algorithm in different scenarios. view at source ↗

read the original abstract

The low-altitude Internet of Things (IoT), supported by unmanned aerial vehicles (UAVs), provides ground sensing networks with advanced real-time monitoring and data collection. To maximize data collection volume from distributed IoT nodes, AI-powered data collection technology plays a critical role in enabling intelligent decision-making. Among them, deep reinforcement learning (DRL) has gained particular attention. However, existing DRL-based work on UAV-assisted IoT data collection rarely addresses challenges such as interference and dynamic data volume, while also suffering from high computational demands and slow convergence. To address these challenges, a hierarchical DRL (HDRL) is designed to optimize UAV trajectories and bandwidth allocation to maximize data collection volume. Firstly, the proposed scenario incorporates interference, dynamic data volume of IoT nodes, and multiple types of obstacles. The entire task is hierarchically structured: the upper-level makes flight trajectory decisions at a coarse temporal granularity, while the lower-level makes bandwidth allocation decisions at a finer temporal granularity. Secondly, a trajectory and bandwidth allocation optimization algorithm based on hierarchical deep deterministic policy gradients (TBH-DDPG) is proposed to solve the problem. Finally, simulation results demonstrate that the proposed algorithm improves convergence speed by 44.44%, and reduces computational cost by 58.05%, compared to non-hierarchical algorithm.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The 44% convergence and 58% cost gains rest on an unspecified non-hierarchical baseline, so the main empirical result needs more scrutiny.

read the letter

The main takeaway is that the performance numbers—44.44% faster convergence and 58.05% lower cost—come from a comparison against an unspecified non-hierarchical algorithm. That makes it hard to know if the gains come from the hierarchy or from differences in how the baseline was implemented. The hierarchical structure itself is a sensible way to handle the problem. Trajectory decisions happen at a slower rate than bandwidth allocation, so separating them reduces the size of the action space at each level. The scenario includes interference and dynamic node data volumes, which adds some realism compared to simpler models. What stands out as a soft spot is the missing information on the baseline. The paper does not say whether the non-hierarchical version uses the same actor-critic networks, the same number of training steps, or the same exploration parameters. In high-dimensional continuous control like this, those choices matter a lot for convergence speed. Without that, the results are not easy to reproduce or trust at face value. The simulation setup sounds standard for UAV-IoT work, but again, no details on variance or multiple runs are mentioned in the abstract. This paper would be of interest to researchers working on practical DRL applications for UAV path planning and resource allocation. It is not breaking new ground in DRL theory, but it applies the hierarchical idea to a specific setting with obstacles and interference. I would bring it to a reading group for discussion on the experimental design. I would not cite it in my own work without seeing the full methods. It deserves peer review to let the authors address the baseline description and add statistical details.

Referee Report

1 major / 0 minor

Summary. The paper proposes a hierarchical deep reinforcement learning (HDRL) framework called TBH-DDPG for joint UAV trajectory planning and bandwidth allocation in low-altitude IoT data collection. The scenario includes interference, dynamic IoT data volumes, and multiple obstacle types. Trajectory decisions are made at coarse temporal granularity in the upper level while bandwidth decisions occur at finer granularity in the lower level. Simulation results claim that TBH-DDPG achieves 44.44% faster convergence and 58.05% lower computational cost relative to a non-hierarchical algorithm.

Significance. If the performance claims can be reproduced with matched baselines, the hierarchical decomposition offers a practical route to scaling DRL to high-dimensional joint action spaces in UAV-assisted IoT without sacrificing data-collection volume. The approach directly targets the sample-efficiency and compute bottlenecks that currently limit deployment of flat DDPG-style methods in dynamic wireless environments.

major comments (1)

[Simulation Results] Simulation Results section: The central empirical claims (44.44% faster convergence, 58.05% lower computational cost) are stated without any description of the non-hierarchical baseline algorithm, including its actor-critic network architectures, total gradient steps, exploration schedule, or number of independent trials with reported variance. Because the joint trajectory-plus-bandwidth action space is high-dimensional, any mismatch in network size or training budget between TBH-DDPG and the flat comparator would produce exactly these speed-ups without demonstrating that the hierarchy itself preserves solution quality.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on the simulation results. We agree that the non-hierarchical baseline requires fuller documentation to support the reported performance gains and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Simulation Results] Simulation Results section: The central empirical claims (44.44% faster convergence, 58.05% lower computational cost) are stated without any description of the non-hierarchical baseline algorithm, including its actor-critic network architectures, total gradient steps, exploration schedule, or number of independent trials with reported variance. Because the joint trajectory-plus-bandwidth action space is high-dimensional, any mismatch in network size or training budget between TBH-DDPG and the flat comparator would produce exactly these speed-ups without demonstrating that the hierarchy itself preserves solution quality.

Authors: We agree that the current manuscript provides insufficient detail on the non-hierarchical baseline. In the revised version we will add a dedicated subsection describing the baseline as a flat DDPG implementation that receives the concatenated trajectory-and-bandwidth action vector at every time step. We will specify the actor and critic network architectures (layer counts and widths), the total number of gradient steps, the exploration noise schedule, the number of independent random seeds (with standard deviation reported for all metrics), and the precise training budget allocated to each method. These additions will allow direct verification that the observed speed-up and cost reduction arise from the hierarchical decomposition rather than from unequal computational resources. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical simulation claims rest on external benchmarks, not self-referential definitions or fitted inputs.

full rationale

The paper's central claims are simulation outcomes (44.44% faster convergence, 58.05% lower cost for TBH-DDPG vs. non-hierarchical baseline). No mathematical derivation chain, equations, or first-principles results are presented that reduce to inputs by construction. The hierarchical structure and algorithm are proposed as design choices, with performance evaluated externally via simulation; the baseline comparison, while potentially underspecified, does not constitute a self-definitional or fitted-input reduction. No self-citation load-bearing steps or ansatz smuggling appear in the abstract or described content. This is a standard empirical DRL paper with independent simulation evidence.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Based solely on the abstract, the approach rests on standard DRL modeling assumptions and the paper-specific choice to decompose the problem hierarchically; no explicit free parameters or invented entities are described.

axioms (2)

domain assumption The UAV-IoT data collection task can be formulated as a Markov decision process amenable to DRL
Implicit in the choice of DDPG-based solution
ad hoc to paper Decomposing trajectory decisions at coarse granularity and bandwidth decisions at fine granularity yields both faster learning and lower compute without sacrificing data-collection performance
Core design choice of the hierarchical structure

pith-pipeline@v0.9.0 · 5788 in / 1476 out tokens · 52713 ms · 2026-05-25T06:54:00.263160+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · 2 internal anchors

[1]

Internet of Low-Altitude UA Vs (IoLoUA): a methodical modeling on integration of Internet of “Things

A. Srivastava and J. Prakash, “Internet of Low-Altitude UA Vs (IoLoUA): a methodical modeling on integration of Internet of “Things” with “UA V” possibilities and tests,”Artificial Intelligence Review, vol. 56, no. 3, pp. 2279–2324, 2023

work page 2023
[2]

UA V meets integrated sensing and communication: Challenges and future directions,

J. Mu, R. Zhang, Y . Cui, N. Gao, and X. Jing, “UA V meets integrated sensing and communication: Challenges and future directions,”IEEE Communications Magazine, vol. 61, no. 5, pp. 62–67, 2023

work page 2023
[3]

UA V-assisted data collection for Internet of Things: A survey,

Z. Wei, M. Zhu, N. Zhang, L. Wang, Y . Zou, Z. Meng, H. Wu, and Z. Feng, “UA V-assisted data collection for Internet of Things: A survey,” IEEE Internet of Things Journal, vol. 9, no. 17, pp. 15 460–15 483, 2022

work page 2022
[4]

A review of cognitive UA Vs: AI-driven situation awareness for enhanced operations,

M. Dehghan and E. Khosravian, “A review of cognitive UA Vs: AI-driven situation awareness for enhanced operations,”AI and Tech in Behavioral and Social Sciences, vol. 2, no. 4, pp. 54–65, 2024

work page 2024
[5]

Urban traffic monitoring and analysis using unmanned aerial vehicles (UA Vs): A systematic literature review,

E. V . Butil ˘a and R. G. Boboc, “Urban traffic monitoring and analysis using unmanned aerial vehicles (UA Vs): A systematic literature review,” Remote Sensing, vol. 14, no. 3, p. 620, 2022

work page 2022
[6]

Unmanned aerial vehicles for air pollution monitoring: A survey,

N. H. Motlagh, P. Kortoc ¸i, X. Su, L. Lov´en, H. K. Hoel, S. B. Haugsvær, V . Srivastava, C. F. Gulbrandsen, P. Nurmi, and S. Tarkoma, “Unmanned aerial vehicles for air pollution monitoring: A survey,”IEEE Internet of Things Journal, vol. 10, no. 24, pp. 21 687–21 704, 2023

work page 2023
[7]

K. P. Valavanis and G. J. Vachtsevanos,Handbook of unmanned aerial vehicles. Springer Publishing Company, Incorporated, 2014

work page 2014
[8]

Mobile unmanned aerial vehicles (UA Vs) for energy-efficient Internet of Things commu- nications,

M. Mozaffari, W. Saad, M. Bennis, and M. Debbah, “Mobile unmanned aerial vehicles (UA Vs) for energy-efficient Internet of Things commu- nications,”IEEE Transactions on Wireless Communications, vol. 16, no. 11, pp. 7574–7589, 2017

work page 2017
[9]

Joint trajectory planning and communication design for multiple UA Vs in intelligent collaborative air-ground communication systems,

Z. Lu, Z. Jia, Q. Wu, and Z. Han, “Joint trajectory planning and communication design for multiple UA Vs in intelligent collaborative air-ground communication systems,”IEEE Internet of Things Journal, 2024

work page 2024
[10]

Trajectory design for UA V-based Internet of Things data collection: A deep reinforcement learning approach,

Y . Wang, Z. Gao, J. Zhang, X. Cao, D. Zheng, Y . Gao, D. W. K. Ng, and M. Di Renzo, “Trajectory design for UA V-based Internet of Things data collection: A deep reinforcement learning approach,”IEEE Internet of Things Journal, vol. 9, no. 5, pp. 3899–3912, 2021

work page 2021
[11]

Deep rein- forcement learning-based UA V path planning algorithm in agricultural time-constrained data collection

C. Mingcheng, F. Shoucheng, X. GuoQiang, and H. Ke, “Deep rein- forcement learning-based UA V path planning algorithm in agricultural time-constrained data collection.”Advances in Electrical & Computer Engineering, vol. 23, no. 2, 2023

work page 2023
[12]

Energy-efficient UA V-enabled data collection via wireless charging: A reinforcement learning approach,

S. Fu, Y . Tang, Y . Wu, N. Zhang, H. Gu, C. Chen, and M. Liu, “Energy-efficient UA V-enabled data collection via wireless charging: A reinforcement learning approach,”IEEE Internet of Things Journal, vol. 8, no. 12, pp. 10 209–10 219, 2021

work page 2021
[13]

Energy-efficient data collection in UA V enabled wireless sensor network,

C. Zhan, Y . Zeng, and R. Zhang, “Energy-efficient data collection in UA V enabled wireless sensor network,”IEEE Wireless Communications Letters, vol. 7, no. 3, pp. 328–331, 2017

work page 2017
[14]

UA V trajectory planning for data collection from time-constrained IoT devices,

M. Samir, S. Sharafeddine, C. M. Assi, T. M. Nguyen, and A. Ghrayeb, “UA V trajectory planning for data collection from time-constrained IoT devices,”IEEE Transactions on Wireless Communications, vol. 19, no. 1, pp. 34–46, 2019

work page 2019
[15]

AoI-minimal trajectory planning and data collection in UA V-assisted wireless powered IoT networks,

H. Hu, K. Xiong, G. Qu, Q. Ni, P. Fan, and K. B. Letaief, “AoI-minimal trajectory planning and data collection in UA V-assisted wireless powered IoT networks,”IEEE Internet of Things Journal, vol. 8, no. 2, pp. 1211– 1223, 2020

work page 2020
[16]

A deep learning trained by genetic algorithm to improve the efficiency of path planning for data collection with multi-UA V,

Y . Pan, Y . Yang, and W. Li, “A deep learning trained by genetic algorithm to improve the efficiency of path planning for data collection with multi-UA V,”IEEE Access, vol. 9, pp. 7994–8005, 2021

work page 2021
[17]

Playing Atari with Deep Reinforcement Learning

V . Mnih, “Playing atari with deep reinforcement learning,”arXiv preprint arXiv:1312.5602, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[18]

R. S. Sutton and A. G. Barto,Reinforcement learning: An introduction. MIT press, 2018

work page 2018
[19]

UA V path planning for wireless data harvesting: A deep reinforcement learning approach,

H. Bayerlein, M. Theile, M. Caccamo, and D. Gesbert, “UA V path planning for wireless data harvesting: A deep reinforcement learning approach,” inGLOBECOM 2020-2020 IEEE Global Communications Conference. IEEE, 2020, pp. 1–6

work page 2020
[20]

Distributed multi-UA V trajectory planning for downlink transmission: A GNN-enhanced DRL approach,

Y . Du, N. Qi, X. Li, M. Xiao, A.-A. A. Boulogeorgos, T. A. Tsiftsis, and Q. Wu, “Distributed multi-UA V trajectory planning for downlink transmission: A GNN-enhanced DRL approach,”IEEE Wireless Com- munications Letters, 2024

work page 2024
[21]

Trajectory planning for UA V-assisted data collection in IoT network: A double deep Q network approach,

S. Wang, N. Qi, H. Jiang, M. Xiao, H. Liu, L. Jia, and D. Zhao, “Trajectory planning for UA V-assisted data collection in IoT network: A double deep Q network approach,”Electronics, vol. 13, no. 8, p. 1592, 2024

work page 2024
[22]

3D UA V trajectory design and frequency band allocation for energy-efficient and fair communication: A deep reinforcement learning approach,

R. Ding, F. Gao, and X. S. Shen, “3D UA V trajectory design and frequency band allocation for energy-efficient and fair communication: A deep reinforcement learning approach,”IEEE Transactions on Wireless Communications, vol. 19, no. 12, pp. 7796–7809, 2020

work page 2020
[23]

Intelligent joint trajectory design and resource allocation in UA V-based data harvesting system,

S. Luo, J. Liu, S. Chen, J. Chen, and J. Guo, “Intelligent joint trajectory design and resource allocation in UA V-based data harvesting system,” in2020 IEEE 16th International Conference on Control & Automation (ICCA). IEEE, 2020, pp. 1378–1383

work page 2020
[24]

UA V trajectory planning in wireless sensor networks for energy consumption minimization by deep reinforcement learning,

B. Zhu, E. Bedeer, H. H. Nguyen, R. Barton, and J. Henry, “UA V trajectory planning in wireless sensor networks for energy consumption minimization by deep reinforcement learning,”IEEE Transactions on Vehicular Technology, vol. 70, no. 9, pp. 9540–9554, 2021

work page 2021
[25]

Energy-efficient distributed mobile crowd sensing: A deep learning approach,

C. H. Liu, Z. Chen, and Y . Zhan, “Energy-efficient distributed mobile crowd sensing: A deep learning approach,”IEEE Journal on Selected Areas in Communications, vol. 37, no. 6, pp. 1262–1276, 2019

work page 2019
[26]

AoI-energy-aware UA V- assisted data collection for IoT networks: A deep reinforcement learning method,

M. Sun, X. Xu, X. Qin, and P. Zhang, “AoI-energy-aware UA V- assisted data collection for IoT networks: A deep reinforcement learning method,”IEEE Internet of Things Journal, vol. 8, no. 24, pp. 17 275– 17 289, 2021

work page 2021
[27]

Deep reinforcement learning for fresh data collection in UA V-assisted IoT networks,

M. Yi, X. Wang, J. Liu, Y . Zhang, and B. Bai, “Deep reinforcement learning for fresh data collection in UA V-assisted IoT networks,” in 14 IEEE INFOCOM 2020-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). IEEE, 2020, pp. 716–721

work page 2020
[28]

Challenges of Real-World Reinforcement Learning

G. Dulac-Arnold, D. Mankowitz, and T. Hester, “Challenges of real- world reinforcement learning,”arXiv preprint arXiv:1904.12901, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1904
[29]

UA V swarm deploy- ment and trajectory for 3D area coverage via reinforcement learning,

J. He, Z. Jia, C. Dong, J. Liu, Q. Wu, and J. Liu, “UA V swarm deploy- ment and trajectory for 3D area coverage via reinforcement learning,” in 2023 International Conference on Wireless Communications and Signal Processing (WCSP). IEEE, 2023, pp. 683–688

work page 2023
[30]

Elastic collaborative edge intelligence for UA V swarm: Architecture, challenges, and opportunities,

Y . Qu, H. Sun, C. Dong, J. Kang, H. Dai, Q. Wu, and S. Guo, “Elastic collaborative edge intelligence for UA V swarm: Architecture, challenges, and opportunities,”IEEE Communications Magazine, 2023

work page 2023
[31]

Hierarchical reinforcement learning with the MAXQ value function decomposition,

T. G. Dietterich, “Hierarchical reinforcement learning with the MAXQ value function decomposition,”Journal of artificial intelligence re- search, vol. 13, pp. 227–303, 2000

work page 2000
[32]

Hier- archical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation,

T. D. Kulkarni, K. Narasimhan, A. Saeedi, and J. Tenenbaum, “Hier- archical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation,”Advances in neural information processing systems, vol. 29, 2016

work page 2016
[33]

The option-critic architecture,

P.-L. Bacon, J. Harb, and D. Precup, “The option-critic architecture,” in Proceedings of the AAAI conference on artificial intelligence, vol. 31, no. 1, 2017

work page 2017
[34]

The UA V trajectory optimization for data collection from time-constrained IoT devices: A hierarchical deep Q-network approach,

Z. Qin, X. Zhang, X. Zhang, B. Lu, Z. Liu, and L. Guo, “The UA V trajectory optimization for data collection from time-constrained IoT devices: A hierarchical deep Q-network approach,”Applied Sciences, vol. 12, no. 5, p. 2546, 2022

work page 2022
[35]

Hierarchical deep reinforcement learning for backscattering data collection with multiple UA Vs,

Y . Zhang, Z. Mou, F. Gao, L. Xing, J. Jiang, and Z. Han, “Hierarchical deep reinforcement learning for backscattering data collection with multiple UA Vs,”IEEE Internet of Things Journal, vol. 8, no. 5, pp. 3786–3800, 2020

work page 2020
[36]

Research on the UA V-aided data collection and trajectory design based on the deep reinforcement learning,

M. Zhiyu, Y . Zhang, F. Dian, L. Jun, and G. Feifei, “Research on the UA V-aided data collection and trajectory design based on the deep reinforcement learning,”Chinese Journal on Internet of Things, vol. 4, no. 3, pp. 42–51, 2020

work page 2020
[37]

Coalitional formation- based group-buying for UA V-enabled data collection: An auction game approach,

N. Qi, Z. Huang, W. Sun, S. Jin, and X. Su, “Coalitional formation- based group-buying for UA V-enabled data collection: An auction game approach,”IEEE Transactions on Mobile Computing, vol. 22, no. 12, pp. 7420–7437, 2022

work page 2022
[38]

Energy- efficient UA V-relaying 5G/6G spectrum sharing networks: Interference coordination with power management and trajectory design,

W. Wang, N. Qi, L. Jia, C. Li, T. A. Tsiftsis, and M. Wang, “Energy- efficient UA V-relaying 5G/6G spectrum sharing networks: Interference coordination with power management and trajectory design,”IEEE Open Journal of the Communications Society, vol. 3, pp. 1672–1687, 2022

work page 2022
[39]

Learning to communicate in UA V-aided wireless networks: Map-based approaches,

O. Esrafilian, R. Gangula, and D. Gesbert, “Learning to communicate in UA V-aided wireless networks: Map-based approaches,”IEEE Internet of Things Journal, vol. 6, no. 2, pp. 1791–1802, 2018

work page 2018
[40]

Altitude and number optimisation for UA Vv-enabled wireless communications,

J. Zhang, T. Zhang, Z. Yang, B. Li, and Y . Wu, “Altitude and number optimisation for UA Vv-enabled wireless communications,”IET Commu- nications, vol. 14, no. 8, pp. 1228–1233, 2020

work page 2020
[41]

Energy minimization for wireless communication with rotary-wing UA V,

Y . Zeng, J. Xu, and R. Zhang, “Energy minimization for wireless communication with rotary-wing UA V,”IEEE Transactions on Wireless Communications, vol. 18, no. 4, pp. 2329–2345, 2019

work page 2019
[42]

UA V path planning using global and local map information with deep rein- forcement learning,

M. Theile, H. Bayerlein, R. Nai, D. Gesbert, and M. Caccamo, “UA V path planning using global and local map information with deep rein- forcement learning,” in2021 20th International Conference on Advanced Robotics (ICAR). IEEE, 2021, pp. 539–546

work page 2021
[43]

DDPG-based aerial secure data collection,

H. Lei, H. Ran, I. S. Ansari, K.-H. Park, G. Pan, and M.-S. Alouini, “DDPG-based aerial secure data collection,”IEEE Transactions on Communications, vol. 72, no. 8, pp. 5179–5193, 2024

work page 2024
[44]

SAC-based UA V mobile edge computing for energy minimization and secure data transmission,

X. Zhao, T. Zhao, F. Wang, Y . Wu, and M. Li, “SAC-based UA V mobile edge computing for energy minimization and secure data transmission,” Ad Hoc Networks, vol. 157, p. 103435, 2024. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1570870524000465 Zhenjia Xureceived the B.S. degree in commu- nication engineering from Nanjing Univer...

work page 2024

[1] [1]

Internet of Low-Altitude UA Vs (IoLoUA): a methodical modeling on integration of Internet of “Things

A. Srivastava and J. Prakash, “Internet of Low-Altitude UA Vs (IoLoUA): a methodical modeling on integration of Internet of “Things” with “UA V” possibilities and tests,”Artificial Intelligence Review, vol. 56, no. 3, pp. 2279–2324, 2023

work page 2023

[2] [2]

UA V meets integrated sensing and communication: Challenges and future directions,

J. Mu, R. Zhang, Y . Cui, N. Gao, and X. Jing, “UA V meets integrated sensing and communication: Challenges and future directions,”IEEE Communications Magazine, vol. 61, no. 5, pp. 62–67, 2023

work page 2023

[3] [3]

UA V-assisted data collection for Internet of Things: A survey,

Z. Wei, M. Zhu, N. Zhang, L. Wang, Y . Zou, Z. Meng, H. Wu, and Z. Feng, “UA V-assisted data collection for Internet of Things: A survey,” IEEE Internet of Things Journal, vol. 9, no. 17, pp. 15 460–15 483, 2022

work page 2022

[4] [4]

A review of cognitive UA Vs: AI-driven situation awareness for enhanced operations,

M. Dehghan and E. Khosravian, “A review of cognitive UA Vs: AI-driven situation awareness for enhanced operations,”AI and Tech in Behavioral and Social Sciences, vol. 2, no. 4, pp. 54–65, 2024

work page 2024

[5] [5]

Urban traffic monitoring and analysis using unmanned aerial vehicles (UA Vs): A systematic literature review,

E. V . Butil ˘a and R. G. Boboc, “Urban traffic monitoring and analysis using unmanned aerial vehicles (UA Vs): A systematic literature review,” Remote Sensing, vol. 14, no. 3, p. 620, 2022

work page 2022

[6] [6]

Unmanned aerial vehicles for air pollution monitoring: A survey,

N. H. Motlagh, P. Kortoc ¸i, X. Su, L. Lov´en, H. K. Hoel, S. B. Haugsvær, V . Srivastava, C. F. Gulbrandsen, P. Nurmi, and S. Tarkoma, “Unmanned aerial vehicles for air pollution monitoring: A survey,”IEEE Internet of Things Journal, vol. 10, no. 24, pp. 21 687–21 704, 2023

work page 2023

[7] [7]

K. P. Valavanis and G. J. Vachtsevanos,Handbook of unmanned aerial vehicles. Springer Publishing Company, Incorporated, 2014

work page 2014

[8] [8]

Mobile unmanned aerial vehicles (UA Vs) for energy-efficient Internet of Things commu- nications,

M. Mozaffari, W. Saad, M. Bennis, and M. Debbah, “Mobile unmanned aerial vehicles (UA Vs) for energy-efficient Internet of Things commu- nications,”IEEE Transactions on Wireless Communications, vol. 16, no. 11, pp. 7574–7589, 2017

work page 2017

[9] [9]

Joint trajectory planning and communication design for multiple UA Vs in intelligent collaborative air-ground communication systems,

Z. Lu, Z. Jia, Q. Wu, and Z. Han, “Joint trajectory planning and communication design for multiple UA Vs in intelligent collaborative air-ground communication systems,”IEEE Internet of Things Journal, 2024

work page 2024

[10] [10]

Trajectory design for UA V-based Internet of Things data collection: A deep reinforcement learning approach,

Y . Wang, Z. Gao, J. Zhang, X. Cao, D. Zheng, Y . Gao, D. W. K. Ng, and M. Di Renzo, “Trajectory design for UA V-based Internet of Things data collection: A deep reinforcement learning approach,”IEEE Internet of Things Journal, vol. 9, no. 5, pp. 3899–3912, 2021

work page 2021

[11] [11]

Deep rein- forcement learning-based UA V path planning algorithm in agricultural time-constrained data collection

C. Mingcheng, F. Shoucheng, X. GuoQiang, and H. Ke, “Deep rein- forcement learning-based UA V path planning algorithm in agricultural time-constrained data collection.”Advances in Electrical & Computer Engineering, vol. 23, no. 2, 2023

work page 2023

[12] [12]

Energy-efficient UA V-enabled data collection via wireless charging: A reinforcement learning approach,

S. Fu, Y . Tang, Y . Wu, N. Zhang, H. Gu, C. Chen, and M. Liu, “Energy-efficient UA V-enabled data collection via wireless charging: A reinforcement learning approach,”IEEE Internet of Things Journal, vol. 8, no. 12, pp. 10 209–10 219, 2021

work page 2021

[13] [13]

Energy-efficient data collection in UA V enabled wireless sensor network,

C. Zhan, Y . Zeng, and R. Zhang, “Energy-efficient data collection in UA V enabled wireless sensor network,”IEEE Wireless Communications Letters, vol. 7, no. 3, pp. 328–331, 2017

work page 2017

[14] [14]

UA V trajectory planning for data collection from time-constrained IoT devices,

M. Samir, S. Sharafeddine, C. M. Assi, T. M. Nguyen, and A. Ghrayeb, “UA V trajectory planning for data collection from time-constrained IoT devices,”IEEE Transactions on Wireless Communications, vol. 19, no. 1, pp. 34–46, 2019

work page 2019

[15] [15]

AoI-minimal trajectory planning and data collection in UA V-assisted wireless powered IoT networks,

H. Hu, K. Xiong, G. Qu, Q. Ni, P. Fan, and K. B. Letaief, “AoI-minimal trajectory planning and data collection in UA V-assisted wireless powered IoT networks,”IEEE Internet of Things Journal, vol. 8, no. 2, pp. 1211– 1223, 2020

work page 2020

[16] [16]

A deep learning trained by genetic algorithm to improve the efficiency of path planning for data collection with multi-UA V,

Y . Pan, Y . Yang, and W. Li, “A deep learning trained by genetic algorithm to improve the efficiency of path planning for data collection with multi-UA V,”IEEE Access, vol. 9, pp. 7994–8005, 2021

work page 2021

[17] [17]

Playing Atari with Deep Reinforcement Learning

V . Mnih, “Playing atari with deep reinforcement learning,”arXiv preprint arXiv:1312.5602, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[18] [18]

R. S. Sutton and A. G. Barto,Reinforcement learning: An introduction. MIT press, 2018

work page 2018

[19] [19]

UA V path planning for wireless data harvesting: A deep reinforcement learning approach,

H. Bayerlein, M. Theile, M. Caccamo, and D. Gesbert, “UA V path planning for wireless data harvesting: A deep reinforcement learning approach,” inGLOBECOM 2020-2020 IEEE Global Communications Conference. IEEE, 2020, pp. 1–6

work page 2020

[20] [20]

Distributed multi-UA V trajectory planning for downlink transmission: A GNN-enhanced DRL approach,

Y . Du, N. Qi, X. Li, M. Xiao, A.-A. A. Boulogeorgos, T. A. Tsiftsis, and Q. Wu, “Distributed multi-UA V trajectory planning for downlink transmission: A GNN-enhanced DRL approach,”IEEE Wireless Com- munications Letters, 2024

work page 2024

[21] [21]

Trajectory planning for UA V-assisted data collection in IoT network: A double deep Q network approach,

S. Wang, N. Qi, H. Jiang, M. Xiao, H. Liu, L. Jia, and D. Zhao, “Trajectory planning for UA V-assisted data collection in IoT network: A double deep Q network approach,”Electronics, vol. 13, no. 8, p. 1592, 2024

work page 2024

[22] [22]

3D UA V trajectory design and frequency band allocation for energy-efficient and fair communication: A deep reinforcement learning approach,

R. Ding, F. Gao, and X. S. Shen, “3D UA V trajectory design and frequency band allocation for energy-efficient and fair communication: A deep reinforcement learning approach,”IEEE Transactions on Wireless Communications, vol. 19, no. 12, pp. 7796–7809, 2020

work page 2020

[23] [23]

Intelligent joint trajectory design and resource allocation in UA V-based data harvesting system,

S. Luo, J. Liu, S. Chen, J. Chen, and J. Guo, “Intelligent joint trajectory design and resource allocation in UA V-based data harvesting system,” in2020 IEEE 16th International Conference on Control & Automation (ICCA). IEEE, 2020, pp. 1378–1383

work page 2020

[24] [24]

UA V trajectory planning in wireless sensor networks for energy consumption minimization by deep reinforcement learning,

B. Zhu, E. Bedeer, H. H. Nguyen, R. Barton, and J. Henry, “UA V trajectory planning in wireless sensor networks for energy consumption minimization by deep reinforcement learning,”IEEE Transactions on Vehicular Technology, vol. 70, no. 9, pp. 9540–9554, 2021

work page 2021

[25] [25]

Energy-efficient distributed mobile crowd sensing: A deep learning approach,

C. H. Liu, Z. Chen, and Y . Zhan, “Energy-efficient distributed mobile crowd sensing: A deep learning approach,”IEEE Journal on Selected Areas in Communications, vol. 37, no. 6, pp. 1262–1276, 2019

work page 2019

[26] [26]

AoI-energy-aware UA V- assisted data collection for IoT networks: A deep reinforcement learning method,

M. Sun, X. Xu, X. Qin, and P. Zhang, “AoI-energy-aware UA V- assisted data collection for IoT networks: A deep reinforcement learning method,”IEEE Internet of Things Journal, vol. 8, no. 24, pp. 17 275– 17 289, 2021

work page 2021

[27] [27]

Deep reinforcement learning for fresh data collection in UA V-assisted IoT networks,

M. Yi, X. Wang, J. Liu, Y . Zhang, and B. Bai, “Deep reinforcement learning for fresh data collection in UA V-assisted IoT networks,” in 14 IEEE INFOCOM 2020-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). IEEE, 2020, pp. 716–721

work page 2020

[28] [28]

Challenges of Real-World Reinforcement Learning

G. Dulac-Arnold, D. Mankowitz, and T. Hester, “Challenges of real- world reinforcement learning,”arXiv preprint arXiv:1904.12901, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1904

[29] [29]

UA V swarm deploy- ment and trajectory for 3D area coverage via reinforcement learning,

J. He, Z. Jia, C. Dong, J. Liu, Q. Wu, and J. Liu, “UA V swarm deploy- ment and trajectory for 3D area coverage via reinforcement learning,” in 2023 International Conference on Wireless Communications and Signal Processing (WCSP). IEEE, 2023, pp. 683–688

work page 2023

[30] [30]

Elastic collaborative edge intelligence for UA V swarm: Architecture, challenges, and opportunities,

Y . Qu, H. Sun, C. Dong, J. Kang, H. Dai, Q. Wu, and S. Guo, “Elastic collaborative edge intelligence for UA V swarm: Architecture, challenges, and opportunities,”IEEE Communications Magazine, 2023

work page 2023

[31] [31]

Hierarchical reinforcement learning with the MAXQ value function decomposition,

T. G. Dietterich, “Hierarchical reinforcement learning with the MAXQ value function decomposition,”Journal of artificial intelligence re- search, vol. 13, pp. 227–303, 2000

work page 2000

[32] [32]

Hier- archical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation,

T. D. Kulkarni, K. Narasimhan, A. Saeedi, and J. Tenenbaum, “Hier- archical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation,”Advances in neural information processing systems, vol. 29, 2016

work page 2016

[33] [33]

The option-critic architecture,

P.-L. Bacon, J. Harb, and D. Precup, “The option-critic architecture,” in Proceedings of the AAAI conference on artificial intelligence, vol. 31, no. 1, 2017

work page 2017

[34] [34]

The UA V trajectory optimization for data collection from time-constrained IoT devices: A hierarchical deep Q-network approach,

Z. Qin, X. Zhang, X. Zhang, B. Lu, Z. Liu, and L. Guo, “The UA V trajectory optimization for data collection from time-constrained IoT devices: A hierarchical deep Q-network approach,”Applied Sciences, vol. 12, no. 5, p. 2546, 2022

work page 2022

[35] [35]

Hierarchical deep reinforcement learning for backscattering data collection with multiple UA Vs,

Y . Zhang, Z. Mou, F. Gao, L. Xing, J. Jiang, and Z. Han, “Hierarchical deep reinforcement learning for backscattering data collection with multiple UA Vs,”IEEE Internet of Things Journal, vol. 8, no. 5, pp. 3786–3800, 2020

work page 2020

[36] [36]

Research on the UA V-aided data collection and trajectory design based on the deep reinforcement learning,

M. Zhiyu, Y . Zhang, F. Dian, L. Jun, and G. Feifei, “Research on the UA V-aided data collection and trajectory design based on the deep reinforcement learning,”Chinese Journal on Internet of Things, vol. 4, no. 3, pp. 42–51, 2020

work page 2020

[37] [37]

Coalitional formation- based group-buying for UA V-enabled data collection: An auction game approach,

N. Qi, Z. Huang, W. Sun, S. Jin, and X. Su, “Coalitional formation- based group-buying for UA V-enabled data collection: An auction game approach,”IEEE Transactions on Mobile Computing, vol. 22, no. 12, pp. 7420–7437, 2022

work page 2022

[38] [38]

Energy- efficient UA V-relaying 5G/6G spectrum sharing networks: Interference coordination with power management and trajectory design,

W. Wang, N. Qi, L. Jia, C. Li, T. A. Tsiftsis, and M. Wang, “Energy- efficient UA V-relaying 5G/6G spectrum sharing networks: Interference coordination with power management and trajectory design,”IEEE Open Journal of the Communications Society, vol. 3, pp. 1672–1687, 2022

work page 2022

[39] [39]

Learning to communicate in UA V-aided wireless networks: Map-based approaches,

O. Esrafilian, R. Gangula, and D. Gesbert, “Learning to communicate in UA V-aided wireless networks: Map-based approaches,”IEEE Internet of Things Journal, vol. 6, no. 2, pp. 1791–1802, 2018

work page 2018

[40] [40]

Altitude and number optimisation for UA Vv-enabled wireless communications,

J. Zhang, T. Zhang, Z. Yang, B. Li, and Y . Wu, “Altitude and number optimisation for UA Vv-enabled wireless communications,”IET Commu- nications, vol. 14, no. 8, pp. 1228–1233, 2020

work page 2020

[41] [41]

Energy minimization for wireless communication with rotary-wing UA V,

Y . Zeng, J. Xu, and R. Zhang, “Energy minimization for wireless communication with rotary-wing UA V,”IEEE Transactions on Wireless Communications, vol. 18, no. 4, pp. 2329–2345, 2019

work page 2019

[42] [42]

UA V path planning using global and local map information with deep rein- forcement learning,

M. Theile, H. Bayerlein, R. Nai, D. Gesbert, and M. Caccamo, “UA V path planning using global and local map information with deep rein- forcement learning,” in2021 20th International Conference on Advanced Robotics (ICAR). IEEE, 2021, pp. 539–546

work page 2021

[43] [43]

DDPG-based aerial secure data collection,

H. Lei, H. Ran, I. S. Ansari, K.-H. Park, G. Pan, and M.-S. Alouini, “DDPG-based aerial secure data collection,”IEEE Transactions on Communications, vol. 72, no. 8, pp. 5179–5193, 2024

work page 2024

[44] [44]

SAC-based UA V mobile edge computing for energy minimization and secure data transmission,

X. Zhao, T. Zhao, F. Wang, Y . Wu, and M. Li, “SAC-based UA V mobile edge computing for energy minimization and secure data transmission,” Ad Hoc Networks, vol. 157, p. 103435, 2024. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1570870524000465 Zhenjia Xureceived the B.S. degree in commu- nication engineering from Nanjing Univer...

work page 2024