Scaling up Energy-Aware Multi-Agent Reinforcement Learning for Mission-Oriented Drone Networks with Individual Reward

Changling Li; Ying Li

arxiv: 2605.24992 · v1 · pith:LSW24GPXnew · submitted 2026-05-24 · 💻 cs.NI · cs.AI· cs.LG· cs.MA

Scaling up Energy-Aware Multi-Agent Reinforcement Learning for Mission-Oriented Drone Networks with Individual Reward

Changling Li , Ying Li This is my paper

Pith reviewed 2026-06-29 23:57 UTC · model grok-4.3

classification 💻 cs.NI cs.AIcs.LGcs.MA

keywords multi-agent reinforcement learningdrone networksenergy-aware planningindividual rewardtrajectory planningDQNsuccess ratescalability

0 comments

The pith

Individual reward functions in energy-aware drone MARL deliver higher success rates and fewer steps when environment size and agent count increase.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops an energy-aware multi-agent reinforcement learning model for drone trajectory planning that assigns each drone its own reward based on how much of the mission it has completed and how much battery it has left. It tests this setup in simulations against a prior shared-reward version of the same algorithm. The individual-reward version reaches at least 80 percent success regardless of where tasks are placed and stays more effective as the area grows or more drones are added. A sympathetic reader would care because limited battery and changing conditions are the main barriers to using many drones together for missions, and clearer per-agent goals appear to ease both problems at once.

Core claim

The central claim is that replacing a shared reward with individual rewards driven by task execution progress and remaining battery in a DQN-based MARL model produces greater robustness to increases in environment size and agent numbers, yielding higher success rates achieved in fewer steps and therefore better energy use than the shared-reward baseline.

What carries the argument

Individual reward functions driven by task execution progress and remaining battery of each drone, which supply each agent with a clear, local signal instead of a single group reward.

If this is right

The individual-reward model achieves at least 80 percent success rate independent of task locations and lengths.
Success rate improves with higher task density and approaches 100 percent near 40 percent density.
The model remains effective when environment size or agent count grows, unlike the shared-reward version.
Higher success is reached in fewer steps, which directly reduces total energy consumed.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same individual-reward structure could be tried in other resource-constrained multi-agent settings such as autonomous vehicle fleets.
Explicit per-agent goals may reduce the need for elaborate credit-assignment techniques when the number of agents becomes large.
Real-world battery discharge curves that include wind or payload effects could be substituted into the reward to test sensitivity.
The approach might allow mission planners to add or remove drones mid-operation without retraining the entire team policy.

Load-bearing premise

The simulation environment and battery model accurately reflect the dynamics that would occur with real drones, and the shared-reward baseline is reproduced exactly except for the reward structure.

What would settle it

Running the same mission scenarios with physical drones while varying the number of agents and the size of the operating area, then checking whether the individual-reward model still records higher success and fewer total steps than the shared-reward model.

Figures

Figures reproduced from arXiv: 2605.24992 by Changling Li, Ying Li.

**Figure 2.** Figure 2: A grid pattern indication of endpoints for a drone [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: A 5 × 5 grid pattern consisting of 25 trajectory points with four tasks and one base station. To simplify the calculation for easy understanding and focus on the performance comparison in our experiment, we proportionally scale down the energy consumption rate values based on the calculated values to PE = 3, PH = [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: The success rate under different batch size [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: The average number of steps required to finish a [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: The average number of steps required to finish a [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 8.** Figure 8: The success rate under different numbers of tasks [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗

**Figure 7.** Figure 7: The average number of steps required to finish [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

**Figure 9.** Figure 9: The average number of steps required to finish a [PITH_FULL_IMAGE:figures/full_fig_p009_9.png] view at source ↗

**Figure 13.** Figure 13: We compared the performance over task numbers [PITH_FULL_IMAGE:figures/full_fig_p009_13.png] view at source ↗

**Figure 12.** Figure 12: Performance comparison on success rate under [PITH_FULL_IMAGE:figures/full_fig_p010_12.png] view at source ↗

**Figure 15.** Figure 15: Performance comparison on task execution steps [PITH_FULL_IMAGE:figures/full_fig_p010_15.png] view at source ↗

**Figure 16.** Figure 16: The success rate under different target network [PITH_FULL_IMAGE:figures/full_fig_p013_16.png] view at source ↗

**Figure 17.** Figure 17: The success rate under different task location and [PITH_FULL_IMAGE:figures/full_fig_p013_17.png] view at source ↗

**Figure 18.** Figure 18: Success rate comparison of the individual reward mode and the shared reward mode on various task densities. [PITH_FULL_IMAGE:figures/full_fig_p014_18.png] view at source ↗

**Figure 19.** Figure 19: Comparison of the number of execution steps of the individual reward mode and the shared reward mode on [PITH_FULL_IMAGE:figures/full_fig_p014_19.png] view at source ↗

**Figure 20.** Figure 20: Success rate comparison of the individual reward mode and the shared reward mode under different grid sizes. [PITH_FULL_IMAGE:figures/full_fig_p015_20.png] view at source ↗

**Figure 21.** Figure 21: Comparison of the number of execution steps of the individual reward mode and the shared reward mode [PITH_FULL_IMAGE:figures/full_fig_p016_21.png] view at source ↗

read the original abstract

Multi-agent reinforcement learning (MARL) has shown wide applicability in collaborative systems such as autonomous driving and smart cities for its ability of learning through interaction. With the recent development of drone networks, researchers have also applied MARL to address the trajectory planning problems. However, the dynamic environment and the limited battery capacity are still challenging for using MARL to achieve efficient collaborative task execution. In this paper, we propose an energy-aware MARL model as an attempt to tackle these challenges, leveraging Deep Q-Networks (DQN) with \emph{individual reward functions} driven by the task execution progress and the remaining battery of drones. We conduct a set of simulation studies for the proposed mode and compare it with the shared reward MARL~\cite{Li2022MARL} to explore the impact of credit assignment in MARL. The results indicate that our proposed model can achieve at least 80\% success rate regardless of the task locations and lengths. Similar to the shared reward mode, the individual reward mode can achieve a better success rate when the task density is high, and it can hit nearly a 100\% success rate when task density gets close to 40\%. The true advantage of our proposed model with individual reward is revealed when scaling up the environment. The comparison to the shared reward MARL shows that the our proposed model is more robust towards the change of the environment size and agent numbers. It can achieve higher success rate with fewer steps due to the clarity of the goal which improves energy efficiency even better.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This applies individual rewards to the authors' prior shared-reward drone MARL setup and claims better scaling robustness, but the abstract supplies no evidence the baseline was reproduced identically.

read the letter

The core claim is that individual rewards in their DQN-based MARL for drone trajectory planning produce higher success rates and better scaling than the shared-reward version from their 2022 paper when environment size and agent count increase. They report success staying above 80 percent across task locations, nearing 100 percent at higher task densities, and fewer steps taken due to clearer per-agent goals.

What the work actually does is swap the reward structure in an existing DQN MARL setup for this domain and run simulations against their own earlier baseline. The intuition that individual rewards reduce credit-assignment problems and improve energy use is straightforward and worth testing in mission-oriented drone settings.

The main weakness is the complete absence of experimental detail. No training curves, variance, hyperparameter tables, or statistical tests appear in the abstract. More critically, there is no confirmation that the shared-reward baseline was reimplemented with identical state space, action space, network architecture, exploration schedule, battery model, and termination conditions. Any deviation would make the robustness advantage impossible to attribute to the reward change alone. The stress-test concern holds up on the available text.

The simulation environment and battery dynamics are also undescribed, so it is unclear how well they match physical drone behavior. This is an incremental application paper rather than a new algorithm or theoretical result.

It is mainly of interest to researchers already working on MARL for UAV task allocation who want to see one more data point on reward design. It does not move the broader MARL or networking literature. If the full paper includes reproducible code, exact baseline matching, and basic statistical reporting, it could merit review for the application angle. Otherwise the evidence is too thin to justify referee time.

Referee Report

2 major / 2 minor

Summary. The paper proposes an energy-aware multi-agent reinforcement learning model using Deep Q-Networks with individual reward functions (driven by task execution progress and remaining battery) for trajectory planning in mission-oriented drone networks. It reports simulation results claiming at least 80% success rate independent of task locations/lengths, improved performance at high task density (near 100% at ~40% density), and superior robustness to scaling environment size and agent numbers compared to the shared-reward baseline from Li2022MARL, with higher success rates, fewer steps, and better energy efficiency due to clearer goal credit assignment.

Significance. If the baseline comparison is controlled and results reproducible, the work would demonstrate the value of individual rewards for credit assignment in scalable, energy-constrained MARL for drone systems, addressing dynamic environments and battery limits in collaborative task execution.

major comments (2)

[Abstract] Abstract and simulation studies: The central robustness claim (higher success rate with fewer steps when scaling environment size and agent numbers) attributes the advantage to the individual reward. However, no evidence is provided (hyperparameter tables, explicit statement, or code) that the Li2022MARL shared-reward baseline was reimplemented identically in state space, action space, DQN architecture, replay buffer, exploration, battery dynamics, task generation, and termination conditions. Any mismatch confounds the credit-assignment effect.
[Abstract] Abstract and simulation studies: Success rates and scaling behavior are reported without training curves, run-to-run variance, hyperparameter details, or statistical tests. This makes it impossible to assess whether the reported robustness (e.g., at least 80% success regardless of task locations) is reliable or sensitive to implementation choices.

minor comments (2)

[Abstract] Abstract: The sentence containing 'the our proposed model' contains a grammatical error that should be corrected for clarity.
The battery model and full simulation environment parameters are referenced but not described in sufficient detail to allow independent reproduction or assessment of physical fidelity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. The comments highlight important issues regarding reproducibility and the strength of the baseline comparison. We address each major comment below and commit to revisions that strengthen the manuscript without altering its core claims.

read point-by-point responses

Referee: [Abstract] Abstract and simulation studies: The central robustness claim (higher success rate with fewer steps when scaling environment size and agent numbers) attributes the advantage to the individual reward. However, no evidence is provided (hyperparameter tables, explicit statement, or code) that the Li2022MARL shared-reward baseline was reimplemented identically in state space, action space, DQN architecture, replay buffer, exploration, battery dynamics, task generation, and termination conditions. Any mismatch confounds the credit-assignment effect.

Authors: We agree that the manuscript does not currently provide sufficient documentation to confirm identical reimplementation of the Li2022MARL baseline. In the revised version we will add an explicit subsection (and accompanying table) that lists all implementation choices for both methods side-by-side, covering state/action spaces, DQN architecture, replay buffer size, exploration schedule, battery model, task generation procedure, and termination conditions. This will allow readers to verify that the only controlled difference is the reward structure, thereby isolating the credit-assignment effect. revision: yes
Referee: [Abstract] Abstract and simulation studies: Success rates and scaling behavior are reported without training curves, run-to-run variance, hyperparameter details, or statistical tests. This makes it impossible to assess whether the reported robustness (e.g., at least 80% success regardless of task locations) is reliable or sensitive to implementation choices.

Authors: We acknowledge the absence of these elements. The revised manuscript will include: (i) training curves for both reward variants, (ii) mean and standard deviation across at least five independent random seeds, (iii) a complete hyperparameter table, and (iv) statistical significance tests (e.g., paired t-tests or Wilcoxon tests) on the reported success rates and step counts. These additions will allow readers to evaluate the reliability of the scaling results. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical comparison with no derivation chain

full rationale

The manuscript presents a simulation-based empirical comparison of an individual-reward DQN variant against a shared-reward baseline from prior work. No mathematical derivation, first-principles result, fitted parameter renamed as prediction, or self-definitional construct is claimed or present. The single self-citation references the baseline method for comparison purposes; the reported robustness findings are direct simulation outputs rather than reductions to inputs by construction. The paper is self-contained as an empirical study against external benchmarks (simulated environments), satisfying the default expectation of no significant circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No new mathematical derivations or entities are introduced; the work relies on standard DQN and MARL assumptions (Markov decision process, Q-learning convergence under suitable conditions) that are not stated or justified in the abstract.

pith-pipeline@v0.9.1-grok · 5813 in / 1025 out tokens · 22222 ms · 2026-06-29T23:57:27.963623+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

53 extracted references · 8 canonical work pages · 2 internal anchors

[1]

Energy-Aware Multi- Agent Reinforcement Learning for Collaborative Execution in Mission-Oriented Drone Networks,

Y. Li, C. Li, J. Chen, and C. Roinou, “Energy-Aware Multi- Agent Reinforcement Learning for Collaborative Execution in Mission-Oriented Drone Networks,” IEEE International Confer- ence on Computer Communications and Networks, July. 2022

2022
[2]

Human-level control through deep rein- forcement learning,

V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski et al., “Human-level control through deep rein- forcement learning,” nature, vol. 518, no. 7540, pp. 529–533, 2015

2015
[3]

A Deep Policy Inference Q-Network for Multi-Agent Systems

Z.-W. Hong, S.-Y. Su, T.-Y. Shann, Y.-H. Chang, and C.-Y. Lee, “A deep policy inference q-network for multi-agent systems,” arXiv preprint arXiv:1712.07893, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[4]

A review of coopera- tive multi-agent deep reinforcement learning,

A. OroojlooyJadid and D. Hajinezhad, “A review of coopera- tive multi-agent deep reinforcement learning,” arXiv preprint arXiv:1908.03963, 2019

work page arXiv 1908
[5]

A collaborative multi-agent reinforcement learning anti-jamming algorithm in wireless networks,

F. Yao and L. Jia, “A collaborative multi-agent reinforcement learning anti-jamming algorithm in wireless networks,” IEEE wireless communications letters, vol. 8, no. 4, pp. 1024–1027, 2019

2019
[6]

Multiagent cooperation and competition with deep reinforcement learning,

A. Tampuu, T. Matiisen, D. Kodelja, I. Kuzovkin, K. Kor- jus, J. Aru, J. Aru, and R. Vicente, “Multiagent cooperation and competition with deep reinforcement learning,” PloS one, vol. 12, no. 4, p. e0172395, 2017

2017
[7]

Multi-agent reinforcement learning: An overview,

L. Buşoniu, R. Babuška, and B. De Schutter, “Multi-agent reinforcement learning: An overview,” Innovations in multi- agent systems and applications-1, pp. 183–221, 2010

2010
[8]

Multi-agent actor-critic for mixed cooperative- competitive environments,

R. Lowe, Y. I. Wu, A. Tamar, J. Harb, O. Pieter Abbeel, and I. Mordatch, “Multi-agent actor-critic for mixed cooperative- competitive environments,” Advances in neural information processing systems, vol. 30, 2017

2017
[9]

Delay-aware multi- agent reinforcement learning for cooperative and competitive environments,

B. Chen, M. Xu, Z. Liu, L. Li, and D. Zhao, “Delay-aware multi- agent reinforcement learning for cooperative and competitive environments,” arXiv preprint arXiv:2005.05441, 2020

work page arXiv 2005
[10]

Prioritized Experience Replay

T. Schaul, J. Quan, I. Antonoglou, and D. Silver, “Prioritized experience replay,” arXiv preprint arXiv:1511.05952, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[11]

Soft actor- critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,

T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor- critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” in International conference on machine learning. PMLR, 2018, pp. 1861–1870

2018
[12]

Addressing function approximation error in actor-critic methods,

S. Fujimoto, H. Hoof, and D. Meger, “Addressing function approximation error in actor-critic methods,” in International conference on machine learning. PMLR, 2018, pp. 1587–1596

2018
[13]

Roer: Regularized optimal experience replay,

C. Li, Z.-W. Hong, P. Agrawal, D. Garg, and J. Pajarinen, “Roer: Regularized optimal experience replay,” arXiv preprint arXiv:2407.03995, 2024

work page arXiv 2024
[14]

Multi-agent reinforcement learning for adaptive de- mand response in smart cities,

J. Vazquez-Canteli, T. Detjeen, G. Henze, J. Kämpf, and Z. Nagy, “Multi-agent reinforcement learning for adaptive de- mand response in smart cities,” in Journal of Physics: Confer- ence Series, vol. 1343, no. 1. IOP Publishing, 2019, p. 012058

2019
[15]

A multi-agent reinforce- ment learning approach to robot soccer,

Y. Duan, B. X. Cui, and X. H. Xu, “A multi-agent reinforce- ment learning approach to robot soccer,” Artificial Intelligence Review, vol. 38, no. 3, pp. 193–211, 2012

2012
[16]

Integrating independent and centralized multi-agent reinforcement learning for traﬀic signal network optimization,

Z. Zhang, J. Yang, and H. Zha, “Integrating independent and centralized multi-agent reinforcement learning for traﬀic signal network optimization,” arXiv preprint arXiv:1909.10651, 2019

work page arXiv 1909
[17]

Solving dynamic traveling salesman problems with deep reinforcement learning,

Z. Zhang, H. Liu, M. Zhou, and J. Wang, “Solving dynamic traveling salesman problems with deep reinforcement learning,” IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 4, pp. 2119–2132, 2023

2023
[18]

Secure underwater distributed an- tenna systems: A multi-agent reinforcement learning approach,

C. Wang, Z. Bi, and Y. Wan, “Secure underwater distributed an- tenna systems: A multi-agent reinforcement learning approach,” IEEE/CAA Journal of Automatica Sinica, vol. 10, no. 7, pp. 1622–1624, 2023

2023
[19]

Using drones for parcels delivery process,

L. D. P. Pugliese, F. Guerriero, and G. Macrina, “Using drones for parcels delivery process,” Procedia Manufacturing, vol. 42, pp. 488–497, 2020

2020
[20]

Cooperative dual-task path planning for persistent surveillance and emergency handling by multiple unmanned ground vehicles,

J. Zhang, Y. Wu, and M. Zhou, “Cooperative dual-task path planning for persistent surveillance and emergency handling by multiple unmanned ground vehicles,” IEEE Transactions on Intelligent Transportation Systems, pp. 1–12, 2024. IEEE INTERNET OF THINGS JOURNAL. 12

2024
[21]

Cost-eﬀicient task offloading in mobile edge computing with layered unmanned aerial vehicles,

H. Yuan, M. Wang, J. Bi, S. Shi, J. Yang, J. Zhang, M. Zhou, and R. Buyya, “Cost-eﬀicient task offloading in mobile edge computing with layered unmanned aerial vehicles,” IEEE In- ternet of Things Journal, pp. 1–1, 2024

2024
[22]

Energy-eﬀicient scheduling in uav-assisted hierarchi- cal wireless sensor networks,

R. Lai, B. Zhang, G. Gong, H. Yuan, J. Yang, J. Zhang, and M. Zhou, “Energy-eﬀicient scheduling in uav-assisted hierarchi- cal wireless sensor networks,” IEEE Internet of Things Journal, vol. 11, no. 11, pp. 20 194–20 206, 2024

2024
[23]

Uav-assisted dynamic avatar task migration for vehicular metaverse services: A multi-agent deep reinforcement learning approach,

J. Kang, J. Chen, M. Xu, Z. Xiong, Y. Jiao, L. Han, D. Niy- ato, Y. Tong, and S. Xie, “Uav-assisted dynamic avatar task migration for vehicular metaverse services: A multi-agent deep reinforcement learning approach,” IEEE/CAA Journal of Au- tomatica Sinica, vol. 11, no. 2, pp. 430–445, 2024

2024
[24]

Cooperative internet of UA Vs: Distributed trajectory design by multi-agent deep reinforcement learning,

J. Hu, H. Zhang, L. Song, R. Schober, and H. V. Poor, “Cooperative internet of UA Vs: Distributed trajectory design by multi-agent deep reinforcement learning,” IEEE Transactions on Communications, vol. 68, no. 11, pp. 6807–6821, 2020

2020
[25]

Wind turbine surface damage detection by deep learning aided drone inspection analysis,

A. Shihavuddin, X. Chen, V. Fedorov, A. Nymark Christensen, N. Andre Brogaard Riis, K. Branner, A. Bjorholm Dahl, and R. Reinhold Paulsen, “Wind turbine surface damage detection by deep learning aided drone inspection analysis,” Energies, vol. 12, no. 4, p. 676, 2019

2019
[26]

Toth and D

P. Toth and D. Vigo, The vehicle routing problem. SIAM, 2002

2002
[27]

New intelligent battery management system for drones,

S. R. Hashemi, R. Esmaeeli, H. Aliniagerdroudbari, M. Alhadri, H. Alshammari, A. Mahajan, and S. Farhad, “New intelligent battery management system for drones,” in ASME interna- tional mechanical engineering congress and exposition, vol. 59438. American Society of Mechanical Engineers, 2019, p. V006T06A028

2019
[28]

A survey on multi-agent deep rein- forcement learning: from the perspective of challenges and applications,

W. Du and S. Ding, “A survey on multi-agent deep rein- forcement learning: from the perspective of challenges and applications,” Artificial Intelligence Review, vol. 54, no. 5, pp. 3215–3238, 2021

2021
[29]

3M-RL: Multi- Resolution, Multi-Agent, Mean-Field Reinforcement Learning for Autonomous UA V Routing,

W. Wang, Y. Liu, R. Srikant, and L. Ying, “3M-RL: Multi- Resolution, Multi-Agent, Mean-Field Reinforcement Learning for Autonomous UA V Routing,” IEEE Transactions on Intelli- gent Transportation Systems, 2021

2021
[30]

Autonomous multirobot navigation and cooperative mapping in partially un- known environments,

H. Xie, D. Zhang, X. Hu, M. Zhou, and Z. Cao, “Autonomous multirobot navigation and cooperative mapping in partially un- known environments,” IEEE Transactions on Instrumentation and Measurement, vol. 72, pp. 1–12, 2023

2023
[31]

Multi-Agent Reinforcement Learning Based 3D Trajectory Design in Aerial-Terrestrial Wireless Caching Networks,

Y.-J. Chen, K.-M. Liao, M.-L. Ku, F. P. Tso, and G.-Y. Chen, “Multi-Agent Reinforcement Learning Based 3D Trajectory Design in Aerial-Terrestrial Wireless Caching Networks,” IEEE Transactions on Vehicular Technology, vol. 70, no. 8, pp. 8201– 8215, 2021

2021
[32]

Multi-agent deep reinforcement learning for trajectory design and power allocation in multi- UA V networks,

N. Zhao, Z. Liu, and Y. Cheng, “Multi-agent deep reinforcement learning for trajectory design and power allocation in multi- UA V networks,” IEEE Access, vol. 8, pp. 139 670–139 679, 2020

2020
[33]

Multi-agent deep reinforcement learning-based trajectory plan- ning for multi-UA V assisted mobile edge computing,

L. Wang, K. Wang, C. Pan, W. Xu, N. Aslam, and L. Hanzo, “Multi-agent deep reinforcement learning-based trajectory plan- ning for multi-UA V assisted mobile edge computing,” IEEE Transactions on Cognitive Communications and Networking, vol. 7, no. 1, pp. 73–84, 2020

2020
[34]

Eﬀicient uav trajectory-planning using economic reinforcement learning,

A. A. Khalil, A. J. Byrne, M. A. Rahman, and M. H. Manshaei, “Eﬀicient uav trajectory-planning using economic reinforcement learning,” arXiv preprint arXiv:2103.02676, 2021

work page arXiv 2021
[35]

Deep reinforcement learning based task-oriented communication in multi-agent systems,

G. He, M. Feng, Y. Zhang, G. Liu, Y. Dai, and T. Jiang, “Deep reinforcement learning based task-oriented communication in multi-agent systems,” IEEE Wireless Communications, vol. 30, no. 3, pp. 112–119, 2023

2023
[36]

Multi-agent reinforcement learning-based coordinated dynamic task alloca- tion for heterogenous uavs,

D. Liu, L. Dou, R. Zhang, X. Zhang, and Q. Zong, “Multi-agent reinforcement learning-based coordinated dynamic task alloca- tion for heterogenous uavs,” IEEE Transactions on Vehicular Technology, vol. 72, no. 4, pp. 4372–4383, 2023

2023
[37]

Shapley Q-value: A local reward approach to solve global reward games,

J. Wang, Y. Zhang, T.-K. Kim, and Y. Gu, “Shapley Q-value: A local reward approach to solve global reward games,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 05, 2020, pp. 7285–7292

2020
[38]

Cm3: Cooperative multi-goal multi-stage multi-agent reinforcement learning,

J. Yang, A. Nakhaei, D. Isele, K. Fujimura, and H. Zha, “Cm3: Cooperative multi-goal multi-stage multi-agent reinforcement learning,” arXiv preprint arXiv:1809.05188, 2018

work page arXiv 2018
[39]

Multi-agent reinforcement learning for problems with combined individual and team reward,

H. U. Sheikh and L. Bölöni, “Multi-agent reinforcement learning for problems with combined individual and team reward,” in 2020 International Joint Conference on Neural Networks (IJCNN). IEEE, 2020, pp. 1–8

2020
[40]

Liir: Learning individual intrinsic reward in multi-agent reinforce- ment learning,

Y. Du, L. Han, M. Fang, J. Liu, T. Dai, and D. Tao, “Liir: Learning individual intrinsic reward in multi-agent reinforce- ment learning,” Advances in Neural Information Processing Systems, vol. 32, 2019

2019
[41]

Energy use and life cycle greenhouse gas emissions of drones for commercial package delivery,

J. K. Stolaroff, C. Samaras, E. R. O’Neill, A. Lubers, A. S. Mitchell, and D. Ceperley, “Energy use and life cycle greenhouse gas emissions of drones for commercial package delivery,” Nature communications, vol. 9, no. 1, p. 409, 2018

2018
[42]

Evaluation of artificial neural network inference speed and energy consumption on embedded systems,

A. Arnautović and E. Teskeredžić, “Evaluation of artificial neural network inference speed and energy consumption on embedded systems,” in 2021 20th International Symposium INFOTEH-JAHORINA (INFOTEH). IEEE, 2021, pp. 1–5

2021
[43]

Neuropower: Designing energy eﬀicient convolutional neural network architecture for embedded systems,

M. Loni, A. Zoljodi, S. Sinaei, M. Daneshtalab, and M. Sjödin, “Neuropower: Designing energy eﬀicient convolutional neural network architecture for embedded systems,” in Artificial Neu- ral Networks and Machine Learning–ICANN 2019: Theoretical Neural Computation: 28th International Conference on Arti- ficial Neural Networks, Munich, Germany, September 17–...

2019
[44]

Neuro. zero: a zero-energy neural network accelerator for embedded sensing and inference systems,

S. Lee and S. Nirjon, “Neuro. zero: a zero-energy neural network accelerator for embedded sensing and inference systems,” in Proceedings of the 17th Conference on Embedded Networked Sensor Systems, 2019, pp. 138–152

2019
[45]

Energy-eﬀicient acceleration of deep neural networks on realtime-constrained embedded edge devices,

B. Kim, S. Lee, A. R. Trivedi, and W. J. Song, “Energy-eﬀicient acceleration of deep neural networks on realtime-constrained embedded edge devices,” IEEE Access, vol. 8, pp. 216 259– 216 270, 2020

2020
[46]

Embedded deep neural network processing: Algorithmic and processor techniques bring deep learning to iot and edge devices,

M. Verhelst and B. Moons, “Embedded deep neural network processing: Algorithmic and processor techniques bring deep learning to iot and edge devices,” IEEE Solid-State Circuits Magazine, vol. 9, no. 4, pp. 55–65, 2017

2017
[47]

Reinforcement learning for decentralized trajectory design in cellular UA V networks with sense-and-send protocol,

J. Hu, H. Zhang, and L. Song, “Reinforcement learning for decentralized trajectory design in cellular UA V networks with sense-and-send protocol,” IEEE Internet of Things Journal, vol. 6, no. 4, pp. 6177–6189, 2018

2018
[48]

Robust multi-agent reinforcement learning via minimax deep deterministic policy gradient,

S. Li, Y. Wu, X. Cui, H. Dong, F. Fang, and S. Russell, “Robust multi-agent reinforcement learning via minimax deep deterministic policy gradient,” in Proceedings of the AAAI conference on artificial intelligence, vol. 33, no. 01, 2019, pp. 4213–4220

2019
[49]

Monotonic value function factorisation for deep multi-agent reinforcement learning,

T. Rashid, M. Samvelyan, C. S. De Witt, G. Farquhar, J. Foer- ster, and S. Whiteson, “Monotonic value function factorisation for deep multi-agent reinforcement learning,” Journal of Ma- chine Learning Research, vol. 21, no. 178, pp. 1–51, 2020

2020
[50]

Qtran: Learning to factorize with transformation for coop- erative multi-agent reinforcement learning,

K. Son, D. Kim, W. J. Kang, D. E. Hostallero, and Y. Yi, “Qtran: Learning to factorize with transformation for coop- erative multi-agent reinforcement learning,” in International conference on machine learning. PMLR, 2019, pp. 5887–5896

2019
[51]

Mean field multi-agent reinforcement learning,

Y. Yang, R. Luo, M. Li, M. Zhou, W. Zhang, and J. Wang, “Mean field multi-agent reinforcement learning,” in Interna- tional conference on machine learning. PMLR, 2018, pp. 5571– 5580

2018
[52]

Multidepot drone path planning with collision avoidance,

K. Shen, R. Shivgan, J. Medina, Z. Dong, and R. Rojas-Cessa, “Multidepot drone path planning with collision avoidance,” IEEE Internet of Things Journal, vol. 9, no. 17, pp. 16 297– 16 307, 2022

2022
[53]

Drone networks: Communications, coordina- tion, and sensing,

E. Yanmaz, S. Yahyanejad, B. Rinner, H. Hellwagner, and C. Bettstetter, “Drone networks: Communications, coordina- tion, and sensing,” Ad Hoc Networks, vol. 68, pp. 1–15, 2018. IEEE INTERNET OF THINGS JOURNAL. 13 Appendix A success rate evaluation for parameter study Fig. 16: The success rate under different target network update frequency f values and ex...

2018

[1] [1]

Energy-Aware Multi- Agent Reinforcement Learning for Collaborative Execution in Mission-Oriented Drone Networks,

Y. Li, C. Li, J. Chen, and C. Roinou, “Energy-Aware Multi- Agent Reinforcement Learning for Collaborative Execution in Mission-Oriented Drone Networks,” IEEE International Confer- ence on Computer Communications and Networks, July. 2022

2022

[2] [2]

Human-level control through deep rein- forcement learning,

V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski et al., “Human-level control through deep rein- forcement learning,” nature, vol. 518, no. 7540, pp. 529–533, 2015

2015

[3] [3]

A Deep Policy Inference Q-Network for Multi-Agent Systems

Z.-W. Hong, S.-Y. Su, T.-Y. Shann, Y.-H. Chang, and C.-Y. Lee, “A deep policy inference q-network for multi-agent systems,” arXiv preprint arXiv:1712.07893, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[4] [4]

A review of coopera- tive multi-agent deep reinforcement learning,

A. OroojlooyJadid and D. Hajinezhad, “A review of coopera- tive multi-agent deep reinforcement learning,” arXiv preprint arXiv:1908.03963, 2019

work page arXiv 1908

[5] [5]

A collaborative multi-agent reinforcement learning anti-jamming algorithm in wireless networks,

F. Yao and L. Jia, “A collaborative multi-agent reinforcement learning anti-jamming algorithm in wireless networks,” IEEE wireless communications letters, vol. 8, no. 4, pp. 1024–1027, 2019

2019

[6] [6]

Multiagent cooperation and competition with deep reinforcement learning,

A. Tampuu, T. Matiisen, D. Kodelja, I. Kuzovkin, K. Kor- jus, J. Aru, J. Aru, and R. Vicente, “Multiagent cooperation and competition with deep reinforcement learning,” PloS one, vol. 12, no. 4, p. e0172395, 2017

2017

[7] [7]

Multi-agent reinforcement learning: An overview,

L. Buşoniu, R. Babuška, and B. De Schutter, “Multi-agent reinforcement learning: An overview,” Innovations in multi- agent systems and applications-1, pp. 183–221, 2010

2010

[8] [8]

Multi-agent actor-critic for mixed cooperative- competitive environments,

R. Lowe, Y. I. Wu, A. Tamar, J. Harb, O. Pieter Abbeel, and I. Mordatch, “Multi-agent actor-critic for mixed cooperative- competitive environments,” Advances in neural information processing systems, vol. 30, 2017

2017

[9] [9]

Delay-aware multi- agent reinforcement learning for cooperative and competitive environments,

B. Chen, M. Xu, Z. Liu, L. Li, and D. Zhao, “Delay-aware multi- agent reinforcement learning for cooperative and competitive environments,” arXiv preprint arXiv:2005.05441, 2020

work page arXiv 2005

[10] [10]

Prioritized Experience Replay

T. Schaul, J. Quan, I. Antonoglou, and D. Silver, “Prioritized experience replay,” arXiv preprint arXiv:1511.05952, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[11] [11]

Soft actor- critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,

T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor- critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” in International conference on machine learning. PMLR, 2018, pp. 1861–1870

2018

[12] [12]

Addressing function approximation error in actor-critic methods,

S. Fujimoto, H. Hoof, and D. Meger, “Addressing function approximation error in actor-critic methods,” in International conference on machine learning. PMLR, 2018, pp. 1587–1596

2018

[13] [13]

Roer: Regularized optimal experience replay,

C. Li, Z.-W. Hong, P. Agrawal, D. Garg, and J. Pajarinen, “Roer: Regularized optimal experience replay,” arXiv preprint arXiv:2407.03995, 2024

work page arXiv 2024

[14] [14]

Multi-agent reinforcement learning for adaptive de- mand response in smart cities,

J. Vazquez-Canteli, T. Detjeen, G. Henze, J. Kämpf, and Z. Nagy, “Multi-agent reinforcement learning for adaptive de- mand response in smart cities,” in Journal of Physics: Confer- ence Series, vol. 1343, no. 1. IOP Publishing, 2019, p. 012058

2019

[15] [15]

A multi-agent reinforce- ment learning approach to robot soccer,

Y. Duan, B. X. Cui, and X. H. Xu, “A multi-agent reinforce- ment learning approach to robot soccer,” Artificial Intelligence Review, vol. 38, no. 3, pp. 193–211, 2012

2012

[16] [16]

Integrating independent and centralized multi-agent reinforcement learning for traﬀic signal network optimization,

Z. Zhang, J. Yang, and H. Zha, “Integrating independent and centralized multi-agent reinforcement learning for traﬀic signal network optimization,” arXiv preprint arXiv:1909.10651, 2019

work page arXiv 1909

[17] [17]

Solving dynamic traveling salesman problems with deep reinforcement learning,

Z. Zhang, H. Liu, M. Zhou, and J. Wang, “Solving dynamic traveling salesman problems with deep reinforcement learning,” IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 4, pp. 2119–2132, 2023

2023

[18] [18]

Secure underwater distributed an- tenna systems: A multi-agent reinforcement learning approach,

C. Wang, Z. Bi, and Y. Wan, “Secure underwater distributed an- tenna systems: A multi-agent reinforcement learning approach,” IEEE/CAA Journal of Automatica Sinica, vol. 10, no. 7, pp. 1622–1624, 2023

2023

[19] [19]

Using drones for parcels delivery process,

L. D. P. Pugliese, F. Guerriero, and G. Macrina, “Using drones for parcels delivery process,” Procedia Manufacturing, vol. 42, pp. 488–497, 2020

2020

[20] [20]

Cooperative dual-task path planning for persistent surveillance and emergency handling by multiple unmanned ground vehicles,

J. Zhang, Y. Wu, and M. Zhou, “Cooperative dual-task path planning for persistent surveillance and emergency handling by multiple unmanned ground vehicles,” IEEE Transactions on Intelligent Transportation Systems, pp. 1–12, 2024. IEEE INTERNET OF THINGS JOURNAL. 12

2024

[21] [21]

Cost-eﬀicient task offloading in mobile edge computing with layered unmanned aerial vehicles,

H. Yuan, M. Wang, J. Bi, S. Shi, J. Yang, J. Zhang, M. Zhou, and R. Buyya, “Cost-eﬀicient task offloading in mobile edge computing with layered unmanned aerial vehicles,” IEEE In- ternet of Things Journal, pp. 1–1, 2024

2024

[22] [22]

Energy-eﬀicient scheduling in uav-assisted hierarchi- cal wireless sensor networks,

R. Lai, B. Zhang, G. Gong, H. Yuan, J. Yang, J. Zhang, and M. Zhou, “Energy-eﬀicient scheduling in uav-assisted hierarchi- cal wireless sensor networks,” IEEE Internet of Things Journal, vol. 11, no. 11, pp. 20 194–20 206, 2024

2024

[23] [23]

Uav-assisted dynamic avatar task migration for vehicular metaverse services: A multi-agent deep reinforcement learning approach,

J. Kang, J. Chen, M. Xu, Z. Xiong, Y. Jiao, L. Han, D. Niy- ato, Y. Tong, and S. Xie, “Uav-assisted dynamic avatar task migration for vehicular metaverse services: A multi-agent deep reinforcement learning approach,” IEEE/CAA Journal of Au- tomatica Sinica, vol. 11, no. 2, pp. 430–445, 2024

2024

[24] [24]

Cooperative internet of UA Vs: Distributed trajectory design by multi-agent deep reinforcement learning,

J. Hu, H. Zhang, L. Song, R. Schober, and H. V. Poor, “Cooperative internet of UA Vs: Distributed trajectory design by multi-agent deep reinforcement learning,” IEEE Transactions on Communications, vol. 68, no. 11, pp. 6807–6821, 2020

2020

[25] [25]

Wind turbine surface damage detection by deep learning aided drone inspection analysis,

A. Shihavuddin, X. Chen, V. Fedorov, A. Nymark Christensen, N. Andre Brogaard Riis, K. Branner, A. Bjorholm Dahl, and R. Reinhold Paulsen, “Wind turbine surface damage detection by deep learning aided drone inspection analysis,” Energies, vol. 12, no. 4, p. 676, 2019

2019

[26] [26]

Toth and D

P. Toth and D. Vigo, The vehicle routing problem. SIAM, 2002

2002

[27] [27]

New intelligent battery management system for drones,

S. R. Hashemi, R. Esmaeeli, H. Aliniagerdroudbari, M. Alhadri, H. Alshammari, A. Mahajan, and S. Farhad, “New intelligent battery management system for drones,” in ASME interna- tional mechanical engineering congress and exposition, vol. 59438. American Society of Mechanical Engineers, 2019, p. V006T06A028

2019

[28] [28]

A survey on multi-agent deep rein- forcement learning: from the perspective of challenges and applications,

W. Du and S. Ding, “A survey on multi-agent deep rein- forcement learning: from the perspective of challenges and applications,” Artificial Intelligence Review, vol. 54, no. 5, pp. 3215–3238, 2021

2021

[29] [29]

3M-RL: Multi- Resolution, Multi-Agent, Mean-Field Reinforcement Learning for Autonomous UA V Routing,

W. Wang, Y. Liu, R. Srikant, and L. Ying, “3M-RL: Multi- Resolution, Multi-Agent, Mean-Field Reinforcement Learning for Autonomous UA V Routing,” IEEE Transactions on Intelli- gent Transportation Systems, 2021

2021

[30] [30]

Autonomous multirobot navigation and cooperative mapping in partially un- known environments,

H. Xie, D. Zhang, X. Hu, M. Zhou, and Z. Cao, “Autonomous multirobot navigation and cooperative mapping in partially un- known environments,” IEEE Transactions on Instrumentation and Measurement, vol. 72, pp. 1–12, 2023

2023

[31] [31]

Multi-Agent Reinforcement Learning Based 3D Trajectory Design in Aerial-Terrestrial Wireless Caching Networks,

Y.-J. Chen, K.-M. Liao, M.-L. Ku, F. P. Tso, and G.-Y. Chen, “Multi-Agent Reinforcement Learning Based 3D Trajectory Design in Aerial-Terrestrial Wireless Caching Networks,” IEEE Transactions on Vehicular Technology, vol. 70, no. 8, pp. 8201– 8215, 2021

2021

[32] [32]

Multi-agent deep reinforcement learning for trajectory design and power allocation in multi- UA V networks,

N. Zhao, Z. Liu, and Y. Cheng, “Multi-agent deep reinforcement learning for trajectory design and power allocation in multi- UA V networks,” IEEE Access, vol. 8, pp. 139 670–139 679, 2020

2020

[33] [33]

Multi-agent deep reinforcement learning-based trajectory plan- ning for multi-UA V assisted mobile edge computing,

L. Wang, K. Wang, C. Pan, W. Xu, N. Aslam, and L. Hanzo, “Multi-agent deep reinforcement learning-based trajectory plan- ning for multi-UA V assisted mobile edge computing,” IEEE Transactions on Cognitive Communications and Networking, vol. 7, no. 1, pp. 73–84, 2020

2020

[34] [34]

Eﬀicient uav trajectory-planning using economic reinforcement learning,

A. A. Khalil, A. J. Byrne, M. A. Rahman, and M. H. Manshaei, “Eﬀicient uav trajectory-planning using economic reinforcement learning,” arXiv preprint arXiv:2103.02676, 2021

work page arXiv 2021

[35] [35]

Deep reinforcement learning based task-oriented communication in multi-agent systems,

G. He, M. Feng, Y. Zhang, G. Liu, Y. Dai, and T. Jiang, “Deep reinforcement learning based task-oriented communication in multi-agent systems,” IEEE Wireless Communications, vol. 30, no. 3, pp. 112–119, 2023

2023

[36] [36]

Multi-agent reinforcement learning-based coordinated dynamic task alloca- tion for heterogenous uavs,

D. Liu, L. Dou, R. Zhang, X. Zhang, and Q. Zong, “Multi-agent reinforcement learning-based coordinated dynamic task alloca- tion for heterogenous uavs,” IEEE Transactions on Vehicular Technology, vol. 72, no. 4, pp. 4372–4383, 2023

2023

[37] [37]

Shapley Q-value: A local reward approach to solve global reward games,

J. Wang, Y. Zhang, T.-K. Kim, and Y. Gu, “Shapley Q-value: A local reward approach to solve global reward games,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 05, 2020, pp. 7285–7292

2020

[38] [38]

Cm3: Cooperative multi-goal multi-stage multi-agent reinforcement learning,

J. Yang, A. Nakhaei, D. Isele, K. Fujimura, and H. Zha, “Cm3: Cooperative multi-goal multi-stage multi-agent reinforcement learning,” arXiv preprint arXiv:1809.05188, 2018

work page arXiv 2018

[39] [39]

Multi-agent reinforcement learning for problems with combined individual and team reward,

H. U. Sheikh and L. Bölöni, “Multi-agent reinforcement learning for problems with combined individual and team reward,” in 2020 International Joint Conference on Neural Networks (IJCNN). IEEE, 2020, pp. 1–8

2020

[40] [40]

Liir: Learning individual intrinsic reward in multi-agent reinforce- ment learning,

Y. Du, L. Han, M. Fang, J. Liu, T. Dai, and D. Tao, “Liir: Learning individual intrinsic reward in multi-agent reinforce- ment learning,” Advances in Neural Information Processing Systems, vol. 32, 2019

2019

[41] [41]

Energy use and life cycle greenhouse gas emissions of drones for commercial package delivery,

J. K. Stolaroff, C. Samaras, E. R. O’Neill, A. Lubers, A. S. Mitchell, and D. Ceperley, “Energy use and life cycle greenhouse gas emissions of drones for commercial package delivery,” Nature communications, vol. 9, no. 1, p. 409, 2018

2018

[42] [42]

Evaluation of artificial neural network inference speed and energy consumption on embedded systems,

A. Arnautović and E. Teskeredžić, “Evaluation of artificial neural network inference speed and energy consumption on embedded systems,” in 2021 20th International Symposium INFOTEH-JAHORINA (INFOTEH). IEEE, 2021, pp. 1–5

2021

[43] [43]

Neuropower: Designing energy eﬀicient convolutional neural network architecture for embedded systems,

M. Loni, A. Zoljodi, S. Sinaei, M. Daneshtalab, and M. Sjödin, “Neuropower: Designing energy eﬀicient convolutional neural network architecture for embedded systems,” in Artificial Neu- ral Networks and Machine Learning–ICANN 2019: Theoretical Neural Computation: 28th International Conference on Arti- ficial Neural Networks, Munich, Germany, September 17–...

2019

[44] [44]

Neuro. zero: a zero-energy neural network accelerator for embedded sensing and inference systems,

S. Lee and S. Nirjon, “Neuro. zero: a zero-energy neural network accelerator for embedded sensing and inference systems,” in Proceedings of the 17th Conference on Embedded Networked Sensor Systems, 2019, pp. 138–152

2019

[45] [45]

Energy-eﬀicient acceleration of deep neural networks on realtime-constrained embedded edge devices,

B. Kim, S. Lee, A. R. Trivedi, and W. J. Song, “Energy-eﬀicient acceleration of deep neural networks on realtime-constrained embedded edge devices,” IEEE Access, vol. 8, pp. 216 259– 216 270, 2020

2020

[46] [46]

Embedded deep neural network processing: Algorithmic and processor techniques bring deep learning to iot and edge devices,

M. Verhelst and B. Moons, “Embedded deep neural network processing: Algorithmic and processor techniques bring deep learning to iot and edge devices,” IEEE Solid-State Circuits Magazine, vol. 9, no. 4, pp. 55–65, 2017

2017

[47] [47]

Reinforcement learning for decentralized trajectory design in cellular UA V networks with sense-and-send protocol,

J. Hu, H. Zhang, and L. Song, “Reinforcement learning for decentralized trajectory design in cellular UA V networks with sense-and-send protocol,” IEEE Internet of Things Journal, vol. 6, no. 4, pp. 6177–6189, 2018

2018

[48] [48]

Robust multi-agent reinforcement learning via minimax deep deterministic policy gradient,

S. Li, Y. Wu, X. Cui, H. Dong, F. Fang, and S. Russell, “Robust multi-agent reinforcement learning via minimax deep deterministic policy gradient,” in Proceedings of the AAAI conference on artificial intelligence, vol. 33, no. 01, 2019, pp. 4213–4220

2019

[49] [49]

Monotonic value function factorisation for deep multi-agent reinforcement learning,

T. Rashid, M. Samvelyan, C. S. De Witt, G. Farquhar, J. Foer- ster, and S. Whiteson, “Monotonic value function factorisation for deep multi-agent reinforcement learning,” Journal of Ma- chine Learning Research, vol. 21, no. 178, pp. 1–51, 2020

2020

[50] [50]

Qtran: Learning to factorize with transformation for coop- erative multi-agent reinforcement learning,

K. Son, D. Kim, W. J. Kang, D. E. Hostallero, and Y. Yi, “Qtran: Learning to factorize with transformation for coop- erative multi-agent reinforcement learning,” in International conference on machine learning. PMLR, 2019, pp. 5887–5896

2019

[51] [51]

Mean field multi-agent reinforcement learning,

Y. Yang, R. Luo, M. Li, M. Zhou, W. Zhang, and J. Wang, “Mean field multi-agent reinforcement learning,” in Interna- tional conference on machine learning. PMLR, 2018, pp. 5571– 5580

2018

[52] [52]

Multidepot drone path planning with collision avoidance,

K. Shen, R. Shivgan, J. Medina, Z. Dong, and R. Rojas-Cessa, “Multidepot drone path planning with collision avoidance,” IEEE Internet of Things Journal, vol. 9, no. 17, pp. 16 297– 16 307, 2022

2022

[53] [53]

Drone networks: Communications, coordina- tion, and sensing,

E. Yanmaz, S. Yahyanejad, B. Rinner, H. Hellwagner, and C. Bettstetter, “Drone networks: Communications, coordina- tion, and sensing,” Ad Hoc Networks, vol. 68, pp. 1–15, 2018. IEEE INTERNET OF THINGS JOURNAL. 13 Appendix A success rate evaluation for parameter study Fig. 16: The success rate under different target network update frequency f values and ex...

2018