Heterogeneous AAV Logistics Task Allocation: A Reinforcement Learning Enhanced Overlapping Coalition Formation Game Approach

Jianxin Zhong; Jingliang Sun; Junzhi Li; Teng Long; Yuze Zhou; Zihan Wang

arxiv: 2605.26471 · v1 · pith:BDQ442J5new · submitted 2026-05-26 · 💻 cs.RO

Heterogeneous AAV Logistics Task Allocation: A Reinforcement Learning Enhanced Overlapping Coalition Formation Game Approach

Yuze Zhou , Jingliang Sun , Junzhi Li , Jianxin Zhong , Zihan Wang , Teng Long This is my paper

Pith reviewed 2026-06-29 17:35 UTC · model grok-4.3

classification 💻 cs.RO

keywords heterogeneous AAVsoverlapping coalition formationreinforcement learningtransformer encoderpotential gametask allocationdynamic logisticsurban logistics

0 comments

The pith

A transformer-based reinforcement learning policy enhances overlapping coalition formation for heterogeneous AAV logistics task allocation, reducing generalized costs and guaranteeing Nash-stable convergence.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper models dynamic task allocation for heterogeneous AAVs as an overlapping coalition formation game where optimality is defined by a generalized logistics cost that accounts for both service quality and resource consumption. A transformer encoder within a soft actor-critic framework learns to process variable-length task states and outputs policies that adaptively update coalitions instead of relying on fixed heuristics. The authors prove that this process forms an exact potential game, ensuring convergence to a Nash-stable equilibrium in finite steps. Simulations with up to 80 tasks show substantial cost improvements over baselines, with indoor experiments confirming practical feasibility.

Core claim

The paper claims that by embedding a transformer-based soft actor-critic network into an overlapping coalition formation game, heterogeneous AAVs can dynamically form overlapping coalitions for stochastic time-sensitive tasks, where the coalition formation constitutes an exact potential game that converges to Nash-stable equilibrium, leading to a 39.76 percent reduction in the generalized logistics cost compared to heuristic methods in a 32 AAV and 80 task scenario.

What carries the argument

The transformer-based soft actor-critic network, which uses multi-head self-attention to encode variable-length logistics states and capture spatiotemporal dependencies to guide coalition updates in the overlapping coalition formation game.

Load-bearing premise

The model assumes that global optimality can be captured by a single generalized logistics cost coupling service quality and resource consumption, and that the transformer policy produces reliable updates for time-varying task sets.

What would settle it

Observing that the coalition formation process fails to reach a Nash-stable equilibrium in repeated simulations, or that the cost reduction does not materialize in scenarios with higher task variability, would challenge the central claims.

read the original abstract

In dynamic urban logistics, the stochastic emergence of time-sensitive tasks poses a significant optimality challenge for heterogeneous AAVs logistics task allocation. To address this problem, a reinforcement learning enhanced overlapping coalition formation game approach is proposed. A dynamic task allocation model is established, where global optimality is mathematically quantified by a generalized logistics cost coupling service quality and resource consumption. To deal with the time-varying task sets induced by stochastic order arrivals, a transformer-based soft actor-critic network is designed. By leveraging multi-head self-attention to encode variable-length logistics states and capture task-wise spatiotemporal dependencies, the learned policy adaptively guides coalition updates, replacing heuristic rules in the overlapping coalition formation game. On this basis, heterogeneous AAVs can form more efficient overlapping coalitions for dynamic logistics tasks. The resulting coalition formation process is proven to constitute an exact potential game, which guarantees convergence to a Nash-stable equilibrium within a finite number of iterations. Numerical simulations demonstrate that the proposed algorithm effectively improves the optimality of task allocation under the generalized logistics cost criterion. In a scenario with 32 AAVs and 80 tasks, our algorithm achieves a 39.76% cost reduction compared with the heuristic OCF baseline. Indoor flight experiments further validate its practicality.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper integrates a transformer SAC policy into overlapping coalition formation for AAV logistics and reports a 40% cost drop plus an exact potential game proof, but the RL guidance probably invalidates the convergence guarantee.

read the letter

The core idea here is using a learned policy to steer coalition updates in a dynamic AAV task allocation setting instead of fixed heuristics. The transformer encoder handles variable-length states and spatiotemporal task dependencies, which fits the stochastic arrival problem. They also run indoor flights to show the approach works on hardware.

The 39.76% cost reduction in the 32-AAV/80-task case is a clear number against the heuristic baseline, and the generalized logistics cost that mixes service quality and resource use is a reasonable modeling choice for this domain.

The load-bearing claim is that the coalition process remains an exact potential game even after the RL policy replaces the heuristics. The abstract states this guarantees finite convergence to Nash-stable equilibrium, but gives no derivation steps. Because the policy is trained directly on the same cost function and can introduce non-myopic or state-dependent choices, it is not obvious that individual utilities still align with a global potential. If that alignment breaks, the convergence result does not carry over.

The numerical result also lacks error bars, multiple random seeds, or statistical tests, so the size of the gain is hard to assess. The paper does not appear to compare against other modern RL or game-theoretic baselines beyond the simple heuristic.

This work is mainly useful to people already working on multi-agent task allocation in robotics or logistics who want a concrete example of transformer RL inside a coalition game. It is worth sending to referees so they can check whether the potential-game property actually survives the learned policy; the application itself is narrow enough that a desk reject would be reasonable if the theory does not hold up.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes a reinforcement learning enhanced overlapping coalition formation game approach for heterogeneous AAV logistics task allocation in dynamic urban settings. It establishes a dynamic task allocation model quantified by a generalized logistics cost, designs a transformer-based soft actor-critic network to encode variable-length states and guide coalition updates adaptively, proves that the resulting coalition formation constitutes an exact potential game guaranteeing finite convergence to a Nash-stable equilibrium, and reports a 39.76% cost reduction versus a heuristic OCF baseline in a 32-AAV/80-task simulation scenario along with indoor flight experiments.

Significance. If the exact potential game property is preserved under the learned RL policy and the performance gains are robust, the work could contribute a theoretically grounded hybrid method for stochastic task allocation that improves on heuristic baselines while providing convergence guarantees. The use of multi-head self-attention for spatiotemporal dependencies in logistics states and the experimental validation are positive elements.

major comments (2)

[Abstract] Abstract: The central claim that the coalition formation process is an exact potential game (guaranteeing finite Nash-stable convergence) is load-bearing, yet no derivation is supplied. Because the transformer-based SAC policy is trained directly on the generalized logistics cost and produces state-dependent updates, it is unclear whether the individual utilities remain aligned with any global potential function; the RL guidance may introduce non-local or non-myopic dependencies that invalidate the exact potential property even if the underlying game without RL satisfies it.
[Numerical simulations] Numerical simulations paragraph: The reported 39.76% cost reduction for 32 AAVs and 80 tasks is presented without statistical significance tests, error bars, variance across runs, or an explicit definition of the heuristic OCF baseline and the precise components of the generalized logistics cost, preventing assessment of whether the improvement reflects genuine generalization or fitting to the training criterion.

minor comments (1)

[Abstract] The abstract references indoor flight experiments for practicality validation but supplies no quantitative results, setup parameters, or comparison metrics in the provided text.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major comment below and will revise the manuscript to strengthen the presentation while preserving the core contributions.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that the coalition formation process is an exact potential game (guaranteeing finite Nash-stable convergence) is load-bearing, yet no derivation is supplied. Because the transformer-based SAC policy is trained directly on the generalized logistics cost and produces state-dependent updates, it is unclear whether the individual utilities remain aligned with any global potential function; the RL guidance may introduce non-local or non-myopic dependencies that invalidate the exact potential property even if the underlying game without RL satisfies it.

Authors: The exact potential game property holds for the underlying overlapping coalition formation game, where individual utilities are explicitly constructed to align with the generalized logistics cost serving as the potential function. The transformer-based SAC policy is trained to optimize this same cost but functions only as an adaptive selector of which valid coalition updates to execute; it does not alter the utility definitions or introduce non-myopic dependencies into the game structure. Consequently, best-response dynamics remain aligned with the potential, preserving finite convergence to a Nash-stable equilibrium. We will add an explicit derivation of the potential function and a proof of the exact potential property (including the role of the learned policy) to the revised manuscript. revision: yes
Referee: [Numerical simulations] Numerical simulations paragraph: The reported 39.76% cost reduction for 32 AAVs and 80 tasks is presented without statistical significance tests, error bars, variance across runs, or an explicit definition of the heuristic OCF baseline and the precise components of the generalized logistics cost, preventing assessment of whether the improvement reflects genuine generalization or fitting to the training criterion.

Authors: We agree that additional statistical detail and explicit definitions are required for rigorous evaluation. In the revised manuscript we will report results with error bars and standard deviation across independent runs, include statistical significance tests against the baseline, provide a precise definition of the heuristic OCF baseline, and fully specify the components of the generalized logistics cost. revision: yes

Circularity Check

0 steps flagged

Derivation chain is self-contained with no circular reductions

full rationale

The paper defines a generalized logistics cost as the global optimality criterion, designs a transformer-based SAC policy to guide coalition updates, and states that the resulting coalition formation process constitutes an exact potential game with finite convergence to Nash equilibrium. The proof is presented as following from the structure of the overlapping coalition formation game itself. No step reduces by construction to a fitted parameter renamed as prediction, no self-citation supplies a load-bearing uniqueness theorem, and no ansatz is smuggled via prior work. Simulations and experiments provide external numerical benchmarks against a heuristic OCF baseline using the same cost function, keeping the theoretical claim independent of the learned policy outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no identifiable free parameters, axioms, or invented entities beyond standard assumptions of game theory and RL; full text would be required to audit these.

pith-pipeline@v0.9.1-grok · 5758 in / 1101 out tokens · 37547 ms · 2026-06-29T17:35:49.989788+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

31 extracted references · 29 canonical work pages · 1 internal anchor

[1]

Holistic service pro visioning in a UAV -UGV integrated network for last -mile delivery,

J. Xu, X. Liu, J. Jin, W. Pan, X. Li, and Y. Yang, “Holistic service pro visioning in a UAV -UGV integrated network for last -mile delivery,” I EEE Trans. Netw. Serv. Manage. , vol. 22, no. 1, pp. 380 –393, Feb. 20 25, doi: 10.1109/TNSM.2024.3487357

work page doi:10.1109/tnsm.2024.3487357 2024
[2]

Multi -UAV-enabled energy-effici ent data delivery for low -altitude economy: Joint coded caching, user grouping, and UAV deployment,

Q. Wei, R. Li, W. Bai, and Z. Han, “Multi -UAV-enabled energy-effici ent data delivery for low -altitude economy: Joint coded caching, user grouping, and UAV deployment,” IEEE Internet Things J. , pp. 1–1, 2 025, doi: 10.1109/JIOT.2025.3562872

work page doi:10.1109/jiot.2025.3562872 2025
[3]

Lambeta, P.-W

Y. Cao, T. Long, J. Sun, Z. Wang, and G. Xu, “Comparison of distrib uted task allocation algorithms considering non -ideal communication f actors for multi -UAV collaborative visit missions,” IEEE Robot. Auto m. Lett., vol. 10, no. 2, pp. 1928 –1935, Feb. 2025, doi: 10.1109/LRA. 2023.3295999

work page doi:10.1109/lra 1928
[4]

A review of task allocati on methods for UAVs,

G. M. Skaltsis, H.-S. Shin, and A. Tsourdos, “A review of task allocati on methods for UAVs,” J. Intell. Rob. Syst., vol. 109, no. 4, p. 76, Dec. 2023, doi: 10.1007/s10846 -023-02011-0

work page doi:10.1007/s10846 2023
[5]

Review of dynamic task allocation met hods for UAV swarms oriented to ground targets,

Q. Peng, H. Wu, and R. Xue, “Review of dynamic task allocation met hods for UAV swarms oriented to ground targets,” Complex Syst. Mod el. Simul., vol. 1, no. 3, pp. 163 –175, Sep. 2021, doi: 10.23919/CSMS. 2021.0022

work page doi:10.23919/csms 2021
[6]

Bi -level optimization framewor k for urban low-altitude UAV delivery ensuring target level of safety,

B. Jiang, Y. Li, C. Li, and Y. Zheng, “Bi -level optimization framewor k for urban low-altitude UAV delivery ensuring target level of safety,” IEEE Trans. Intell. Transport. Syst. , pp. 1 –14, 2026, doi: 10.1109/TI TS.2026.3660878

work page doi:10.1109/ti 2026
[7]

Urban on -demand delivery via autonomous aerial mobility: Formulation and exact algorithm,

Z. Pei, T. Fang, K. Weng, and W. Yi, “Urban on -demand delivery via autonomous aerial mobility: Formulation and exact algorithm,” IEEE Trans. Autom. Sci. Eng., vol. 20, no. 3, pp. 1675 –1689, Jul. 2023, doi: 10.1109/TASE.2022.3184324

work page doi:10.1109/tase.2022.3184324 2023
[8]

Crowdsourced auction - based framework for time -critical and budget-constrained last mile del ivery,

E. Odeh, S. Singh, R. Mizouni, and H. Otrok, “Crowdsourced auction - based framework for time -critical and budget-constrained last mile del ivery,” Inf. Process. Manage. , vol. 62, no. 1, p. 103888, Jan. 2025, doi: 10.1016/j.ipm.2024.103888

work page doi:10.1016/j.ipm.2024.103888 2025
[9]

Z. Zhen, L. Wen, B. Wang, Z. Hu, and D. Zhang, “Improved contract network protocol algorithm based cooperative target allocation of hete AAV3 AAV2 AAV1 AAV0 AAV3 AAV2 AAV1 AAV0 (a) (b) Fig. 9. Indoor flight experiments for dynamic task reallocation. (a) First reallocation triggered by newly emerged tasks at T = 5 s. (b) Second reallocation trig- gered by ...

work page doi:10.1016/j.ast.2021.107054 2021
[10]

Collaborative task allocation fo r large-scale heterogeneous UAV swarm: A hierarchical coalition for mation game method,

Y. Yan, W. Bi, G. Ma, and A. Zhang, “Collaborative task allocation fo r large-scale heterogeneous UAV swarm: A hierarchical coalition for mation game method,” IEEE Internet Things J. , pp. 1–1, 2025, doi: 10. 1109/JIOT.2025.3562692

work page arXiv 2025
[11]

Cooperat ive task allocation and path planning for multi -UAVs in low-altitude u rban intelligent transportation systems,

Z. Zhang, J. Jiang, K. V. Ling, X. Wang, and W. -A. Zhang, “Cooperat ive task allocation and path planning for multi -UAVs in low-altitude u rban intelligent transportation systems,” IEEE Trans. Intell. Transport. Syst., pp. 1–13, 2026, doi: 10.1109/TITS.2026.3667967

work page doi:10.1109/tits.2026.3667967 2026
[12]

Coalition -based facility location optimization for urban UAV logistics,

L. Liu and Z. Gong, “Coalition -based facility location optimization for urban UAV logistics,” Transportation Research Part C: Emerging Te chnologies, vol. 186, p. 105624, May 2026, doi: 10.1016/j.trc.2026.10 5624

work page doi:10.1016/j.trc.2026.10 2026
[13]

Joint UA V deployment, power allocation, and coalition formation for physical l ayer security in heterogeneous networks,

Y. Zhang, X. Gao, N. Ye, D. Niyato, Z. Han, and K. Yang, “Joint UA V deployment, power allocation, and coalition formation for physical l ayer security in heterogeneous networks,” IEEE Trans. Veh. Technol., vol. 74, no. 7, pp. 10994 –11009, Jul. 2025, doi: 10.1109/TVT.2025.35 48987

work page doi:10.1109/tvt.2025.35 2025
[14]

A heuristic task allocation metho d based on overlapping coalition formation game for heterogeneous U AVs,

Y. Li, Z. Zhang, Z. He, and Q. Sun, “A heuristic task allocation metho d based on overlapping coalition formation game for heterogeneous U AVs,” IEEE Internet Things J., vol. 11, no. 17, pp. 28945 –28959, Sep. 2024, doi: 10.1109/JIOT.2024.3406336

work page doi:10.1109/jiot.2024.3406336 2024
[15]

A task -driven sequential overlapping coalition formation game for resource allocati on in heterogeneous UAV networks,

N. Qi, Z. Huang, F. Zhou, Q. Shi, Q. Wu, and M. Xiao, “A task -driven sequential overlapping coalition formation game for resource allocati on in heterogeneous UAV networks,” IEEE Trans. on Mobile Comput., vol. 22, no. 8, pp. 4439 –4455, Aug. 2023, doi: 10.1109/TMC.2022.31 65965

work page doi:10.1109/tmc.2022.31 2023
[16]

DDL: Empowering delivery drones with large -scale u rban sensing capability,

X. Chen et al., “DDL: Empowering delivery drones with large -scale u rban sensing capability,” IEEE J. Sel. Topics Signal Process. , vol. 18, no. 3, pp. 502–515, Apr. 2024, doi: 10.1109/JSTSP.2024.3427371

work page doi:10.1109/jstsp.2024.3427371 2024
[17]

Cooperative air-ground instant delivery by UAVs and cr owdsourced taxis: Joint UAV station deployment and delivery schedul ing,

J. Gao et al., “Cooperative air-ground instant delivery by UAVs and cr owdsourced taxis: Joint UAV station deployment and delivery schedul ing,” IEEE Trans. Mobile Comput. , vol. 25, no. 5, pp. 6133 –6149, Ma y 2026, doi: 10.1109/TMC.2025.3634430

work page doi:10.1109/tmc.2025.3634430 2026
[18]

Centralized task allocation for multiple UAVs in time -cons traint industrial IoT operations,

M. A. Houran, G. Srivastava, J. Mirza, A. Ranjha, M. A. Javed, and M. H. Zafar, “Centralized task allocation for multiple UAVs in time -cons traint industrial IoT operations,” IEEE Internet Things J. , vol. 12, no. 18, pp. 37529–37537, Sep. 2025, doi: 10.1109/JIOT.2025.3584277

work page doi:10.1109/jiot.2025.3584277 2025
[19]

Multi -agent reinfo rcement learning-based coordinated dynamic task allocation for hetero genous UAVs,

D. Liu, L. Dou, R. Zhang, X. Zhang, and Q. Zong, “Multi -agent reinfo rcement learning-based coordinated dynamic task allocation for hetero genous UAVs,” IEEE Trans. Veh. Technol. , vol. 72, no. 4, pp. 4372 –4 383, Apr. 2023, doi: 10.1109/TVT.2022.3228198

work page doi:10.1109/tvt.2022.3228198 2023
[20]

Fast task allocatio n for heterogeneous unmanned aerial vehicles through reinforcement l earning,

X. Zhao, Q. Zong, B. Tian, B. Zhang, and M. You, “Fast task allocatio n for heterogeneous unmanned aerial vehicles through reinforcement l earning,” Aerospace Science and Technology , vol. 92, pp. 588 –594, S ep. 2019, doi: 10.1016/j.ast.2019.06.024

work page doi:10.1016/j.ast.2019.06.024 2019
[21]

Energy efficient task cooperation for multi -UAV netw orks: A coalition formation game approach,

H. Luan et al., “Energy efficient task cooperation for multi -UAV netw orks: A coalition formation game approach,” IEEE Access, vol. 8, pp. 149372–149384, 2020, doi: 10.1109/ACCESS.2020.3016009

work page doi:10.1109/access.2020.3016009 2020
[22]

Differential flatness -based fast tr ajectory planning for fixed -wing autonomous aerial vehicles,

J. Li, J. Sun, T. Long, and Z. Zhou, “Differential flatness -based fast tr ajectory planning for fixed -wing autonomous aerial vehicles,” IEEE T rans. Syst., Man, Cybern., Syst., pp. 1–14, 2025, doi: 10.1109/TSMC.2 025.3559591

work page doi:10.1109/tsmc.2 2025
[23]

Joint task assignment and spectrum allocation in hetero geneous UAV communication networks: A coalition formation game -t heoretic approach,

J. Chen et al., “Joint task assignment and spectrum allocation in hetero geneous UAV communication networks: A coalition formation game -t heoretic approach,” IEEE Trans. Wireless Commun. , vol. 20, no. 1, pp. 440–452, Jan. 2021, doi: 10.1109/TWC.2020.3025316

work page doi:10.1109/twc.2020.3025316 2021
[24]

Hybrid transformer based multi - agent reinforcement learning for multiple unpiloted aerial vehicle coor dination in air corridors,

L. Yu, Z. Li, N. Ansari, and X. Sun, “Hybrid transformer based multi - agent reinforcement learning for multiple unpiloted aerial vehicle coor dination in air corridors,” IEEE Trans. Mobile Comput. , vol. 24, no. 6, pp. 5482–5495, Jun. 2025, doi: 10.1109/TMC.2025.3532204

work page doi:10.1109/tmc.2025.3532204 2025
[25]

Training stochastic model recognition algorithms as net works can lead to maximum mutual information estimation of paramet ers,

J. S. Bridle, “Training stochastic model recognition algorithms as net works can lead to maximum mutual information estimation of paramet ers,” pp. 1–7
[26]

Soft Actor-Critic Algorithms and Applications

T. Haarnoja et al., “Soft actor-critic algorithms and applications,” Jan. 29, 2019, arXiv: arXiv:1812.05905. doi: 10.48550/arXiv.1812.05905

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1812.05905 2019
[27]

UAV-assisted real-time video transmission for vehicles: A soft actor –critic DRL approach,

D. Wu et al., “UAV-assisted real-time video transmission for vehicles: A soft actor –critic DRL approach,” IEEE Internet Things J. , vol. 11, no. 8, pp. 14710–14726, Apr. 2024, doi: 10.1109/JIOT.2023.3343590

work page doi:10.1109/jiot.2023.3343590 2024
[28]

Principles of tabu search,

F. Glover, M. Laguna, and R. Marti, “Principles of tabu search,” 2007

2007
[29]

Potential games,

D. Monderer and L. S. Shapley, “Potential games,” Games and Econo mic Behavior, vol. 14, no. 1, pp. 124 –143, May 1996, doi: 10.1006/ga me.1996.0044

work page doi:10.1006/ga 1996
[30]

Cooperative task allocation with si multaneous arrival and resource constraint for multi -UAV using a gen etic algorithm,

F. Yan, J. Chu, J. Hu, and X. Zhu, “Cooperative task allocation with si multaneous arrival and resource constraint for multi -UAV using a gen etic algorithm,” Expert Systems with Applications , vol. 245, p. 123023, Jul. 2024, doi: 10.1016/j.eswa.2023.123023

work page doi:10.1016/j.eswa.2023.123023 2024
[31]

A two -stage game framework to sec ure transmission in two -tier UAV networks,

M. Xu, Y. Chen, and W. Wang, “A two -stage game framework to sec ure transmission in two -tier UAV networks,” IEEE Trans. Veh. Techn ol., vol. 69, no. 11, pp. 13728 –13740, Nov. 2020, doi: 10.1109/TVT.2 020.3026184

work page doi:10.1109/tvt.2 2020

[1] [1]

Holistic service pro visioning in a UAV -UGV integrated network for last -mile delivery,

J. Xu, X. Liu, J. Jin, W. Pan, X. Li, and Y. Yang, “Holistic service pro visioning in a UAV -UGV integrated network for last -mile delivery,” I EEE Trans. Netw. Serv. Manage. , vol. 22, no. 1, pp. 380 –393, Feb. 20 25, doi: 10.1109/TNSM.2024.3487357

work page doi:10.1109/tnsm.2024.3487357 2024

[2] [2]

Multi -UAV-enabled energy-effici ent data delivery for low -altitude economy: Joint coded caching, user grouping, and UAV deployment,

Q. Wei, R. Li, W. Bai, and Z. Han, “Multi -UAV-enabled energy-effici ent data delivery for low -altitude economy: Joint coded caching, user grouping, and UAV deployment,” IEEE Internet Things J. , pp. 1–1, 2 025, doi: 10.1109/JIOT.2025.3562872

work page doi:10.1109/jiot.2025.3562872 2025

[3] [3]

Lambeta, P.-W

Y. Cao, T. Long, J. Sun, Z. Wang, and G. Xu, “Comparison of distrib uted task allocation algorithms considering non -ideal communication f actors for multi -UAV collaborative visit missions,” IEEE Robot. Auto m. Lett., vol. 10, no. 2, pp. 1928 –1935, Feb. 2025, doi: 10.1109/LRA. 2023.3295999

work page doi:10.1109/lra 1928

[4] [4]

A review of task allocati on methods for UAVs,

G. M. Skaltsis, H.-S. Shin, and A. Tsourdos, “A review of task allocati on methods for UAVs,” J. Intell. Rob. Syst., vol. 109, no. 4, p. 76, Dec. 2023, doi: 10.1007/s10846 -023-02011-0

work page doi:10.1007/s10846 2023

[5] [5]

Review of dynamic task allocation met hods for UAV swarms oriented to ground targets,

Q. Peng, H. Wu, and R. Xue, “Review of dynamic task allocation met hods for UAV swarms oriented to ground targets,” Complex Syst. Mod el. Simul., vol. 1, no. 3, pp. 163 –175, Sep. 2021, doi: 10.23919/CSMS. 2021.0022

work page doi:10.23919/csms 2021

[6] [6]

Bi -level optimization framewor k for urban low-altitude UAV delivery ensuring target level of safety,

B. Jiang, Y. Li, C. Li, and Y. Zheng, “Bi -level optimization framewor k for urban low-altitude UAV delivery ensuring target level of safety,” IEEE Trans. Intell. Transport. Syst. , pp. 1 –14, 2026, doi: 10.1109/TI TS.2026.3660878

work page doi:10.1109/ti 2026

[7] [7]

Urban on -demand delivery via autonomous aerial mobility: Formulation and exact algorithm,

Z. Pei, T. Fang, K. Weng, and W. Yi, “Urban on -demand delivery via autonomous aerial mobility: Formulation and exact algorithm,” IEEE Trans. Autom. Sci. Eng., vol. 20, no. 3, pp. 1675 –1689, Jul. 2023, doi: 10.1109/TASE.2022.3184324

work page doi:10.1109/tase.2022.3184324 2023

[8] [8]

Crowdsourced auction - based framework for time -critical and budget-constrained last mile del ivery,

E. Odeh, S. Singh, R. Mizouni, and H. Otrok, “Crowdsourced auction - based framework for time -critical and budget-constrained last mile del ivery,” Inf. Process. Manage. , vol. 62, no. 1, p. 103888, Jan. 2025, doi: 10.1016/j.ipm.2024.103888

work page doi:10.1016/j.ipm.2024.103888 2025

[9] [9]

Z. Zhen, L. Wen, B. Wang, Z. Hu, and D. Zhang, “Improved contract network protocol algorithm based cooperative target allocation of hete AAV3 AAV2 AAV1 AAV0 AAV3 AAV2 AAV1 AAV0 (a) (b) Fig. 9. Indoor flight experiments for dynamic task reallocation. (a) First reallocation triggered by newly emerged tasks at T = 5 s. (b) Second reallocation trig- gered by ...

work page doi:10.1016/j.ast.2021.107054 2021

[10] [10]

Collaborative task allocation fo r large-scale heterogeneous UAV swarm: A hierarchical coalition for mation game method,

Y. Yan, W. Bi, G. Ma, and A. Zhang, “Collaborative task allocation fo r large-scale heterogeneous UAV swarm: A hierarchical coalition for mation game method,” IEEE Internet Things J. , pp. 1–1, 2025, doi: 10. 1109/JIOT.2025.3562692

work page arXiv 2025

[11] [11]

Cooperat ive task allocation and path planning for multi -UAVs in low-altitude u rban intelligent transportation systems,

Z. Zhang, J. Jiang, K. V. Ling, X. Wang, and W. -A. Zhang, “Cooperat ive task allocation and path planning for multi -UAVs in low-altitude u rban intelligent transportation systems,” IEEE Trans. Intell. Transport. Syst., pp. 1–13, 2026, doi: 10.1109/TITS.2026.3667967

work page doi:10.1109/tits.2026.3667967 2026

[12] [12]

Coalition -based facility location optimization for urban UAV logistics,

L. Liu and Z. Gong, “Coalition -based facility location optimization for urban UAV logistics,” Transportation Research Part C: Emerging Te chnologies, vol. 186, p. 105624, May 2026, doi: 10.1016/j.trc.2026.10 5624

work page doi:10.1016/j.trc.2026.10 2026

[13] [13]

Joint UA V deployment, power allocation, and coalition formation for physical l ayer security in heterogeneous networks,

Y. Zhang, X. Gao, N. Ye, D. Niyato, Z. Han, and K. Yang, “Joint UA V deployment, power allocation, and coalition formation for physical l ayer security in heterogeneous networks,” IEEE Trans. Veh. Technol., vol. 74, no. 7, pp. 10994 –11009, Jul. 2025, doi: 10.1109/TVT.2025.35 48987

work page doi:10.1109/tvt.2025.35 2025

[14] [14]

A heuristic task allocation metho d based on overlapping coalition formation game for heterogeneous U AVs,

Y. Li, Z. Zhang, Z. He, and Q. Sun, “A heuristic task allocation metho d based on overlapping coalition formation game for heterogeneous U AVs,” IEEE Internet Things J., vol. 11, no. 17, pp. 28945 –28959, Sep. 2024, doi: 10.1109/JIOT.2024.3406336

work page doi:10.1109/jiot.2024.3406336 2024

[15] [15]

A task -driven sequential overlapping coalition formation game for resource allocati on in heterogeneous UAV networks,

N. Qi, Z. Huang, F. Zhou, Q. Shi, Q. Wu, and M. Xiao, “A task -driven sequential overlapping coalition formation game for resource allocati on in heterogeneous UAV networks,” IEEE Trans. on Mobile Comput., vol. 22, no. 8, pp. 4439 –4455, Aug. 2023, doi: 10.1109/TMC.2022.31 65965

work page doi:10.1109/tmc.2022.31 2023

[16] [16]

DDL: Empowering delivery drones with large -scale u rban sensing capability,

X. Chen et al., “DDL: Empowering delivery drones with large -scale u rban sensing capability,” IEEE J. Sel. Topics Signal Process. , vol. 18, no. 3, pp. 502–515, Apr. 2024, doi: 10.1109/JSTSP.2024.3427371

work page doi:10.1109/jstsp.2024.3427371 2024

[17] [17]

Cooperative air-ground instant delivery by UAVs and cr owdsourced taxis: Joint UAV station deployment and delivery schedul ing,

J. Gao et al., “Cooperative air-ground instant delivery by UAVs and cr owdsourced taxis: Joint UAV station deployment and delivery schedul ing,” IEEE Trans. Mobile Comput. , vol. 25, no. 5, pp. 6133 –6149, Ma y 2026, doi: 10.1109/TMC.2025.3634430

work page doi:10.1109/tmc.2025.3634430 2026

[18] [18]

Centralized task allocation for multiple UAVs in time -cons traint industrial IoT operations,

M. A. Houran, G. Srivastava, J. Mirza, A. Ranjha, M. A. Javed, and M. H. Zafar, “Centralized task allocation for multiple UAVs in time -cons traint industrial IoT operations,” IEEE Internet Things J. , vol. 12, no. 18, pp. 37529–37537, Sep. 2025, doi: 10.1109/JIOT.2025.3584277

work page doi:10.1109/jiot.2025.3584277 2025

[19] [19]

Multi -agent reinfo rcement learning-based coordinated dynamic task allocation for hetero genous UAVs,

D. Liu, L. Dou, R. Zhang, X. Zhang, and Q. Zong, “Multi -agent reinfo rcement learning-based coordinated dynamic task allocation for hetero genous UAVs,” IEEE Trans. Veh. Technol. , vol. 72, no. 4, pp. 4372 –4 383, Apr. 2023, doi: 10.1109/TVT.2022.3228198

work page doi:10.1109/tvt.2022.3228198 2023

[20] [20]

Fast task allocatio n for heterogeneous unmanned aerial vehicles through reinforcement l earning,

X. Zhao, Q. Zong, B. Tian, B. Zhang, and M. You, “Fast task allocatio n for heterogeneous unmanned aerial vehicles through reinforcement l earning,” Aerospace Science and Technology , vol. 92, pp. 588 –594, S ep. 2019, doi: 10.1016/j.ast.2019.06.024

work page doi:10.1016/j.ast.2019.06.024 2019

[21] [21]

Energy efficient task cooperation for multi -UAV netw orks: A coalition formation game approach,

H. Luan et al., “Energy efficient task cooperation for multi -UAV netw orks: A coalition formation game approach,” IEEE Access, vol. 8, pp. 149372–149384, 2020, doi: 10.1109/ACCESS.2020.3016009

work page doi:10.1109/access.2020.3016009 2020

[22] [22]

Differential flatness -based fast tr ajectory planning for fixed -wing autonomous aerial vehicles,

J. Li, J. Sun, T. Long, and Z. Zhou, “Differential flatness -based fast tr ajectory planning for fixed -wing autonomous aerial vehicles,” IEEE T rans. Syst., Man, Cybern., Syst., pp. 1–14, 2025, doi: 10.1109/TSMC.2 025.3559591

work page doi:10.1109/tsmc.2 2025

[23] [23]

Joint task assignment and spectrum allocation in hetero geneous UAV communication networks: A coalition formation game -t heoretic approach,

J. Chen et al., “Joint task assignment and spectrum allocation in hetero geneous UAV communication networks: A coalition formation game -t heoretic approach,” IEEE Trans. Wireless Commun. , vol. 20, no. 1, pp. 440–452, Jan. 2021, doi: 10.1109/TWC.2020.3025316

work page doi:10.1109/twc.2020.3025316 2021

[24] [24]

Hybrid transformer based multi - agent reinforcement learning for multiple unpiloted aerial vehicle coor dination in air corridors,

L. Yu, Z. Li, N. Ansari, and X. Sun, “Hybrid transformer based multi - agent reinforcement learning for multiple unpiloted aerial vehicle coor dination in air corridors,” IEEE Trans. Mobile Comput. , vol. 24, no. 6, pp. 5482–5495, Jun. 2025, doi: 10.1109/TMC.2025.3532204

work page doi:10.1109/tmc.2025.3532204 2025

[25] [25]

Training stochastic model recognition algorithms as net works can lead to maximum mutual information estimation of paramet ers,

J. S. Bridle, “Training stochastic model recognition algorithms as net works can lead to maximum mutual information estimation of paramet ers,” pp. 1–7

[26] [26]

Soft Actor-Critic Algorithms and Applications

T. Haarnoja et al., “Soft actor-critic algorithms and applications,” Jan. 29, 2019, arXiv: arXiv:1812.05905. doi: 10.48550/arXiv.1812.05905

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1812.05905 2019

[27] [27]

UAV-assisted real-time video transmission for vehicles: A soft actor –critic DRL approach,

D. Wu et al., “UAV-assisted real-time video transmission for vehicles: A soft actor –critic DRL approach,” IEEE Internet Things J. , vol. 11, no. 8, pp. 14710–14726, Apr. 2024, doi: 10.1109/JIOT.2023.3343590

work page doi:10.1109/jiot.2023.3343590 2024

[28] [28]

Principles of tabu search,

F. Glover, M. Laguna, and R. Marti, “Principles of tabu search,” 2007

2007

[29] [29]

Potential games,

D. Monderer and L. S. Shapley, “Potential games,” Games and Econo mic Behavior, vol. 14, no. 1, pp. 124 –143, May 1996, doi: 10.1006/ga me.1996.0044

work page doi:10.1006/ga 1996

[30] [30]

Cooperative task allocation with si multaneous arrival and resource constraint for multi -UAV using a gen etic algorithm,

F. Yan, J. Chu, J. Hu, and X. Zhu, “Cooperative task allocation with si multaneous arrival and resource constraint for multi -UAV using a gen etic algorithm,” Expert Systems with Applications , vol. 245, p. 123023, Jul. 2024, doi: 10.1016/j.eswa.2023.123023

work page doi:10.1016/j.eswa.2023.123023 2024

[31] [31]

A two -stage game framework to sec ure transmission in two -tier UAV networks,

M. Xu, Y. Chen, and W. Wang, “A two -stage game framework to sec ure transmission in two -tier UAV networks,” IEEE Trans. Veh. Techn ol., vol. 69, no. 11, pp. 13728 –13740, Nov. 2020, doi: 10.1109/TVT.2 020.3026184

work page doi:10.1109/tvt.2 2020