pith. sign in

arxiv: 2603.12031 · v2 · submitted 2026-03-12 · 💻 cs.DC · cs.LG· cs.MA

AGMARL-DKS: An Adaptive Graph-Enhanced Multi-Agent Reinforcement Learning for Dynamic Kubernetes Scheduling

Pith reviewed 2026-05-15 11:53 UTC · model grok-4.3

classification 💻 cs.DC cs.LGcs.MA
keywords multi-agent reinforcement learningkubernetes schedulinggraph neural networksdynamic resource allocationfault tolerancecloud computingworkload scheduling
0
0 comments X

The pith

AGMARL-DKS treats each Kubernetes node as an RL agent that uses graph-derived global context and a stress-aware lexicographical policy to schedule workloads more effectively than the default scheduler.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a scheduler that models Kubernetes cluster management as a cooperative multi-agent reinforcement learning problem to overcome the scalability limits of centralized agents and the rigidity of static reward combinations. Each node runs its own agent after centralized training, but receives a representation of the full cluster state built by a graph neural network rather than relying on local views alone. A stress-aware lexicographical ordering then ranks objectives such as stability, utilization, and cost according to current cluster pressure instead of fixed linear weights. Real deployments on Google Kubernetes Engine show measurable gains in fault tolerance and efficiency, most clearly for batch and mission-critical jobs.

Core claim

AGMARL-DKS constructs a scalable scheduler by assigning an RL agent to every cluster node, supplying each agent with a global state representation extracted by a graph neural network, and guiding decisions with a stress-aware lexicographical ordering of objectives; this combination produces scheduling actions that improve fault tolerance, resource utilization, and cost relative to the default Kubernetes scheduler when tested on Google Kubernetes Engine, particularly for batch and mission-critical workloads.

What carries the argument

AGMARL-DKS, the multi-agent RL system in which each node is an agent that receives GNN-derived global cluster context and applies a stress-aware lexicographical ordering policy to multi-objective rewards.

Load-bearing premise

That assigning an RL agent to every node, feeding it GNN global context, and using a lexicographical policy will scale to large heterogeneous clusters without excessive communication or training overhead.

What would settle it

A controlled run on a large heterogeneous cluster in which AGMARL-DKS shows no improvement or a decline in fault-tolerance or utilization metrics compared with the default scheduler under identical workload traces.

read the original abstract

State-of-the-art cloud-native applications require intelligent schedulers that can effectively balance system stability, resource utilisation, and associated costs. While Kubernetes provides feasibility-based placement by default, recent research efforts have explored the use of reinforcement learning (RL) for more intelligent scheduling decisions. However, current RL-based schedulers have three major limitations. First, most of these schedulers use monolithic centralised agents, which are non-scalable for large heterogeneous clusters. Second, the ones that use multi-objective reward functions assume simple, static, linear combinations of the objectives. Third, no previous work has produced a stress-aware scheduler that can react adaptively to dynamic conditions. To address these gaps in current research, we propose the Adaptive Graph-enhanced Multi-Agent Reinforcement Learning Dynamic Kubernetes Scheduler (AGMARL-DKS). AGMARL-DKS addresses these gaps by introducing three major innovations. First, we construct a scalable solution by treating the scheduling challenge as a cooperative multi-agent problem, where every cluster node operates as an agent, employing centralised training methods before decentralised execution. Second, to be context-aware and yet decentralised, we use a Graph Neural Network (GNN) to build a state representation of the global cluster context at each agent. This represents an improvement over methods that rely solely on local observations. Finally, to make trade-offs between these objectives, we use a stress-aware lexicographical ordering policy instead of a simple, static linear weighting of these objectives. The evaluations in Google Kubernetes Engine (GKE) reveal that AGMARL-DKS significantly outperforms the default scheduler in terms of fault tolerance, utilisation, and cost, especially in scheduling batch and mission-critical workloads.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes AGMARL-DKS, a cooperative multi-agent RL scheduler for Kubernetes in which each cluster node acts as an independent agent. It uses centralised training with decentralised execution, GNNs to supply each agent with global cluster state representations, and a stress-aware lexicographical ordering policy for adaptive multi-objective trade-offs. The central claim is that this design yields significant gains over the default Kubernetes scheduler in fault tolerance, resource utilisation and cost on GKE, especially for batch and mission-critical workloads.

Significance. If the empirical results and scalability claims hold after detailed verification, the work would offer a concrete path toward decentralised yet globally aware RL scheduling that avoids both monolithic central agents and static linear reward weighting, potentially influencing production cloud-native orchestration systems.

major comments (2)
  1. [Abstract] Abstract and Evaluation section: the claim of significant outperformance in fault tolerance, utilisation and cost supplies no quantitative metrics, baseline schedulers, statistical tests, cluster sizes, workload traces or experimental protocol, rendering the central empirical claim unverifiable from the provided description.
  2. [Method] Method description: the decentralised execution model relies on each agent receiving GNN-derived global context, yet no analysis or measurement is given for communication volume, message size, update frequency or bandwidth consumption per scheduling decision. This directly affects the load-bearing scalability assertion for large heterogeneous clusters.
minor comments (1)
  1. Clarify the precise definition and implementation of the stress-aware lexicographical ordering policy, including how stress levels are computed and how ties are broken.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the constructive feedback. We address each major comment below and will incorporate revisions to improve verifiability and strengthen the scalability discussion.

read point-by-point responses
  1. Referee: [Abstract] Abstract and Evaluation section: the claim of significant outperformance in fault tolerance, utilisation and cost supplies no quantitative metrics, baseline schedulers, statistical tests, cluster sizes, workload traces or experimental protocol, rendering the central empirical claim unverifiable from the provided description.

    Authors: We agree that the abstract should include key quantitative results for immediate verifiability. The evaluation section reports comparisons against the default Kubernetes scheduler on GKE using specific cluster sizes, standard workload traces for batch and mission-critical jobs, and metrics on fault tolerance, utilization, and cost with statistical tests. We will revise the abstract to summarize representative quantitative improvements and briefly note the experimental protocol. revision: yes

  2. Referee: [Method] Method description: the decentralised execution model relies on each agent receiving GNN-derived global context, yet no analysis or measurement is given for communication volume, message size, update frequency or bandwidth consumption per scheduling decision. This directly affects the load-bearing scalability assertion for large heterogeneous clusters.

    Authors: We acknowledge this gap in the scalability analysis. The current manuscript focuses on algorithmic design and end-to-end performance but lacks explicit communication overhead measurements. We will add a dedicated analysis (theoretical bounds plus empirical measurements from GKE runs) covering message sizes for GNN embeddings, update frequencies, and bandwidth consumption across cluster scales. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper proposes AGMARL-DKS as a new architecture with three explicit innovations (cooperative MARL treating nodes as agents, GNN-derived global context per agent, stress-aware lexicographical policy) and reports empirical outperformance on external GKE infrastructure. No equations, fitted parameters, or self-referential definitions appear in the abstract or method description. No load-bearing self-citations, uniqueness theorems, or ansatzes are invoked. The central claims rest on external evaluation rather than any reduction of predictions to inputs by construction. This is the expected non-finding for a proposal-style systems paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract, no explicit free parameters, axioms, or invented entities are described; the approach relies on standard assumptions from multi-agent RL and graph neural network literature applied to the Kubernetes domain.

pith-pipeline@v0.9.0 · 5607 in / 1129 out tokens · 67506 ms · 2026-05-15T11:53:03.617125+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

64 extracted references · 64 canonical work pages · 3 internal anchors

  1. [1]

    Journal of Grid Computing 20(4)

    Carri´ on C (2022) Kubernetes as a standard container orchestrator - A bibliometric analysis. Journal of Grid Computing 20(4). https://doi.org/10.1007/s10723-022-09629-8

  2. [2]

    Applied Sciences 9(5):931

    Truyen E, Van Landuyt D, Preuveneers D, Lagaisse B, Joosen W (2019) A comprehensive feature comparison study of Open-Source Container Orchestration Frameworks. Applied Sciences 9(5):931. https://doi.org/10.3390/app9050931

  3. [3]

    https://kubernetes.io/docs/concepts/workloads/pods/

  4. [4]

    Journal of Cloud Computing Advances Systems and Applications 12(1)

    Senjab K, Abbas S, Ahmed N, Khan AUR (2023) A survey of Kubernetes scheduling algorithms. Journal of Cloud Computing Advances Systems and Applications 12(1). https://doi.org/10.1186/s13677-023-00471-1

  5. [5]

    IEEE Transactions on Software Engineering 49(4):2722–2740

    Zhou N, Zhou H, Hoppe D (2022) Containerization for High performance Com- puting Systems: Survey and Prospects. IEEE Transactions on Software Engineering 49(4):2722–2740. https://doi.org/10.1109/tse.2022.3229221

  6. [6]

    SN Computer Science 6(3)

    Marchese A, Tomarchio O (2025) Enhancing the Kubernetes Platform with a Load-Aware Orchestration Strategy. SN Computer Science 6(3). https://doi.org/10.1007/s42979-025-03712-z

  7. [7]

    ACM Computing Surveys 55(7):1–37

    Rejiba Z, Chamanara J (2022) Custom Scheduling in Kubernetes: A survey on common problems and solution approaches. ACM Computing Surveys 55(7):1–37. https://doi.org/10.1145/3544788 33

  8. [8]

    Software Practice and Experience 54(10):2102–2126

    Jian Z, Xie X, Fang Y, Jiang Y, Lu Y, Dash A, Li T, Wang G (2023) DRS: A deep reinforcement learning enhanced Kubernetes scheduler for microservice-based system. Software Practice and Experience 54(10):2102–2126. https://doi.org/10.1002/spe.3284

  9. [9]

    Computers 14(9):390

    Farid M, Lim HS, Lee CP, Zarakovitis CC, Chien SF (2025) Optimizing Kubernetes with Multi-Objective Scheduling Algorithms: A 5G Perspective. Computers 14(9):390. https://doi.org/10.3390/computers14090390

  10. [10]

    ACM Computing Surveys 55(7):1–37

    Carri´ on C (2022b) Kubernetes scheduling: taxonomy, ongoing issues and challenges. ACM Computing Surveys 55(7):1–37. https://doi.org/10.1145/3539606

  11. [11]

    2024 20th International Conference on Network and Service Management (CNSM) :1–9

    Di Cicco N, Poltronieri F, Santos J, Zaccarini M, Tortonesi M, Stefanelli C, De Turck F (2024) Multi-Objective Scheduling and Resource Allocation of Kubernetes Replicas Across the Compute Continuum. 2024 20th International Conference on Network and Service Management (CNSM) :1–9. https://doi.org/10.23919/cnsm62983.2024.10814307

  12. [12]

    Mathematics 11(20):4269

    Wang X, Zhao K, Qin B (2023c) Optimization of Task-Scheduling strategy in edge kubernetes clusters based on deep reinforcement learning. Mathematics 11(20):4269. https://doi.org/10.3390/math11204269

  13. [13]

    arXiv (Cornell University)

    Jayanetti A, Halgamuge S, Buyya R (2024b) Reinforcement Learning based Workflow Scheduling in Cloud and Edge Computing Environments: A Taxonomy, Review and Future Directions. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2408.02938

  14. [14]

    ScheduleNet: Learn to solve multi-agent scheduling problems with reinforcement learning

    Park J, Bakhtiyar S, Park J (2021b) ScheduleNet: Learn to solve multi- agent scheduling problems with reinforcement learning. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2106.03051

  15. [15]

    In: arXiv.org

    Aina KO, Ha S (2025b) Deep reinforcement learning for Multi-Agent coordination. In: arXiv.org. https://arxiv.org/abs/2510.03592v1

  16. [16]

    Li, Y., Zhong, W. & Wu, Y. Multi-objective flexible job-shop scheduling via graph attention network and reinforcement learning. J Supercomput 81, 293 (2025). https://doi.org/10.1007/s11227-024-06741-2

  17. [17]

    In: arXiv.org

    Gaon M, Brafman R I (2019b) Reinforcement Learning with Non-Markovian Rewards. In: arXiv.org. https://arxiv.org/abs/1912.02552

  18. [18]

    Zhou, G., Tian, W., Buyya, R. et al. Deep reinforcement learning-based methods for resource scheduling in cloud computing: a review and future directions. Artif Intell Rev 57, 124 (2024). https://doi.org/10.1007/s10462-024-10756-9

  19. [19]

    & Javidi, M.M

    Jalali Khalil Abadi, Z., Mansouri, N. & Javidi, M.M. Deep reinforcement learning-based scheduling in distributed systems: a critical review. Knowl Inf Syst 66, 5709–5782 (2024). https://doi.org/10.1007/s10115-024-02167-7

  20. [20]

    Hardware-based Always-On Heap Memory Safety,

    Kim YG, Wu C-J (2020) AutoScale: Energy Efficiency Optimization for Stochastic Edge Inference Using Reinforcement Learning. 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) :1082–1096. https://doi.org/10.1109/micro50266.2020.00090

  21. [21]

    2023 IEEE 31st International Conference on Network Protocols (ICNP) :1–6

    Rothman J, Chamanara J (2023) An RL-Based Model for Optimized Kubernetes Scheduling. 2023 IEEE 31st International Conference on Network Protocols (ICNP) :1–6. https://doi.org/10.1109/icnp59255.2023.10355623

  22. [22]

    MSc Thesis, National College of Ireland (2024)

    Shukla, J.: Comparative study of RL algorithms for resource optimization schedul- ing in Kubernetes. MSc Thesis, National College of Ireland (2024). Available at: https://norma.ncirl.ie/8056/

  23. [23]

    Anouar, H., Hatim, H., Zineb, E.A. (2025). Proposing a Theoretical Energy Aware Framework for Kubernetes Scheduling Using Reinforcement Learning. In: Ezziyyani, M., Kacprzyk, J., Balas, V.E. (eds) International Conference on Advanced Intelligent Sys- tems for Sustainable Developent (AI2SD 2024). AI2SD 2024. Lecture Notes in Networks and Systems, vol 1403....

  24. [24]

    Electronics 14(4):820

    Liang J, Miao H, Li K, Tan J, Wang X, Luo R, Jiang Y (2025) A review of Multi-Agent Reinforcement Learning Algorithms. Electronics 14(4):820. https://doi.org/10.3390/electronics14040820

  25. [25]

    Electronics 12(12):2614

    Danino T, Ben-Shimol Y, Greenberg S (2023) Container allocation in cloud envi- ronment using Multi-Agent deep reinforcement learning. Electronics 12(12):2614. https://doi.org/10.3390/electronics12122614

  26. [26]

    In: arXiv.org

    Soul´ e J, Jamont J-P, Occello M, Traonouez L-M, Th´ eron P (2025) Streamlining Resilient Kubernetes Autoscaling with Multi-Agent Systems via an Automated Online Design Framework. In: arXiv.org. https://arxiv.org/abs/2505.21559v1

  27. [27]

    An introduction to centralized training for decentralized execution in cooperative multi-agent reinforcement learning.arXiv preprint arXiv:2409.03052, 2024

    Amato C (2024) An introduction to centralized training for decentralized execution in cooperative Multi-Agent Reinforcement learning. In: arXiv.org. https://arxiv.org/abs/2409.03052v1

  28. [28]

    Xu, L., Chen, W., Liu, X., Chen, YY. (2023). MADDPG: Multi-agent Deep Deterministic Policy Gradient Algorithm for Formation Elliptical Encirclement and Collision Avoid- ance. In: Ren, Z., Wang, M., Hua, Y. (eds) Proceedings of 2021 5th Chinese Conference on Swarm Intelligence and Cooperative Control. Lecture Notes in Electrical Engineering, vol 934. Sprin...

  29. [29]

    Wang, Y., Wu, F. (2021). Policy Adaptive Multi-agent Deep Deterministic Policy Gra- dient. In: Uchiya, T., Bai, Q., Mars´ a Maestre, I. (eds) PRIMA 2020: Principles and Practice of Multi-Agent Systems. PRIMA 2020. Lecture Notes in Computer Science(), vol 12568. Springer, Cham. https://doi.org/10.1007/978-3-030-69322-0-11

  30. [30]

    Informatics 10(3):64

    Lahande P, Kaveri P, Saini J (2023) Reinforcement learning for reducing the inter- ruptions and increasing fault tolerance in the cloud environment. Informatics 10(3):64. https://doi.org/10.3390/informatics10030064

  31. [31]

    In: arXiv.org

    Xu Z, Gong Y, Zhou Y, Bao Q, Qian W (2024) Enhancing Kubernetes Automated Scheduling with Deep Learning and Reinforcement Techniques for Large-Scale Cloud Computing Optimization. In: arXiv.org. https://arxiv.org/abs/2403.07905v1

  32. [32]

    Proceedings of the 14th International Conference on Agents and Artificial Intelligence :231–242

    Kallel A, Rekik M, Khemakhem M (2024) DRL4HFC: Deep Reinforcement Learn- ing for Container-Based Scheduling in Hybrid FOG/Cloud System. Proceedings of the 14th International Conference on Agents and Artificial Intelligence :231–242. https://doi.org/10.5220/0012356800003636

  33. [33]

    Reliability and performance of resource efficiency in dynamic optimization scheduling using multi-agent microservice cloud-fog on IoT applications

    Krishnan, R., Durairaj, S. Reliability and performance of resource efficiency in dynamic optimization scheduling using multi-agent microservice cloud-fog on IoT applications. Computing 106, 3837–3878 (2024). https://doi.org/10.1007/s00607-024-01301-1

  34. [34]

    In: arXiv.org

    Yang Y, Ren F, Zhang M (2024) A decentralized Multiagent-Based task schedul- ing framework for handling uncertain events in fog computing. In: arXiv.org. https://arxiv.org/abs/2401.02219v1

  35. [35]

    Gasior, J., Seredy´ nski, F. (2015). A Decentralized Multi-agent Approach to Job Scheduling in Cloud Environment. In: Angelov, P., et al. Intelligent Sys- tems’2014. Advances in Intelligent Systems and Computing, vol 322. Springer, Cham. https://doi.org/10.1007/978-3-319-11313-5-36

  36. [36]

    IEEE Transactions on Parallel and Distributed Systems 32(3):692–707

    Gao X, Liu R, Kaushik A (2020) Hierarchical Multi-Agent optimization for resource allocation in cloud computing. IEEE Transactions on Parallel and Distributed Systems 32(3):692–707. https://doi.org/10.1109/tpds.2020.3030920

  37. [37]

    Expert Systems With Applications 255:124845

    Pan J, Wei Y (2024) A deep reinforcement learning-based scheduling framework for real- time workflows in the cloud environment. Expert Systems With Applications 255:124845. https://doi.org/10.1016/j.eswa.2024.124845

  38. [38]

    Multi-objective application placement in fog computing using graph neural network-based reinforcement learning

    Lera, I., Guerrero, C. Multi-objective application placement in fog computing using graph neural network-based reinforcement learning. J Supercomput 80, 27073–27094 (2024). https://doi.org/10.1007/s11227-024-06439-5 35

  39. [39]

    A review of cooperative multi-agent deep reinforcement learning

    Oroojlooy, A., Hajinezhad, D. A review of cooperative multi-agent deep reinforcement learning. Appl Intell 53, 13677–13722 (2023). https://doi.org/10.1007/s10489-022-04105- y

  40. [40]

    & Banerjee, B

    He, K., Doshi, P. & Banerjee, B. Modeling and reinforcement learning in par- tially observable many-agent systems. Auton Agent Multi-Agent Syst 38, 12 (2024). https://doi.org/10.1007/s10458-024-09640-1

  41. [41]

    In: arXiv.org

    Lee D, Lim H-D, Kim DW (2023) Continuous-Time distributed dynamic pro- gramming for networked Multi-Agent Markov decision processes. In: arXiv.org. https://arxiv.org/abs/2307.16706v7

  42. [42]

    Applied Sciences 14(10):3960

    Chen H-C, Li S-A, Chang T-H, Feng H-M, Chen Y-C (2024) Hybrid centralized train- ing and decentralized execution reinforcement learning in Multi-Agent Path-Finding simulations. Applied Sciences 14(10):3960. https://doi.org/10.3390/app14103960

  43. [43]

    Wireless Communications and Mobile Computing 2022:1–18

    Liang F, Qian C, Yu W, Griffith D, Golmie N (2022) Survey of Graph Neural Net- works and Applications. Wireless Communications and Mobile Computing 2022:1–18. https://doi.org/10.1155/2022/9261537

  44. [44]

    IEEE Transactions on Emerging Topics in Computational Intelligence

    Bonjour, T., Haliem, M., Alsalem, A., Thomas, S., Li, H., Aggarwal, V., Kejriwal, M., Bhargava, B.: Decision making in monopoly using a hybrid deep reinforcement learn- ing approach. IEEE Transactions on Emerging Topics in Computational Intelligence. 6, 1335–1344 (2022). https://doi.org/10.1109/tetci.2022.3166555

  45. [45]

    In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (IJCAI-2022)

    Sun Y, Ma L, Liu Y, Wang S, Zhang J, Zheng Y, Yun H, Lei L, Kang Y, Ye L (2022) Lexicographic Multi-Objective Reinforcement learning. Proceed- ings of the Thirty-First International Joint Conference on Artificial Intelligence. https://doi.org/10.24963/ijcai.2022/476

  46. [46]

    Electronics 14(12):2361

    Yang B, Gao L, Zhou F, Yao H, Fu Y, Sun Z, Tian F, Ren H (2025) A coordination optimization framework for Multi-Agent reinforcement learning based on reward redistribution and experience reutilization. Electronics 14(12):2361. https://doi.org/10.3390/electronics14122361

  47. [47]

    Applied Intelligence 53(12):14819–14837

    Sun Q, Yao Y, Yi P, Hu Y, Yang Z, Yang G, Zhou X (2022a) Learning controlled and targeted communication with the centralized critic for the multi-agent system. Applied Intelligence 53(12):14819–14837. https://doi.org/10.1007/s10489-022-04225-5

  48. [48]

    Entropy 27(1):4

    Li T, Shi D, Jin S, Wang Z, Yang H, Chen Y (2024a) Multi-Agent hier- archical graph Attention Actor–Critic reinforcement learning. Entropy 27(1):4. https://doi.org/10.3390/e27010004

  49. [49]

    Applied Intelligence 55(2)

    Xiong F, Zhang Y, Kuang X, He L, Han X (2024) Multi-agent dual actor- critic framework for reinforcement learning navigation. Applied Intelligence 55(2). https://doi.org/10.1007/s10489-024-05933-w

  50. [50]

    Symmetry 17(5):638

    Kim C (2025) Classification-Based Q-Value estimation for continuous Actor-Critic reinforcement learning. Symmetry 17(5):638. https://doi.org/10.3390/sym17050638

  51. [51]

    Neural Information Processing Systems 30:1024–1034

    Hamilton WL, Ying Z, Leskovec J (2017) Inductive representation learning on large graphs. Neural Information Processing Systems 30:1024–1034

  52. [52]

    & Charafeddine, J

    Dornaika, F., Bi, J. & Charafeddine, J. Leveraging Graph Convolutional Networks for Semi-supervised Learning in Multi-view Non-graph Data. Cogn Comput 17, 73 (2025). https://doi.org/10.1007/s12559-025-10428-y

  53. [53]

    Konda VR, Tsitsiklis JN (2002) Actor-critic algorithms

  54. [54]

    Lu, C., Bao, Q., Xia, S. et al. Centralized reinforcement learning for multi-agent coop- erative environments. Evol. Intel. 17, 267–273 (2024). https://doi.org/10.1007/s12065- 022-00703-4

  55. [55]

    European Journal of Operational Research 290(2):469–478

    Letsios D, Mistry M, Misener R (2020) Exact lexicographic scheduling and approx- imate rescheduling. European Journal of Operational Research 290(2):469–478. https://doi.org/10.1016/j.ejor.2020.08.032 36

  56. [56]

    Decisions in Economics and Finance 44(1):411–457

    Bubboloni D, Gori M (2020) Breaking ties in collective decision-making. Decisions in Economics and Finance 44(1):411–457. https://doi.org/10.1007/s10203-020-00294-8

  57. [57]

    Journal of Systems Architecture 134:102780

    Xu X, Liu K, Dai P, Jin F, Ren H, Zhan C, Guo S (2022b) Joint task offloading and resource optimization in NOMA-based vehicular edge computing: A game-theoretic DRL approach. Journal of Systems Architecture 134:102780. https://doi.org/10.1016/j.sysarc.2022.102780

  58. [58]

    arXiv (Cornell University) 30:6379–6390

    Lowe R, Wu Y, Tamar A, Harb J, Abbeel OP, Mordatch I (2017b) Multi-Agent Actor- Critic for mixed Cooperative-Competitive environments. arXiv (Cornell University) 30:6379–6390

  59. [59]

    Human-level control through deep reinforcement learning

    Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Ried- miller M, Fidjeland AK, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533. https://doi.org/10.1038/nature14236

  60. [60]

    David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin Riedmiller. 2014. Deterministic policy gradient algorithms. In Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32 (ICML’14). JMLR.org, I–387–I–395

  61. [61]

    A Deeper Look at Experience Replay

    Zhang S, Sutton RS (2017) A deeper look at experience replay. In: arXiv.org. https://arxiv.org/abs/1712.01275v3

  62. [62]

    Deep Reinforcement Learning using Genetic Algorithm for Parameter Optimization

    Sehgal A, La HM, Louis SJ, Nguyen H (2019) Deep Reinforcement Learn- ing using Genetic Algorithm for Parameter Optimization. In: arXiv.org. https://arxiv.org/abs/1905.04100v1

  63. [63]

    Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning

    Gu S, Lillicrap T, Ghahramani Z, Turner RE, Sch¨ olkopf B, Levine S (2017) Interpo- lated Policy gradient: Merging On-Policy and Off-Policy gradient estimation for deep reinforcement learning. In: arXiv.org. https://arxiv.org/abs/1706.00387v1

  64. [64]

    Available: https://cloud.google.com/kubernetes-engine

    Google Cloud, ”Google Kubernetes Engine (GKE),” Google Cloud, [Online]. Available: https://cloud.google.com/kubernetes-engine. 37