AGMARL-DKS: An Adaptive Graph-Enhanced Multi-Agent Reinforcement Learning for Dynamic Kubernetes Scheduling

Hamed Hamzeh

arxiv: 2603.12031 · v2 · submitted 2026-03-12 · 💻 cs.DC · cs.LG· cs.MA

AGMARL-DKS: An Adaptive Graph-Enhanced Multi-Agent Reinforcement Learning for Dynamic Kubernetes Scheduling

Hamed Hamzeh This is my paper

Pith reviewed 2026-05-15 11:53 UTC · model grok-4.3

classification 💻 cs.DC cs.LGcs.MA

keywords multi-agent reinforcement learningkubernetes schedulinggraph neural networksdynamic resource allocationfault tolerancecloud computingworkload scheduling

0 comments

The pith

AGMARL-DKS treats each Kubernetes node as an RL agent that uses graph-derived global context and a stress-aware lexicographical policy to schedule workloads more effectively than the default scheduler.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a scheduler that models Kubernetes cluster management as a cooperative multi-agent reinforcement learning problem to overcome the scalability limits of centralized agents and the rigidity of static reward combinations. Each node runs its own agent after centralized training, but receives a representation of the full cluster state built by a graph neural network rather than relying on local views alone. A stress-aware lexicographical ordering then ranks objectives such as stability, utilization, and cost according to current cluster pressure instead of fixed linear weights. Real deployments on Google Kubernetes Engine show measurable gains in fault tolerance and efficiency, most clearly for batch and mission-critical jobs.

Core claim

AGMARL-DKS constructs a scalable scheduler by assigning an RL agent to every cluster node, supplying each agent with a global state representation extracted by a graph neural network, and guiding decisions with a stress-aware lexicographical ordering of objectives; this combination produces scheduling actions that improve fault tolerance, resource utilization, and cost relative to the default Kubernetes scheduler when tested on Google Kubernetes Engine, particularly for batch and mission-critical workloads.

What carries the argument

AGMARL-DKS, the multi-agent RL system in which each node is an agent that receives GNN-derived global cluster context and applies a stress-aware lexicographical ordering policy to multi-objective rewards.

Load-bearing premise

That assigning an RL agent to every node, feeding it GNN global context, and using a lexicographical policy will scale to large heterogeneous clusters without excessive communication or training overhead.

What would settle it

A controlled run on a large heterogeneous cluster in which AGMARL-DKS shows no improvement or a decline in fault-tolerance or utilization metrics compared with the default scheduler under identical workload traces.

read the original abstract

State-of-the-art cloud-native applications require intelligent schedulers that can effectively balance system stability, resource utilisation, and associated costs. While Kubernetes provides feasibility-based placement by default, recent research efforts have explored the use of reinforcement learning (RL) for more intelligent scheduling decisions. However, current RL-based schedulers have three major limitations. First, most of these schedulers use monolithic centralised agents, which are non-scalable for large heterogeneous clusters. Second, the ones that use multi-objective reward functions assume simple, static, linear combinations of the objectives. Third, no previous work has produced a stress-aware scheduler that can react adaptively to dynamic conditions. To address these gaps in current research, we propose the Adaptive Graph-enhanced Multi-Agent Reinforcement Learning Dynamic Kubernetes Scheduler (AGMARL-DKS). AGMARL-DKS addresses these gaps by introducing three major innovations. First, we construct a scalable solution by treating the scheduling challenge as a cooperative multi-agent problem, where every cluster node operates as an agent, employing centralised training methods before decentralised execution. Second, to be context-aware and yet decentralised, we use a Graph Neural Network (GNN) to build a state representation of the global cluster context at each agent. This represents an improvement over methods that rely solely on local observations. Finally, to make trade-offs between these objectives, we use a stress-aware lexicographical ordering policy instead of a simple, static linear weighting of these objectives. The evaluations in Google Kubernetes Engine (GKE) reveal that AGMARL-DKS significantly outperforms the default scheduler in terms of fault tolerance, utilisation, and cost, especially in scheduling batch and mission-critical workloads.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

AGMARL-DKS sketches a decentralized MARL Kubernetes scheduler with per-node agents, GNN global context, and stress-aware lex ordering, but the abstract gives no metrics or overhead numbers to judge if it actually works.

read the letter

The main takeaway is that this paper tries to solve three concrete problems in RL schedulers for Kubernetes: central agents that do not scale, rigid linear multi-objective rewards, and no handling of dynamic stress. It does so by treating each node as an agent in a cooperative MARL setup with centralized training and decentralized execution, feeding each agent a GNN-derived global cluster view, and using stress-aware lexicographic ordering to trade off objectives adaptively. That combination is the actual new piece, and the design choices line up directly with the gaps the authors name in prior work. The GNN step is a straightforward way to move beyond purely local observations, and the lex ordering avoids the need to tune static weights, which is a practical improvement for changing workloads. The paper is clear about why these matter for batch and mission-critical jobs in heterogeneous clusters. The soft spot is the evaluation. The abstract claims significant gains in fault tolerance, utilization, and cost on GKE but supplies no numbers, no baselines beyond the default scheduler, no cluster sizes, no communication costs, and no ablation results. Without those details it is impossible to check whether the GNN updates stay cheap enough to preserve the decentralized advantage or whether the method holds on larger setups. The stress-test concern about overhead is therefore still open. This is for researchers working on RL for cloud orchestration who want a concrete example of decentralized multi-objective scheduling. A reader in that area would get value from the method description and the gap analysis even before seeing the numbers. It deserves a serious referee because the problem is real, the approach is coherent, and the ideas are grounded enough to test, though the experimental section will need substantial strengthening.

Referee Report

2 major / 1 minor

Summary. The paper proposes AGMARL-DKS, a cooperative multi-agent RL scheduler for Kubernetes in which each cluster node acts as an independent agent. It uses centralised training with decentralised execution, GNNs to supply each agent with global cluster state representations, and a stress-aware lexicographical ordering policy for adaptive multi-objective trade-offs. The central claim is that this design yields significant gains over the default Kubernetes scheduler in fault tolerance, resource utilisation and cost on GKE, especially for batch and mission-critical workloads.

Significance. If the empirical results and scalability claims hold after detailed verification, the work would offer a concrete path toward decentralised yet globally aware RL scheduling that avoids both monolithic central agents and static linear reward weighting, potentially influencing production cloud-native orchestration systems.

major comments (2)

[Abstract] Abstract and Evaluation section: the claim of significant outperformance in fault tolerance, utilisation and cost supplies no quantitative metrics, baseline schedulers, statistical tests, cluster sizes, workload traces or experimental protocol, rendering the central empirical claim unverifiable from the provided description.
[Method] Method description: the decentralised execution model relies on each agent receiving GNN-derived global context, yet no analysis or measurement is given for communication volume, message size, update frequency or bandwidth consumption per scheduling decision. This directly affects the load-bearing scalability assertion for large heterogeneous clusters.

minor comments (1)

Clarify the precise definition and implementation of the stress-aware lexicographical ordering policy, including how stress levels are computed and how ties are broken.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the constructive feedback. We address each major comment below and will incorporate revisions to improve verifiability and strengthen the scalability discussion.

read point-by-point responses

Referee: [Abstract] Abstract and Evaluation section: the claim of significant outperformance in fault tolerance, utilisation and cost supplies no quantitative metrics, baseline schedulers, statistical tests, cluster sizes, workload traces or experimental protocol, rendering the central empirical claim unverifiable from the provided description.

Authors: We agree that the abstract should include key quantitative results for immediate verifiability. The evaluation section reports comparisons against the default Kubernetes scheduler on GKE using specific cluster sizes, standard workload traces for batch and mission-critical jobs, and metrics on fault tolerance, utilization, and cost with statistical tests. We will revise the abstract to summarize representative quantitative improvements and briefly note the experimental protocol. revision: yes
Referee: [Method] Method description: the decentralised execution model relies on each agent receiving GNN-derived global context, yet no analysis or measurement is given for communication volume, message size, update frequency or bandwidth consumption per scheduling decision. This directly affects the load-bearing scalability assertion for large heterogeneous clusters.

Authors: We acknowledge this gap in the scalability analysis. The current manuscript focuses on algorithmic design and end-to-end performance but lacks explicit communication overhead measurements. We will add a dedicated analysis (theoretical bounds plus empirical measurements from GKE runs) covering message sizes for GNN embeddings, update frequencies, and bandwidth consumption across cluster scales. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper proposes AGMARL-DKS as a new architecture with three explicit innovations (cooperative MARL treating nodes as agents, GNN-derived global context per agent, stress-aware lexicographical policy) and reports empirical outperformance on external GKE infrastructure. No equations, fitted parameters, or self-referential definitions appear in the abstract or method description. No load-bearing self-citations, uniqueness theorems, or ansatzes are invoked. The central claims rest on external evaluation rather than any reduction of predictions to inputs by construction. This is the expected non-finding for a proposal-style systems paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract, no explicit free parameters, axioms, or invented entities are described; the approach relies on standard assumptions from multi-agent RL and graph neural network literature applied to the Kubernetes domain.

pith-pipeline@v0.9.0 · 5607 in / 1129 out tokens · 67506 ms · 2026-05-15T11:53:03.617125+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

64 extracted references · 64 canonical work pages · 3 internal anchors

[1]

Journal of Grid Computing 20(4)

Carri´ on C (2022) Kubernetes as a standard container orchestrator - A bibliometric analysis. Journal of Grid Computing 20(4). https://doi.org/10.1007/s10723-022-09629-8

work page doi:10.1007/s10723-022-09629-8 2022
[2]

Applied Sciences 9(5):931

Truyen E, Van Landuyt D, Preuveneers D, Lagaisse B, Joosen W (2019) A comprehensive feature comparison study of Open-Source Container Orchestration Frameworks. Applied Sciences 9(5):931. https://doi.org/10.3390/app9050931

work page doi:10.3390/app9050931 2019
[3]

https://kubernetes.io/docs/concepts/workloads/pods/

work page
[4]

Journal of Cloud Computing Advances Systems and Applications 12(1)

Senjab K, Abbas S, Ahmed N, Khan AUR (2023) A survey of Kubernetes scheduling algorithms. Journal of Cloud Computing Advances Systems and Applications 12(1). https://doi.org/10.1186/s13677-023-00471-1

work page doi:10.1186/s13677-023-00471-1 2023
[5]

IEEE Transactions on Software Engineering 49(4):2722–2740

Zhou N, Zhou H, Hoppe D (2022) Containerization for High performance Com- puting Systems: Survey and Prospects. IEEE Transactions on Software Engineering 49(4):2722–2740. https://doi.org/10.1109/tse.2022.3229221

work page doi:10.1109/tse.2022.3229221 2022
[6]

SN Computer Science 6(3)

Marchese A, Tomarchio O (2025) Enhancing the Kubernetes Platform with a Load-Aware Orchestration Strategy. SN Computer Science 6(3). https://doi.org/10.1007/s42979-025-03712-z

work page doi:10.1007/s42979-025-03712-z 2025
[7]

ACM Computing Surveys 55(7):1–37

Rejiba Z, Chamanara J (2022) Custom Scheduling in Kubernetes: A survey on common problems and solution approaches. ACM Computing Surveys 55(7):1–37. https://doi.org/10.1145/3544788 33

work page doi:10.1145/3544788 2022
[8]

Software Practice and Experience 54(10):2102–2126

Jian Z, Xie X, Fang Y, Jiang Y, Lu Y, Dash A, Li T, Wang G (2023) DRS: A deep reinforcement learning enhanced Kubernetes scheduler for microservice-based system. Software Practice and Experience 54(10):2102–2126. https://doi.org/10.1002/spe.3284

work page doi:10.1002/spe.3284 2023
[9]

Computers 14(9):390

Farid M, Lim HS, Lee CP, Zarakovitis CC, Chien SF (2025) Optimizing Kubernetes with Multi-Objective Scheduling Algorithms: A 5G Perspective. Computers 14(9):390. https://doi.org/10.3390/computers14090390

work page doi:10.3390/computers14090390 2025
[10]

ACM Computing Surveys 55(7):1–37

Carri´ on C (2022b) Kubernetes scheduling: taxonomy, ongoing issues and challenges. ACM Computing Surveys 55(7):1–37. https://doi.org/10.1145/3539606

work page doi:10.1145/3539606
[11]

2024 20th International Conference on Network and Service Management (CNSM) :1–9

Di Cicco N, Poltronieri F, Santos J, Zaccarini M, Tortonesi M, Stefanelli C, De Turck F (2024) Multi-Objective Scheduling and Resource Allocation of Kubernetes Replicas Across the Compute Continuum. 2024 20th International Conference on Network and Service Management (CNSM) :1–9. https://doi.org/10.23919/cnsm62983.2024.10814307

work page doi:10.23919/cnsm62983.2024.10814307 2024
[12]

Mathematics 11(20):4269

Wang X, Zhao K, Qin B (2023c) Optimization of Task-Scheduling strategy in edge kubernetes clusters based on deep reinforcement learning. Mathematics 11(20):4269. https://doi.org/10.3390/math11204269

work page doi:10.3390/math11204269
[13]

arXiv (Cornell University)

Jayanetti A, Halgamuge S, Buyya R (2024b) Reinforcement Learning based Workflow Scheduling in Cloud and Edge Computing Environments: A Taxonomy, Review and Future Directions. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2408.02938

work page doi:10.48550/arxiv.2408.02938
[14]

ScheduleNet: Learn to solve multi-agent scheduling problems with reinforcement learning

Park J, Bakhtiyar S, Park J (2021b) ScheduleNet: Learn to solve multi- agent scheduling problems with reinforcement learning. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2106.03051

work page doi:10.48550/arxiv.2106.03051
[15]

In: arXiv.org

Aina KO, Ha S (2025b) Deep reinforcement learning for Multi-Agent coordination. In: arXiv.org. https://arxiv.org/abs/2510.03592v1

work page arXiv
[16]

Li, Y., Zhong, W. & Wu, Y. Multi-objective flexible job-shop scheduling via graph attention network and reinforcement learning. J Supercomput 81, 293 (2025). https://doi.org/10.1007/s11227-024-06741-2

work page doi:10.1007/s11227-024-06741-2 2025
[17]

In: arXiv.org

Gaon M, Brafman R I (2019b) Reinforcement Learning with Non-Markovian Rewards. In: arXiv.org. https://arxiv.org/abs/1912.02552

work page arXiv 1912
[18]

Zhou, G., Tian, W., Buyya, R. et al. Deep reinforcement learning-based methods for resource scheduling in cloud computing: a review and future directions. Artif Intell Rev 57, 124 (2024). https://doi.org/10.1007/s10462-024-10756-9

work page doi:10.1007/s10462-024-10756-9 2024
[19]

& Javidi, M.M

Jalali Khalil Abadi, Z., Mansouri, N. & Javidi, M.M. Deep reinforcement learning-based scheduling in distributed systems: a critical review. Knowl Inf Syst 66, 5709–5782 (2024). https://doi.org/10.1007/s10115-024-02167-7

work page doi:10.1007/s10115-024-02167-7 2024
[20]

Hardware-based Always-On Heap Memory Safety,

Kim YG, Wu C-J (2020) AutoScale: Energy Efficiency Optimization for Stochastic Edge Inference Using Reinforcement Learning. 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) :1082–1096. https://doi.org/10.1109/micro50266.2020.00090

work page doi:10.1109/micro50266.2020.00090 2020
[21]

2023 IEEE 31st International Conference on Network Protocols (ICNP) :1–6

Rothman J, Chamanara J (2023) An RL-Based Model for Optimized Kubernetes Scheduling. 2023 IEEE 31st International Conference on Network Protocols (ICNP) :1–6. https://doi.org/10.1109/icnp59255.2023.10355623

work page doi:10.1109/icnp59255.2023.10355623 2023
[22]

MSc Thesis, National College of Ireland (2024)

Shukla, J.: Comparative study of RL algorithms for resource optimization schedul- ing in Kubernetes. MSc Thesis, National College of Ireland (2024). Available at: https://norma.ncirl.ie/8056/

work page 2024
[23]

Anouar, H., Hatim, H., Zineb, E.A. (2025). Proposing a Theoretical Energy Aware Framework for Kubernetes Scheduling Using Reinforcement Learning. In: Ezziyyani, M., Kacprzyk, J., Balas, V.E. (eds) International Conference on Advanced Intelligent Sys- tems for Sustainable Developent (AI2SD 2024). AI2SD 2024. Lecture Notes in Networks and Systems, vol 1403....

work page doi:10.1007/978-3-031-91337-2-75 2025
[24]

Electronics 14(4):820

Liang J, Miao H, Li K, Tan J, Wang X, Luo R, Jiang Y (2025) A review of Multi-Agent Reinforcement Learning Algorithms. Electronics 14(4):820. https://doi.org/10.3390/electronics14040820

work page doi:10.3390/electronics14040820 2025
[25]

Electronics 12(12):2614

Danino T, Ben-Shimol Y, Greenberg S (2023) Container allocation in cloud envi- ronment using Multi-Agent deep reinforcement learning. Electronics 12(12):2614. https://doi.org/10.3390/electronics12122614

work page doi:10.3390/electronics12122614 2023
[26]

In: arXiv.org

Soul´ e J, Jamont J-P, Occello M, Traonouez L-M, Th´ eron P (2025) Streamlining Resilient Kubernetes Autoscaling with Multi-Agent Systems via an Automated Online Design Framework. In: arXiv.org. https://arxiv.org/abs/2505.21559v1

work page arXiv 2025
[27]

An introduction to centralized training for decentralized execution in cooperative multi-agent reinforcement learning.arXiv preprint arXiv:2409.03052, 2024

Amato C (2024) An introduction to centralized training for decentralized execution in cooperative Multi-Agent Reinforcement learning. In: arXiv.org. https://arxiv.org/abs/2409.03052v1

work page arXiv 2024
[28]

Xu, L., Chen, W., Liu, X., Chen, YY. (2023). MADDPG: Multi-agent Deep Deterministic Policy Gradient Algorithm for Formation Elliptical Encirclement and Collision Avoid- ance. In: Ren, Z., Wang, M., Hua, Y. (eds) Proceedings of 2021 5th Chinese Conference on Swarm Intelligence and Cooperative Control. Lecture Notes in Electrical Engineering, vol 934. Sprin...

work page doi:10.1007/978-981-19-3998-3-24 2023
[29]

Wang, Y., Wu, F. (2021). Policy Adaptive Multi-agent Deep Deterministic Policy Gra- dient. In: Uchiya, T., Bai, Q., Mars´ a Maestre, I. (eds) PRIMA 2020: Principles and Practice of Multi-Agent Systems. PRIMA 2020. Lecture Notes in Computer Science(), vol 12568. Springer, Cham. https://doi.org/10.1007/978-3-030-69322-0-11

work page doi:10.1007/978-3-030-69322-0-11 2021
[30]

Informatics 10(3):64

Lahande P, Kaveri P, Saini J (2023) Reinforcement learning for reducing the inter- ruptions and increasing fault tolerance in the cloud environment. Informatics 10(3):64. https://doi.org/10.3390/informatics10030064

work page doi:10.3390/informatics10030064 2023
[31]

In: arXiv.org

Xu Z, Gong Y, Zhou Y, Bao Q, Qian W (2024) Enhancing Kubernetes Automated Scheduling with Deep Learning and Reinforcement Techniques for Large-Scale Cloud Computing Optimization. In: arXiv.org. https://arxiv.org/abs/2403.07905v1

work page arXiv 2024
[32]

Proceedings of the 14th International Conference on Agents and Artificial Intelligence :231–242

Kallel A, Rekik M, Khemakhem M (2024) DRL4HFC: Deep Reinforcement Learn- ing for Container-Based Scheduling in Hybrid FOG/Cloud System. Proceedings of the 14th International Conference on Agents and Artificial Intelligence :231–242. https://doi.org/10.5220/0012356800003636

work page doi:10.5220/0012356800003636 2024
[33]

Reliability and performance of resource efficiency in dynamic optimization scheduling using multi-agent microservice cloud-fog on IoT applications

Krishnan, R., Durairaj, S. Reliability and performance of resource efficiency in dynamic optimization scheduling using multi-agent microservice cloud-fog on IoT applications. Computing 106, 3837–3878 (2024). https://doi.org/10.1007/s00607-024-01301-1

work page doi:10.1007/s00607-024-01301-1 2024
[34]

In: arXiv.org

Yang Y, Ren F, Zhang M (2024) A decentralized Multiagent-Based task schedul- ing framework for handling uncertain events in fog computing. In: arXiv.org. https://arxiv.org/abs/2401.02219v1

work page arXiv 2024
[35]

Gasior, J., Seredy´ nski, F. (2015). A Decentralized Multi-agent Approach to Job Scheduling in Cloud Environment. In: Angelov, P., et al. Intelligent Sys- tems’2014. Advances in Intelligent Systems and Computing, vol 322. Springer, Cham. https://doi.org/10.1007/978-3-319-11313-5-36

work page doi:10.1007/978-3-319-11313-5-36 2015
[36]

IEEE Transactions on Parallel and Distributed Systems 32(3):692–707

Gao X, Liu R, Kaushik A (2020) Hierarchical Multi-Agent optimization for resource allocation in cloud computing. IEEE Transactions on Parallel and Distributed Systems 32(3):692–707. https://doi.org/10.1109/tpds.2020.3030920

work page doi:10.1109/tpds.2020.3030920 2020
[37]

Expert Systems With Applications 255:124845

Pan J, Wei Y (2024) A deep reinforcement learning-based scheduling framework for real- time workflows in the cloud environment. Expert Systems With Applications 255:124845. https://doi.org/10.1016/j.eswa.2024.124845

work page doi:10.1016/j.eswa.2024.124845 2024
[38]

Multi-objective application placement in fog computing using graph neural network-based reinforcement learning

Lera, I., Guerrero, C. Multi-objective application placement in fog computing using graph neural network-based reinforcement learning. J Supercomput 80, 27073–27094 (2024). https://doi.org/10.1007/s11227-024-06439-5 35

work page doi:10.1007/s11227-024-06439-5 2024
[39]

A review of cooperative multi-agent deep reinforcement learning

Oroojlooy, A., Hajinezhad, D. A review of cooperative multi-agent deep reinforcement learning. Appl Intell 53, 13677–13722 (2023). https://doi.org/10.1007/s10489-022-04105- y

work page doi:10.1007/s10489-022-04105- 2023
[40]

& Banerjee, B

He, K., Doshi, P. & Banerjee, B. Modeling and reinforcement learning in par- tially observable many-agent systems. Auton Agent Multi-Agent Syst 38, 12 (2024). https://doi.org/10.1007/s10458-024-09640-1

work page doi:10.1007/s10458-024-09640-1 2024
[41]

In: arXiv.org

Lee D, Lim H-D, Kim DW (2023) Continuous-Time distributed dynamic pro- gramming for networked Multi-Agent Markov decision processes. In: arXiv.org. https://arxiv.org/abs/2307.16706v7

work page arXiv 2023
[42]

Applied Sciences 14(10):3960

Chen H-C, Li S-A, Chang T-H, Feng H-M, Chen Y-C (2024) Hybrid centralized train- ing and decentralized execution reinforcement learning in Multi-Agent Path-Finding simulations. Applied Sciences 14(10):3960. https://doi.org/10.3390/app14103960

work page doi:10.3390/app14103960 2024
[43]

Wireless Communications and Mobile Computing 2022:1–18

Liang F, Qian C, Yu W, Griffith D, Golmie N (2022) Survey of Graph Neural Net- works and Applications. Wireless Communications and Mobile Computing 2022:1–18. https://doi.org/10.1155/2022/9261537

work page doi:10.1155/2022/9261537 2022
[44]

IEEE Transactions on Emerging Topics in Computational Intelligence

Bonjour, T., Haliem, M., Alsalem, A., Thomas, S., Li, H., Aggarwal, V., Kejriwal, M., Bhargava, B.: Decision making in monopoly using a hybrid deep reinforcement learn- ing approach. IEEE Transactions on Emerging Topics in Computational Intelligence. 6, 1335–1344 (2022). https://doi.org/10.1109/tetci.2022.3166555

work page doi:10.1109/tetci.2022.3166555 2022
[45]

In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (IJCAI-2022)

Sun Y, Ma L, Liu Y, Wang S, Zhang J, Zheng Y, Yun H, Lei L, Kang Y, Ye L (2022) Lexicographic Multi-Objective Reinforcement learning. Proceed- ings of the Thirty-First International Joint Conference on Artificial Intelligence. https://doi.org/10.24963/ijcai.2022/476

work page doi:10.24963/ijcai.2022/476 2022
[46]

Electronics 14(12):2361

Yang B, Gao L, Zhou F, Yao H, Fu Y, Sun Z, Tian F, Ren H (2025) A coordination optimization framework for Multi-Agent reinforcement learning based on reward redistribution and experience reutilization. Electronics 14(12):2361. https://doi.org/10.3390/electronics14122361

work page doi:10.3390/electronics14122361 2025
[47]

Applied Intelligence 53(12):14819–14837

Sun Q, Yao Y, Yi P, Hu Y, Yang Z, Yang G, Zhou X (2022a) Learning controlled and targeted communication with the centralized critic for the multi-agent system. Applied Intelligence 53(12):14819–14837. https://doi.org/10.1007/s10489-022-04225-5

work page doi:10.1007/s10489-022-04225-5
[48]

Entropy 27(1):4

Li T, Shi D, Jin S, Wang Z, Yang H, Chen Y (2024a) Multi-Agent hier- archical graph Attention Actor–Critic reinforcement learning. Entropy 27(1):4. https://doi.org/10.3390/e27010004

work page doi:10.3390/e27010004
[49]

Applied Intelligence 55(2)

Xiong F, Zhang Y, Kuang X, He L, Han X (2024) Multi-agent dual actor- critic framework for reinforcement learning navigation. Applied Intelligence 55(2). https://doi.org/10.1007/s10489-024-05933-w

work page doi:10.1007/s10489-024-05933-w 2024
[50]

Symmetry 17(5):638

Kim C (2025) Classification-Based Q-Value estimation for continuous Actor-Critic reinforcement learning. Symmetry 17(5):638. https://doi.org/10.3390/sym17050638

work page doi:10.3390/sym17050638 2025
[51]

Neural Information Processing Systems 30:1024–1034

Hamilton WL, Ying Z, Leskovec J (2017) Inductive representation learning on large graphs. Neural Information Processing Systems 30:1024–1034

work page 2017
[52]

& Charafeddine, J

Dornaika, F., Bi, J. & Charafeddine, J. Leveraging Graph Convolutional Networks for Semi-supervised Learning in Multi-view Non-graph Data. Cogn Comput 17, 73 (2025). https://doi.org/10.1007/s12559-025-10428-y

work page doi:10.1007/s12559-025-10428-y 2025
[53]

Konda VR, Tsitsiklis JN (2002) Actor-critic algorithms

work page 2002
[54]

Lu, C., Bao, Q., Xia, S. et al. Centralized reinforcement learning for multi-agent coop- erative environments. Evol. Intel. 17, 267–273 (2024). https://doi.org/10.1007/s12065- 022-00703-4

work page doi:10.1007/s12065- 2024
[55]

European Journal of Operational Research 290(2):469–478

Letsios D, Mistry M, Misener R (2020) Exact lexicographic scheduling and approx- imate rescheduling. European Journal of Operational Research 290(2):469–478. https://doi.org/10.1016/j.ejor.2020.08.032 36

work page doi:10.1016/j.ejor.2020.08.032 2020
[56]

Decisions in Economics and Finance 44(1):411–457

Bubboloni D, Gori M (2020) Breaking ties in collective decision-making. Decisions in Economics and Finance 44(1):411–457. https://doi.org/10.1007/s10203-020-00294-8

work page doi:10.1007/s10203-020-00294-8 2020
[57]

Journal of Systems Architecture 134:102780

Xu X, Liu K, Dai P, Jin F, Ren H, Zhan C, Guo S (2022b) Joint task offloading and resource optimization in NOMA-based vehicular edge computing: A game-theoretic DRL approach. Journal of Systems Architecture 134:102780. https://doi.org/10.1016/j.sysarc.2022.102780

work page doi:10.1016/j.sysarc.2022.102780 2022
[58]

arXiv (Cornell University) 30:6379–6390

Lowe R, Wu Y, Tamar A, Harb J, Abbeel OP, Mordatch I (2017b) Multi-Agent Actor- Critic for mixed Cooperative-Competitive environments. arXiv (Cornell University) 30:6379–6390

work page
[59]

Human-level control through deep reinforcement learning

Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Ried- miller M, Fidjeland AK, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533. https://doi.org/10.1038/nature14236

work page doi:10.1038/nature14236 2015
[60]

David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin Riedmiller. 2014. Deterministic policy gradient algorithms. In Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32 (ICML’14). JMLR.org, I–387–I–395

work page 2014
[61]

A Deeper Look at Experience Replay

Zhang S, Sutton RS (2017) A deeper look at experience replay. In: arXiv.org. https://arxiv.org/abs/1712.01275v3

work page internal anchor Pith review Pith/arXiv arXiv 2017
[62]

Deep Reinforcement Learning using Genetic Algorithm for Parameter Optimization

Sehgal A, La HM, Louis SJ, Nguyen H (2019) Deep Reinforcement Learn- ing using Genetic Algorithm for Parameter Optimization. In: arXiv.org. https://arxiv.org/abs/1905.04100v1

work page internal anchor Pith review Pith/arXiv arXiv 2019
[63]

Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning

Gu S, Lillicrap T, Ghahramani Z, Turner RE, Sch¨ olkopf B, Levine S (2017) Interpo- lated Policy gradient: Merging On-Policy and Off-Policy gradient estimation for deep reinforcement learning. In: arXiv.org. https://arxiv.org/abs/1706.00387v1

work page internal anchor Pith review Pith/arXiv arXiv 2017
[64]

Available: https://cloud.google.com/kubernetes-engine

Google Cloud, ”Google Kubernetes Engine (GKE),” Google Cloud, [Online]. Available: https://cloud.google.com/kubernetes-engine. 37

work page

[1] [1]

Journal of Grid Computing 20(4)

Carri´ on C (2022) Kubernetes as a standard container orchestrator - A bibliometric analysis. Journal of Grid Computing 20(4). https://doi.org/10.1007/s10723-022-09629-8

work page doi:10.1007/s10723-022-09629-8 2022

[2] [2]

Applied Sciences 9(5):931

Truyen E, Van Landuyt D, Preuveneers D, Lagaisse B, Joosen W (2019) A comprehensive feature comparison study of Open-Source Container Orchestration Frameworks. Applied Sciences 9(5):931. https://doi.org/10.3390/app9050931

work page doi:10.3390/app9050931 2019

[3] [3]

https://kubernetes.io/docs/concepts/workloads/pods/

work page

[4] [4]

Journal of Cloud Computing Advances Systems and Applications 12(1)

Senjab K, Abbas S, Ahmed N, Khan AUR (2023) A survey of Kubernetes scheduling algorithms. Journal of Cloud Computing Advances Systems and Applications 12(1). https://doi.org/10.1186/s13677-023-00471-1

work page doi:10.1186/s13677-023-00471-1 2023

[5] [5]

IEEE Transactions on Software Engineering 49(4):2722–2740

Zhou N, Zhou H, Hoppe D (2022) Containerization for High performance Com- puting Systems: Survey and Prospects. IEEE Transactions on Software Engineering 49(4):2722–2740. https://doi.org/10.1109/tse.2022.3229221

work page doi:10.1109/tse.2022.3229221 2022

[6] [6]

SN Computer Science 6(3)

Marchese A, Tomarchio O (2025) Enhancing the Kubernetes Platform with a Load-Aware Orchestration Strategy. SN Computer Science 6(3). https://doi.org/10.1007/s42979-025-03712-z

work page doi:10.1007/s42979-025-03712-z 2025

[7] [7]

ACM Computing Surveys 55(7):1–37

Rejiba Z, Chamanara J (2022) Custom Scheduling in Kubernetes: A survey on common problems and solution approaches. ACM Computing Surveys 55(7):1–37. https://doi.org/10.1145/3544788 33

work page doi:10.1145/3544788 2022

[8] [8]

Software Practice and Experience 54(10):2102–2126

Jian Z, Xie X, Fang Y, Jiang Y, Lu Y, Dash A, Li T, Wang G (2023) DRS: A deep reinforcement learning enhanced Kubernetes scheduler for microservice-based system. Software Practice and Experience 54(10):2102–2126. https://doi.org/10.1002/spe.3284

work page doi:10.1002/spe.3284 2023

[9] [9]

Computers 14(9):390

Farid M, Lim HS, Lee CP, Zarakovitis CC, Chien SF (2025) Optimizing Kubernetes with Multi-Objective Scheduling Algorithms: A 5G Perspective. Computers 14(9):390. https://doi.org/10.3390/computers14090390

work page doi:10.3390/computers14090390 2025

[10] [10]

ACM Computing Surveys 55(7):1–37

Carri´ on C (2022b) Kubernetes scheduling: taxonomy, ongoing issues and challenges. ACM Computing Surveys 55(7):1–37. https://doi.org/10.1145/3539606

work page doi:10.1145/3539606

[11] [11]

2024 20th International Conference on Network and Service Management (CNSM) :1–9

Di Cicco N, Poltronieri F, Santos J, Zaccarini M, Tortonesi M, Stefanelli C, De Turck F (2024) Multi-Objective Scheduling and Resource Allocation of Kubernetes Replicas Across the Compute Continuum. 2024 20th International Conference on Network and Service Management (CNSM) :1–9. https://doi.org/10.23919/cnsm62983.2024.10814307

work page doi:10.23919/cnsm62983.2024.10814307 2024

[12] [12]

Mathematics 11(20):4269

Wang X, Zhao K, Qin B (2023c) Optimization of Task-Scheduling strategy in edge kubernetes clusters based on deep reinforcement learning. Mathematics 11(20):4269. https://doi.org/10.3390/math11204269

work page doi:10.3390/math11204269

[13] [13]

arXiv (Cornell University)

Jayanetti A, Halgamuge S, Buyya R (2024b) Reinforcement Learning based Workflow Scheduling in Cloud and Edge Computing Environments: A Taxonomy, Review and Future Directions. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2408.02938

work page doi:10.48550/arxiv.2408.02938

[14] [14]

ScheduleNet: Learn to solve multi-agent scheduling problems with reinforcement learning

Park J, Bakhtiyar S, Park J (2021b) ScheduleNet: Learn to solve multi- agent scheduling problems with reinforcement learning. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2106.03051

work page doi:10.48550/arxiv.2106.03051

[15] [15]

In: arXiv.org

Aina KO, Ha S (2025b) Deep reinforcement learning for Multi-Agent coordination. In: arXiv.org. https://arxiv.org/abs/2510.03592v1

work page arXiv

[16] [16]

Li, Y., Zhong, W. & Wu, Y. Multi-objective flexible job-shop scheduling via graph attention network and reinforcement learning. J Supercomput 81, 293 (2025). https://doi.org/10.1007/s11227-024-06741-2

work page doi:10.1007/s11227-024-06741-2 2025

[17] [17]

In: arXiv.org

Gaon M, Brafman R I (2019b) Reinforcement Learning with Non-Markovian Rewards. In: arXiv.org. https://arxiv.org/abs/1912.02552

work page arXiv 1912

[18] [18]

Zhou, G., Tian, W., Buyya, R. et al. Deep reinforcement learning-based methods for resource scheduling in cloud computing: a review and future directions. Artif Intell Rev 57, 124 (2024). https://doi.org/10.1007/s10462-024-10756-9

work page doi:10.1007/s10462-024-10756-9 2024

[19] [19]

& Javidi, M.M

Jalali Khalil Abadi, Z., Mansouri, N. & Javidi, M.M. Deep reinforcement learning-based scheduling in distributed systems: a critical review. Knowl Inf Syst 66, 5709–5782 (2024). https://doi.org/10.1007/s10115-024-02167-7

work page doi:10.1007/s10115-024-02167-7 2024

[20] [20]

Hardware-based Always-On Heap Memory Safety,

Kim YG, Wu C-J (2020) AutoScale: Energy Efficiency Optimization for Stochastic Edge Inference Using Reinforcement Learning. 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) :1082–1096. https://doi.org/10.1109/micro50266.2020.00090

work page doi:10.1109/micro50266.2020.00090 2020

[21] [21]

2023 IEEE 31st International Conference on Network Protocols (ICNP) :1–6

Rothman J, Chamanara J (2023) An RL-Based Model for Optimized Kubernetes Scheduling. 2023 IEEE 31st International Conference on Network Protocols (ICNP) :1–6. https://doi.org/10.1109/icnp59255.2023.10355623

work page doi:10.1109/icnp59255.2023.10355623 2023

[22] [22]

MSc Thesis, National College of Ireland (2024)

Shukla, J.: Comparative study of RL algorithms for resource optimization schedul- ing in Kubernetes. MSc Thesis, National College of Ireland (2024). Available at: https://norma.ncirl.ie/8056/

work page 2024

[23] [23]

Anouar, H., Hatim, H., Zineb, E.A. (2025). Proposing a Theoretical Energy Aware Framework for Kubernetes Scheduling Using Reinforcement Learning. In: Ezziyyani, M., Kacprzyk, J., Balas, V.E. (eds) International Conference on Advanced Intelligent Sys- tems for Sustainable Developent (AI2SD 2024). AI2SD 2024. Lecture Notes in Networks and Systems, vol 1403....

work page doi:10.1007/978-3-031-91337-2-75 2025

[24] [24]

Electronics 14(4):820

Liang J, Miao H, Li K, Tan J, Wang X, Luo R, Jiang Y (2025) A review of Multi-Agent Reinforcement Learning Algorithms. Electronics 14(4):820. https://doi.org/10.3390/electronics14040820

work page doi:10.3390/electronics14040820 2025

[25] [25]

Electronics 12(12):2614

Danino T, Ben-Shimol Y, Greenberg S (2023) Container allocation in cloud envi- ronment using Multi-Agent deep reinforcement learning. Electronics 12(12):2614. https://doi.org/10.3390/electronics12122614

work page doi:10.3390/electronics12122614 2023

[26] [26]

In: arXiv.org

Soul´ e J, Jamont J-P, Occello M, Traonouez L-M, Th´ eron P (2025) Streamlining Resilient Kubernetes Autoscaling with Multi-Agent Systems via an Automated Online Design Framework. In: arXiv.org. https://arxiv.org/abs/2505.21559v1

work page arXiv 2025

[27] [27]

An introduction to centralized training for decentralized execution in cooperative multi-agent reinforcement learning.arXiv preprint arXiv:2409.03052, 2024

Amato C (2024) An introduction to centralized training for decentralized execution in cooperative Multi-Agent Reinforcement learning. In: arXiv.org. https://arxiv.org/abs/2409.03052v1

work page arXiv 2024

[28] [28]

Xu, L., Chen, W., Liu, X., Chen, YY. (2023). MADDPG: Multi-agent Deep Deterministic Policy Gradient Algorithm for Formation Elliptical Encirclement and Collision Avoid- ance. In: Ren, Z., Wang, M., Hua, Y. (eds) Proceedings of 2021 5th Chinese Conference on Swarm Intelligence and Cooperative Control. Lecture Notes in Electrical Engineering, vol 934. Sprin...

work page doi:10.1007/978-981-19-3998-3-24 2023

[29] [29]

Wang, Y., Wu, F. (2021). Policy Adaptive Multi-agent Deep Deterministic Policy Gra- dient. In: Uchiya, T., Bai, Q., Mars´ a Maestre, I. (eds) PRIMA 2020: Principles and Practice of Multi-Agent Systems. PRIMA 2020. Lecture Notes in Computer Science(), vol 12568. Springer, Cham. https://doi.org/10.1007/978-3-030-69322-0-11

work page doi:10.1007/978-3-030-69322-0-11 2021

[30] [30]

Informatics 10(3):64

Lahande P, Kaveri P, Saini J (2023) Reinforcement learning for reducing the inter- ruptions and increasing fault tolerance in the cloud environment. Informatics 10(3):64. https://doi.org/10.3390/informatics10030064

work page doi:10.3390/informatics10030064 2023

[31] [31]

In: arXiv.org

Xu Z, Gong Y, Zhou Y, Bao Q, Qian W (2024) Enhancing Kubernetes Automated Scheduling with Deep Learning and Reinforcement Techniques for Large-Scale Cloud Computing Optimization. In: arXiv.org. https://arxiv.org/abs/2403.07905v1

work page arXiv 2024

[32] [32]

Proceedings of the 14th International Conference on Agents and Artificial Intelligence :231–242

Kallel A, Rekik M, Khemakhem M (2024) DRL4HFC: Deep Reinforcement Learn- ing for Container-Based Scheduling in Hybrid FOG/Cloud System. Proceedings of the 14th International Conference on Agents and Artificial Intelligence :231–242. https://doi.org/10.5220/0012356800003636

work page doi:10.5220/0012356800003636 2024

[33] [33]

Reliability and performance of resource efficiency in dynamic optimization scheduling using multi-agent microservice cloud-fog on IoT applications

Krishnan, R., Durairaj, S. Reliability and performance of resource efficiency in dynamic optimization scheduling using multi-agent microservice cloud-fog on IoT applications. Computing 106, 3837–3878 (2024). https://doi.org/10.1007/s00607-024-01301-1

work page doi:10.1007/s00607-024-01301-1 2024

[34] [34]

In: arXiv.org

Yang Y, Ren F, Zhang M (2024) A decentralized Multiagent-Based task schedul- ing framework for handling uncertain events in fog computing. In: arXiv.org. https://arxiv.org/abs/2401.02219v1

work page arXiv 2024

[35] [35]

Gasior, J., Seredy´ nski, F. (2015). A Decentralized Multi-agent Approach to Job Scheduling in Cloud Environment. In: Angelov, P., et al. Intelligent Sys- tems’2014. Advances in Intelligent Systems and Computing, vol 322. Springer, Cham. https://doi.org/10.1007/978-3-319-11313-5-36

work page doi:10.1007/978-3-319-11313-5-36 2015

[36] [36]

IEEE Transactions on Parallel and Distributed Systems 32(3):692–707

Gao X, Liu R, Kaushik A (2020) Hierarchical Multi-Agent optimization for resource allocation in cloud computing. IEEE Transactions on Parallel and Distributed Systems 32(3):692–707. https://doi.org/10.1109/tpds.2020.3030920

work page doi:10.1109/tpds.2020.3030920 2020

[37] [37]

Expert Systems With Applications 255:124845

Pan J, Wei Y (2024) A deep reinforcement learning-based scheduling framework for real- time workflows in the cloud environment. Expert Systems With Applications 255:124845. https://doi.org/10.1016/j.eswa.2024.124845

work page doi:10.1016/j.eswa.2024.124845 2024

[38] [38]

Multi-objective application placement in fog computing using graph neural network-based reinforcement learning

Lera, I., Guerrero, C. Multi-objective application placement in fog computing using graph neural network-based reinforcement learning. J Supercomput 80, 27073–27094 (2024). https://doi.org/10.1007/s11227-024-06439-5 35

work page doi:10.1007/s11227-024-06439-5 2024

[39] [39]

A review of cooperative multi-agent deep reinforcement learning

Oroojlooy, A., Hajinezhad, D. A review of cooperative multi-agent deep reinforcement learning. Appl Intell 53, 13677–13722 (2023). https://doi.org/10.1007/s10489-022-04105- y

work page doi:10.1007/s10489-022-04105- 2023

[40] [40]

& Banerjee, B

He, K., Doshi, P. & Banerjee, B. Modeling and reinforcement learning in par- tially observable many-agent systems. Auton Agent Multi-Agent Syst 38, 12 (2024). https://doi.org/10.1007/s10458-024-09640-1

work page doi:10.1007/s10458-024-09640-1 2024

[41] [41]

In: arXiv.org

Lee D, Lim H-D, Kim DW (2023) Continuous-Time distributed dynamic pro- gramming for networked Multi-Agent Markov decision processes. In: arXiv.org. https://arxiv.org/abs/2307.16706v7

work page arXiv 2023

[42] [42]

Applied Sciences 14(10):3960

Chen H-C, Li S-A, Chang T-H, Feng H-M, Chen Y-C (2024) Hybrid centralized train- ing and decentralized execution reinforcement learning in Multi-Agent Path-Finding simulations. Applied Sciences 14(10):3960. https://doi.org/10.3390/app14103960

work page doi:10.3390/app14103960 2024

[43] [43]

Wireless Communications and Mobile Computing 2022:1–18

Liang F, Qian C, Yu W, Griffith D, Golmie N (2022) Survey of Graph Neural Net- works and Applications. Wireless Communications and Mobile Computing 2022:1–18. https://doi.org/10.1155/2022/9261537

work page doi:10.1155/2022/9261537 2022

[44] [44]

IEEE Transactions on Emerging Topics in Computational Intelligence

Bonjour, T., Haliem, M., Alsalem, A., Thomas, S., Li, H., Aggarwal, V., Kejriwal, M., Bhargava, B.: Decision making in monopoly using a hybrid deep reinforcement learn- ing approach. IEEE Transactions on Emerging Topics in Computational Intelligence. 6, 1335–1344 (2022). https://doi.org/10.1109/tetci.2022.3166555

work page doi:10.1109/tetci.2022.3166555 2022

[45] [45]

In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (IJCAI-2022)

Sun Y, Ma L, Liu Y, Wang S, Zhang J, Zheng Y, Yun H, Lei L, Kang Y, Ye L (2022) Lexicographic Multi-Objective Reinforcement learning. Proceed- ings of the Thirty-First International Joint Conference on Artificial Intelligence. https://doi.org/10.24963/ijcai.2022/476

work page doi:10.24963/ijcai.2022/476 2022

[46] [46]

Electronics 14(12):2361

Yang B, Gao L, Zhou F, Yao H, Fu Y, Sun Z, Tian F, Ren H (2025) A coordination optimization framework for Multi-Agent reinforcement learning based on reward redistribution and experience reutilization. Electronics 14(12):2361. https://doi.org/10.3390/electronics14122361

work page doi:10.3390/electronics14122361 2025

[47] [47]

Applied Intelligence 53(12):14819–14837

Sun Q, Yao Y, Yi P, Hu Y, Yang Z, Yang G, Zhou X (2022a) Learning controlled and targeted communication with the centralized critic for the multi-agent system. Applied Intelligence 53(12):14819–14837. https://doi.org/10.1007/s10489-022-04225-5

work page doi:10.1007/s10489-022-04225-5

[48] [48]

Entropy 27(1):4

Li T, Shi D, Jin S, Wang Z, Yang H, Chen Y (2024a) Multi-Agent hier- archical graph Attention Actor–Critic reinforcement learning. Entropy 27(1):4. https://doi.org/10.3390/e27010004

work page doi:10.3390/e27010004

[49] [49]

Applied Intelligence 55(2)

Xiong F, Zhang Y, Kuang X, He L, Han X (2024) Multi-agent dual actor- critic framework for reinforcement learning navigation. Applied Intelligence 55(2). https://doi.org/10.1007/s10489-024-05933-w

work page doi:10.1007/s10489-024-05933-w 2024

[50] [50]

Symmetry 17(5):638

Kim C (2025) Classification-Based Q-Value estimation for continuous Actor-Critic reinforcement learning. Symmetry 17(5):638. https://doi.org/10.3390/sym17050638

work page doi:10.3390/sym17050638 2025

[51] [51]

Neural Information Processing Systems 30:1024–1034

Hamilton WL, Ying Z, Leskovec J (2017) Inductive representation learning on large graphs. Neural Information Processing Systems 30:1024–1034

work page 2017

[52] [52]

& Charafeddine, J

Dornaika, F., Bi, J. & Charafeddine, J. Leveraging Graph Convolutional Networks for Semi-supervised Learning in Multi-view Non-graph Data. Cogn Comput 17, 73 (2025). https://doi.org/10.1007/s12559-025-10428-y

work page doi:10.1007/s12559-025-10428-y 2025

[53] [53]

Konda VR, Tsitsiklis JN (2002) Actor-critic algorithms

work page 2002

[54] [54]

Lu, C., Bao, Q., Xia, S. et al. Centralized reinforcement learning for multi-agent coop- erative environments. Evol. Intel. 17, 267–273 (2024). https://doi.org/10.1007/s12065- 022-00703-4

work page doi:10.1007/s12065- 2024

[55] [55]

European Journal of Operational Research 290(2):469–478

Letsios D, Mistry M, Misener R (2020) Exact lexicographic scheduling and approx- imate rescheduling. European Journal of Operational Research 290(2):469–478. https://doi.org/10.1016/j.ejor.2020.08.032 36

work page doi:10.1016/j.ejor.2020.08.032 2020

[56] [56]

Decisions in Economics and Finance 44(1):411–457

Bubboloni D, Gori M (2020) Breaking ties in collective decision-making. Decisions in Economics and Finance 44(1):411–457. https://doi.org/10.1007/s10203-020-00294-8

work page doi:10.1007/s10203-020-00294-8 2020

[57] [57]

Journal of Systems Architecture 134:102780

Xu X, Liu K, Dai P, Jin F, Ren H, Zhan C, Guo S (2022b) Joint task offloading and resource optimization in NOMA-based vehicular edge computing: A game-theoretic DRL approach. Journal of Systems Architecture 134:102780. https://doi.org/10.1016/j.sysarc.2022.102780

work page doi:10.1016/j.sysarc.2022.102780 2022

[58] [58]

arXiv (Cornell University) 30:6379–6390

Lowe R, Wu Y, Tamar A, Harb J, Abbeel OP, Mordatch I (2017b) Multi-Agent Actor- Critic for mixed Cooperative-Competitive environments. arXiv (Cornell University) 30:6379–6390

work page

[59] [59]

Human-level control through deep reinforcement learning

Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Ried- miller M, Fidjeland AK, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533. https://doi.org/10.1038/nature14236

work page doi:10.1038/nature14236 2015

[60] [60]

David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin Riedmiller. 2014. Deterministic policy gradient algorithms. In Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32 (ICML’14). JMLR.org, I–387–I–395

work page 2014

[61] [61]

A Deeper Look at Experience Replay

Zhang S, Sutton RS (2017) A deeper look at experience replay. In: arXiv.org. https://arxiv.org/abs/1712.01275v3

work page internal anchor Pith review Pith/arXiv arXiv 2017

[62] [62]

Deep Reinforcement Learning using Genetic Algorithm for Parameter Optimization

Sehgal A, La HM, Louis SJ, Nguyen H (2019) Deep Reinforcement Learn- ing using Genetic Algorithm for Parameter Optimization. In: arXiv.org. https://arxiv.org/abs/1905.04100v1

work page internal anchor Pith review Pith/arXiv arXiv 2019

[63] [63]

Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning

Gu S, Lillicrap T, Ghahramani Z, Turner RE, Sch¨ olkopf B, Levine S (2017) Interpo- lated Policy gradient: Merging On-Policy and Off-Policy gradient estimation for deep reinforcement learning. In: arXiv.org. https://arxiv.org/abs/1706.00387v1

work page internal anchor Pith review Pith/arXiv arXiv 2017

[64] [64]

Available: https://cloud.google.com/kubernetes-engine

Google Cloud, ”Google Kubernetes Engine (GKE),” Google Cloud, [Online]. Available: https://cloud.google.com/kubernetes-engine. 37

work page