AGMARL-DKS: An Adaptive Graph-Enhanced Multi-Agent Reinforcement Learning for Dynamic Kubernetes Scheduling
Pith reviewed 2026-05-15 11:53 UTC · model grok-4.3
The pith
AGMARL-DKS treats each Kubernetes node as an RL agent that uses graph-derived global context and a stress-aware lexicographical policy to schedule workloads more effectively than the default scheduler.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
AGMARL-DKS constructs a scalable scheduler by assigning an RL agent to every cluster node, supplying each agent with a global state representation extracted by a graph neural network, and guiding decisions with a stress-aware lexicographical ordering of objectives; this combination produces scheduling actions that improve fault tolerance, resource utilization, and cost relative to the default Kubernetes scheduler when tested on Google Kubernetes Engine, particularly for batch and mission-critical workloads.
What carries the argument
AGMARL-DKS, the multi-agent RL system in which each node is an agent that receives GNN-derived global cluster context and applies a stress-aware lexicographical ordering policy to multi-objective rewards.
Load-bearing premise
That assigning an RL agent to every node, feeding it GNN global context, and using a lexicographical policy will scale to large heterogeneous clusters without excessive communication or training overhead.
What would settle it
A controlled run on a large heterogeneous cluster in which AGMARL-DKS shows no improvement or a decline in fault-tolerance or utilization metrics compared with the default scheduler under identical workload traces.
read the original abstract
State-of-the-art cloud-native applications require intelligent schedulers that can effectively balance system stability, resource utilisation, and associated costs. While Kubernetes provides feasibility-based placement by default, recent research efforts have explored the use of reinforcement learning (RL) for more intelligent scheduling decisions. However, current RL-based schedulers have three major limitations. First, most of these schedulers use monolithic centralised agents, which are non-scalable for large heterogeneous clusters. Second, the ones that use multi-objective reward functions assume simple, static, linear combinations of the objectives. Third, no previous work has produced a stress-aware scheduler that can react adaptively to dynamic conditions. To address these gaps in current research, we propose the Adaptive Graph-enhanced Multi-Agent Reinforcement Learning Dynamic Kubernetes Scheduler (AGMARL-DKS). AGMARL-DKS addresses these gaps by introducing three major innovations. First, we construct a scalable solution by treating the scheduling challenge as a cooperative multi-agent problem, where every cluster node operates as an agent, employing centralised training methods before decentralised execution. Second, to be context-aware and yet decentralised, we use a Graph Neural Network (GNN) to build a state representation of the global cluster context at each agent. This represents an improvement over methods that rely solely on local observations. Finally, to make trade-offs between these objectives, we use a stress-aware lexicographical ordering policy instead of a simple, static linear weighting of these objectives. The evaluations in Google Kubernetes Engine (GKE) reveal that AGMARL-DKS significantly outperforms the default scheduler in terms of fault tolerance, utilisation, and cost, especially in scheduling batch and mission-critical workloads.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes AGMARL-DKS, a cooperative multi-agent RL scheduler for Kubernetes in which each cluster node acts as an independent agent. It uses centralised training with decentralised execution, GNNs to supply each agent with global cluster state representations, and a stress-aware lexicographical ordering policy for adaptive multi-objective trade-offs. The central claim is that this design yields significant gains over the default Kubernetes scheduler in fault tolerance, resource utilisation and cost on GKE, especially for batch and mission-critical workloads.
Significance. If the empirical results and scalability claims hold after detailed verification, the work would offer a concrete path toward decentralised yet globally aware RL scheduling that avoids both monolithic central agents and static linear reward weighting, potentially influencing production cloud-native orchestration systems.
major comments (2)
- [Abstract] Abstract and Evaluation section: the claim of significant outperformance in fault tolerance, utilisation and cost supplies no quantitative metrics, baseline schedulers, statistical tests, cluster sizes, workload traces or experimental protocol, rendering the central empirical claim unverifiable from the provided description.
- [Method] Method description: the decentralised execution model relies on each agent receiving GNN-derived global context, yet no analysis or measurement is given for communication volume, message size, update frequency or bandwidth consumption per scheduling decision. This directly affects the load-bearing scalability assertion for large heterogeneous clusters.
minor comments (1)
- Clarify the precise definition and implementation of the stress-aware lexicographical ordering policy, including how stress levels are computed and how ties are broken.
Simulated Author's Rebuttal
Thank you for the constructive feedback. We address each major comment below and will incorporate revisions to improve verifiability and strengthen the scalability discussion.
read point-by-point responses
-
Referee: [Abstract] Abstract and Evaluation section: the claim of significant outperformance in fault tolerance, utilisation and cost supplies no quantitative metrics, baseline schedulers, statistical tests, cluster sizes, workload traces or experimental protocol, rendering the central empirical claim unverifiable from the provided description.
Authors: We agree that the abstract should include key quantitative results for immediate verifiability. The evaluation section reports comparisons against the default Kubernetes scheduler on GKE using specific cluster sizes, standard workload traces for batch and mission-critical jobs, and metrics on fault tolerance, utilization, and cost with statistical tests. We will revise the abstract to summarize representative quantitative improvements and briefly note the experimental protocol. revision: yes
-
Referee: [Method] Method description: the decentralised execution model relies on each agent receiving GNN-derived global context, yet no analysis or measurement is given for communication volume, message size, update frequency or bandwidth consumption per scheduling decision. This directly affects the load-bearing scalability assertion for large heterogeneous clusters.
Authors: We acknowledge this gap in the scalability analysis. The current manuscript focuses on algorithmic design and end-to-end performance but lacks explicit communication overhead measurements. We will add a dedicated analysis (theoretical bounds plus empirical measurements from GKE runs) covering message sizes for GNN embeddings, update frequencies, and bandwidth consumption across cluster scales. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper proposes AGMARL-DKS as a new architecture with three explicit innovations (cooperative MARL treating nodes as agents, GNN-derived global context per agent, stress-aware lexicographical policy) and reports empirical outperformance on external GKE infrastructure. No equations, fitted parameters, or self-referential definitions appear in the abstract or method description. No load-bearing self-citations, uniqueness theorems, or ansatzes are invoked. The central claims rest on external evaluation rather than any reduction of predictions to inputs by construction. This is the expected non-finding for a proposal-style systems paper.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Journal of Grid Computing 20(4)
Carri´ on C (2022) Kubernetes as a standard container orchestrator - A bibliometric analysis. Journal of Grid Computing 20(4). https://doi.org/10.1007/s10723-022-09629-8
-
[2]
Truyen E, Van Landuyt D, Preuveneers D, Lagaisse B, Joosen W (2019) A comprehensive feature comparison study of Open-Source Container Orchestration Frameworks. Applied Sciences 9(5):931. https://doi.org/10.3390/app9050931
-
[3]
https://kubernetes.io/docs/concepts/workloads/pods/
-
[4]
Journal of Cloud Computing Advances Systems and Applications 12(1)
Senjab K, Abbas S, Ahmed N, Khan AUR (2023) A survey of Kubernetes scheduling algorithms. Journal of Cloud Computing Advances Systems and Applications 12(1). https://doi.org/10.1186/s13677-023-00471-1
-
[5]
IEEE Transactions on Software Engineering 49(4):2722–2740
Zhou N, Zhou H, Hoppe D (2022) Containerization for High performance Com- puting Systems: Survey and Prospects. IEEE Transactions on Software Engineering 49(4):2722–2740. https://doi.org/10.1109/tse.2022.3229221
-
[6]
Marchese A, Tomarchio O (2025) Enhancing the Kubernetes Platform with a Load-Aware Orchestration Strategy. SN Computer Science 6(3). https://doi.org/10.1007/s42979-025-03712-z
-
[7]
ACM Computing Surveys 55(7):1–37
Rejiba Z, Chamanara J (2022) Custom Scheduling in Kubernetes: A survey on common problems and solution approaches. ACM Computing Surveys 55(7):1–37. https://doi.org/10.1145/3544788 33
-
[8]
Software Practice and Experience 54(10):2102–2126
Jian Z, Xie X, Fang Y, Jiang Y, Lu Y, Dash A, Li T, Wang G (2023) DRS: A deep reinforcement learning enhanced Kubernetes scheduler for microservice-based system. Software Practice and Experience 54(10):2102–2126. https://doi.org/10.1002/spe.3284
-
[9]
Farid M, Lim HS, Lee CP, Zarakovitis CC, Chien SF (2025) Optimizing Kubernetes with Multi-Objective Scheduling Algorithms: A 5G Perspective. Computers 14(9):390. https://doi.org/10.3390/computers14090390
-
[10]
ACM Computing Surveys 55(7):1–37
Carri´ on C (2022b) Kubernetes scheduling: taxonomy, ongoing issues and challenges. ACM Computing Surveys 55(7):1–37. https://doi.org/10.1145/3539606
-
[11]
2024 20th International Conference on Network and Service Management (CNSM) :1–9
Di Cicco N, Poltronieri F, Santos J, Zaccarini M, Tortonesi M, Stefanelli C, De Turck F (2024) Multi-Objective Scheduling and Resource Allocation of Kubernetes Replicas Across the Compute Continuum. 2024 20th International Conference on Network and Service Management (CNSM) :1–9. https://doi.org/10.23919/cnsm62983.2024.10814307
-
[12]
Wang X, Zhao K, Qin B (2023c) Optimization of Task-Scheduling strategy in edge kubernetes clusters based on deep reinforcement learning. Mathematics 11(20):4269. https://doi.org/10.3390/math11204269
-
[13]
Jayanetti A, Halgamuge S, Buyya R (2024b) Reinforcement Learning based Workflow Scheduling in Cloud and Edge Computing Environments: A Taxonomy, Review and Future Directions. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2408.02938
-
[14]
ScheduleNet: Learn to solve multi-agent scheduling problems with reinforcement learning
Park J, Bakhtiyar S, Park J (2021b) ScheduleNet: Learn to solve multi- agent scheduling problems with reinforcement learning. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2106.03051
-
[15]
Aina KO, Ha S (2025b) Deep reinforcement learning for Multi-Agent coordination. In: arXiv.org. https://arxiv.org/abs/2510.03592v1
-
[16]
Li, Y., Zhong, W. & Wu, Y. Multi-objective flexible job-shop scheduling via graph attention network and reinforcement learning. J Supercomput 81, 293 (2025). https://doi.org/10.1007/s11227-024-06741-2
-
[17]
Gaon M, Brafman R I (2019b) Reinforcement Learning with Non-Markovian Rewards. In: arXiv.org. https://arxiv.org/abs/1912.02552
-
[18]
Zhou, G., Tian, W., Buyya, R. et al. Deep reinforcement learning-based methods for resource scheduling in cloud computing: a review and future directions. Artif Intell Rev 57, 124 (2024). https://doi.org/10.1007/s10462-024-10756-9
-
[19]
Jalali Khalil Abadi, Z., Mansouri, N. & Javidi, M.M. Deep reinforcement learning-based scheduling in distributed systems: a critical review. Knowl Inf Syst 66, 5709–5782 (2024). https://doi.org/10.1007/s10115-024-02167-7
-
[20]
Hardware-based Always-On Heap Memory Safety,
Kim YG, Wu C-J (2020) AutoScale: Energy Efficiency Optimization for Stochastic Edge Inference Using Reinforcement Learning. 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) :1082–1096. https://doi.org/10.1109/micro50266.2020.00090
-
[21]
2023 IEEE 31st International Conference on Network Protocols (ICNP) :1–6
Rothman J, Chamanara J (2023) An RL-Based Model for Optimized Kubernetes Scheduling. 2023 IEEE 31st International Conference on Network Protocols (ICNP) :1–6. https://doi.org/10.1109/icnp59255.2023.10355623
-
[22]
MSc Thesis, National College of Ireland (2024)
Shukla, J.: Comparative study of RL algorithms for resource optimization schedul- ing in Kubernetes. MSc Thesis, National College of Ireland (2024). Available at: https://norma.ncirl.ie/8056/
work page 2024
-
[23]
Anouar, H., Hatim, H., Zineb, E.A. (2025). Proposing a Theoretical Energy Aware Framework for Kubernetes Scheduling Using Reinforcement Learning. In: Ezziyyani, M., Kacprzyk, J., Balas, V.E. (eds) International Conference on Advanced Intelligent Sys- tems for Sustainable Developent (AI2SD 2024). AI2SD 2024. Lecture Notes in Networks and Systems, vol 1403....
-
[24]
Liang J, Miao H, Li K, Tan J, Wang X, Luo R, Jiang Y (2025) A review of Multi-Agent Reinforcement Learning Algorithms. Electronics 14(4):820. https://doi.org/10.3390/electronics14040820
-
[25]
Danino T, Ben-Shimol Y, Greenberg S (2023) Container allocation in cloud envi- ronment using Multi-Agent deep reinforcement learning. Electronics 12(12):2614. https://doi.org/10.3390/electronics12122614
-
[26]
Soul´ e J, Jamont J-P, Occello M, Traonouez L-M, Th´ eron P (2025) Streamlining Resilient Kubernetes Autoscaling with Multi-Agent Systems via an Automated Online Design Framework. In: arXiv.org. https://arxiv.org/abs/2505.21559v1
-
[27]
Amato C (2024) An introduction to centralized training for decentralized execution in cooperative Multi-Agent Reinforcement learning. In: arXiv.org. https://arxiv.org/abs/2409.03052v1
-
[28]
Xu, L., Chen, W., Liu, X., Chen, YY. (2023). MADDPG: Multi-agent Deep Deterministic Policy Gradient Algorithm for Formation Elliptical Encirclement and Collision Avoid- ance. In: Ren, Z., Wang, M., Hua, Y. (eds) Proceedings of 2021 5th Chinese Conference on Swarm Intelligence and Cooperative Control. Lecture Notes in Electrical Engineering, vol 934. Sprin...
-
[29]
Wang, Y., Wu, F. (2021). Policy Adaptive Multi-agent Deep Deterministic Policy Gra- dient. In: Uchiya, T., Bai, Q., Mars´ a Maestre, I. (eds) PRIMA 2020: Principles and Practice of Multi-Agent Systems. PRIMA 2020. Lecture Notes in Computer Science(), vol 12568. Springer, Cham. https://doi.org/10.1007/978-3-030-69322-0-11
-
[30]
Lahande P, Kaveri P, Saini J (2023) Reinforcement learning for reducing the inter- ruptions and increasing fault tolerance in the cloud environment. Informatics 10(3):64. https://doi.org/10.3390/informatics10030064
-
[31]
Xu Z, Gong Y, Zhou Y, Bao Q, Qian W (2024) Enhancing Kubernetes Automated Scheduling with Deep Learning and Reinforcement Techniques for Large-Scale Cloud Computing Optimization. In: arXiv.org. https://arxiv.org/abs/2403.07905v1
-
[32]
Proceedings of the 14th International Conference on Agents and Artificial Intelligence :231–242
Kallel A, Rekik M, Khemakhem M (2024) DRL4HFC: Deep Reinforcement Learn- ing for Container-Based Scheduling in Hybrid FOG/Cloud System. Proceedings of the 14th International Conference on Agents and Artificial Intelligence :231–242. https://doi.org/10.5220/0012356800003636
-
[33]
Krishnan, R., Durairaj, S. Reliability and performance of resource efficiency in dynamic optimization scheduling using multi-agent microservice cloud-fog on IoT applications. Computing 106, 3837–3878 (2024). https://doi.org/10.1007/s00607-024-01301-1
-
[34]
Yang Y, Ren F, Zhang M (2024) A decentralized Multiagent-Based task schedul- ing framework for handling uncertain events in fog computing. In: arXiv.org. https://arxiv.org/abs/2401.02219v1
-
[35]
Gasior, J., Seredy´ nski, F. (2015). A Decentralized Multi-agent Approach to Job Scheduling in Cloud Environment. In: Angelov, P., et al. Intelligent Sys- tems’2014. Advances in Intelligent Systems and Computing, vol 322. Springer, Cham. https://doi.org/10.1007/978-3-319-11313-5-36
-
[36]
IEEE Transactions on Parallel and Distributed Systems 32(3):692–707
Gao X, Liu R, Kaushik A (2020) Hierarchical Multi-Agent optimization for resource allocation in cloud computing. IEEE Transactions on Parallel and Distributed Systems 32(3):692–707. https://doi.org/10.1109/tpds.2020.3030920
-
[37]
Expert Systems With Applications 255:124845
Pan J, Wei Y (2024) A deep reinforcement learning-based scheduling framework for real- time workflows in the cloud environment. Expert Systems With Applications 255:124845. https://doi.org/10.1016/j.eswa.2024.124845
-
[38]
Lera, I., Guerrero, C. Multi-objective application placement in fog computing using graph neural network-based reinforcement learning. J Supercomput 80, 27073–27094 (2024). https://doi.org/10.1007/s11227-024-06439-5 35
-
[39]
A review of cooperative multi-agent deep reinforcement learning
Oroojlooy, A., Hajinezhad, D. A review of cooperative multi-agent deep reinforcement learning. Appl Intell 53, 13677–13722 (2023). https://doi.org/10.1007/s10489-022-04105- y
-
[40]
He, K., Doshi, P. & Banerjee, B. Modeling and reinforcement learning in par- tially observable many-agent systems. Auton Agent Multi-Agent Syst 38, 12 (2024). https://doi.org/10.1007/s10458-024-09640-1
-
[41]
Lee D, Lim H-D, Kim DW (2023) Continuous-Time distributed dynamic pro- gramming for networked Multi-Agent Markov decision processes. In: arXiv.org. https://arxiv.org/abs/2307.16706v7
-
[42]
Chen H-C, Li S-A, Chang T-H, Feng H-M, Chen Y-C (2024) Hybrid centralized train- ing and decentralized execution reinforcement learning in Multi-Agent Path-Finding simulations. Applied Sciences 14(10):3960. https://doi.org/10.3390/app14103960
-
[43]
Wireless Communications and Mobile Computing 2022:1–18
Liang F, Qian C, Yu W, Griffith D, Golmie N (2022) Survey of Graph Neural Net- works and Applications. Wireless Communications and Mobile Computing 2022:1–18. https://doi.org/10.1155/2022/9261537
-
[44]
IEEE Transactions on Emerging Topics in Computational Intelligence
Bonjour, T., Haliem, M., Alsalem, A., Thomas, S., Li, H., Aggarwal, V., Kejriwal, M., Bhargava, B.: Decision making in monopoly using a hybrid deep reinforcement learn- ing approach. IEEE Transactions on Emerging Topics in Computational Intelligence. 6, 1335–1344 (2022). https://doi.org/10.1109/tetci.2022.3166555
-
[45]
Sun Y, Ma L, Liu Y, Wang S, Zhang J, Zheng Y, Yun H, Lei L, Kang Y, Ye L (2022) Lexicographic Multi-Objective Reinforcement learning. Proceed- ings of the Thirty-First International Joint Conference on Artificial Intelligence. https://doi.org/10.24963/ijcai.2022/476
-
[46]
Yang B, Gao L, Zhou F, Yao H, Fu Y, Sun Z, Tian F, Ren H (2025) A coordination optimization framework for Multi-Agent reinforcement learning based on reward redistribution and experience reutilization. Electronics 14(12):2361. https://doi.org/10.3390/electronics14122361
-
[47]
Applied Intelligence 53(12):14819–14837
Sun Q, Yao Y, Yi P, Hu Y, Yang Z, Yang G, Zhou X (2022a) Learning controlled and targeted communication with the centralized critic for the multi-agent system. Applied Intelligence 53(12):14819–14837. https://doi.org/10.1007/s10489-022-04225-5
-
[48]
Li T, Shi D, Jin S, Wang Z, Yang H, Chen Y (2024a) Multi-Agent hier- archical graph Attention Actor–Critic reinforcement learning. Entropy 27(1):4. https://doi.org/10.3390/e27010004
-
[49]
Xiong F, Zhang Y, Kuang X, He L, Han X (2024) Multi-agent dual actor- critic framework for reinforcement learning navigation. Applied Intelligence 55(2). https://doi.org/10.1007/s10489-024-05933-w
-
[50]
Kim C (2025) Classification-Based Q-Value estimation for continuous Actor-Critic reinforcement learning. Symmetry 17(5):638. https://doi.org/10.3390/sym17050638
-
[51]
Neural Information Processing Systems 30:1024–1034
Hamilton WL, Ying Z, Leskovec J (2017) Inductive representation learning on large graphs. Neural Information Processing Systems 30:1024–1034
work page 2017
-
[52]
Dornaika, F., Bi, J. & Charafeddine, J. Leveraging Graph Convolutional Networks for Semi-supervised Learning in Multi-view Non-graph Data. Cogn Comput 17, 73 (2025). https://doi.org/10.1007/s12559-025-10428-y
-
[53]
Konda VR, Tsitsiklis JN (2002) Actor-critic algorithms
work page 2002
-
[54]
Lu, C., Bao, Q., Xia, S. et al. Centralized reinforcement learning for multi-agent coop- erative environments. Evol. Intel. 17, 267–273 (2024). https://doi.org/10.1007/s12065- 022-00703-4
-
[55]
European Journal of Operational Research 290(2):469–478
Letsios D, Mistry M, Misener R (2020) Exact lexicographic scheduling and approx- imate rescheduling. European Journal of Operational Research 290(2):469–478. https://doi.org/10.1016/j.ejor.2020.08.032 36
-
[56]
Decisions in Economics and Finance 44(1):411–457
Bubboloni D, Gori M (2020) Breaking ties in collective decision-making. Decisions in Economics and Finance 44(1):411–457. https://doi.org/10.1007/s10203-020-00294-8
-
[57]
Journal of Systems Architecture 134:102780
Xu X, Liu K, Dai P, Jin F, Ren H, Zhan C, Guo S (2022b) Joint task offloading and resource optimization in NOMA-based vehicular edge computing: A game-theoretic DRL approach. Journal of Systems Architecture 134:102780. https://doi.org/10.1016/j.sysarc.2022.102780
-
[58]
arXiv (Cornell University) 30:6379–6390
Lowe R, Wu Y, Tamar A, Harb J, Abbeel OP, Mordatch I (2017b) Multi-Agent Actor- Critic for mixed Cooperative-Competitive environments. arXiv (Cornell University) 30:6379–6390
-
[59]
Human-level control through deep reinforcement learning
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Ried- miller M, Fidjeland AK, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533. https://doi.org/10.1038/nature14236
-
[60]
David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin Riedmiller. 2014. Deterministic policy gradient algorithms. In Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32 (ICML’14). JMLR.org, I–387–I–395
work page 2014
-
[61]
A Deeper Look at Experience Replay
Zhang S, Sutton RS (2017) A deeper look at experience replay. In: arXiv.org. https://arxiv.org/abs/1712.01275v3
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[62]
Deep Reinforcement Learning using Genetic Algorithm for Parameter Optimization
Sehgal A, La HM, Louis SJ, Nguyen H (2019) Deep Reinforcement Learn- ing using Genetic Algorithm for Parameter Optimization. In: arXiv.org. https://arxiv.org/abs/1905.04100v1
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[63]
Gu S, Lillicrap T, Ghahramani Z, Turner RE, Sch¨ olkopf B, Levine S (2017) Interpo- lated Policy gradient: Merging On-Policy and Off-Policy gradient estimation for deep reinforcement learning. In: arXiv.org. https://arxiv.org/abs/1706.00387v1
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[64]
Available: https://cloud.google.com/kubernetes-engine
Google Cloud, ”Google Kubernetes Engine (GKE),” Google Cloud, [Online]. Available: https://cloud.google.com/kubernetes-engine. 37
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.