pith. machine review for the scientific record. sign in

arxiv: 2602.13211 · v2 · submitted 2026-01-17 · 💻 cs.NI · cs.AI

An Overlay Multicast Routing Method Based on Network Situational Awareness and Hierarchical Multi-Agent Reinforcement Learning

Pith reviewed 2026-05-16 13:10 UTC · model grok-4.3

classification 💻 cs.NI cs.AI
keywords overlay multicasthierarchical reinforcement learningmulti-agent systemssoftware-defined networkingnetwork routingtraffic awarenessdynamic adaptation
0
0 comments X

The pith

MA-DHRL-OM decomposes overlay multicast tree construction into two hierarchical stages using multi-agent reinforcement learning to handle dynamic traffic.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents MA-DHRL-OM, a multi-agent deep hierarchical reinforcement learning method for overlay multicast routing that uses SDN's global network view to create a traffic-aware model. It addresses the limits of traditional overlay multicast and standard reinforcement learning by splitting the multi-objective tree-building task into two coordinated stages, which shrinks the action space and stabilizes learning. Experiments indicate the approach yields lower delay, better bandwidth use, and reduced packet loss compared with prior methods while maintaining more consistent convergence.

Core claim

The central claim is that overlay multicast routing can be made adaptive to dynamic traffic by modeling it as a hierarchical multi-agent reinforcement learning problem: SDN supplies global situational awareness to build a traffic model, and two layers of collaborating agents decompose the construction of the multicast tree into sequential stages that reduce action-space size, balance competing objectives such as delay and bandwidth, and produce stable, scalable routes.

What carries the argument

The MA-DHRL-OM framework, which decomposes overlay multicast tree construction into two hierarchical stages executed by collaborating agents to shrink the joint action space and stabilize multi-objective optimization.

If this is right

  • The method produces lower end-to-end delay than existing overlay multicast schemes under varying loads.
  • Bandwidth utilization improves because the hierarchical agents jointly optimize resource allocation.
  • Packet loss drops due to more stable route selection in changing conditions.
  • Convergence remains consistent across different network scales because the action space is decomposed.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same staged decomposition could be tested on other multi-objective routing tasks such as service chaining or data-center flow scheduling.
  • If coordination overhead remains low at larger agent counts, the approach might scale to cross-domain overlays without central bottlenecks.
  • A natural next measurement would be how performance changes when the traffic model is learned online rather than supplied by SDN.
  • The two-stage split might also be applied to non-multicast problems that combine path selection with resource reservation.

Load-bearing premise

That splitting tree construction into two stages via multi-agent collaboration will reduce action space and stabilize convergence in real dynamic traffic without adding prohibitive coordination overhead.

What would settle it

A side-by-side run of MA-DHRL-OM versus a flat multi-agent baseline on a live network trace with sudden traffic spikes, measuring whether the hierarchical version actually converges faster and incurs lower total overhead.

Figures

Figures reproduced from arXiv: 2602.13211 by Cheng Zhu, Feng Ding, Gai Huang, Miao Ye, Qiuxiang Jiang, Yanye Chen, Yong Wang.

Figure 1
Figure 1. Figure 1: Model architecture. 4.2. Control plane The control plane occupies a central position within the SDN architecture, responsible for managing the global network state, defining forwarding policies, and dynamically scheduling resources. The controller interacts with data plane switches via the southbound OpenFlow protocol to discover network topology, collect network state information, and construct a global n… view at source ↗
Figure 2
Figure 2. Figure 2: State space representation of the upper-layer reinforcement learning algorithm. The source-destination distance matrix is a diagonal matrix, where the values on the diagonal represent the normalized shortest hop count distance from the OM source node to each destination node in the current underlying network topology. The node selection matrix is a diagonal matrix. The values on the diagonal indicate wheth… view at source ↗
Figure 3
Figure 3. Figure 3: State space representation of the lower-layer reinforcement learning algorithm. TM is constructed by normalizing network state information collected from the data plane. It comprises the normalized residual bandwidth matrix ( ), the normalized link latency matrix ( ), and the normalized packet loss rate matrix ( ). All three matrices are adjacency symmetric matrices of size , where is the number of nodes i… view at source ↗
Figure 4
Figure 4. Figure 4: Traffic generation using Iperf. To simulate realistic network traffic, the Iperf traffic generation tool [43] is used to transmit UDP (User Datagram Protocol) packets between nodes. Following the methodology in [30], we emulate the 24- hour variation of network traffic over a full day, as shown in [PITH_FULL_IMAGE:figures/full_fig_p023_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Three network topologies tested in the experiments: (a) 10NodeNet; (b) 14NodeNet; and (c) 21NodeNet. Following the methodology in [45], three network topologies are adopted for performance evaluation, consisting of 10, 14, and 21 nodes, respectively, and named 10NodeNet, 14NodeNet, and 21NodeNet. The corresponding topologies are shown in Figure 5a–c. The link parameters in each topology are randomly genera… view at source ↗
Figure 6
Figure 6. Figure 6: Critic learning rate comparison. Experimental results indicate that increasing the learning rate to accelerates reward curve convergence. At , the reward curve converges but at a slower pace. Raising the learning rate to further speeds convergence. At , the reward curve converges most rapidly compared to other parameter settings. 0.00005 actor_lr = 0.00001 0.00005 0.0001 [PITH_FULL_IMAGE:figures/full_fig_… view at source ↗
Figure 7
Figure 7. Figure 7: Actor learning rate comparison [PITH_FULL_IMAGE:figures/full_fig_p027_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Batch size comparison. Now, with all learning rates fixed, we adjust the batch size parameter for comparison. The experimental results are shown in [PITH_FULL_IMAGE:figures/full_fig_p027_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Bottleneck bandwidth experimental results for three network topologies: (a) 10NodeNet; (b) 14NodeNet; and (c) 21NodeNet. (a) (b) (c) [PITH_FULL_IMAGE:figures/full_fig_p029_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Average total link delay experimental results for three network topologies: (a) 10NodeNet; (b) 14NodeNet; and (c) 21NodeNet. As shown in Figure 10a–c and [PITH_FULL_IMAGE:figures/full_fig_p029_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Average packet loss rate experimental results for three network topologies: (a) 10NodeNet; (b) 14NodeNet; and (c) 21NodeNet [PITH_FULL_IMAGE:figures/full_fig_p030_11.png] view at source ↗
read the original abstract

Compared with IP multicast, Overlay Multicast (OM) offers better compatibility and flexible deployment in heterogeneous, cross-domain networks. However, traditional OM struggles to adapt to dynamic traffic due to unawareness of physical resource states, and existing reinforcement learning methods fail to decouple OM's tightly coupled multi-objective nature, leading to high complexity, slow convergence, and instability. To address this, we propose MA-DHRL-OM, a multi-agent deep hierarchical reinforcement learning approach. Using SDN's global view, it builds a traffic-aware model for OM path planning. The method decomposes OM tree construction into two stages via hierarchical agents, reducing action space and improving convergence stability. Multi-agent collaboration balances multi-objective optimization while enhancing scalability and adaptability. Experiments show MA-DHRL-OM outperforms existing methods in delay, bandwidth utilization, and packet loss, with more stable convergence and flexible routing.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes MA-DHRL-OM, a multi-agent deep hierarchical reinforcement learning method for overlay multicast (OM) routing. Leveraging SDN's global view for network situational awareness, it constructs a traffic-aware model and decomposes OM tree construction into two hierarchical stages with multi-agent collaboration. This is claimed to reduce action space, balance multi-objective optimization (delay, bandwidth utilization, packet loss), improve convergence stability, and enhance scalability and adaptability in dynamic traffic. Experiments are reported to demonstrate outperformance over existing methods in the key metrics along with more stable convergence and flexible routing.

Significance. If the performance claims and the assumption that hierarchical decomposition reliably reduces action space without offsetting coordination overhead hold under detailed validation, the work could advance adaptive OM routing in heterogeneous, cross-domain networks by addressing limitations of traditional methods and flat RL approaches. The integration of SDN situational awareness with hierarchical MARL offers a structured way to handle multi-objective trade-offs, which may prove useful for scalable deployment in volatile traffic environments.

major comments (2)
  1. [Abstract / Experiments] Abstract and Experiments section: The central claims of outperformance in delay, bandwidth utilization, packet loss, and convergence stability rest on experiments whose setup (traffic models, network topologies, baseline algorithms, hyperparameter details, and statistical reporting such as error bars or number of runs) is unspecified. Without these, it is impossible to assess whether gains derive from the hierarchical structure itself or from other factors, undermining the load-bearing assertion that the method improves adaptability.
  2. [Method] Method / Hierarchical agent description: The claim that decomposing OM tree construction into two stages via multi-agent collaboration reduces action space and yields stable convergence without new coordination overhead is load-bearing for the scalability argument. No equations or protocol details are given for inter-agent messages, synchronization, or reward structures between upper- and lower-level agents, and no metrics (e.g., message volume or wall-clock overhead versus flat MARL baselines) are reported to test this assumption in dynamic traffic.
minor comments (1)
  1. [Throughout] Ensure all acronyms (OM, SDN, MA-DHRL-OM, etc.) are defined at first use and used consistently.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments on our manuscript. We address each major comment below and have prepared revisions to strengthen the presentation of our experimental setup and methodological details.

read point-by-point responses
  1. Referee: [Abstract / Experiments] Abstract and Experiments section: The central claims of outperformance in delay, bandwidth utilization, packet loss, and convergence stability rest on experiments whose setup (traffic models, network topologies, baseline algorithms, hyperparameter details, and statistical reporting such as error bars or number of runs) is unspecified. Without these, it is impossible to assess whether gains derive from the hierarchical structure itself or from other factors, undermining the load-bearing assertion that the method improves adaptability.

    Authors: We agree that the experimental setup details were insufficiently specified. In the revised manuscript, we will expand the Experiments section with full descriptions of the traffic models (including arrival processes and parameters), network topologies (e.g., generation method and sizes), all baseline algorithms with their configurations, complete hyperparameter values for the RL components, and statistical reporting (means and standard deviations from 10 independent runs, with error bars on all performance figures). These additions will allow readers to evaluate the source of the reported improvements. revision: yes

  2. Referee: [Method] Method / Hierarchical agent description: The claim that decomposing OM tree construction into two stages via multi-agent collaboration reduces action space and yields stable convergence without new coordination overhead is load-bearing for the scalability argument. No equations or protocol details are given for inter-agent messages, synchronization, or reward structures between upper- and lower-level agents, and no metrics (e.g., message volume or wall-clock overhead versus flat MARL baselines) are reported to test this assumption in dynamic traffic.

    Authors: We acknowledge the need for greater rigor in describing the hierarchical interactions. The revised Method section will include the explicit equations for the upper- and lower-level policies, the per-level reward functions, and the inter-agent communication protocol (message formats, synchronization, and coordination). We will also add new results quantifying coordination overhead (message volume and wall-clock time) relative to flat MARL baselines under dynamic traffic to support the scalability claims. revision: yes

Circularity Check

0 steps flagged

No significant circularity in MA-DHRL-OM derivation

full rationale

The paper proposes MA-DHRL-OM as a novel algorithmic method that uses SDN global view to build a traffic-aware model and decomposes OM tree construction into two hierarchical stages via multi-agent collaboration. This decomposition is presented as a design choice to reduce action space, with performance validated through experiments on delay, bandwidth, and packet loss. No equations, fitted parameters renamed as predictions, self-definitional steps, or load-bearing self-citations appear in the abstract or description. The central claims rest on the proposed algorithm and external experimental comparisons rather than reducing to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach rests on standard RL assumptions plus domain claims about SDN visibility and hierarchical decomposition benefits; no explicit free parameters or invented entities are named in the abstract.

axioms (2)
  • domain assumption SDN provides accurate global view of physical resource states for traffic-aware OM planning
    Invoked in the abstract to justify the traffic-aware model
  • domain assumption Hierarchical decomposition reduces action space and improves convergence stability for multi-objective OM
    Central premise for the two-stage agent design

pith-pipeline@v0.9.0 · 5465 in / 1232 out tokens · 32486 ms · 2026-05-16T13:10:00.256089+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · 1 internal anchor

  1. [1]

    Marques, H

    H. Marques, H. Silva, E. Logota, J. Rodriguez, S. Vahid, R. Tafazolli, Multiview real-time media distribution for next generation networks, Comput. Networks, 118 (2017), 96–124. https://doi.org/10.1016/j.comnet.2017.03.002

  2. [2]

    M. L. Hu, M. Xiao, Y . Hu, C. Cai, T. P. Deng, K. Peng, Software defined multicast using segment routing in LEO satellite networks, IEEE Trans. Mob. Comput., 23 (2024), 835–849. https://doi.org/10.1109/TMC.2022.3215976

  3. [3]

    Y . H. Chu, S. G. Rao, S. Seshan, H. Zhang, A case for end system multicast, IEEE J. Sel. Areas Commun., 20 (2002), 1456–1471. https://doi.org/10.1109/JSAC.2002.803066

  4. [4]

    Hosseini, D

    M. Hosseini, D. T. Ahmed, S. Shirmohammadi, N. D. Georganas, A survey of application-layer multicast protocols, IEEE Commun. Surv. Tutorials , 9 (2007), 58–74. https://doi.org/10.1109/COMST.2007.4317616

  5. [5]

    T. Ruso, C. Chellappan, P. Sivasankar, Ppssm: push/pull smooth video streaming multicast protocol design and implementation for an overlay network, Multimedia Tools Appl., 75 (2016), 17097–17119. https://doi.org/10.1007/s11042-015-2979-5

  6. [6]

    Sampaio, P

    A. Sampaio, P. Sousa, An adaptable and ISP-friendly multicast overlay network, Peer-to-Peer Networking Appl., 12 (2019), 809–829. https://doi.org/10.1007/s12083-018-0680-y

  7. [7]

    Y . Zhu, B. Li, J. Guo, Multicast with network coding in application-layer overlay networks, IEEE J. Sel. Areas Commun., 22 (2004), 107–120. https://doi.org/10.1109/JSAC.2003.818801

  8. [8]

    Zhang, L

    J. Zhang, L. Liu, L. Ramaswamy, C. Pu, Peercast: churn-resilient end system multicast on heterogeneous overlay networks, J. Network Comput. Appl., 31 (2008), 821–850. https://doi.org/10.1016/j.jnca.2007.05.001

  9. [9]

    J. Su, J. Cao, B. Zhang, A survey of the research on ALM stability enhancement, Chin. J. Comput., 32 (2009), 576–590

  10. [10]

    X. C. Zhang, Z. Wang, W. M. Luo, B. P. Yan, Topology-aware application layer multicast scheme, J. Software, 21 (2010), 2010–2022. https://doi.org/10.3724/SP.J.1001.2010.03594

  11. [11]

    Zhang, X

    Y . Zhang, X. Nie, J. Jiang, W. Wang, K. Xu, Y . Zhao, et al., BDS+: an inter-datacenter data replication system with dynamic bandwidth separation, IEEE/ACM Trans. Networking, 29 (2021), 918–934. https://doi.org/10.1109/TNET.2021.3054924

  12. [12]

    C. Kim, Y . Kim, J. H. Yang, I. Yeom, Analysis of bandwidth efficiency in overlay multicasting, Comput. Networks, 52 (2008), 384–398. https://doi.org/10.1016/j.comnet.2007.09.020

  13. [13]

    H. C. Lin, H. M. Yang, An approximation algorithm for constructing degree-dependent node- weighted multicast trees, IEEE Trans. Parallel Distrib. Syst., 25 (2014), 1976–1985. https://doi.org/10.1109/TPDS.2013.108

  14. [14]

    Ruckert, J

    J. Ruckert, J. Blendin, R. Hark, D. Hausheer, Flexible, efficient, and scalable software-defined over- the-top multicast for ISP environments with DynSdm, IEEE Trans. Network Serv. Manage., 13 (2016), 754–767. https://doi.org/10.1109/TNSM.2016.2607281 33

  15. [15]

    Coras, J

    F. Coras, J. Domingo-Pascual, F. Maino, D. Farinacci, A. Cabellos-Aparicio, Lcast: software-defined inter-domain multicast, Comput. Networks , 59 (2014), 153–170. https://doi.org/10.1016/j.bjp.2013.10.010

  16. [16]

    Zhong, F

    H. Zhong, F. Wu, Y . Xu, J. Cui, QoS-aware multicast for scalable video streaming in software-defined networks, IEEE Trans. Multimedia, 23 (2021), 982–994. https://doi.org/10.1109/TMM.2020.2991539

  17. [17]

    Y . Gong, W. Huang, W. Wang, Y . Lei, A survey on software defined networking and its applications, Front. Comput. Sci., 9 (2015), 827–845. https://doi.org/10.1007/s11704-015-3448-z

  18. [18]

    H. W. Da Silva, F. R. Barbalho, A. V . Neto, Cross-layer multiuser session control for optimized communications on SDN-based cloud platforms, Future Gener . Comput. Syst., 92 (2019), 1116–1130. https://doi.org/10.1016/j.future.2017.11.016

  19. [19]

    Y . Shi, J. Wong, H. A. Jacobsen, Y . Zhang, J. Chen, Topic-oriented bucket-based fast multicast routing in SDN-like publish/subscribe middleware, IEEE Access, 8 (2020), 89741–89756. https://doi.org/10.1109/ACCESS.2020.2994268

  20. [20]

    Cao, A minimum delay spanning tree algorithm for the application-layer multicast, J

    J. Cao, A minimum delay spanning tree algorithm for the application-layer multicast, J. Software, 16 (2005), 1766-1773. https://doi.org/10.1360/jos161766

  21. [21]

    Y . Zhu, B. Li, K. Q. Pu, Dynamic multicast in overlay networks with linear capacity constraints, IEEE Trans. Parallel Distrib. Syst., 20 (2009), 925–939. https://doi.org/10.1109/tpds.2008.155

  22. [22]

    Q. Liu, R. Tang, H. Ren, Y . Pei, Optimizing multicast routing tree on application layer via an encoding-free non-dominated sorting genetic algorithm, Appl. Intell., 50 (2020), 759–777. https://doi.org/10.1007/s10489-019-01547-9

  23. [23]

    S. Y . Tseng, C. C. Lin, Y . M. Huang, Ant colony-based algorithm for constructing broadcasting tree with degree and delay constraints, Expert Syst. Appl., 35 (2008), 1473–1481. https://doi.org/10.1016/j.eswa.2007.08.018

  24. [24]

    X. Wang, S. Wang, X. Liang, D. Zhao, J. Huang, X. Xu, et al., Deep reinforcement learning: a survey, IEEE Trans. Neural Networks Learn. Syst. , 35 (2024), 5064–5078. https://doi.org/10.1109/TNNLS.2022.3207346

  25. [25]

    F. Zhao, F. Yin, L. Wang, Y . Y u, A co-evolution algorithm with dueling reinforcement learning mechanism for the energy-aware distributed heterogeneous flexible flow-shop scheduling problem, IEEE Trans. Syst. Man Cybern. Syst., 55 (2025), 1794–1809. https://doi.org/10.1109/TSMC.2024.3510384

  26. [26]

    Z. Pan, D. Lei, L. Wang, A knowledge-based two-population optimization algorithm for distributed energy-efficient parallel machines scheduling, IEEE Trans. Cybern., 52 (2022), 5051–5063. https://doi.org/10.1109/TCYB.2020.3026571

  27. [27]

    H. Wang, B. R. Sarker, J. Li, J. Li, Adaptive scheduling for assembly job shop with uncertain assembly times based on dual Q-learning, Int. J. Prod. Res., 59 (2021), 5867–5883. https://doi.org/10.1080/00207543.2020.1794075

  28. [28]

    X. Li, J. Tian, C. Wang, Y . Jiang, X. Wang, J. Wang, Multi-objective multicast optimization with deep reinforcement learning, Cluster Comput., 28 (2025), 222. https://doi.org/10.1007/s10586-024-04906-5

  29. [29]

    X. Li, Y . Wang, TABDeep: a two-level action branch architecture-based deep reinforcement learning for distributed sub-tree scheduling of online multicast sessions in EON, Comput. Networks, 243 (2024), 110288. https://doi.org/10.1016/j.comnet.2024.110288

  30. [30]

    M. Ye, C. Zhao, P. Wen, Y . Wang, X. Wang, H. Qiu, DHRL-FNMR: an intelligent multicast routing approach based on deep hierarchical reinforcement learning in SDN, IEEE Trans. Network Serv. Manage., 21 (2024), 5733–5755. https://doi.org/10.1109/TNSM.2024.3402275 34

  31. [31]

    Y . Li, Q. Zhang, H. Yao, R. Gao, X. Xin, F. R. Yu, Stigmergy and hierarchical learning for routing optimization in multi-domain collaborative satellite networks, IEEE J. Sel. Areas Commun., 42 (2024), 1188–1203. https://doi.org/10.1109/JSAC.2024.3365878

  32. [32]

    K. Hu, M. Li, Z. Song, K. Xu, Q. Xia, N. Sun, et al., A review of research on reinforcement learning algorithms for multi-agents, Neurocomputing, 599 (2024), 128068. https://doi.org/10.1016/j.neucom.2024.128068

  33. [33]

    P. Wen, M. Ye, Y . Wang, Q. He, H. Qiu, A multi-agent graph reinforcement learning method for many- to-many communication routing in SDWN, Acta Electron. Sin., 53 (2025), 1885–1905

  34. [34]

    J. H. Wang, J. Cai, J. Lu, K. Yin, J. Yang, Solving multicast problem in cloud networks using overlay routing, Comput. Commun., 70 (2015), 1–14. https://doi.org/10.1016/j.comcom.2015.05.016

  35. [35]

    S. Y . Tseng, Y . M. Huang, C. C. Lin, Genetic algorithm for delay- and degree-constrained multimedia broadcasting on overlay networks, Comput. Commun., 29 (2006), 3625–3632. https://doi.org/10.1016/j.comcom.2006.06.003

  36. [36]

    L. Lin, J. Zhou, L. Zhang, Z. Ye, Overlay multicast routing algorithm with minimum overlay cost, J. Comput. Appl., 10 (2008), 2569–2576. https://doi.org/10.3724/SP.J.1087.2008.02569

  37. [37]

    Q. Liu, Y . Wang, X. Li, H. Li, Gene-pool based genetic algorithm for optimizing application layer multicast, Comput. Eng. Appl., 55 (2019), 142–150. https://doi.org/10.3778/j.issn.1002-8331.1903- 0444

  38. [38]

    Y . Li, N. Wang, W. Zhang, Q. Liu, F. Liu, Discrete artificial fish swarm algorithm-based one-off optimization method for multiple co-existing application layer multicast routing trees, Electronics, 13 (2024), 894. https://doi.org/10.3390/electronics13050894

  39. [39]

    J. Chae, N. Kim, Multicast tree generation using meta reinforcement learning in SDN-based smart network platforms, KSII Trans. Internet Inf. Syst. , 15 (2021), 3138–3150. https://doi.org/10.3837/tiis.2021.09.003

  40. [40]

    M. Ye, H. W. Hu, Y . Wang, Q. He, X. L. Wang, P. Wen, et al., MA-CDMR: an intelligent cross domain multicast routing method based on multi-agent deep reinforcement learning in SDWN multi controller domain, Chin. J. Comput. , 48 (2025), 1417–1442. https://doi.org/10.11897/SP.J.1016.2025.01417

  41. [41]

    M. Kim, H. Choo, M. W. Mutka, H. J. Lim, K. Park, On QoS multicast routing algorithms using k- minimum Steiner trees, Inf. Sci., 238 (2013), 190–204. https://doi.org/10.1016/j.ins.2013.03.006

  42. [42]

    Available from: https://mininet-wifi.github.io/ (accessed Mar.16, 2023)

    Mininet-WIFI. Available from: https://mininet-wifi.github.io/ (accessed Mar.16, 2023)

  43. [43]

    Available from: https://iperf.fr (accessed Mar

    iPerf. Available from: https://iperf.fr (accessed Mar. 16, 2023)

  44. [44]

    Available from: https://ryu-sdn.org/ (accessed Mar

    Ryu. Available from: https://ryu-sdn.org/ (accessed Mar. 16, 2023)

  45. [45]

    Y . R. Chen, A. Rezapour, W. G. Tzeng, S. C. Tsai, RL-routing: An SDN routing algorithm based on deep reinforcement learning, IEEE Trans. Network Sci. Eng., 7 (2020), 3185–3199. https://doi.org/10.1109/TNSE.2020.3017751