arxiv: 2602.13211 · v2 · submitted 2026-01-17 · 💻 cs.NI · cs.AI

An Overlay Multicast Routing Method Based on Network Situational Awareness and Hierarchical Multi-Agent Reinforcement Learning

Miao Ye , Yanye Chen , Yong Wang , Cheng Zhu , Qiuxiang Jiang , Gai Huang , Feng Ding This is my paper

Pith reviewed 2026-05-16 13:10 UTC · model grok-4.3

classification 💻 cs.NI cs.AI

keywords overlay multicasthierarchical reinforcement learningmulti-agent systemssoftware-defined networkingnetwork routingtraffic awarenessdynamic adaptation

0 comments

The pith

MA-DHRL-OM decomposes overlay multicast tree construction into two hierarchical stages using multi-agent reinforcement learning to handle dynamic traffic.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents MA-DHRL-OM, a multi-agent deep hierarchical reinforcement learning method for overlay multicast routing that uses SDN's global network view to create a traffic-aware model. It addresses the limits of traditional overlay multicast and standard reinforcement learning by splitting the multi-objective tree-building task into two coordinated stages, which shrinks the action space and stabilizes learning. Experiments indicate the approach yields lower delay, better bandwidth use, and reduced packet loss compared with prior methods while maintaining more consistent convergence.

Core claim

The central claim is that overlay multicast routing can be made adaptive to dynamic traffic by modeling it as a hierarchical multi-agent reinforcement learning problem: SDN supplies global situational awareness to build a traffic model, and two layers of collaborating agents decompose the construction of the multicast tree into sequential stages that reduce action-space size, balance competing objectives such as delay and bandwidth, and produce stable, scalable routes.

What carries the argument

The MA-DHRL-OM framework, which decomposes overlay multicast tree construction into two hierarchical stages executed by collaborating agents to shrink the joint action space and stabilize multi-objective optimization.

If this is right

The method produces lower end-to-end delay than existing overlay multicast schemes under varying loads.
Bandwidth utilization improves because the hierarchical agents jointly optimize resource allocation.
Packet loss drops due to more stable route selection in changing conditions.
Convergence remains consistent across different network scales because the action space is decomposed.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same staged decomposition could be tested on other multi-objective routing tasks such as service chaining or data-center flow scheduling.
If coordination overhead remains low at larger agent counts, the approach might scale to cross-domain overlays without central bottlenecks.
A natural next measurement would be how performance changes when the traffic model is learned online rather than supplied by SDN.
The two-stage split might also be applied to non-multicast problems that combine path selection with resource reservation.

Load-bearing premise

That splitting tree construction into two stages via multi-agent collaboration will reduce action space and stabilize convergence in real dynamic traffic without adding prohibitive coordination overhead.

What would settle it

A side-by-side run of MA-DHRL-OM versus a flat multi-agent baseline on a live network trace with sudden traffic spikes, measuring whether the hierarchical version actually converges faster and incurs lower total overhead.

Figures

Figures reproduced from arXiv: 2602.13211 by Cheng Zhu, Feng Ding, Gai Huang, Miao Ye, Qiuxiang Jiang, Yanye Chen, Yong Wang.

**Figure 1.** Figure 1: Model architecture. 4.2. Control plane The control plane occupies a central position within the SDN architecture, responsible for managing the global network state, defining forwarding policies, and dynamically scheduling resources. The controller interacts with data plane switches via the southbound OpenFlow protocol to discover network topology, collect network state information, and construct a global n… view at source ↗

**Figure 2.** Figure 2: State space representation of the upper-layer reinforcement learning algorithm. The source-destination distance matrix is a diagonal matrix, where the values on the diagonal represent the normalized shortest hop count distance from the OM source node to each destination node in the current underlying network topology. The node selection matrix is a diagonal matrix. The values on the diagonal indicate wheth… view at source ↗

**Figure 3.** Figure 3: State space representation of the lower-layer reinforcement learning algorithm. TM is constructed by normalizing network state information collected from the data plane. It comprises the normalized residual bandwidth matrix ( ), the normalized link latency matrix ( ), and the normalized packet loss rate matrix ( ). All three matrices are adjacency symmetric matrices of size , where is the number of nodes i… view at source ↗

**Figure 4.** Figure 4: Traffic generation using Iperf. To simulate realistic network traffic, the Iperf traffic generation tool [43] is used to transmit UDP (User Datagram Protocol) packets between nodes. Following the methodology in [30], we emulate the 24- hour variation of network traffic over a full day, as shown in [PITH_FULL_IMAGE:figures/full_fig_p023_4.png] view at source ↗

**Figure 5.** Figure 5: Three network topologies tested in the experiments: (a) 10NodeNet; (b) 14NodeNet; and (c) 21NodeNet. Following the methodology in [45], three network topologies are adopted for performance evaluation, consisting of 10, 14, and 21 nodes, respectively, and named 10NodeNet, 14NodeNet, and 21NodeNet. The corresponding topologies are shown in Figure 5a–c. The link parameters in each topology are randomly genera… view at source ↗

**Figure 6.** Figure 6: Critic learning rate comparison. Experimental results indicate that increasing the learning rate to accelerates reward curve convergence. At , the reward curve converges but at a slower pace. Raising the learning rate to further speeds convergence. At , the reward curve converges most rapidly compared to other parameter settings. 0.00005 actor_lr = 0.00001 0.00005 0.0001 [PITH_FULL_IMAGE:figures/full_fig_… view at source ↗

**Figure 7.** Figure 7: Actor learning rate comparison [PITH_FULL_IMAGE:figures/full_fig_p027_7.png] view at source ↗

**Figure 8.** Figure 8: Batch size comparison. Now, with all learning rates fixed, we adjust the batch size parameter for comparison. The experimental results are shown in [PITH_FULL_IMAGE:figures/full_fig_p027_8.png] view at source ↗

**Figure 9.** Figure 9: Bottleneck bandwidth experimental results for three network topologies: (a) 10NodeNet; (b) 14NodeNet; and (c) 21NodeNet. (a) (b) (c) [PITH_FULL_IMAGE:figures/full_fig_p029_9.png] view at source ↗

**Figure 10.** Figure 10: Average total link delay experimental results for three network topologies: (a) 10NodeNet; (b) 14NodeNet; and (c) 21NodeNet. As shown in Figure 10a–c and [PITH_FULL_IMAGE:figures/full_fig_p029_10.png] view at source ↗

**Figure 11.** Figure 11: Average packet loss rate experimental results for three network topologies: (a) 10NodeNet; (b) 14NodeNet; and (c) 21NodeNet [PITH_FULL_IMAGE:figures/full_fig_p030_11.png] view at source ↗

read the original abstract

Compared with IP multicast, Overlay Multicast (OM) offers better compatibility and flexible deployment in heterogeneous, cross-domain networks. However, traditional OM struggles to adapt to dynamic traffic due to unawareness of physical resource states, and existing reinforcement learning methods fail to decouple OM's tightly coupled multi-objective nature, leading to high complexity, slow convergence, and instability. To address this, we propose MA-DHRL-OM, a multi-agent deep hierarchical reinforcement learning approach. Using SDN's global view, it builds a traffic-aware model for OM path planning. The method decomposes OM tree construction into two stages via hierarchical agents, reducing action space and improving convergence stability. Multi-agent collaboration balances multi-objective optimization while enhancing scalability and adaptability. Experiments show MA-DHRL-OM outperforms existing methods in delay, bandwidth utilization, and packet loss, with more stable convergence and flexible routing.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's new angle is a two-stage hierarchical multi-agent RL split for overlay multicast that uses SDN awareness to shrink action space, but the performance claims rest on experiments whose details and overhead checks are missing from the abstract.

read the letter

The main takeaway is that MA-DHRL-OM decomposes overlay multicast tree construction into two hierarchical stages with collaborating agents, aiming to handle dynamic traffic better than flat RL or traditional methods while leveraging SDN's global view. This framing is the clearest novelty: it explicitly targets the multi-objective coupling problem by separating concerns across agent levels and adding collaboration for balance across delay, bandwidth, and loss. The abstract does a clean job laying out why standard OM struggles with adaptation and why existing RL approaches converge slowly, then positions the hierarchy as the fix for action-space size and stability. That part reads as a reasonable engineering step forward for this subfield. The experiments are reported to beat baselines on the usual metrics with more stable convergence, which is the kind of result that could matter for practical deployment in heterogeneous networks. The soft spot is exactly the one the stress-test flags: nothing is shown on inter-agent communication volume, synchronization cost, or wall-clock overhead versus a flat multi-agent baseline. If those costs grow with network size or traffic churn, the claimed stability gain could disappear even if per-agent spaces shrink. The abstract also gives no traffic models, error bars, or baseline descriptions, so it is hard to tell whether the gains are robust or tied to specific hyperparameter choices. This work is aimed at networking researchers who already use RL for routing problems and want to see a concrete hierarchical decomposition tried on multicast. It shows honest engagement with the limitations of prior OM and RL work, so it is worth a serious referee even though the current evidence is preliminary. I would send it to peer review to get the experimental section and overhead measurements properly checked.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes MA-DHRL-OM, a multi-agent deep hierarchical reinforcement learning method for overlay multicast (OM) routing. Leveraging SDN's global view for network situational awareness, it constructs a traffic-aware model and decomposes OM tree construction into two hierarchical stages with multi-agent collaboration. This is claimed to reduce action space, balance multi-objective optimization (delay, bandwidth utilization, packet loss), improve convergence stability, and enhance scalability and adaptability in dynamic traffic. Experiments are reported to demonstrate outperformance over existing methods in the key metrics along with more stable convergence and flexible routing.

Significance. If the performance claims and the assumption that hierarchical decomposition reliably reduces action space without offsetting coordination overhead hold under detailed validation, the work could advance adaptive OM routing in heterogeneous, cross-domain networks by addressing limitations of traditional methods and flat RL approaches. The integration of SDN situational awareness with hierarchical MARL offers a structured way to handle multi-objective trade-offs, which may prove useful for scalable deployment in volatile traffic environments.

major comments (2)

[Abstract / Experiments] Abstract and Experiments section: The central claims of outperformance in delay, bandwidth utilization, packet loss, and convergence stability rest on experiments whose setup (traffic models, network topologies, baseline algorithms, hyperparameter details, and statistical reporting such as error bars or number of runs) is unspecified. Without these, it is impossible to assess whether gains derive from the hierarchical structure itself or from other factors, undermining the load-bearing assertion that the method improves adaptability.
[Method] Method / Hierarchical agent description: The claim that decomposing OM tree construction into two stages via multi-agent collaboration reduces action space and yields stable convergence without new coordination overhead is load-bearing for the scalability argument. No equations or protocol details are given for inter-agent messages, synchronization, or reward structures between upper- and lower-level agents, and no metrics (e.g., message volume or wall-clock overhead versus flat MARL baselines) are reported to test this assumption in dynamic traffic.

minor comments (1)

[Throughout] Ensure all acronyms (OM, SDN, MA-DHRL-OM, etc.) are defined at first use and used consistently.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments on our manuscript. We address each major comment below and have prepared revisions to strengthen the presentation of our experimental setup and methodological details.

read point-by-point responses

Referee: [Abstract / Experiments] Abstract and Experiments section: The central claims of outperformance in delay, bandwidth utilization, packet loss, and convergence stability rest on experiments whose setup (traffic models, network topologies, baseline algorithms, hyperparameter details, and statistical reporting such as error bars or number of runs) is unspecified. Without these, it is impossible to assess whether gains derive from the hierarchical structure itself or from other factors, undermining the load-bearing assertion that the method improves adaptability.

Authors: We agree that the experimental setup details were insufficiently specified. In the revised manuscript, we will expand the Experiments section with full descriptions of the traffic models (including arrival processes and parameters), network topologies (e.g., generation method and sizes), all baseline algorithms with their configurations, complete hyperparameter values for the RL components, and statistical reporting (means and standard deviations from 10 independent runs, with error bars on all performance figures). These additions will allow readers to evaluate the source of the reported improvements. revision: yes
Referee: [Method] Method / Hierarchical agent description: The claim that decomposing OM tree construction into two stages via multi-agent collaboration reduces action space and yields stable convergence without new coordination overhead is load-bearing for the scalability argument. No equations or protocol details are given for inter-agent messages, synchronization, or reward structures between upper- and lower-level agents, and no metrics (e.g., message volume or wall-clock overhead versus flat MARL baselines) are reported to test this assumption in dynamic traffic.

Authors: We acknowledge the need for greater rigor in describing the hierarchical interactions. The revised Method section will include the explicit equations for the upper- and lower-level policies, the per-level reward functions, and the inter-agent communication protocol (message formats, synchronization, and coordination). We will also add new results quantifying coordination overhead (message volume and wall-clock time) relative to flat MARL baselines under dynamic traffic to support the scalability claims. revision: yes

Circularity Check

0 steps flagged

No significant circularity in MA-DHRL-OM derivation

full rationale

The paper proposes MA-DHRL-OM as a novel algorithmic method that uses SDN global view to build a traffic-aware model and decomposes OM tree construction into two hierarchical stages via multi-agent collaboration. This decomposition is presented as a design choice to reduce action space, with performance validated through experiments on delay, bandwidth, and packet loss. No equations, fitted parameters renamed as predictions, self-definitional steps, or load-bearing self-citations appear in the abstract or description. The central claims rest on the proposed algorithm and external experimental comparisons rather than reducing to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach rests on standard RL assumptions plus domain claims about SDN visibility and hierarchical decomposition benefits; no explicit free parameters or invented entities are named in the abstract.

axioms (2)

domain assumption SDN provides accurate global view of physical resource states for traffic-aware OM planning
Invoked in the abstract to justify the traffic-aware model
domain assumption Hierarchical decomposition reduces action space and improves convergence stability for multi-objective OM
Central premise for the two-stage agent design

pith-pipeline@v0.9.0 · 5465 in / 1232 out tokens · 32486 ms · 2026-05-16T13:10:00.256089+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · 1 internal anchor

[1]

Marques, H

H. Marques, H. Silva, E. Logota, J. Rodriguez, S. Vahid, R. Tafazolli, Multiview real-time media distribution for next generation networks, Comput. Networks, 118 (2017), 96–124. https://doi.org/10.1016/j.comnet.2017.03.002

work page doi:10.1016/j.comnet.2017.03.002 2017
[2]

M. L. Hu, M. Xiao, Y . Hu, C. Cai, T. P. Deng, K. Peng, Software defined multicast using segment routing in LEO satellite networks, IEEE Trans. Mob. Comput., 23 (2024), 835–849. https://doi.org/10.1109/TMC.2022.3215976

work page doi:10.1109/tmc.2022.3215976 2024
[3]

Y . H. Chu, S. G. Rao, S. Seshan, H. Zhang, A case for end system multicast, IEEE J. Sel. Areas Commun., 20 (2002), 1456–1471. https://doi.org/10.1109/JSAC.2002.803066

work page doi:10.1109/jsac.2002.803066 2002
[4]

Hosseini, D

M. Hosseini, D. T. Ahmed, S. Shirmohammadi, N. D. Georganas, A survey of application-layer multicast protocols, IEEE Commun. Surv. Tutorials , 9 (2007), 58–74. https://doi.org/10.1109/COMST.2007.4317616

work page doi:10.1109/comst.2007.4317616 2007
[5]

T. Ruso, C. Chellappan, P. Sivasankar, Ppssm: push/pull smooth video streaming multicast protocol design and implementation for an overlay network, Multimedia Tools Appl., 75 (2016), 17097–17119. https://doi.org/10.1007/s11042-015-2979-5

work page doi:10.1007/s11042-015-2979-5 2016
[6]

Sampaio, P

A. Sampaio, P. Sousa, An adaptable and ISP-friendly multicast overlay network, Peer-to-Peer Networking Appl., 12 (2019), 809–829. https://doi.org/10.1007/s12083-018-0680-y

work page doi:10.1007/s12083-018-0680-y 2019
[7]

Y . Zhu, B. Li, J. Guo, Multicast with network coding in application-layer overlay networks, IEEE J. Sel. Areas Commun., 22 (2004), 107–120. https://doi.org/10.1109/JSAC.2003.818801

work page doi:10.1109/jsac.2003.818801 2004
[8]

Zhang, L

J. Zhang, L. Liu, L. Ramaswamy, C. Pu, Peercast: churn-resilient end system multicast on heterogeneous overlay networks, J. Network Comput. Appl., 31 (2008), 821–850. https://doi.org/10.1016/j.jnca.2007.05.001

work page doi:10.1016/j.jnca.2007.05.001 2008
[9]

J. Su, J. Cao, B. Zhang, A survey of the research on ALM stability enhancement, Chin. J. Comput., 32 (2009), 576–590

work page 2009
[10]

X. C. Zhang, Z. Wang, W. M. Luo, B. P. Yan, Topology-aware application layer multicast scheme, J. Software, 21 (2010), 2010–2022. https://doi.org/10.3724/SP.J.1001.2010.03594

work page internal anchor Pith review Pith/arXiv arXiv doi:10.3724/sp.j.1001.2010.03594 2010
[11]

Zhang, X

Y . Zhang, X. Nie, J. Jiang, W. Wang, K. Xu, Y . Zhao, et al., BDS+: an inter-datacenter data replication system with dynamic bandwidth separation, IEEE/ACM Trans. Networking, 29 (2021), 918–934. https://doi.org/10.1109/TNET.2021.3054924

work page doi:10.1109/tnet.2021.3054924 2021
[12]

C. Kim, Y . Kim, J. H. Yang, I. Yeom, Analysis of bandwidth efficiency in overlay multicasting, Comput. Networks, 52 (2008), 384–398. https://doi.org/10.1016/j.comnet.2007.09.020

work page doi:10.1016/j.comnet.2007.09.020 2008
[13]

H. C. Lin, H. M. Yang, An approximation algorithm for constructing degree-dependent node- weighted multicast trees, IEEE Trans. Parallel Distrib. Syst., 25 (2014), 1976–1985. https://doi.org/10.1109/TPDS.2013.108

work page doi:10.1109/tpds.2013.108 2014
[14]

Ruckert, J

J. Ruckert, J. Blendin, R. Hark, D. Hausheer, Flexible, efficient, and scalable software-defined over- the-top multicast for ISP environments with DynSdm, IEEE Trans. Network Serv. Manage., 13 (2016), 754–767. https://doi.org/10.1109/TNSM.2016.2607281 33

work page doi:10.1109/tnsm.2016.2607281 2016
[15]

Coras, J

F. Coras, J. Domingo-Pascual, F. Maino, D. Farinacci, A. Cabellos-Aparicio, Lcast: software-defined inter-domain multicast, Comput. Networks , 59 (2014), 153–170. https://doi.org/10.1016/j.bjp.2013.10.010

work page doi:10.1016/j.bjp.2013.10.010 2014
[16]

Zhong, F

H. Zhong, F. Wu, Y . Xu, J. Cui, QoS-aware multicast for scalable video streaming in software-defined networks, IEEE Trans. Multimedia, 23 (2021), 982–994. https://doi.org/10.1109/TMM.2020.2991539

work page doi:10.1109/tmm.2020.2991539 2021
[17]

Y . Gong, W. Huang, W. Wang, Y . Lei, A survey on software defined networking and its applications, Front. Comput. Sci., 9 (2015), 827–845. https://doi.org/10.1007/s11704-015-3448-z

work page doi:10.1007/s11704-015-3448-z 2015
[18]

H. W. Da Silva, F. R. Barbalho, A. V . Neto, Cross-layer multiuser session control for optimized communications on SDN-based cloud platforms, Future Gener . Comput. Syst., 92 (2019), 1116–1130. https://doi.org/10.1016/j.future.2017.11.016

work page doi:10.1016/j.future.2017.11.016 2019
[19]

Y . Shi, J. Wong, H. A. Jacobsen, Y . Zhang, J. Chen, Topic-oriented bucket-based fast multicast routing in SDN-like publish/subscribe middleware, IEEE Access, 8 (2020), 89741–89756. https://doi.org/10.1109/ACCESS.2020.2994268

work page doi:10.1109/access.2020.2994268 2020
[20]

Cao, A minimum delay spanning tree algorithm for the application-layer multicast, J

J. Cao, A minimum delay spanning tree algorithm for the application-layer multicast, J. Software, 16 (2005), 1766-1773. https://doi.org/10.1360/jos161766

work page doi:10.1360/jos161766 2005
[21]

Y . Zhu, B. Li, K. Q. Pu, Dynamic multicast in overlay networks with linear capacity constraints, IEEE Trans. Parallel Distrib. Syst., 20 (2009), 925–939. https://doi.org/10.1109/tpds.2008.155

work page doi:10.1109/tpds.2008.155 2009
[22]

Q. Liu, R. Tang, H. Ren, Y . Pei, Optimizing multicast routing tree on application layer via an encoding-free non-dominated sorting genetic algorithm, Appl. Intell., 50 (2020), 759–777. https://doi.org/10.1007/s10489-019-01547-9

work page doi:10.1007/s10489-019-01547-9 2020
[23]

S. Y . Tseng, C. C. Lin, Y . M. Huang, Ant colony-based algorithm for constructing broadcasting tree with degree and delay constraints, Expert Syst. Appl., 35 (2008), 1473–1481. https://doi.org/10.1016/j.eswa.2007.08.018

work page doi:10.1016/j.eswa.2007.08.018 2008
[24]

X. Wang, S. Wang, X. Liang, D. Zhao, J. Huang, X. Xu, et al., Deep reinforcement learning: a survey, IEEE Trans. Neural Networks Learn. Syst. , 35 (2024), 5064–5078. https://doi.org/10.1109/TNNLS.2022.3207346

work page doi:10.1109/tnnls.2022.3207346 2024
[25]

F. Zhao, F. Yin, L. Wang, Y . Y u, A co-evolution algorithm with dueling reinforcement learning mechanism for the energy-aware distributed heterogeneous flexible flow-shop scheduling problem, IEEE Trans. Syst. Man Cybern. Syst., 55 (2025), 1794–1809. https://doi.org/10.1109/TSMC.2024.3510384

work page doi:10.1109/tsmc.2024.3510384 2025
[26]

Z. Pan, D. Lei, L. Wang, A knowledge-based two-population optimization algorithm for distributed energy-efficient parallel machines scheduling, IEEE Trans. Cybern., 52 (2022), 5051–5063. https://doi.org/10.1109/TCYB.2020.3026571

work page doi:10.1109/tcyb.2020.3026571 2022
[27]

H. Wang, B. R. Sarker, J. Li, J. Li, Adaptive scheduling for assembly job shop with uncertain assembly times based on dual Q-learning, Int. J. Prod. Res., 59 (2021), 5867–5883. https://doi.org/10.1080/00207543.2020.1794075

work page doi:10.1080/00207543.2020.1794075 2021
[28]

X. Li, J. Tian, C. Wang, Y . Jiang, X. Wang, J. Wang, Multi-objective multicast optimization with deep reinforcement learning, Cluster Comput., 28 (2025), 222. https://doi.org/10.1007/s10586-024-04906-5

work page doi:10.1007/s10586-024-04906-5 2025
[29]

X. Li, Y . Wang, TABDeep: a two-level action branch architecture-based deep reinforcement learning for distributed sub-tree scheduling of online multicast sessions in EON, Comput. Networks, 243 (2024), 110288. https://doi.org/10.1016/j.comnet.2024.110288

work page doi:10.1016/j.comnet.2024.110288 2024
[30]

M. Ye, C. Zhao, P. Wen, Y . Wang, X. Wang, H. Qiu, DHRL-FNMR: an intelligent multicast routing approach based on deep hierarchical reinforcement learning in SDN, IEEE Trans. Network Serv. Manage., 21 (2024), 5733–5755. https://doi.org/10.1109/TNSM.2024.3402275 34

work page doi:10.1109/tnsm.2024.3402275 2024
[31]

Y . Li, Q. Zhang, H. Yao, R. Gao, X. Xin, F. R. Yu, Stigmergy and hierarchical learning for routing optimization in multi-domain collaborative satellite networks, IEEE J. Sel. Areas Commun., 42 (2024), 1188–1203. https://doi.org/10.1109/JSAC.2024.3365878

work page doi:10.1109/jsac.2024.3365878 2024
[32]

K. Hu, M. Li, Z. Song, K. Xu, Q. Xia, N. Sun, et al., A review of research on reinforcement learning algorithms for multi-agents, Neurocomputing, 599 (2024), 128068. https://doi.org/10.1016/j.neucom.2024.128068

work page doi:10.1016/j.neucom.2024.128068 2024
[33]

P. Wen, M. Ye, Y . Wang, Q. He, H. Qiu, A multi-agent graph reinforcement learning method for many- to-many communication routing in SDWN, Acta Electron. Sin., 53 (2025), 1885–1905

work page 2025
[34]

J. H. Wang, J. Cai, J. Lu, K. Yin, J. Yang, Solving multicast problem in cloud networks using overlay routing, Comput. Commun., 70 (2015), 1–14. https://doi.org/10.1016/j.comcom.2015.05.016

work page doi:10.1016/j.comcom.2015.05.016 2015
[35]

S. Y . Tseng, Y . M. Huang, C. C. Lin, Genetic algorithm for delay- and degree-constrained multimedia broadcasting on overlay networks, Comput. Commun., 29 (2006), 3625–3632. https://doi.org/10.1016/j.comcom.2006.06.003

work page doi:10.1016/j.comcom.2006.06.003 2006
[36]

L. Lin, J. Zhou, L. Zhang, Z. Ye, Overlay multicast routing algorithm with minimum overlay cost, J. Comput. Appl., 10 (2008), 2569–2576. https://doi.org/10.3724/SP.J.1087.2008.02569

work page doi:10.3724/sp.j.1087.2008.02569 2008
[37]

Q. Liu, Y . Wang, X. Li, H. Li, Gene-pool based genetic algorithm for optimizing application layer multicast, Comput. Eng. Appl., 55 (2019), 142–150. https://doi.org/10.3778/j.issn.1002-8331.1903- 0444

work page doi:10.3778/j.issn.1002-8331.1903- 2019
[38]

Y . Li, N. Wang, W. Zhang, Q. Liu, F. Liu, Discrete artificial fish swarm algorithm-based one-off optimization method for multiple co-existing application layer multicast routing trees, Electronics, 13 (2024), 894. https://doi.org/10.3390/electronics13050894

work page doi:10.3390/electronics13050894 2024
[39]

J. Chae, N. Kim, Multicast tree generation using meta reinforcement learning in SDN-based smart network platforms, KSII Trans. Internet Inf. Syst. , 15 (2021), 3138–3150. https://doi.org/10.3837/tiis.2021.09.003

work page doi:10.3837/tiis.2021.09.003 2021
[40]

M. Ye, H. W. Hu, Y . Wang, Q. He, X. L. Wang, P. Wen, et al., MA-CDMR: an intelligent cross domain multicast routing method based on multi-agent deep reinforcement learning in SDWN multi controller domain, Chin. J. Comput. , 48 (2025), 1417–1442. https://doi.org/10.11897/SP.J.1016.2025.01417

work page doi:10.11897/sp.j.1016.2025.01417 2025
[41]

M. Kim, H. Choo, M. W. Mutka, H. J. Lim, K. Park, On QoS multicast routing algorithms using k- minimum Steiner trees, Inf. Sci., 238 (2013), 190–204. https://doi.org/10.1016/j.ins.2013.03.006

work page doi:10.1016/j.ins.2013.03.006 2013
[42]

Available from: https://mininet-wifi.github.io/ (accessed Mar.16, 2023)

Mininet-WIFI. Available from: https://mininet-wifi.github.io/ (accessed Mar.16, 2023)

work page 2023
[43]

Available from: https://iperf.fr (accessed Mar

iPerf. Available from: https://iperf.fr (accessed Mar. 16, 2023)

work page 2023
[44]

Available from: https://ryu-sdn.org/ (accessed Mar

Ryu. Available from: https://ryu-sdn.org/ (accessed Mar. 16, 2023)

work page 2023
[45]

Y . R. Chen, A. Rezapour, W. G. Tzeng, S. C. Tsai, RL-routing: An SDN routing algorithm based on deep reinforcement learning, IEEE Trans. Network Sci. Eng., 7 (2020), 3185–3199. https://doi.org/10.1109/TNSE.2020.3017751

work page doi:10.1109/tnse.2020.3017751 2020