Transformer-Empowered Actor-Critic Reinforcement Learning for Sequence-Aware Service Function Chain Partitioning

Anestis Dalgkitsis; Chrysa Papagianni; Cyril Shih-Huan Hsu; Paola Grosso

arxiv: 2504.18902 · v2 · submitted 2025-04-26 · 💻 cs.NI · cs.AI· cs.LG· cs.NE

Transformer-Empowered Actor-Critic Reinforcement Learning for Sequence-Aware Service Function Chain Partitioning

Cyril Shih-Huan Hsu , Anestis Dalgkitsis , Chrysa Papagianni , Paola Grosso This is my paper

Pith reviewed 2026-05-22 17:54 UTC · model grok-4.3

classification 💻 cs.NI cs.AIcs.LGcs.NE

keywords service function chain partitioningtransformeractor-criticreinforcement learningvirtual network functions6G networksself-attentionnetwork resource management

0 comments

The pith

A Transformer-empowered actor-critic reinforcement learning approach improves sequence-aware partitioning of service function chains.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces a reinforcement learning framework that uses Transformer self-attention to partition service function chains across network domains. The goal is to handle the difficulties posed by varying domain features, quality of service rules, and partial network views that limit other methods. Self-attention helps track how virtual network functions relate to each other in sequence, supporting better joint decisions. Additional strategies for exploration and return normalization aid stable learning. Success here would mean networks can accept more service requests while using resources effectively and scaling to larger setups.

Core claim

The authors claim that integrating a Transformer into an actor-critic reinforcement learning agent for sequence-aware service function chain partitioning allows effective modeling of VNF inter-dependencies via self-attention, leading to superior long-term service acceptance rates, resource utilization, scalability, and fast inference compared to existing solutions, supported by the ε-LoPe exploration and Asymptotic Return Normalization techniques.

What carries the argument

Transformer self-attention mechanism within the actor-critic framework that processes the sequence of VNFs to capture inter-dependencies for partitioning decisions.

If this is right

The framework achieves higher long-term acceptance rates for network services.
It improves overall resource utilization across domains.
The approach scales effectively to complex and large SFCs.
Inference remains fast, suitable for real-time network management.
Training converges more stably with the proposed exploration and normalization methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Applying this to other sequential network tasks like dynamic routing could yield similar gains in handling dependencies.
Reduced reliance on complete network state visibility might enable more decentralized network control.
Extending the model to incorporate real-time QoS feedback could further enhance performance in dynamic environments.
Validation through deployment in testbed networks would confirm the simulation advantages.

Load-bearing premise

Self-attention can model the inter-dependencies among VNFs effectively despite differences in domain characteristics and incomplete network state information.

What would settle it

Demonstrating that a simpler actor-critic model without the Transformer achieves comparable or better acceptance rates and scalability in heterogeneous network simulations would challenge the central claim.

Figures

Figures reproduced from arXiv: 2504.18902 by Anestis Dalgkitsis, Chrysa Papagianni, Cyril Shih-Huan Hsu, Paola Grosso.

**Figure 2.** Figure 2: Architecture and flow of the NFVO with the proposed [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: The SDAC framework. improves the policy [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Post-Norm (left) and Pre-Norm (right) Transformer [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Self-attention Mechanism on VNF tokens [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Acceptance and arrival rate over time [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

**Figure 7.** Figure 7: Average acceptance rate [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

**Figure 8.** Figure 8: Proportion of rejections caused by resource capacity [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗

**Figure 10.** Figure 10: Inference time per request [PITH_FULL_IMAGE:figures/full_fig_p016_10.png] view at source ↗

**Figure 9.** Figure 9: Average reward of DRL-based approaches over time. [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗

**Figure 12.** Figure 12: Perplexity of load distribution over peak hours. [PITH_FULL_IMAGE:figures/full_fig_p017_12.png] view at source ↗

**Figure 11.** Figure 11: Average CPU utilization over peak hours. [PITH_FULL_IMAGE:figures/full_fig_p017_11.png] view at source ↗

**Figure 13.** Figure 13: Effect of various normalization techniques on SDAC. [PITH_FULL_IMAGE:figures/full_fig_p018_13.png] view at source ↗

read the original abstract

In the forthcoming era of 6G networks, characterized by unprecedented data rates, ultra-low latency, and ubiquitous connectivity, effective management of Virtualized Network Functions (VNFs) is essential. VNFs are software-based counterparts of traditional hardware devices that facilitate flexible and scalable service provisioning. Service Function Chains (SFCs), structured as ordered sequences of VNFs, are pivotal in delivering complex network services. Nevertheless, splitting an SFC into multiple segments that are deployed across different network domains or infrastructure locations presents substantial challenges due to the potential heterogeneity of domain characteristic along with quality of service (QoS) constraints and limited visibility of network state. Conventional optimization methods have limited scalability, while existing data-driven approaches struggle to balance efficiency with capturing VNF inter-dependencies in SFCs. To overcome these limitations, we introduce a Transformer-empowered actor-critic framework specifically designed for sequence-aware SFC partitioning. By utilizing the self-attention mechanism, our approach effectively models complex inter-dependencies between VNFs, facilitating coordinated and parallel decision-making processes. Furthermore, to improve training stability and convergence we introduce an $\epsilon$-LoPe exploration strategy as well as Asymptotic Return Normalization. Comprehensive simulation results demonstrate that the proposed methodology outperforms existing state-of-the-art solutions in terms of long-term service acceptance rates, resource utilization, and scalability while achieving fast inference.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adds self-attention to actor-critic RL for SFC partitioning plus two training tweaks, but the abstract gives no numbers or ablations so the gains are hard to judge.

read the letter

The main thing to know is that this work takes a standard actor-critic setup and layers a Transformer on top to handle sequences of virtual network functions when splitting service function chains across domains. They also add an epsilon-LoPe exploration method and asymptotic return normalization to steady the training. These pieces target the practical issues of heterogeneity, QoS limits, and partial network visibility in 6G-style settings. The framing of why self-attention might help with coordinated decisions across the chain is straightforward and fits the sequence-aware nature of the task. The motivation section does a clear job contrasting this against older optimization approaches that do not scale and earlier data-driven methods that miss inter-VNF dependencies. That part reads as honest engineering rather than overclaim. The soft spots sit in the evaluation. The abstract states that simulations show gains in acceptance rates, utilization, and scalability with fast inference, yet it supplies none of the actual numbers, baseline descriptions, or variance measures that would let a reader check the size of the improvement. The stress-test note about missing ablations also lands: without a direct comparison that swaps the Transformer for an RNN or MLP while holding the input mask and other additions fixed, it is not possible to tell whether attention is the driver or whether the new exploration and normalization steps are doing most of the work. If the full manuscript contains those controls and they are reported cleanly, the gap closes; from the given description it remains open. This paper is aimed at the networking crowd that works on NFV and RL-based resource allocation for future wireless systems. Someone already running actor-critic agents on similar partitioning problems could borrow the architecture sketch and the two training tricks. I would send it to peer review so the experiments can be examined in full, but it will need tighter reporting and at least one ablation before it is ready for a wider audience.

Referee Report

2 major / 1 minor

Summary. The manuscript presents a Transformer-empowered actor-critic reinforcement learning framework for sequence-aware Service Function Chain (SFC) partitioning in 6G networks. It uses self-attention to model inter-dependencies among Virtual Network Functions (VNFs) under domain heterogeneity and limited network-state visibility, augmented by an ε-LoPe exploration strategy and Asymptotic Return Normalization for training stability. Comprehensive simulations are claimed to demonstrate outperformance over state-of-the-art methods in long-term service acceptance rates, resource utilization, scalability, and inference speed.

Significance. If the empirical claims hold after addressing isolation of components, the work could advance RL applications in network function virtualization by showing how attention mechanisms handle sequential partitioning decisions with partial observability, potentially offering practical gains in 6G SFC management where conventional optimization scales poorly.

major comments (2)

[Experimental evaluation / results section] The central claim that the Transformer-empowered actor-critic outperforms SOTA rests on the self-attention mechanism successfully modeling VNF inter-dependencies under heterogeneity and limited network-state visibility. The architecture description likely feeds a sequence of VNF features into the encoder, yet the paper does not appear to include an ablation that replaces the Transformer with a simpler RNN or MLP while keeping the same partial-observability input mask. Without that isolation, performance gains could be driven by the RL formulation, the ε-LoPe strategy, or Asymptotic Return Normalization rather than the attention component itself.
[Abstract and results presentation] The abstract asserts comprehensive simulation results demonstrating outperformance, yet supplies no quantitative metrics, baseline details, error bars, or description of data exclusion rules, leaving major gaps that prevent verification that the data supports the central claim of improved acceptance rates and scalability.

minor comments (1)

[Abstract] The abstract could briefly name the specific SOTA baselines used in comparisons to make the outperformance claim more concrete.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback on our manuscript. We address each major comment below and will revise the paper to strengthen the presentation of results and isolate component contributions where feasible.

read point-by-point responses

Referee: [Experimental evaluation / results section] The central claim that the Transformer-empowered actor-critic outperforms SOTA rests on the self-attention mechanism successfully modeling VNF inter-dependencies under heterogeneity and limited network-state visibility. The architecture description likely feeds a sequence of VNF features into the encoder, yet the paper does not appear to include an ablation that replaces the Transformer with a simpler RNN or MLP while keeping the same partial-observability input mask. Without that isolation, performance gains could be driven by the RL formulation, the ε-LoPe strategy, or Asymptotic Return Normalization rather than the attention component itself.

Authors: We agree that an ablation isolating the self-attention mechanism would better substantiate the central claim. The current experiments compare the full framework against external SOTA baselines but do not internally replace the Transformer encoder with RNN or MLP variants under identical partial-observability masking. In the revised manuscript we will add this ablation study in the experimental evaluation section, reporting performance differences while holding the ε-LoPe exploration and Asymptotic Return Normalization fixed. This will clarify the specific contribution of attention to modeling VNF inter-dependencies. revision: yes
Referee: [Abstract and results presentation] The abstract asserts comprehensive simulation results demonstrating outperformance, yet supplies no quantitative metrics, baseline details, error bars, or description of data exclusion rules, leaving major gaps that prevent verification that the data supports the central claim of improved acceptance rates and scalability.

Authors: We acknowledge that the abstract currently provides only qualitative statements. The full results section already contains quantitative comparisons, error bars, and baseline descriptions, but these are not summarized in the abstract. We will revise the abstract to include specific metrics (e.g., relative gains in long-term acceptance rate and resource utilization), name the primary baselines, reference the use of error bars for statistical reporting, and briefly note data handling procedures. This will make the abstract self-contained while remaining within length limits. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation extends standard RL and Transformer components independently

full rationale

The paper describes a Transformer-empowered actor-critic framework for SFC partitioning, motivated by the need to model VNF inter-dependencies under heterogeneity and partial observability. It adds an ε-LoPe exploration strategy and Asymptotic Return Normalization explicitly for training stability and convergence. These are presented as independent engineering choices rather than quantities fitted to or defined by the target performance metrics. No equations or claims reduce a prediction to a fitted input by construction, no self-citation is invoked as a uniqueness theorem, and no ansatz is smuggled via prior work. The central results are empirical comparisons on simulated acceptance rates and resource utilization, which remain falsifiable against external benchmarks. The derivation chain is therefore self-contained against the listed circularity patterns.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

Review based on abstract only; full paper may contain additional fitted hyperparameters such as learning rates or attention head counts not visible here.

free parameters (2)

epsilon schedule in LoPe exploration
Hyperparameter controlling the shift from random to guided exploration during training; introduced to improve stability.
normalization constants in Asymptotic Return Normalization
Scaling factors chosen to keep returns bounded; directly affect convergence behavior.

axioms (1)

domain assumption Self-attention mechanism captures VNF inter-dependencies under partial network visibility
Invoked as the core justification for employing Transformer layers in the actor and critic networks.

pith-pipeline@v0.9.0 · 5794 in / 1340 out tokens · 52472 ms · 2026-05-22T17:54:00.894686+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

By utilizing the self-attention mechanism, our approach effectively models complex inter-dependencies between VNFs, facilitating coordinated and parallel decision-making processes.
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat_equivNat unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We introduce an innovative critic network, inspired by sentence encoders in Large Language Models (LLMs), that evaluates the entire sequence of VNF decisions holistically.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages · 3 internal anchors

[1]

Service function chain orchestra- tion across multiple domains: A full mesh aggregation approach,

G. Sun, Y . Li, D. Liao, and V . Chang, “Service function chain orchestra- tion across multiple domains: A full mesh aggregation approach,” IEEE Transactions on Network and Service Management , vol. 15, no. 3, pp. 1175–1191, 2018

work page 2018
[2]

A survey on service function chaining,

D. Bhamare, R. Jain, M. Samaka, and A. Erbad, “A survey on service function chaining,” Journal of Network and Computer Applications , vol. 75, pp. 138–155, 2016

work page 2016
[3]

Deep learning,

Y . LeCun, Y . Bengio, and G. Hinton, “Deep learning,”Nature, vol. 521, no. 7553, pp. 436–444, 2015

work page 2015
[4]

Human-level control through deep reinforcement learning,

V . Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al. , “Human-level control through deep reinforcement learning,” nature, vol. 518, no. 7540, pp. 529–533, 2015

work page 2015
[5]

Mastering the game of Go without human knowledge,

D. Silver et al., “Mastering the game of Go without human knowledge,” Nature, vol. 550, pp. 354–359, 10 2017

work page 2017
[6]

Attention is all you need,

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems , vol. 30, 2017, pp. 5998–6008

work page 2017
[7]

Single and multi-domain adaptive allocation algorithms for VNF for- warding graph embedding,

P. T. A. Quang, A. Bradai, K. D. Singh, G. Picard, and R. Riggio, “Single and multi-domain adaptive allocation algorithms for VNF for- warding graph embedding,” IEEE Transactions on Network and Service Management, vol. 16, no. 1, pp. 98–112, 2019

work page 2019
[8]

Network service chaining with optimized network function embedding supporting service decompositions,

S. Sahhaf, W. Tavernier, M. Rost, S. Schmid, D. Colle, M. Pickavet, and P. Demeester, “Network service chaining with optimized network function embedding supporting service decompositions,” Computer Net- works, vol. 93, pp. 492–505, 2015, cloud Networking and Communica- tions II

work page 2015
[9]

Multi-provider service chain embedding with Nestor,

D. Dietrich, A. Abujoda, A. Rizk, and P. Papadimitriou, “Multi-provider service chain embedding with Nestor,” IEEE Transactions on Network and Service Management , vol. 14, no. 1, pp. 91–105, 2017

work page 2017
[10]

A graph partitioning game theoretical approach for the VNF service chaining problem,

A. Leivadeas, G. Kesidis, M. Falkner, and I. Lambadaris, “A graph partitioning game theoretical approach for the VNF service chaining problem,” IEEE Transactions on Network and Service Management , vol. 14, no. 4, pp. 890–903, 2017

work page 2017
[11]

A reinforcement learning-based approach for availability-aware service function chain placement in large-scale networks,

G. L. Santos, P. T. Endo, T. G. Lynn, D. H. J. Sadok, and J. Kelner, “A reinforcement learning-based approach for availability-aware service function chain placement in large-scale networks,” 2022

work page 2022
[12]

Deep multi-agent reinforcement learning with minimal cross-agent communication for sfc partitioning,

A. Pentelas, D. De Vleeschauwer, C.-Y . Chang, K. De Schepper, and P. Papadimitriou, “Deep multi-agent reinforcement learning with minimal cross-agent communication for sfc partitioning,” IEEE Access, vol. 11, pp. 40 384–40 398, 2023

work page 2023
[13]

Graph neural network based service function chaining for automatic network control,

D. Heo, S. Lange, H.-G. Kim, and H. Choi, “Graph neural network based service function chaining for automatic network control,” in 2020 21st Asia-Pacific Network Operations and Management Symposium (APNOMS), 2020, pp. 7–12

work page 2020
[14]

Transformers in vision: A survey,

S. Khan, M. Naseer, M. Hayat, S. W. Zamir, F. S. Khan, and M. Shah, “Transformers in vision: A survey,” ACM computing surveys (CSUR) , vol. 54, no. 10s, pp. 1–41, 2022

work page 2022
[15]

RT-1: Robotics Transformer for Real-World Control at Scale

A. Brohan, N. Brown, J. Carbajal, Y . Chebotar, J. Dabis, C. Finn, K. Gopalakrishnan, K. Hausman, A. Herzog, J. Hsu, et al. , “Rt-1: Robotics transformer for real-world control at scale,” arXiv preprint arXiv:2212.06817, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[16]

A comprehensive survey on applications of transformers for deep learning tasks,

S. Islam, H. Elmekki, A. Elsebai, J. Bentahar, N. Drawel, G. Rjoub, and W. Pedrycz, “A comprehensive survey on applications of transformers for deep learning tasks,” Expert Systems with Applications , vol. 241, p. 122666, 2024. 19

work page 2024
[17]

Transformer-empowered 6G intelligent networks: From massive mimo processing to semantic communication,

Y . Wang, Z. Gao, D. Zheng, S. Chen, D. G ¨und¨uz, and H. V . Poor, “Transformer-empowered 6G intelligent networks: From massive mimo processing to semantic communication,” IEEE Wireless Communica- tions, vol. 30, no. 6, pp. 127–135, 2022

work page 2022
[18]

Signal modulation classification based on the transformer network,

J. Cai, F. Gan, X. Cao, and W. Liu, “Signal modulation classification based on the transformer network,” IEEE Transactions on Cognitive Communications and Networking , vol. 8, no. 3, pp. 1348–1357, 2022

work page 2022
[19]

Toward qos prediction based on temporal transformers for iot applications,

A. Hameed, J. Violos, A. Leivadeas, N. Santi, R. Gr ¨unblatt, and N. Mit- ton, “Toward qos prediction based on temporal transformers for iot applications,” IEEE Transactions on Network and Service Management , vol. 19, no. 4, pp. 4010–4027, 2022

work page 2022
[20]

Advancing ultra-reliable 6g: Transformer and semantic localization empowered robust beamforming in millimeter-wave communications,

A. D. Raha, K. Kim, A. Adhikary, M. Gain, Z. Han, and C. S. Hong, “Advancing ultra-reliable 6g: Transformer and semantic localization empowered robust beamforming in millimeter-wave communications,” arXiv preprint arXiv:2406.02000 , 2024

work page arXiv 2024
[21]

On the hardness and inapproximability of virtual network embeddings,

M. Rost and S. Schmid, “On the hardness and inapproximability of virtual network embeddings,” IEEE/ACM Transactions on Networking , vol. 28, no. 2, pp. 791–803, 2020

work page 2020
[22]

S. V . Albrecht, F. Christianos, and L. Sch¨afer, Multi-Agent Reinforcement Learning: Foundations and Modern Approaches . MIT Press, 2024

work page 2024
[23]

On layer normalization in the transformer architecture,

R. Xiong, Y . Yang, D. He, K. Zheng, S. Zheng, C. Xing, H. Zhang, Y . Lan, L. Wang, and T. Liu, “On layer normalization in the transformer architecture,” in International conference on machine learning. PMLR, 2020, pp. 10 524–10 533

work page 2020
[24]

Universal Sentence Encoder

D. Cer, Y . Yang, S.-y. Kong, N. Hua, N. Limtiaco, R. S. John, N. Constant, M. Guajardo-Cespedes, S. Yuan, C. Tar, et al., “Universal sentence encoder,” arXiv preprint arXiv:1803.11175 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[25]

Categorical reparameterization with gumbel-softmax,

E. Jang, S. Gu, and B. Poole, “Categorical reparameterization with gumbel-softmax,” in International Conference on Learning Represen- tations, 2017

work page 2017
[26]

Playing Atari with Deep Reinforcement Learning

V . Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wier- stra, and M. Riedmiller, “Playing atari with deep reinforcement learn- ing,” arXiv preprint arXiv:1312.5602 , 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[27]

Language mod- els are few-shot learners,

T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al., “Language mod- els are few-shot learners,” Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020

work page 1901
[28]

How to train your ViT? data, augmentation, and regularization in vision transformers,

A. P. Steiner, A. Kolesnikov, X. Zhai, R. Wightman, J. Uszkoreit, and L. Beyer, “How to train your ViT? data, augmentation, and regularization in vision transformers,” Transactions on Machine Learning Research , 2022

work page 2022
[29]

RAILS: Risk-aware iterated local search for joint SLA decomposition and service provider man- agement in multi-domain networks,

C. S.-H. Hsu, C. Papagianni, and P. Grosso, “RAILS: Risk-aware iterated local search for joint SLA decomposition and service provider man- agement in multi-domain networks,” in 2025 IEEE 26th International Conference on High Performance Switching and Routing (HPSR), 2025, to be published

work page 2025
[30]

Efficient resource mapping framework over networked clouds via iterated local search- based request partitioning,

A. Leivadeas, C. Papagianni, and S. Papavassiliou, “Efficient resource mapping framework over networked clouds via iterated local search- based request partitioning,” IEEE Transactions on Parallel and Dis- tributed Systems, vol. 24, no. 6, pp. 1077–1086, 2013

work page 2013
[31]

H. R. Lourenc ¸o, O. C. Martin, and T. St ¨utzle, Iterated Local Search . Boston, MA: Springer US, 2003, pp. 320–353

work page 2003
[32]

Independent rein- forcement learners in cooperative markov games: a survey regarding coordination problems,

L. Matignon, G. J. Laurent, and N. Le Fort-Piat, “Independent rein- forcement learners in cooperative markov games: a survey regarding coordination problems,” The Knowledge Engineering Review , vol. 27, no. 1, pp. 1–31, 2012

work page 2012
[33]

Bridging nonlinearities and stochastic regularizers with gaussian error linear units,

D. Hendrycks and K. Gimpel, “Bridging nonlinearities and stochastic regularizers with gaussian error linear units,” 2017

work page 2017
[34]

Identity mappings in deep residual networks,

K. He, X. Zhang, S. Ren, and J. Sun, “Identity mappings in deep residual networks,” in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14. Springer, 2016, pp. 630–645

work page 2016
[35]

Decoupled weight decay regularization,

I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” in International Conference on Learning Representations , 2019. Cyril Shih-Huan Hsu is a Ph.D. candidate at the In- formatics Institute, University of Amsterdam (UvA), The Netherlands, where he has been pursuing his degree since 2021. He earned his B.Sc. and M.Sc. degrees from National ...

work page 2019

[1] [1]

Service function chain orchestra- tion across multiple domains: A full mesh aggregation approach,

G. Sun, Y . Li, D. Liao, and V . Chang, “Service function chain orchestra- tion across multiple domains: A full mesh aggregation approach,” IEEE Transactions on Network and Service Management , vol. 15, no. 3, pp. 1175–1191, 2018

work page 2018

[2] [2]

A survey on service function chaining,

D. Bhamare, R. Jain, M. Samaka, and A. Erbad, “A survey on service function chaining,” Journal of Network and Computer Applications , vol. 75, pp. 138–155, 2016

work page 2016

[3] [3]

Deep learning,

Y . LeCun, Y . Bengio, and G. Hinton, “Deep learning,”Nature, vol. 521, no. 7553, pp. 436–444, 2015

work page 2015

[4] [4]

Human-level control through deep reinforcement learning,

V . Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al. , “Human-level control through deep reinforcement learning,” nature, vol. 518, no. 7540, pp. 529–533, 2015

work page 2015

[5] [5]

Mastering the game of Go without human knowledge,

D. Silver et al., “Mastering the game of Go without human knowledge,” Nature, vol. 550, pp. 354–359, 10 2017

work page 2017

[6] [6]

Attention is all you need,

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems , vol. 30, 2017, pp. 5998–6008

work page 2017

[7] [7]

Single and multi-domain adaptive allocation algorithms for VNF for- warding graph embedding,

P. T. A. Quang, A. Bradai, K. D. Singh, G. Picard, and R. Riggio, “Single and multi-domain adaptive allocation algorithms for VNF for- warding graph embedding,” IEEE Transactions on Network and Service Management, vol. 16, no. 1, pp. 98–112, 2019

work page 2019

[8] [8]

Network service chaining with optimized network function embedding supporting service decompositions,

S. Sahhaf, W. Tavernier, M. Rost, S. Schmid, D. Colle, M. Pickavet, and P. Demeester, “Network service chaining with optimized network function embedding supporting service decompositions,” Computer Net- works, vol. 93, pp. 492–505, 2015, cloud Networking and Communica- tions II

work page 2015

[9] [9]

Multi-provider service chain embedding with Nestor,

D. Dietrich, A. Abujoda, A. Rizk, and P. Papadimitriou, “Multi-provider service chain embedding with Nestor,” IEEE Transactions on Network and Service Management , vol. 14, no. 1, pp. 91–105, 2017

work page 2017

[10] [10]

A graph partitioning game theoretical approach for the VNF service chaining problem,

A. Leivadeas, G. Kesidis, M. Falkner, and I. Lambadaris, “A graph partitioning game theoretical approach for the VNF service chaining problem,” IEEE Transactions on Network and Service Management , vol. 14, no. 4, pp. 890–903, 2017

work page 2017

[11] [11]

A reinforcement learning-based approach for availability-aware service function chain placement in large-scale networks,

G. L. Santos, P. T. Endo, T. G. Lynn, D. H. J. Sadok, and J. Kelner, “A reinforcement learning-based approach for availability-aware service function chain placement in large-scale networks,” 2022

work page 2022

[12] [12]

Deep multi-agent reinforcement learning with minimal cross-agent communication for sfc partitioning,

A. Pentelas, D. De Vleeschauwer, C.-Y . Chang, K. De Schepper, and P. Papadimitriou, “Deep multi-agent reinforcement learning with minimal cross-agent communication for sfc partitioning,” IEEE Access, vol. 11, pp. 40 384–40 398, 2023

work page 2023

[13] [13]

Graph neural network based service function chaining for automatic network control,

D. Heo, S. Lange, H.-G. Kim, and H. Choi, “Graph neural network based service function chaining for automatic network control,” in 2020 21st Asia-Pacific Network Operations and Management Symposium (APNOMS), 2020, pp. 7–12

work page 2020

[14] [14]

Transformers in vision: A survey,

S. Khan, M. Naseer, M. Hayat, S. W. Zamir, F. S. Khan, and M. Shah, “Transformers in vision: A survey,” ACM computing surveys (CSUR) , vol. 54, no. 10s, pp. 1–41, 2022

work page 2022

[15] [15]

RT-1: Robotics Transformer for Real-World Control at Scale

A. Brohan, N. Brown, J. Carbajal, Y . Chebotar, J. Dabis, C. Finn, K. Gopalakrishnan, K. Hausman, A. Herzog, J. Hsu, et al. , “Rt-1: Robotics transformer for real-world control at scale,” arXiv preprint arXiv:2212.06817, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[16] [16]

A comprehensive survey on applications of transformers for deep learning tasks,

S. Islam, H. Elmekki, A. Elsebai, J. Bentahar, N. Drawel, G. Rjoub, and W. Pedrycz, “A comprehensive survey on applications of transformers for deep learning tasks,” Expert Systems with Applications , vol. 241, p. 122666, 2024. 19

work page 2024

[17] [17]

Transformer-empowered 6G intelligent networks: From massive mimo processing to semantic communication,

Y . Wang, Z. Gao, D. Zheng, S. Chen, D. G ¨und¨uz, and H. V . Poor, “Transformer-empowered 6G intelligent networks: From massive mimo processing to semantic communication,” IEEE Wireless Communica- tions, vol. 30, no. 6, pp. 127–135, 2022

work page 2022

[18] [18]

Signal modulation classification based on the transformer network,

J. Cai, F. Gan, X. Cao, and W. Liu, “Signal modulation classification based on the transformer network,” IEEE Transactions on Cognitive Communications and Networking , vol. 8, no. 3, pp. 1348–1357, 2022

work page 2022

[19] [19]

Toward qos prediction based on temporal transformers for iot applications,

A. Hameed, J. Violos, A. Leivadeas, N. Santi, R. Gr ¨unblatt, and N. Mit- ton, “Toward qos prediction based on temporal transformers for iot applications,” IEEE Transactions on Network and Service Management , vol. 19, no. 4, pp. 4010–4027, 2022

work page 2022

[20] [20]

Advancing ultra-reliable 6g: Transformer and semantic localization empowered robust beamforming in millimeter-wave communications,

A. D. Raha, K. Kim, A. Adhikary, M. Gain, Z. Han, and C. S. Hong, “Advancing ultra-reliable 6g: Transformer and semantic localization empowered robust beamforming in millimeter-wave communications,” arXiv preprint arXiv:2406.02000 , 2024

work page arXiv 2024

[21] [21]

On the hardness and inapproximability of virtual network embeddings,

M. Rost and S. Schmid, “On the hardness and inapproximability of virtual network embeddings,” IEEE/ACM Transactions on Networking , vol. 28, no. 2, pp. 791–803, 2020

work page 2020

[22] [22]

S. V . Albrecht, F. Christianos, and L. Sch¨afer, Multi-Agent Reinforcement Learning: Foundations and Modern Approaches . MIT Press, 2024

work page 2024

[23] [23]

On layer normalization in the transformer architecture,

R. Xiong, Y . Yang, D. He, K. Zheng, S. Zheng, C. Xing, H. Zhang, Y . Lan, L. Wang, and T. Liu, “On layer normalization in the transformer architecture,” in International conference on machine learning. PMLR, 2020, pp. 10 524–10 533

work page 2020

[24] [24]

Universal Sentence Encoder

D. Cer, Y . Yang, S.-y. Kong, N. Hua, N. Limtiaco, R. S. John, N. Constant, M. Guajardo-Cespedes, S. Yuan, C. Tar, et al., “Universal sentence encoder,” arXiv preprint arXiv:1803.11175 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[25] [25]

Categorical reparameterization with gumbel-softmax,

E. Jang, S. Gu, and B. Poole, “Categorical reparameterization with gumbel-softmax,” in International Conference on Learning Represen- tations, 2017

work page 2017

[26] [26]

Playing Atari with Deep Reinforcement Learning

V . Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wier- stra, and M. Riedmiller, “Playing atari with deep reinforcement learn- ing,” arXiv preprint arXiv:1312.5602 , 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[27] [27]

Language mod- els are few-shot learners,

T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al., “Language mod- els are few-shot learners,” Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020

work page 1901

[28] [28]

How to train your ViT? data, augmentation, and regularization in vision transformers,

A. P. Steiner, A. Kolesnikov, X. Zhai, R. Wightman, J. Uszkoreit, and L. Beyer, “How to train your ViT? data, augmentation, and regularization in vision transformers,” Transactions on Machine Learning Research , 2022

work page 2022

[29] [29]

RAILS: Risk-aware iterated local search for joint SLA decomposition and service provider man- agement in multi-domain networks,

C. S.-H. Hsu, C. Papagianni, and P. Grosso, “RAILS: Risk-aware iterated local search for joint SLA decomposition and service provider man- agement in multi-domain networks,” in 2025 IEEE 26th International Conference on High Performance Switching and Routing (HPSR), 2025, to be published

work page 2025

[30] [30]

Efficient resource mapping framework over networked clouds via iterated local search- based request partitioning,

A. Leivadeas, C. Papagianni, and S. Papavassiliou, “Efficient resource mapping framework over networked clouds via iterated local search- based request partitioning,” IEEE Transactions on Parallel and Dis- tributed Systems, vol. 24, no. 6, pp. 1077–1086, 2013

work page 2013

[31] [31]

H. R. Lourenc ¸o, O. C. Martin, and T. St ¨utzle, Iterated Local Search . Boston, MA: Springer US, 2003, pp. 320–353

work page 2003

[32] [32]

Independent rein- forcement learners in cooperative markov games: a survey regarding coordination problems,

L. Matignon, G. J. Laurent, and N. Le Fort-Piat, “Independent rein- forcement learners in cooperative markov games: a survey regarding coordination problems,” The Knowledge Engineering Review , vol. 27, no. 1, pp. 1–31, 2012

work page 2012

[33] [33]

Bridging nonlinearities and stochastic regularizers with gaussian error linear units,

D. Hendrycks and K. Gimpel, “Bridging nonlinearities and stochastic regularizers with gaussian error linear units,” 2017

work page 2017

[34] [34]

Identity mappings in deep residual networks,

K. He, X. Zhang, S. Ren, and J. Sun, “Identity mappings in deep residual networks,” in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14. Springer, 2016, pp. 630–645

work page 2016

[35] [35]

Decoupled weight decay regularization,

I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” in International Conference on Learning Representations , 2019. Cyril Shih-Huan Hsu is a Ph.D. candidate at the In- formatics Institute, University of Amsterdam (UvA), The Netherlands, where he has been pursuing his degree since 2021. He earned his B.Sc. and M.Sc. degrees from National ...

work page 2019