REINFORCE, A2C, and PPO are compared for service rate control in an M/M/1 queue modeled as an SMDP, using queue length states and assessing convergence and regret.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
math.OC 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Empirical Evaluation of Policy-Based Reinforcement Learning for Dynamic Service Control in an M/M/1 Queue
REINFORCE, A2C, and PPO are compared for service rate control in an M/M/1 queue modeled as an SMDP, using queue length states and assessing convergence and regret.