The POW index policy for restless multi-armed bandits with per-arm penalty constraints is asymptotically optimal, computable offline per user, and learnable via deep RL.
Lagrangian index policy for restless bandits with average reward
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
background 1polarities
contest 1representative citing papers
Shifted empirical Gittins indices derived from discretized and right-shifted samples of a bounded job-size distribution yield an index policy that is asymptotically optimal for response-time minimization in M/G/1 queues.
Lagrange index heuristic for RMAB-SMDP scheduling minimizes weighted AoI under non-preemptive heterogeneous updates in wireless networks.
citing papers explorer
-
Restless Bandits with Individual Penalty Constraints: Near-Optimal Indices and Deep Reinforcement Learning
The POW index policy for restless multi-armed bandits with per-arm penalty constraints is asymptotically optimal, computable offline per user, and learnable via deep RL.
-
Scheduling jobs with unknown size distribution in a M/G/1 queue: the shifted empirical Gittins
Shifted empirical Gittins indices derived from discretized and right-shifted samples of a bounded job-size distribution yield an index policy that is asymptotically optimal for response-time minimization in M/G/1 queues.
-
Lagrange Index based Scheduling for Minimizing Age of Updates from Heterogeneous Sources
Lagrange index heuristic for RMAB-SMDP scheduling minimizes weighted AoI under non-preemptive heterogeneous updates in wireless networks.