The POW index policy for restless multi-armed bandits with per-arm penalty constraints is asymptotically optimal, computable offline per user, and learnable via deep RL.
Lagrangian index policy for restless bandits with average reward
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
background 1polarities
contest 1representative citing papers
Shifted empirical Gittins indices derived from discretized and right-shifted samples of a bounded job-size distribution yield an index policy that is asymptotically optimal for response-time minimization in M/G/1 queues.
Lagrange index heuristic for RMAB-SMDP scheduling minimizes weighted AoI under non-preemptive heterogeneous updates in wireless networks.
citing papers explorer
-
Scheduling jobs with unknown size distribution in a M/G/1 queue: the shifted empirical Gittins
Shifted empirical Gittins indices derived from discretized and right-shifted samples of a bounded job-size distribution yield an index policy that is asymptotically optimal for response-time minimization in M/G/1 queues.