A unifying computations of Whittle's Index for Markovian bandits

Ina Maria Verloop; Manu K. Gupta; Urtzi Ayesta

arxiv: 1906.10946 · v1 · pith:4LO5HKYFnew · submitted 2019-06-26 · 🧮 math.OC

A unifying computations of Whittle's Index for Markovian bandits

Urtzi Ayesta , Manu K. Gupta , Ina Maria Verloop This is my paper

Pith reviewed 2026-05-25 15:42 UTC · model grok-4.3

classification 🧮 math.OC

keywords Whittle indexrestless banditsMarkovian banditsindex policiescontinuous-time Markov chainsthreshold policiesindexabilitymachine repairmen problem

0 comments

The pith

An analytical expression computes Whittle's index for any Markovian bandit with finite or infinite transition rates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper derives a single analytical expression for Whittle's index that works for any Markovian bandit modeled as a continuous-time Markov chain. The expression covers cases with finite rates, such as machine repair, and infinite rates, such as networks with instant rate changes on losses. It supplies sufficient conditions that make the relaxed single-bandit problem have a threshold-type solution and that make the bandit indexable. A sympathetic reader cares because the formula replaces numerical search or ad-hoc derivations with one expression that recovers earlier indices as special cases.

Core claim

For any Markovian bandit an analytical expression for Whittle's index is obtained by solving the relaxed problem, provided the optimal policy is of threshold type and the bandit satisfies indexability; the same expression holds uniformly for both finite and infinite transition rates.

What carries the argument

The analytical expression for Whittle's index obtained by setting the subsidy for passivity equal to the value at which the active and passive actions become indifferent in the single-arm relaxed problem.

If this is right

The expression retrieves known Whittle indices from the literature as particular cases.
It applies directly to the machine repairmen problem with finite rates.
It applies to communication networks in which transmission rates react instantaneously to packet losses.
Sufficient conditions ensure the optimal solution of the relaxed problem is of threshold type.
Conditions guarantee that the bandit is indexable so the index exists.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The single formula may let researchers write closed-form performance bounds for index policies in application areas that previously required separate derivations.
The subsidy-indifference step could be reused to obtain indices for restless bandits whose state spaces are not Markovian but admit similar value-function comparisons.
Numerical checks on standard examples from the literature would immediately confirm or refute whether the new expression matches previously computed values.

Load-bearing premise

The underlying processes are continuous-time Markov chains that satisfy the derived sufficient conditions for threshold-type solutions and indexability.

What would settle it

For any concrete finite-rate Markovian bandit, solve the relaxed problem numerically to obtain the index value and compare it directly to the value given by the analytical expression; a mismatch falsifies the claim.

Figures

Figures reproduced from arXiv: 1906.10946 by Ina Maria Verloop, Manu K. Gupta, Urtzi Ayesta.

**Figure 2.** Figure 2: A bottleneck router in TCP with multiple flows [ [PITH_FULL_IMAGE:figures/full_fig_p018_2.png] view at source ↗

**Figure 3.** Figure 3: Transition diagram under the threshold policy ‘ [PITH_FULL_IMAGE:figures/full_fig_p018_3.png] view at source ↗

**Figure 4.** Figure 4: Optimal clearing framework as single-armed restless bandit [PITH_FULL_IMAGE:figures/full_fig_p019_4.png] view at source ↗

**Figure 5.** Figure 5: Transition diagram under the threshold policy [PITH_FULL_IMAGE:figures/full_fig_p019_5.png] view at source ↗

read the original abstract

The multi-armed restless bandit framework allows to model a wide variety of decision-making problems in areas as diverse as industrial engineering, computer communication, operations research, financial engineering, communication networks etc. In a seminal work, Whittle developed a methodology to derive well-performing (Whittle's) index policies that are obtained by solving a relaxed version of the original problem. However, the computation of Whittle's index itself is a difficult problem and hence researchers focused on calculating Whittle's index numerically or with a problem dependent approach. In our main contribution we derive an analytical expression for Whittle's index for any Markovian bandit with both finite and infinite transition rates. We derive sufficient conditions for the optimal solution of the relaxed problem to be of threshold type, and obtain conditions for the bandit to be indexable, a property assuring the existence of Whittle's index. Our solution approach provides a unifying expression for Whittle's index, which we highlight by retrieving known indices from literature as particular cases. The applicability of finite rates is illustrated with the machine repairmen problem, and that of infinite rates by an example of communication networks where transmission rates react instantaneously to packet losses.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a closed-form Whittle index formula that recovers known special cases for Markovian bandits under finite or infinite rates, but the claim of applying to any such bandit rests on sufficient conditions whose breadth is unclear.

read the letter

The main takeaway is that Ayesta et al. derive one analytical expression for the Whittle index that covers both finite-rate and infinite-rate continuous-time Markov chains and reproduces several existing indices from the literature as special cases. They also supply sufficient conditions for the relaxed problem to have a threshold solution and for the bandit to be indexable. The machine-repair example and the instantaneous-rate communication-network example show the formula in use. That unification is the concrete advance; prior work was mostly numerical or tailored to one model. The derivation appears to stand on its own without circular fitting. The limitation is that the expression is guaranteed only when those sufficient conditions hold. The abstract does not make clear how restrictive the conditions are or whether there are natural Markovian bandits that fall outside them, so the “for any” phrasing in the strongest claim may over-reach. No machine-checked proof or exhaustive counter-example search is mentioned. Readers who already work on restless-bandit scheduling in operations research or networks will find the closed form and the recovered cases useful. The paper is coherent on its own terms and engages the literature directly, so it merits a full referee process even if the conditions later need tightening.

Referee Report

2 major / 2 minor

Summary. The paper derives an analytical expression for Whittle's index applicable to any Markovian bandit (continuous-time Markov chain) with finite or infinite transition rates. It also derives sufficient conditions for the relaxed problem to admit a threshold-type optimal policy and for the bandit to be indexable (ensuring the index exists). The expression is presented as unifying, with known indices recovered as special cases; applicability is illustrated via the machine repairmen problem (finite rates) and a communication-network example with instantaneous rate reactions (infinite rates).

Significance. If the derivation is correct under the stated conditions, the result would supply a closed-form route to Whittle indices for a broad subclass of restless bandits, replacing case-by-case numerical or problem-specific calculations. Recovery of prior indices as instances and the handling of infinite-rate models are concrete strengths that could aid applications in networks and operations research.

major comments (2)

[Abstract] Abstract: the central claim of an analytical expression 'for any Markovian bandit' is not supported by the manuscript, because the derivation is explicitly conditioned on sufficient (not necessary) conditions for threshold structure and indexability; the paper does not establish that these conditions hold for every continuous-time Markov chain, so the unifying expression applies only inside the subclass satisfying the conditions.
[Conditions for indexability] The manuscript states that sufficient conditions for indexability are obtained, yet provides no general, checkable criterion that would allow a reader to verify indexability for an arbitrary CTMC without re-deriving the conditions case by case; this limits the practical scope of the claimed unification.

minor comments (2)

[Machine repairmen example] The machine-repairmen illustration would be strengthened by an explicit side-by-side numerical comparison of the new closed-form index against the previously known index for the same parameters.
[Infinite-rate case] Notation for the infinite-rate limit (e.g., how the transition-rate matrix is regularized) should be introduced earlier and used consistently in the derivation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful review and for identifying points that can improve the clarity of our claims. We respond to each major comment below and indicate where revisions will be made.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim of an analytical expression 'for any Markovian bandit' is not supported by the manuscript, because the derivation is explicitly conditioned on sufficient (not necessary) conditions for threshold structure and indexability; the paper does not establish that these conditions hold for every continuous-time Markov chain, so the unifying expression applies only inside the subclass satisfying the conditions.

Authors: We agree that the abstract phrasing is imprecise. The derivation of the closed-form expression and the recovery of known indices as special cases are explicitly under the sufficient conditions for threshold optimality and indexability that we establish. These conditions are not asserted to be necessary or to hold for every CTMC. We will revise the abstract to state that the unifying expression applies to Markovian bandits satisfying the derived conditions (covering both finite- and infinite-rate cases) and will add a clarifying sentence in the introduction. revision: yes
Referee: [Conditions for indexability] The manuscript states that sufficient conditions for indexability are obtained, yet provides no general, checkable criterion that would allow a reader to verify indexability for an arbitrary CTMC without re-deriving the conditions case by case; this limits the practical scope of the claimed unification.

Authors: The sufficient conditions are stated in general form in terms of the CTMC transition rates, holding costs, and subsidy, so that for any concrete model a reader can verify them directly from the rate matrix without a full re-derivation. This is the standard role of sufficient conditions; a universal necessary-and-sufficient criterion for arbitrary CTMCs is not provided because it would require solving the indexability question anew for each chain. The unification remains useful precisely when the conditions hold, as shown by the recovery of prior indices and the two illustrative examples. revision: no

Circularity Check

0 steps flagged

No circularity: analytical derivation of Whittle index stands independently of inputs.

full rationale

The paper states it derives an analytical expression for Whittle's index along with sufficient conditions for threshold optimality and indexability in continuous-time Markovian bandits. It recovers known indices as special cases but presents this as verification of unification rather than a reduction to fitted quantities or self-referential definitions. No load-bearing step reduces by construction to a prior fit, self-citation chain, or ansatz smuggled via citation; the central claim is a direct derivation under stated assumptions. This is the common honest outcome for a self-contained mathematical derivation paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Central claim rests on the Markov property of the bandits and on the existence of threshold solutions and indexability under conditions that the paper states it derives; no free parameters or invented entities are mentioned in the abstract.

axioms (1)

domain assumption The decision processes are continuous-time Markov chains with finite or infinite transition rates.
Explicitly stated in the abstract as the setting for the bandits.

pith-pipeline@v0.9.0 · 5738 in / 1062 out tokens · 22726 ms · 2026-05-25T15:42:38.862865+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages · 1 internal anchor

[1]

Group maintenance: A restless bandits approach

Abderrahmane Abbou and Viliam Makis. Group maintenance: A restless bandits approach. INFORMS Journal on Computing, 2019. doi:10.1287/ijoc.2018.0863

work page doi:10.1287/ijoc.2018.0863 2019
[2]

Generalized -fair resource allocation in wireless networks

Eitan Altman, Konstantin Avrachenkov, and Andrey Garnaev. Generalized -fair resource allocation in wireless networks. In 2008 47th IEEE Conference on Decision and Control, pages 2414--2419. IEEE, 2008

work page 2008
[3]

Whittle's index policy for a multi-class queueing system with convex holding costs

PS Ansell, Kevin D Glazebrook, Jos \'e Ni \ n o-Mora, and M O'Keeffe. Whittle's index policy for a multi-class queueing system with convex holding costs. Mathematical Methods of Operations Research, 57 0 (1): 0 21--39, 2003

work page 2003
[4]

Dynamic routing of customers with general delay costs in a multiserver queuing system

Nilay Tanik Argon, Li Ding, Kevin D Glazebrook, and Serhan Ziya. Dynamic routing of customers with general delay costs in a multiserver queuing system. Probability in the Engineering and Informational Sciences, 23 0 (2): 0 175--203, 2009

work page 2009
[5]

Congestion control of TCP flows in internet routers by means of index policy

Konstantin Avrachenkov, Urtzi Ayesta, Josu Doncel, and Peter Jacko. Congestion control of TCP flows in internet routers by means of index policy. Computer Networks, 57 0 (17): 0 3463--3478, 2013

work page 2013
[6]

Prioritizing H epatitis C treatment in US prisons

Turgay Ayer, Can Zhang, Anthony Bonifonte, Anne C Spaulding, and Jagpreet Chhatwal. Prioritizing H epatitis C treatment in US prisons. Operations Research, 2019

work page 2019
[7]

Whittle indexability in egalitarian processor sharing systems

Vivek S Borkar and Sarath Pattathil. Whittle indexability in egalitarian processor sharing systems. Annals of Operations Research, pages 1--21, 2017

work page 2017
[8]

Opportunistic scheduling as restless bandits

Vivek S Borkar, Gaurav S Kasbekar, Sarath Pattathil, and Priyesh Shetty. Opportunistic scheduling as restless bandits. IEEE Transactions on Control of Network Systems, 2017 a

work page 2017
[9]

An index policy for dynamic pricing in cloud computing under price commitments

Vivek S Borkar, K Ravikumar, and Krishnakant Saboo. An index policy for dynamic pricing in cloud computing under price commitments. Applicationes Mathematicae, 44: 0 215--245, 2017 b

work page 2017
[10]

Multi-armed bandit allocation indices

John Gittins, Kevin Glazebrook, and Richard Weber. Multi-armed bandit allocation indices. John Wiley & Sons, 2011

work page 2011
[11]

Index policies for the maintenance of a collection of machines by a set of repairmen

Kevin D Glazebrook, HM Mitchell, and PS Ansell. Index policies for the maintenance of a collection of machines by a set of repairmen. European Journal of Operational Research, 165 0 (1): 0 267--284, 2005

work page 2005
[12]

Index policies for the admission control and routing of impatient customers to heterogeneous service stations

Kevin D Glazebrook, Christopher Kirkbride, and Jamal Ouenniche. Index policies for the admission control and routing of impatient customers to heterogeneous service stations. Operations Research, 57 0 (4): 0 975--989, 2009

work page 2009
[13]

Generalized restless bandits and the knapsack problem for perishable inventories

Darina Graczov \'a and Peter Jacko. Generalized restless bandits and the knapsack problem for perishable inventories. Operations Research, 62 0 (3): 0 696--711, 2014

work page 2014
[14]

Dynamic priority allocation in restless bandit models, 2010

Peter Jacko. Dynamic priority allocation in restless bandit models, 2010

work page 2010
[15]

Resource capacity allocation to stochastic dynamic competitors: knapsack problem for perishable items and index-knapsack heuristic

Peter Jacko. Resource capacity allocation to stochastic dynamic competitors: knapsack problem for perishable items and index-knapsack heuristic. Annals of Operations Research, 241 0 (1-2): 0 83--107, 2016

work page 2016
[16]

Developing effective service policies for multiclass queues with abandonment: asymptotic optimality and approximate policy improvement

Terry James, Kevin Glazebrook, and Kyle Lin. Developing effective service policies for multiclass queues with abandonment: asymptotic optimality and approximate policy improvement. INFORMS Journal on Computing, 28 0 (2): 0 251--264, 2016

work page 2016
[17]

Efficient content delivery in the presence of impatient jobs

Maialen Larra \ n aga, Onno J Boxma, Rudesindo N \'u \ n ez-Queija, and Mark S Squillante. Efficient content delivery in the presence of impatient jobs. In Teletraffic Congress (ITC 27), 2015 27th International, pages 73--81. IEEE, 2015

work page 2015
[18]

Dynamic control of birth-and-death restless bandits: application to resource-allocation problems

Maialen Larra \ n aga, Urtzi Ayesta, and Ina Maria Verloop. Dynamic control of birth-and-death restless bandits: application to resource-allocation problems. IEEE/ACM Transactions on Networking, 24 0 (6): 0 3812--3825, 2016

work page 2016
[19]

Fair end-to-end window-based congestion control

Jeonghoon Mo and Jean Walrand. Fair end-to-end window-based congestion control. IEEE/ACM Transactions on networking, 0 (5): 0 556--567, 2000

work page 2000
[20]

Dynamic allocation indices for restless projects and queueing admission control: a polyhedral approach

Jos \'e Nino-Mora. Dynamic allocation indices for restless projects and queueing admission control: a polyhedral approach. Mathematical Programming, 93 0 (3): 0 361--413, 2002

work page 2002
[21]

Restless bandit marginal productivity indices, diminishing returns, and optimal control of make-to-order/make-to-stock M / G /1 queues

Jos \'e Ni \ n o-Mora. Restless bandit marginal productivity indices, diminishing returns, and optimal control of make-to-order/make-to-stock M / G /1 queues. Mathematics of Operations Research, 31 0 (1): 0 50--84, 2006

work page 2006
[22]

Dynamic priority allocation via restless bandit marginal productivity indices

Jos \'e Ni \ n o-Mora. Dynamic priority allocation via restless bandit marginal productivity indices. Top, 15 0 (2): 0 161--198, 2007

work page 2007
[23]

Outsourcing warranty repairs: Dynamic allocation

Michelle Opp, Kevin Glazebrook, and Vidyadhar G Kulkarni. Outsourcing warranty repairs: Dynamic allocation. Naval Research Logistics (NRL), 52 0 (5): 0 381--398, 2005

work page 2005
[24]

The complexity of optimal queuing network control

Christos H Papadimitriou and John N Tsitsiklis. The complexity of optimal queuing network control. Mathematics of Operations Research, 24 0 (2): 0 293--305, 1999

work page 1999
[25]

Distributed Server Allocation for Content Delivery Networks

Sarath Pattathil, Vivek S Borkar, and Gaurav S Kasbekar. Distributed server allocation for content delivery networks. arXiv preprint arXiv:1710.11471, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[26]

Markov decision processes: discrete stochastic dynamic programming

Martin L Puterman. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014

work page 2014
[27]

Indexable restless bandits, 2008

D Ruiz-Hernandez. Indexable restless bandits, 2008

work page 2008
[28]

Asymptotically optimal priority policies for indexable and nonindexable restless bandits

Ina Maria Verloop. Asymptotically optimal priority policies for indexable and nonindexable restless bandits. The Annals of Applied Probability, 26 0 (4): 0 1947--1995, 2016

work page 1947
[29]

On an index policy for restless bandits

Richard R Weber and Gideon Weiss. On an index policy for restless bandits. Journal of Applied Probability, 27 0 (3): 0 637--648, 1990

work page 1990
[30]

Restless bandits: Activity allocation in a changing world

Peter Whittle. Restless bandits: Activity allocation in a changing world. Journal of Applied Probability, 25 0 (A): 0 287--298, 1988

work page 1988

[1] [1]

Group maintenance: A restless bandits approach

Abderrahmane Abbou and Viliam Makis. Group maintenance: A restless bandits approach. INFORMS Journal on Computing, 2019. doi:10.1287/ijoc.2018.0863

work page doi:10.1287/ijoc.2018.0863 2019

[2] [2]

Generalized -fair resource allocation in wireless networks

Eitan Altman, Konstantin Avrachenkov, and Andrey Garnaev. Generalized -fair resource allocation in wireless networks. In 2008 47th IEEE Conference on Decision and Control, pages 2414--2419. IEEE, 2008

work page 2008

[3] [3]

Whittle's index policy for a multi-class queueing system with convex holding costs

PS Ansell, Kevin D Glazebrook, Jos \'e Ni \ n o-Mora, and M O'Keeffe. Whittle's index policy for a multi-class queueing system with convex holding costs. Mathematical Methods of Operations Research, 57 0 (1): 0 21--39, 2003

work page 2003

[4] [4]

Dynamic routing of customers with general delay costs in a multiserver queuing system

Nilay Tanik Argon, Li Ding, Kevin D Glazebrook, and Serhan Ziya. Dynamic routing of customers with general delay costs in a multiserver queuing system. Probability in the Engineering and Informational Sciences, 23 0 (2): 0 175--203, 2009

work page 2009

[5] [5]

Congestion control of TCP flows in internet routers by means of index policy

Konstantin Avrachenkov, Urtzi Ayesta, Josu Doncel, and Peter Jacko. Congestion control of TCP flows in internet routers by means of index policy. Computer Networks, 57 0 (17): 0 3463--3478, 2013

work page 2013

[6] [6]

Prioritizing H epatitis C treatment in US prisons

Turgay Ayer, Can Zhang, Anthony Bonifonte, Anne C Spaulding, and Jagpreet Chhatwal. Prioritizing H epatitis C treatment in US prisons. Operations Research, 2019

work page 2019

[7] [7]

Whittle indexability in egalitarian processor sharing systems

Vivek S Borkar and Sarath Pattathil. Whittle indexability in egalitarian processor sharing systems. Annals of Operations Research, pages 1--21, 2017

work page 2017

[8] [8]

Opportunistic scheduling as restless bandits

Vivek S Borkar, Gaurav S Kasbekar, Sarath Pattathil, and Priyesh Shetty. Opportunistic scheduling as restless bandits. IEEE Transactions on Control of Network Systems, 2017 a

work page 2017

[9] [9]

An index policy for dynamic pricing in cloud computing under price commitments

Vivek S Borkar, K Ravikumar, and Krishnakant Saboo. An index policy for dynamic pricing in cloud computing under price commitments. Applicationes Mathematicae, 44: 0 215--245, 2017 b

work page 2017

[10] [10]

Multi-armed bandit allocation indices

John Gittins, Kevin Glazebrook, and Richard Weber. Multi-armed bandit allocation indices. John Wiley & Sons, 2011

work page 2011

[11] [11]

Index policies for the maintenance of a collection of machines by a set of repairmen

Kevin D Glazebrook, HM Mitchell, and PS Ansell. Index policies for the maintenance of a collection of machines by a set of repairmen. European Journal of Operational Research, 165 0 (1): 0 267--284, 2005

work page 2005

[12] [12]

Index policies for the admission control and routing of impatient customers to heterogeneous service stations

Kevin D Glazebrook, Christopher Kirkbride, and Jamal Ouenniche. Index policies for the admission control and routing of impatient customers to heterogeneous service stations. Operations Research, 57 0 (4): 0 975--989, 2009

work page 2009

[13] [13]

Generalized restless bandits and the knapsack problem for perishable inventories

Darina Graczov \'a and Peter Jacko. Generalized restless bandits and the knapsack problem for perishable inventories. Operations Research, 62 0 (3): 0 696--711, 2014

work page 2014

[14] [14]

Dynamic priority allocation in restless bandit models, 2010

Peter Jacko. Dynamic priority allocation in restless bandit models, 2010

work page 2010

[15] [15]

Resource capacity allocation to stochastic dynamic competitors: knapsack problem for perishable items and index-knapsack heuristic

Peter Jacko. Resource capacity allocation to stochastic dynamic competitors: knapsack problem for perishable items and index-knapsack heuristic. Annals of Operations Research, 241 0 (1-2): 0 83--107, 2016

work page 2016

[16] [16]

Developing effective service policies for multiclass queues with abandonment: asymptotic optimality and approximate policy improvement

Terry James, Kevin Glazebrook, and Kyle Lin. Developing effective service policies for multiclass queues with abandonment: asymptotic optimality and approximate policy improvement. INFORMS Journal on Computing, 28 0 (2): 0 251--264, 2016

work page 2016

[17] [17]

Efficient content delivery in the presence of impatient jobs

Maialen Larra \ n aga, Onno J Boxma, Rudesindo N \'u \ n ez-Queija, and Mark S Squillante. Efficient content delivery in the presence of impatient jobs. In Teletraffic Congress (ITC 27), 2015 27th International, pages 73--81. IEEE, 2015

work page 2015

[18] [18]

Dynamic control of birth-and-death restless bandits: application to resource-allocation problems

Maialen Larra \ n aga, Urtzi Ayesta, and Ina Maria Verloop. Dynamic control of birth-and-death restless bandits: application to resource-allocation problems. IEEE/ACM Transactions on Networking, 24 0 (6): 0 3812--3825, 2016

work page 2016

[19] [19]

Fair end-to-end window-based congestion control

Jeonghoon Mo and Jean Walrand. Fair end-to-end window-based congestion control. IEEE/ACM Transactions on networking, 0 (5): 0 556--567, 2000

work page 2000

[20] [20]

Dynamic allocation indices for restless projects and queueing admission control: a polyhedral approach

Jos \'e Nino-Mora. Dynamic allocation indices for restless projects and queueing admission control: a polyhedral approach. Mathematical Programming, 93 0 (3): 0 361--413, 2002

work page 2002

[21] [21]

Restless bandit marginal productivity indices, diminishing returns, and optimal control of make-to-order/make-to-stock M / G /1 queues

Jos \'e Ni \ n o-Mora. Restless bandit marginal productivity indices, diminishing returns, and optimal control of make-to-order/make-to-stock M / G /1 queues. Mathematics of Operations Research, 31 0 (1): 0 50--84, 2006

work page 2006

[22] [22]

Dynamic priority allocation via restless bandit marginal productivity indices

Jos \'e Ni \ n o-Mora. Dynamic priority allocation via restless bandit marginal productivity indices. Top, 15 0 (2): 0 161--198, 2007

work page 2007

[23] [23]

Outsourcing warranty repairs: Dynamic allocation

Michelle Opp, Kevin Glazebrook, and Vidyadhar G Kulkarni. Outsourcing warranty repairs: Dynamic allocation. Naval Research Logistics (NRL), 52 0 (5): 0 381--398, 2005

work page 2005

[24] [24]

The complexity of optimal queuing network control

Christos H Papadimitriou and John N Tsitsiklis. The complexity of optimal queuing network control. Mathematics of Operations Research, 24 0 (2): 0 293--305, 1999

work page 1999

[25] [25]

Distributed Server Allocation for Content Delivery Networks

Sarath Pattathil, Vivek S Borkar, and Gaurav S Kasbekar. Distributed server allocation for content delivery networks. arXiv preprint arXiv:1710.11471, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[26] [26]

Markov decision processes: discrete stochastic dynamic programming

Martin L Puterman. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014

work page 2014

[27] [27]

Indexable restless bandits, 2008

D Ruiz-Hernandez. Indexable restless bandits, 2008

work page 2008

[28] [28]

Asymptotically optimal priority policies for indexable and nonindexable restless bandits

Ina Maria Verloop. Asymptotically optimal priority policies for indexable and nonindexable restless bandits. The Annals of Applied Probability, 26 0 (4): 0 1947--1995, 2016

work page 1947

[29] [29]

On an index policy for restless bandits

Richard R Weber and Gideon Weiss. On an index policy for restless bandits. Journal of Applied Probability, 27 0 (3): 0 637--648, 1990

work page 1990

[30] [30]

Restless bandits: Activity allocation in a changing world

Peter Whittle. Restless bandits: Activity allocation in a changing world. Journal of Applied Probability, 25 0 (A): 0 287--298, 1988

work page 1988