A unifying computations of Whittle's Index for Markovian bandits
Pith reviewed 2026-05-25 15:42 UTC · model grok-4.3
The pith
An analytical expression computes Whittle's index for any Markovian bandit with finite or infinite transition rates.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
For any Markovian bandit an analytical expression for Whittle's index is obtained by solving the relaxed problem, provided the optimal policy is of threshold type and the bandit satisfies indexability; the same expression holds uniformly for both finite and infinite transition rates.
What carries the argument
The analytical expression for Whittle's index obtained by setting the subsidy for passivity equal to the value at which the active and passive actions become indifferent in the single-arm relaxed problem.
If this is right
- The expression retrieves known Whittle indices from the literature as particular cases.
- It applies directly to the machine repairmen problem with finite rates.
- It applies to communication networks in which transmission rates react instantaneously to packet losses.
- Sufficient conditions ensure the optimal solution of the relaxed problem is of threshold type.
- Conditions guarantee that the bandit is indexable so the index exists.
Where Pith is reading between the lines
- The single formula may let researchers write closed-form performance bounds for index policies in application areas that previously required separate derivations.
- The subsidy-indifference step could be reused to obtain indices for restless bandits whose state spaces are not Markovian but admit similar value-function comparisons.
- Numerical checks on standard examples from the literature would immediately confirm or refute whether the new expression matches previously computed values.
Load-bearing premise
The underlying processes are continuous-time Markov chains that satisfy the derived sufficient conditions for threshold-type solutions and indexability.
What would settle it
For any concrete finite-rate Markovian bandit, solve the relaxed problem numerically to obtain the index value and compare it directly to the value given by the analytical expression; a mismatch falsifies the claim.
Figures
read the original abstract
The multi-armed restless bandit framework allows to model a wide variety of decision-making problems in areas as diverse as industrial engineering, computer communication, operations research, financial engineering, communication networks etc. In a seminal work, Whittle developed a methodology to derive well-performing (Whittle's) index policies that are obtained by solving a relaxed version of the original problem. However, the computation of Whittle's index itself is a difficult problem and hence researchers focused on calculating Whittle's index numerically or with a problem dependent approach. In our main contribution we derive an analytical expression for Whittle's index for any Markovian bandit with both finite and infinite transition rates. We derive sufficient conditions for the optimal solution of the relaxed problem to be of threshold type, and obtain conditions for the bandit to be indexable, a property assuring the existence of Whittle's index. Our solution approach provides a unifying expression for Whittle's index, which we highlight by retrieving known indices from literature as particular cases. The applicability of finite rates is illustrated with the machine repairmen problem, and that of infinite rates by an example of communication networks where transmission rates react instantaneously to packet losses.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper derives an analytical expression for Whittle's index applicable to any Markovian bandit (continuous-time Markov chain) with finite or infinite transition rates. It also derives sufficient conditions for the relaxed problem to admit a threshold-type optimal policy and for the bandit to be indexable (ensuring the index exists). The expression is presented as unifying, with known indices recovered as special cases; applicability is illustrated via the machine repairmen problem (finite rates) and a communication-network example with instantaneous rate reactions (infinite rates).
Significance. If the derivation is correct under the stated conditions, the result would supply a closed-form route to Whittle indices for a broad subclass of restless bandits, replacing case-by-case numerical or problem-specific calculations. Recovery of prior indices as instances and the handling of infinite-rate models are concrete strengths that could aid applications in networks and operations research.
major comments (2)
- [Abstract] Abstract: the central claim of an analytical expression 'for any Markovian bandit' is not supported by the manuscript, because the derivation is explicitly conditioned on sufficient (not necessary) conditions for threshold structure and indexability; the paper does not establish that these conditions hold for every continuous-time Markov chain, so the unifying expression applies only inside the subclass satisfying the conditions.
- [Conditions for indexability] The manuscript states that sufficient conditions for indexability are obtained, yet provides no general, checkable criterion that would allow a reader to verify indexability for an arbitrary CTMC without re-deriving the conditions case by case; this limits the practical scope of the claimed unification.
minor comments (2)
- [Machine repairmen example] The machine-repairmen illustration would be strengthened by an explicit side-by-side numerical comparison of the new closed-form index against the previously known index for the same parameters.
- [Infinite-rate case] Notation for the infinite-rate limit (e.g., how the transition-rate matrix is regularized) should be introduced earlier and used consistently in the derivation.
Simulated Author's Rebuttal
We thank the referee for the thoughtful review and for identifying points that can improve the clarity of our claims. We respond to each major comment below and indicate where revisions will be made.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim of an analytical expression 'for any Markovian bandit' is not supported by the manuscript, because the derivation is explicitly conditioned on sufficient (not necessary) conditions for threshold structure and indexability; the paper does not establish that these conditions hold for every continuous-time Markov chain, so the unifying expression applies only inside the subclass satisfying the conditions.
Authors: We agree that the abstract phrasing is imprecise. The derivation of the closed-form expression and the recovery of known indices as special cases are explicitly under the sufficient conditions for threshold optimality and indexability that we establish. These conditions are not asserted to be necessary or to hold for every CTMC. We will revise the abstract to state that the unifying expression applies to Markovian bandits satisfying the derived conditions (covering both finite- and infinite-rate cases) and will add a clarifying sentence in the introduction. revision: yes
-
Referee: [Conditions for indexability] The manuscript states that sufficient conditions for indexability are obtained, yet provides no general, checkable criterion that would allow a reader to verify indexability for an arbitrary CTMC without re-deriving the conditions case by case; this limits the practical scope of the claimed unification.
Authors: The sufficient conditions are stated in general form in terms of the CTMC transition rates, holding costs, and subsidy, so that for any concrete model a reader can verify them directly from the rate matrix without a full re-derivation. This is the standard role of sufficient conditions; a universal necessary-and-sufficient criterion for arbitrary CTMCs is not provided because it would require solving the indexability question anew for each chain. The unification remains useful precisely when the conditions hold, as shown by the recovery of prior indices and the two illustrative examples. revision: no
Circularity Check
No circularity: analytical derivation of Whittle index stands independently of inputs.
full rationale
The paper states it derives an analytical expression for Whittle's index along with sufficient conditions for threshold optimality and indexability in continuous-time Markovian bandits. It recovers known indices as special cases but presents this as verification of unification rather than a reduction to fitted quantities or self-referential definitions. No load-bearing step reduces by construction to a prior fit, self-citation chain, or ansatz smuggled via citation; the central claim is a direct derivation under stated assumptions. This is the common honest outcome for a self-contained mathematical derivation paper.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The decision processes are continuous-time Markov chains with finite or infinite transition rates.
Reference graph
Works this paper leans on
-
[1]
Group maintenance: A restless bandits approach
Abderrahmane Abbou and Viliam Makis. Group maintenance: A restless bandits approach. INFORMS Journal on Computing, 2019. doi:10.1287/ijoc.2018.0863
-
[2]
Generalized -fair resource allocation in wireless networks
Eitan Altman, Konstantin Avrachenkov, and Andrey Garnaev. Generalized -fair resource allocation in wireless networks. In 2008 47th IEEE Conference on Decision and Control, pages 2414--2419. IEEE, 2008
work page 2008
-
[3]
Whittle's index policy for a multi-class queueing system with convex holding costs
PS Ansell, Kevin D Glazebrook, Jos \'e Ni \ n o-Mora, and M O'Keeffe. Whittle's index policy for a multi-class queueing system with convex holding costs. Mathematical Methods of Operations Research, 57 0 (1): 0 21--39, 2003
work page 2003
-
[4]
Dynamic routing of customers with general delay costs in a multiserver queuing system
Nilay Tanik Argon, Li Ding, Kevin D Glazebrook, and Serhan Ziya. Dynamic routing of customers with general delay costs in a multiserver queuing system. Probability in the Engineering and Informational Sciences, 23 0 (2): 0 175--203, 2009
work page 2009
-
[5]
Congestion control of TCP flows in internet routers by means of index policy
Konstantin Avrachenkov, Urtzi Ayesta, Josu Doncel, and Peter Jacko. Congestion control of TCP flows in internet routers by means of index policy. Computer Networks, 57 0 (17): 0 3463--3478, 2013
work page 2013
-
[6]
Prioritizing H epatitis C treatment in US prisons
Turgay Ayer, Can Zhang, Anthony Bonifonte, Anne C Spaulding, and Jagpreet Chhatwal. Prioritizing H epatitis C treatment in US prisons. Operations Research, 2019
work page 2019
-
[7]
Whittle indexability in egalitarian processor sharing systems
Vivek S Borkar and Sarath Pattathil. Whittle indexability in egalitarian processor sharing systems. Annals of Operations Research, pages 1--21, 2017
work page 2017
-
[8]
Opportunistic scheduling as restless bandits
Vivek S Borkar, Gaurav S Kasbekar, Sarath Pattathil, and Priyesh Shetty. Opportunistic scheduling as restless bandits. IEEE Transactions on Control of Network Systems, 2017 a
work page 2017
-
[9]
An index policy for dynamic pricing in cloud computing under price commitments
Vivek S Borkar, K Ravikumar, and Krishnakant Saboo. An index policy for dynamic pricing in cloud computing under price commitments. Applicationes Mathematicae, 44: 0 215--245, 2017 b
work page 2017
-
[10]
Multi-armed bandit allocation indices
John Gittins, Kevin Glazebrook, and Richard Weber. Multi-armed bandit allocation indices. John Wiley & Sons, 2011
work page 2011
-
[11]
Index policies for the maintenance of a collection of machines by a set of repairmen
Kevin D Glazebrook, HM Mitchell, and PS Ansell. Index policies for the maintenance of a collection of machines by a set of repairmen. European Journal of Operational Research, 165 0 (1): 0 267--284, 2005
work page 2005
-
[12]
Kevin D Glazebrook, Christopher Kirkbride, and Jamal Ouenniche. Index policies for the admission control and routing of impatient customers to heterogeneous service stations. Operations Research, 57 0 (4): 0 975--989, 2009
work page 2009
-
[13]
Generalized restless bandits and the knapsack problem for perishable inventories
Darina Graczov \'a and Peter Jacko. Generalized restless bandits and the knapsack problem for perishable inventories. Operations Research, 62 0 (3): 0 696--711, 2014
work page 2014
-
[14]
Dynamic priority allocation in restless bandit models, 2010
Peter Jacko. Dynamic priority allocation in restless bandit models, 2010
work page 2010
-
[15]
Peter Jacko. Resource capacity allocation to stochastic dynamic competitors: knapsack problem for perishable items and index-knapsack heuristic. Annals of Operations Research, 241 0 (1-2): 0 83--107, 2016
work page 2016
-
[16]
Terry James, Kevin Glazebrook, and Kyle Lin. Developing effective service policies for multiclass queues with abandonment: asymptotic optimality and approximate policy improvement. INFORMS Journal on Computing, 28 0 (2): 0 251--264, 2016
work page 2016
-
[17]
Efficient content delivery in the presence of impatient jobs
Maialen Larra \ n aga, Onno J Boxma, Rudesindo N \'u \ n ez-Queija, and Mark S Squillante. Efficient content delivery in the presence of impatient jobs. In Teletraffic Congress (ITC 27), 2015 27th International, pages 73--81. IEEE, 2015
work page 2015
-
[18]
Dynamic control of birth-and-death restless bandits: application to resource-allocation problems
Maialen Larra \ n aga, Urtzi Ayesta, and Ina Maria Verloop. Dynamic control of birth-and-death restless bandits: application to resource-allocation problems. IEEE/ACM Transactions on Networking, 24 0 (6): 0 3812--3825, 2016
work page 2016
-
[19]
Fair end-to-end window-based congestion control
Jeonghoon Mo and Jean Walrand. Fair end-to-end window-based congestion control. IEEE/ACM Transactions on networking, 0 (5): 0 556--567, 2000
work page 2000
-
[20]
Jos \'e Nino-Mora. Dynamic allocation indices for restless projects and queueing admission control: a polyhedral approach. Mathematical Programming, 93 0 (3): 0 361--413, 2002
work page 2002
-
[21]
Jos \'e Ni \ n o-Mora. Restless bandit marginal productivity indices, diminishing returns, and optimal control of make-to-order/make-to-stock M / G /1 queues. Mathematics of Operations Research, 31 0 (1): 0 50--84, 2006
work page 2006
-
[22]
Dynamic priority allocation via restless bandit marginal productivity indices
Jos \'e Ni \ n o-Mora. Dynamic priority allocation via restless bandit marginal productivity indices. Top, 15 0 (2): 0 161--198, 2007
work page 2007
-
[23]
Outsourcing warranty repairs: Dynamic allocation
Michelle Opp, Kevin Glazebrook, and Vidyadhar G Kulkarni. Outsourcing warranty repairs: Dynamic allocation. Naval Research Logistics (NRL), 52 0 (5): 0 381--398, 2005
work page 2005
-
[24]
The complexity of optimal queuing network control
Christos H Papadimitriou and John N Tsitsiklis. The complexity of optimal queuing network control. Mathematics of Operations Research, 24 0 (2): 0 293--305, 1999
work page 1999
-
[25]
Distributed Server Allocation for Content Delivery Networks
Sarath Pattathil, Vivek S Borkar, and Gaurav S Kasbekar. Distributed server allocation for content delivery networks. arXiv preprint arXiv:1710.11471, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[26]
Markov decision processes: discrete stochastic dynamic programming
Martin L Puterman. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014
work page 2014
- [27]
-
[28]
Asymptotically optimal priority policies for indexable and nonindexable restless bandits
Ina Maria Verloop. Asymptotically optimal priority policies for indexable and nonindexable restless bandits. The Annals of Applied Probability, 26 0 (4): 0 1947--1995, 2016
work page 1947
-
[29]
On an index policy for restless bandits
Richard R Weber and Gideon Weiss. On an index policy for restless bandits. Journal of Applied Probability, 27 0 (3): 0 637--648, 1990
work page 1990
-
[30]
Restless bandits: Activity allocation in a changing world
Peter Whittle. Restless bandits: Activity allocation in a changing world. Journal of Applied Probability, 25 0 (A): 0 287--298, 1988
work page 1988
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.