pith. sign in

arxiv: 1906.10946 · v1 · pith:4LO5HKYFnew · submitted 2019-06-26 · 🧮 math.OC

A unifying computations of Whittle's Index for Markovian bandits

Pith reviewed 2026-05-25 15:42 UTC · model grok-4.3

classification 🧮 math.OC
keywords Whittle indexrestless banditsMarkovian banditsindex policiescontinuous-time Markov chainsthreshold policiesindexabilitymachine repairmen problem
0
0 comments X

The pith

An analytical expression computes Whittle's index for any Markovian bandit with finite or infinite transition rates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper derives a single analytical expression for Whittle's index that works for any Markovian bandit modeled as a continuous-time Markov chain. The expression covers cases with finite rates, such as machine repair, and infinite rates, such as networks with instant rate changes on losses. It supplies sufficient conditions that make the relaxed single-bandit problem have a threshold-type solution and that make the bandit indexable. A sympathetic reader cares because the formula replaces numerical search or ad-hoc derivations with one expression that recovers earlier indices as special cases.

Core claim

For any Markovian bandit an analytical expression for Whittle's index is obtained by solving the relaxed problem, provided the optimal policy is of threshold type and the bandit satisfies indexability; the same expression holds uniformly for both finite and infinite transition rates.

What carries the argument

The analytical expression for Whittle's index obtained by setting the subsidy for passivity equal to the value at which the active and passive actions become indifferent in the single-arm relaxed problem.

If this is right

  • The expression retrieves known Whittle indices from the literature as particular cases.
  • It applies directly to the machine repairmen problem with finite rates.
  • It applies to communication networks in which transmission rates react instantaneously to packet losses.
  • Sufficient conditions ensure the optimal solution of the relaxed problem is of threshold type.
  • Conditions guarantee that the bandit is indexable so the index exists.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The single formula may let researchers write closed-form performance bounds for index policies in application areas that previously required separate derivations.
  • The subsidy-indifference step could be reused to obtain indices for restless bandits whose state spaces are not Markovian but admit similar value-function comparisons.
  • Numerical checks on standard examples from the literature would immediately confirm or refute whether the new expression matches previously computed values.

Load-bearing premise

The underlying processes are continuous-time Markov chains that satisfy the derived sufficient conditions for threshold-type solutions and indexability.

What would settle it

For any concrete finite-rate Markovian bandit, solve the relaxed problem numerically to obtain the index value and compare it directly to the value given by the analytical expression; a mismatch falsifies the claim.

Figures

Figures reproduced from arXiv: 1906.10946 by Ina Maria Verloop, Manu K. Gupta, Urtzi Ayesta.

Figure 1
Figure 1. Figure 1: Transition diagram under the threshold policy [PITH_FULL_IMAGE:figures/full_fig_p016_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: A bottleneck router in TCP with multiple flows [ [PITH_FULL_IMAGE:figures/full_fig_p018_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Transition diagram under the threshold policy ‘ [PITH_FULL_IMAGE:figures/full_fig_p018_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Optimal clearing framework as single-armed restless bandit [PITH_FULL_IMAGE:figures/full_fig_p019_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Transition diagram under the threshold policy [PITH_FULL_IMAGE:figures/full_fig_p019_5.png] view at source ↗
read the original abstract

The multi-armed restless bandit framework allows to model a wide variety of decision-making problems in areas as diverse as industrial engineering, computer communication, operations research, financial engineering, communication networks etc. In a seminal work, Whittle developed a methodology to derive well-performing (Whittle's) index policies that are obtained by solving a relaxed version of the original problem. However, the computation of Whittle's index itself is a difficult problem and hence researchers focused on calculating Whittle's index numerically or with a problem dependent approach. In our main contribution we derive an analytical expression for Whittle's index for any Markovian bandit with both finite and infinite transition rates. We derive sufficient conditions for the optimal solution of the relaxed problem to be of threshold type, and obtain conditions for the bandit to be indexable, a property assuring the existence of Whittle's index. Our solution approach provides a unifying expression for Whittle's index, which we highlight by retrieving known indices from literature as particular cases. The applicability of finite rates is illustrated with the machine repairmen problem, and that of infinite rates by an example of communication networks where transmission rates react instantaneously to packet losses.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper derives an analytical expression for Whittle's index applicable to any Markovian bandit (continuous-time Markov chain) with finite or infinite transition rates. It also derives sufficient conditions for the relaxed problem to admit a threshold-type optimal policy and for the bandit to be indexable (ensuring the index exists). The expression is presented as unifying, with known indices recovered as special cases; applicability is illustrated via the machine repairmen problem (finite rates) and a communication-network example with instantaneous rate reactions (infinite rates).

Significance. If the derivation is correct under the stated conditions, the result would supply a closed-form route to Whittle indices for a broad subclass of restless bandits, replacing case-by-case numerical or problem-specific calculations. Recovery of prior indices as instances and the handling of infinite-rate models are concrete strengths that could aid applications in networks and operations research.

major comments (2)
  1. [Abstract] Abstract: the central claim of an analytical expression 'for any Markovian bandit' is not supported by the manuscript, because the derivation is explicitly conditioned on sufficient (not necessary) conditions for threshold structure and indexability; the paper does not establish that these conditions hold for every continuous-time Markov chain, so the unifying expression applies only inside the subclass satisfying the conditions.
  2. [Conditions for indexability] The manuscript states that sufficient conditions for indexability are obtained, yet provides no general, checkable criterion that would allow a reader to verify indexability for an arbitrary CTMC without re-deriving the conditions case by case; this limits the practical scope of the claimed unification.
minor comments (2)
  1. [Machine repairmen example] The machine-repairmen illustration would be strengthened by an explicit side-by-side numerical comparison of the new closed-form index against the previously known index for the same parameters.
  2. [Infinite-rate case] Notation for the infinite-rate limit (e.g., how the transition-rate matrix is regularized) should be introduced earlier and used consistently in the derivation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful review and for identifying points that can improve the clarity of our claims. We respond to each major comment below and indicate where revisions will be made.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim of an analytical expression 'for any Markovian bandit' is not supported by the manuscript, because the derivation is explicitly conditioned on sufficient (not necessary) conditions for threshold structure and indexability; the paper does not establish that these conditions hold for every continuous-time Markov chain, so the unifying expression applies only inside the subclass satisfying the conditions.

    Authors: We agree that the abstract phrasing is imprecise. The derivation of the closed-form expression and the recovery of known indices as special cases are explicitly under the sufficient conditions for threshold optimality and indexability that we establish. These conditions are not asserted to be necessary or to hold for every CTMC. We will revise the abstract to state that the unifying expression applies to Markovian bandits satisfying the derived conditions (covering both finite- and infinite-rate cases) and will add a clarifying sentence in the introduction. revision: yes

  2. Referee: [Conditions for indexability] The manuscript states that sufficient conditions for indexability are obtained, yet provides no general, checkable criterion that would allow a reader to verify indexability for an arbitrary CTMC without re-deriving the conditions case by case; this limits the practical scope of the claimed unification.

    Authors: The sufficient conditions are stated in general form in terms of the CTMC transition rates, holding costs, and subsidy, so that for any concrete model a reader can verify them directly from the rate matrix without a full re-derivation. This is the standard role of sufficient conditions; a universal necessary-and-sufficient criterion for arbitrary CTMCs is not provided because it would require solving the indexability question anew for each chain. The unification remains useful precisely when the conditions hold, as shown by the recovery of prior indices and the two illustrative examples. revision: no

Circularity Check

0 steps flagged

No circularity: analytical derivation of Whittle index stands independently of inputs.

full rationale

The paper states it derives an analytical expression for Whittle's index along with sufficient conditions for threshold optimality and indexability in continuous-time Markovian bandits. It recovers known indices as special cases but presents this as verification of unification rather than a reduction to fitted quantities or self-referential definitions. No load-bearing step reduces by construction to a prior fit, self-citation chain, or ansatz smuggled via citation; the central claim is a direct derivation under stated assumptions. This is the common honest outcome for a self-contained mathematical derivation paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Central claim rests on the Markov property of the bandits and on the existence of threshold solutions and indexability under conditions that the paper states it derives; no free parameters or invented entities are mentioned in the abstract.

axioms (1)
  • domain assumption The decision processes are continuous-time Markov chains with finite or infinite transition rates.
    Explicitly stated in the abstract as the setting for the bandits.

pith-pipeline@v0.9.0 · 5738 in / 1062 out tokens · 22726 ms · 2026-05-25T15:42:38.862865+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages · 1 internal anchor

  1. [1]

    Group maintenance: A restless bandits approach

    Abderrahmane Abbou and Viliam Makis. Group maintenance: A restless bandits approach. INFORMS Journal on Computing, 2019. doi:10.1287/ijoc.2018.0863

  2. [2]

    Generalized -fair resource allocation in wireless networks

    Eitan Altman, Konstantin Avrachenkov, and Andrey Garnaev. Generalized -fair resource allocation in wireless networks. In 2008 47th IEEE Conference on Decision and Control, pages 2414--2419. IEEE, 2008

  3. [3]

    Whittle's index policy for a multi-class queueing system with convex holding costs

    PS Ansell, Kevin D Glazebrook, Jos \'e Ni \ n o-Mora, and M O'Keeffe. Whittle's index policy for a multi-class queueing system with convex holding costs. Mathematical Methods of Operations Research, 57 0 (1): 0 21--39, 2003

  4. [4]

    Dynamic routing of customers with general delay costs in a multiserver queuing system

    Nilay Tanik Argon, Li Ding, Kevin D Glazebrook, and Serhan Ziya. Dynamic routing of customers with general delay costs in a multiserver queuing system. Probability in the Engineering and Informational Sciences, 23 0 (2): 0 175--203, 2009

  5. [5]

    Congestion control of TCP flows in internet routers by means of index policy

    Konstantin Avrachenkov, Urtzi Ayesta, Josu Doncel, and Peter Jacko. Congestion control of TCP flows in internet routers by means of index policy. Computer Networks, 57 0 (17): 0 3463--3478, 2013

  6. [6]

    Prioritizing H epatitis C treatment in US prisons

    Turgay Ayer, Can Zhang, Anthony Bonifonte, Anne C Spaulding, and Jagpreet Chhatwal. Prioritizing H epatitis C treatment in US prisons. Operations Research, 2019

  7. [7]

    Whittle indexability in egalitarian processor sharing systems

    Vivek S Borkar and Sarath Pattathil. Whittle indexability in egalitarian processor sharing systems. Annals of Operations Research, pages 1--21, 2017

  8. [8]

    Opportunistic scheduling as restless bandits

    Vivek S Borkar, Gaurav S Kasbekar, Sarath Pattathil, and Priyesh Shetty. Opportunistic scheduling as restless bandits. IEEE Transactions on Control of Network Systems, 2017 a

  9. [9]

    An index policy for dynamic pricing in cloud computing under price commitments

    Vivek S Borkar, K Ravikumar, and Krishnakant Saboo. An index policy for dynamic pricing in cloud computing under price commitments. Applicationes Mathematicae, 44: 0 215--245, 2017 b

  10. [10]

    Multi-armed bandit allocation indices

    John Gittins, Kevin Glazebrook, and Richard Weber. Multi-armed bandit allocation indices. John Wiley & Sons, 2011

  11. [11]

    Index policies for the maintenance of a collection of machines by a set of repairmen

    Kevin D Glazebrook, HM Mitchell, and PS Ansell. Index policies for the maintenance of a collection of machines by a set of repairmen. European Journal of Operational Research, 165 0 (1): 0 267--284, 2005

  12. [12]

    Index policies for the admission control and routing of impatient customers to heterogeneous service stations

    Kevin D Glazebrook, Christopher Kirkbride, and Jamal Ouenniche. Index policies for the admission control and routing of impatient customers to heterogeneous service stations. Operations Research, 57 0 (4): 0 975--989, 2009

  13. [13]

    Generalized restless bandits and the knapsack problem for perishable inventories

    Darina Graczov \'a and Peter Jacko. Generalized restless bandits and the knapsack problem for perishable inventories. Operations Research, 62 0 (3): 0 696--711, 2014

  14. [14]

    Dynamic priority allocation in restless bandit models, 2010

    Peter Jacko. Dynamic priority allocation in restless bandit models, 2010

  15. [15]

    Resource capacity allocation to stochastic dynamic competitors: knapsack problem for perishable items and index-knapsack heuristic

    Peter Jacko. Resource capacity allocation to stochastic dynamic competitors: knapsack problem for perishable items and index-knapsack heuristic. Annals of Operations Research, 241 0 (1-2): 0 83--107, 2016

  16. [16]

    Developing effective service policies for multiclass queues with abandonment: asymptotic optimality and approximate policy improvement

    Terry James, Kevin Glazebrook, and Kyle Lin. Developing effective service policies for multiclass queues with abandonment: asymptotic optimality and approximate policy improvement. INFORMS Journal on Computing, 28 0 (2): 0 251--264, 2016

  17. [17]

    Efficient content delivery in the presence of impatient jobs

    Maialen Larra \ n aga, Onno J Boxma, Rudesindo N \'u \ n ez-Queija, and Mark S Squillante. Efficient content delivery in the presence of impatient jobs. In Teletraffic Congress (ITC 27), 2015 27th International, pages 73--81. IEEE, 2015

  18. [18]

    Dynamic control of birth-and-death restless bandits: application to resource-allocation problems

    Maialen Larra \ n aga, Urtzi Ayesta, and Ina Maria Verloop. Dynamic control of birth-and-death restless bandits: application to resource-allocation problems. IEEE/ACM Transactions on Networking, 24 0 (6): 0 3812--3825, 2016

  19. [19]

    Fair end-to-end window-based congestion control

    Jeonghoon Mo and Jean Walrand. Fair end-to-end window-based congestion control. IEEE/ACM Transactions on networking, 0 (5): 0 556--567, 2000

  20. [20]

    Dynamic allocation indices for restless projects and queueing admission control: a polyhedral approach

    Jos \'e Nino-Mora. Dynamic allocation indices for restless projects and queueing admission control: a polyhedral approach. Mathematical Programming, 93 0 (3): 0 361--413, 2002

  21. [21]

    Restless bandit marginal productivity indices, diminishing returns, and optimal control of make-to-order/make-to-stock M / G /1 queues

    Jos \'e Ni \ n o-Mora. Restless bandit marginal productivity indices, diminishing returns, and optimal control of make-to-order/make-to-stock M / G /1 queues. Mathematics of Operations Research, 31 0 (1): 0 50--84, 2006

  22. [22]

    Dynamic priority allocation via restless bandit marginal productivity indices

    Jos \'e Ni \ n o-Mora. Dynamic priority allocation via restless bandit marginal productivity indices. Top, 15 0 (2): 0 161--198, 2007

  23. [23]

    Outsourcing warranty repairs: Dynamic allocation

    Michelle Opp, Kevin Glazebrook, and Vidyadhar G Kulkarni. Outsourcing warranty repairs: Dynamic allocation. Naval Research Logistics (NRL), 52 0 (5): 0 381--398, 2005

  24. [24]

    The complexity of optimal queuing network control

    Christos H Papadimitriou and John N Tsitsiklis. The complexity of optimal queuing network control. Mathematics of Operations Research, 24 0 (2): 0 293--305, 1999

  25. [25]

    Distributed Server Allocation for Content Delivery Networks

    Sarath Pattathil, Vivek S Borkar, and Gaurav S Kasbekar. Distributed server allocation for content delivery networks. arXiv preprint arXiv:1710.11471, 2017

  26. [26]

    Markov decision processes: discrete stochastic dynamic programming

    Martin L Puterman. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014

  27. [27]

    Indexable restless bandits, 2008

    D Ruiz-Hernandez. Indexable restless bandits, 2008

  28. [28]

    Asymptotically optimal priority policies for indexable and nonindexable restless bandits

    Ina Maria Verloop. Asymptotically optimal priority policies for indexable and nonindexable restless bandits. The Annals of Applied Probability, 26 0 (4): 0 1947--1995, 2016

  29. [29]

    On an index policy for restless bandits

    Richard R Weber and Gideon Weiss. On an index policy for restless bandits. Journal of Applied Probability, 27 0 (3): 0 637--648, 1990

  30. [30]

    Restless bandits: Activity allocation in a changing world

    Peter Whittle. Restless bandits: Activity allocation in a changing world. Journal of Applied Probability, 25 0 (A): 0 287--298, 1988