arxiv: 2605.08674 · v1 · submitted 2026-05-09 · 📡 eess.SP

Recognition: no theorem link

Fair and Efficient Scheduling for Sensor Networks via Online Whittle Index Policy

Anita Khadka, Saurav Staphit, Seong Ki Yoo, Sokipriala Jonah

Authors on Pith no claims yet

Pith reviewed 2026-05-12 01:02 UTC · model grok-4.3

classification 📡 eess.SP

keywords sensor networkswake-up radioage of incorrect informationwhittle indexrestless multi-armed banditonline schedulingenergy efficiency

0 comments

The pith

An online Whittle index policy using Age of Incorrect Information cuts sensor network transmissions by up to 70 percent compared to round-robin polling while keeping estimation errors within acceptable limits.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to reduce energy use and storage demands in wake-up radio sensor networks by polling only those nodes whose data meaningfully corrects the remote monitor's view of the monitored process. It casts the choice of which nodes to poll as a restless multi-armed bandit problem and replaces the usual requirement for known transition probabilities with an online state-estimation step that learns the necessary indices on the fly. This produces two policies, WAoII and its fair variant FWAoII, that adapt polling to actual information value rather than fixed rotation. Experiments on both real-world traces and synthetic data show the resulting schedule transmits far fewer packets than round-robin while the monitor's root-mean-square error stays inside application tolerances.

Core claim

The paper establishes that an online state-estimation procedure can compute Whittle indices for the Age of Incorrect Information metric without prior knowledge of transition dynamics, yielding WAoII and FWAoII policies that schedule node polling in wake-up radio networks. These policies reduce packet transmissions by up to 70 percent relative to round-robin polling while keeping root-mean-square error within acceptable application tolerances on both real and synthetic data sets.

What carries the argument

The online Whittle Index AoII (WAoII) policy, derived by estimating unknown transition dynamics from observed states and then applying the index policy of the resulting restless multi-armed bandit formulation of AoII minimization.

Load-bearing premise

The online state-estimation step recovers enough information about the unknown transition dynamics to produce reliable Whittle indices that correctly rank which nodes to poll.

What would settle it

A controlled deployment in which the state estimator converges to inaccurate transition estimates and the resulting WAoII policy either transmits at least as many packets as round-robin or produces root-mean-square error above the stated application tolerance.

Figures

Figures reproduced from arXiv: 2605.08674 by Anita Khadka, Saurav Staphit, Seong Ki Yoo, Sokipriala Jonah.

**Figure 2.** Figure 2: Polling distribution for Scenario One: RR and AoI [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

**Figure 3.** Figure 3: Polling distribution for Scenario Two: The RR and AoI [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 5.** Figure 5: Time series reconstruction from the synthetic data at [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗

**Figure 6.** Figure 6: Comparison of rewards for various scheduling techniques under different values of [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗

**Figure 7.** Figure 7: Example time series reconstruction from the tempera [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗

**Figure 8.** Figure 8: Example time series reconstruction from the humidity [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗

read the original abstract

Wake-Up Radio (WUR) enables resource-constrained, battery-powered sensor nodes to remain in a low-power deep sleep state while continuously listening for a Wake-Up Signal (WUS). Sensor nodes only wake and transmit data after receiving the WUS, significantly reducing energy consumption. However, polling nodes whose transmitted data provides little or no meaningful update to the remote monitor can still result in unnecessary energy usage and increased storage overhead. To address this issue, this paper uses the Age of Incorrect Information (AoII) metric to prioritise the polling of nodes that provide informative updates to the remote monitor. Determining the optimal set of nodes to poll based on AoII can be formulated as a Restless Multi-Armed Bandit (RMAB) problem, which traditionally requires prior knowledge of the monitored process transition dynamics. Since such dynamics are often unknown in practical deployments, we propose an online learning framework based on state estimation to derive Whittle Index AoII (WAoII) and Fair Whittle Index AoII (FWAoII) policies without assuming known transition probabilities. The proposed policies efficiently schedule node polling while adapting to unknown process behaviour. Experimental evaluation using both real-world and synthetic datasets demonstrates that the proposed online WAoII policy can reduce packet transmissions by up to 70\% compared to the widely used Round Robin (RR) polling strategy, while maintaining Root Mean Squared Error (RMSE) values within acceptable application error tolerances. These results demonstrate the effectiveness of WAoII and FWAoII as energy-efficient polling techniques for low-power WUR sensor networks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a practical online Whittle-index scheduler for AoII in unknown sensor dynamics, with experiments showing clear transmission savings over round-robin, but the estimator's accuracy under non-stationary conditions remains lightly checked.

read the letter

The core contribution is an online state-estimation step that lets them compute Whittle indices for AoII-based polling without assuming known transition probabilities. They also add a fair variant. This extends standard RMAB work in a direct way for wake-up radio networks where the underlying processes are not known in advance. The experiments on both real traces and synthetic data are the strongest part: they report up to 70% fewer packets than round-robin while keeping RMSE inside application tolerances. That result is concrete and relevant for battery-constrained IoT setups. The approach is straightforward to implement once the estimator is in place, and the authors show it adapts to the observed behavior. The main limitation is that the paper provides little on how accurate the state estimator needs to be or how it behaves when the dynamics shift or observations are noisy. There are no convergence bounds or sensitivity analysis, so the claimed savings could shrink if the recovered indices drift from the true ones. The abstract also skips details on exact baselines and statistical tests, which makes it harder to gauge how robust the gains really are. This work is aimed at researchers and engineers who build scheduling for low-power sensor networks and want to move beyond fixed polling. Anyone already using RMAB or AoII metrics will see the practical extension clearly. It is solid enough on the engineering side to warrant a full review rather than a desk reject, though the referees should press for more on the estimator's reliability.

Referee Report

2 major / 2 minor

Summary. The paper formulates polling scheduling for Wake-Up Radio sensor networks as a Restless Multi-Armed Bandit (RMAB) problem using the Age of Incorrect Information (AoII) metric to prioritize informative updates. It proposes online WAoII and FWAoII policies that use state estimation to compute Whittle indices without assuming known transition probabilities, and reports that these policies reduce packet transmissions by up to 70% versus Round Robin while keeping RMSE within acceptable tolerances on real-world and synthetic datasets.

Significance. If the online state estimation reliably recovers the underlying dynamics, the work provides a practical, adaptive scheduling method that extends battery life in resource-constrained WUR networks without requiring prior process models. The experimental results on both real and synthetic traces constitute a concrete strength, demonstrating measurable transmission savings while respecting application-level error bounds.

major comments (2)

[online learning framework and WAoII/FWAoII policy derivation] The online learning framework (state estimation for unknown transition probabilities) provides no convergence guarantees, error bounds, or robustness analysis for the recovered dynamics used to compute Whittle indices. This is load-bearing for the central claim, as inaccurate indices would invalidate the prioritization that produces the reported 70% transmission reduction.
[Experimental Evaluation] Experimental Evaluation: the manuscript reports RMSE values within tolerances and up to 70% savings versus RR but supplies no quantitative comparison of estimated versus true transition probabilities, no statistical significance tests across runs, and no tests under non-stationarity or observation noise. Without these, it is unclear whether the performance generalizes beyond the specific traces.

minor comments (2)

[Abstract] The abstract states that RMSE remains 'within acceptable application error tolerances' but does not define or justify those tolerances or link them to specific application requirements.
[Proposed online learning framework] Notation for the estimated state and the online estimator could be clarified with an explicit algorithm box or pseudocode to improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment point by point below, indicating where revisions will be made to the manuscript.

read point-by-point responses

Referee: [online learning framework and WAoII/FWAoII policy derivation] The online learning framework (state estimation for unknown transition probabilities) provides no convergence guarantees, error bounds, or robustness analysis for the recovered dynamics used to compute Whittle indices. This is load-bearing for the central claim, as inaccurate indices would invalidate the prioritization that produces the reported 70% transmission reduction.

Authors: We agree that the manuscript does not include formal convergence guarantees, error bounds, or a dedicated robustness analysis for the state estimation step. The estimation uses online frequency counts of observed transitions, a standard method for learning unknown Markov dynamics, but we did not derive Whittle-index-specific bounds or prove convergence rates in the RMAB setting. In the revision we will add a dedicated subsection on the estimation procedure, recall its known asymptotic consistency under standard ergodicity assumptions, and include empirical plots of estimation error versus sample size on the synthetic traces. We will also discuss how index computation is affected by moderate estimation error. These additions will clarify the practical reliability of the approach while acknowledging that a full theoretical analysis remains future work. revision: partial
Referee: [Experimental Evaluation] Experimental Evaluation: the manuscript reports RMSE values within tolerances and up to 70% savings versus RR but supplies no quantitative comparison of estimated versus true transition probabilities, no statistical significance tests across runs, and no tests under non-stationarity or observation noise. Without these, it is unclear whether the performance generalizes beyond the specific traces.

Authors: We accept that the current experimental section lacks these quantitative checks. In the revised manuscript we will: (i) add direct comparisons (tables and plots) of estimated versus ground-truth transition probabilities on all synthetic datasets, reporting L1 or total-variation error; (ii) repeat all experiments over 20 independent runs and report mean performance with standard deviation together with paired statistical significance tests (t-tests or Wilcoxon signed-rank) against Round-Robin; (iii) introduce new experiments that inject controlled non-stationarity (abrupt or gradual changes in transition matrices) and additive observation noise, measuring degradation in transmission savings and RMSE. These results will be placed in an expanded experimental section to support claims of generalizability. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained via standard RMAB theory plus empirical validation

full rationale

The paper formulates AoII-based polling as an RMAB, adopts the standard Whittle index policy, and augments it with an online state-estimation procedure to handle unknown transition probabilities. The reported 70% transmission reduction is an empirical outcome measured on held-out real-world and synthetic traces, not a quantity that reduces by construction to parameters fitted inside the same experiment or to a self-citation chain. No self-definitional steps, fitted-input predictions, or load-bearing self-citations appear in the derivation; the online estimator is presented as an independent approximation whose accuracy is tested externally rather than assumed tautologically.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The claim rests on the standard RMAB formulation being a valid model for AoII-based polling and on the online estimator being able to substitute for unknown transition probabilities.

axioms (1)

domain assumption Polling decisions in WUR sensor networks can be modeled as a Restless Multi-Armed Bandit problem.
Invoked in the abstract to justify the Whittle Index approach.

pith-pipeline@v0.9.0 · 5593 in / 1166 out tokens · 56806 ms · 2026-05-12T01:02:24.399607+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages

[1]

When to pull data from sensors for minimum age of incorrect information,

S. Kriouile and M. Assaad, “When to pull data from sensors for minimum age of incorrect information,” in2023 21st International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt). IEEE, 2023, pp. 603–610

work page 2023
[2]

Admin: Adaptive mon- itoring dissemination for the internet of things,

D. Trihinas, G. Pallis, and M. D. Dikaiakos, “Admin: Adaptive mon- itoring dissemination for the internet of things,” inIEEE INFOCOM 2017-IEEE conference on computer communications. IEEE, 2017, pp. 1–9

work page 2017
[3]

Edge mining the internet of things,

E. I. Gaura, J. Brusey, M. Allen, R. Wilkins, D. Goldsmith, and R. Rednic, “Edge mining the internet of things,”IEEE Sensors Journal, vol. 13, no. 10, pp. 3816–3825, 2013

work page 2013
[4]

Learn to schedule: Data freshness- oriented intelligent scheduling in industrial iot,

J. Tang, F. Chen, J. Li, and Z. Liu, “Learn to schedule: Data freshness- oriented intelligent scheduling in industrial iot,”IEEE Transactions on Cognitive Communications and Networking, 2024

work page 2024
[5]

Goal-oriented scheduling in sensor networks with applica- tion timing awareness,

J. Holm, F. Chiariotti, A. E. Kalør, B. Soret, T. B. Pedersen, and P. Popovski, “Goal-oriented scheduling in sensor networks with applica- tion timing awareness,”IEEE Transactions on Communications, vol. 71, no. 8, pp. 4513–4527, 2023

work page 2023
[6]

A bayesian ap- proach to online learning for contextual restless bandits with applications to public health,

B. Liang, L. Xu, A. Taneja, M. Tambe, and L. Janson, “A bayesian ap- proach to online learning for contextual restless bandits with applications to public health,”arXiv preprint arXiv:2402.04933, 2024

work page arXiv 2024
[7]

Energy-efficient internet of things monitoring with content-based wake-up radio,

A. A. Deshpande, F. Chiariotti, and A. Zanella, “Energy-efficient internet of things monitoring with content-based wake-up radio,”arXiv preprint arXiv:2312.04294, 2023

work page arXiv 2023
[8]

Nc-approximation schemes for np- and pspace-hard problems for geometric graphs,

H. B. Hunt III, M. V . Marathe, V . Radhakrishnan, S. S. Ravi, D. J. Rosenkrantz, and R. E. Stearns, “Nc-approximation schemes for np- and pspace-hard problems for geometric graphs,”Journal of algorithms, vol. 26, no. 2, pp. 238–274, 1998

work page 1998
[9]

Restless-ucb, an efficient and low- complexity algorithm for online restless bandits,

S. Wang, L. Huang, and J. Lui, “Restless-ucb, an efficient and low- complexity algorithm for online restless bandits,”Advances in Neural Information Processing Systems, vol. 33, pp. 11 878–11 889, 2020

work page 2020
[10]

Optimistic whittle index policy: Online learning for restless bandits,

K. Wang, L. Xu, A. Taneja, and M. Tambe, “Optimistic whittle index policy: Online learning for restless bandits,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 8, 2023, pp. 10 131– 10 139

work page 2023
[11]

Energy efficient wake up radio polling based on value of information,

S. Jonah, S. K. Yoo, and S. Sthapit, “Energy efficient wake up radio polling based on value of information,” 2025, presented at the IEEE International Black Sea Conference on Communications and Networking (BlackSeaCom), Chisinau, Moldova, 23–26 June 2025

work page 2025
[12]

Has time come to switch from duty-cycled mac protocols to wake-up radio for wireless sensor networks?

J. Oller, I. Demirkol, J. Casademont, J. Paradells, G. U. Gamm, and L. Reindl, “Has time come to switch from duty-cycled mac protocols to wake-up radio for wireless sensor networks?”IEEE/ACM Transactions on Networking, vol. 24, no. 2, pp. 674–687, 2015

work page 2015
[13]

Energy efficiency trade-off between duty-cycling and wake-up radio techniques in iot networks,

A. Kozłowski and J. Sosnowski, “Energy efficiency trade-off between duty-cycling and wake-up radio techniques in iot networks,”Wireless Personal Communications, vol. 107, no. 4, pp. 1951–1971, 2019

work page 1951
[14]

Ieee 802.11 ba wake-up radio: Performance evaluation and practical designs,

D.-J. Deng, S.-Y . Lien, C.-C. Lin, M. Gan, and H.-C. Chen, “Ieee 802.11 ba wake-up radio: Performance evaluation and practical designs,”IEEE Access, vol. 8, pp. 141 547–141 557, 2020

work page 2020
[15]

Radio- on-demand sensor and actuator networks (rod-san): System design and field trial,

H. Yomo, K. Abe, Y . Ezure, T. Ito, A. Hasegawa, and T. Ikenaga, “Radio- on-demand sensor and actuator networks (rod-san): System design and field trial,” in2015 IEEE Global Communications Conference (GLOBECOM). IEEE, 2015, pp. 1–6

work page 2015
[16]

Value of information- based packet scheduling scheme for auv-assisted uasns,

X. Zhuo, W. Wu, L. Tang, F. Qu, and X. Shen, “Value of information- based packet scheduling scheme for auv-assisted uasns,”IEEE Transac- tions on Wireless Communications, 2023

work page 2023
[17]

6g networks: Beyond shannon towards semantic and goal-oriented communications,

E. C. Strinati and S. Barbarossa, “6g networks: Beyond shannon towards semantic and goal-oriented communications,”Computer Networks, vol. 190, p. 107930, 2021

work page 2021
[18]

Toward goal- oriented semantic communications: New metrics, framework, and open challenges,

A. Li, S. Wu, S. Meng, R. Lu, S. Sun, and Q. Zhang, “Toward goal- oriented semantic communications: New metrics, framework, and open challenges,”IEEE Wireless Communications, 2024

work page 2024
[19]

Goal-oriented wireless communication resource allocation for cyber-physical systems,

C. Feng, K. Zheng, Y . Wang, K. Huang, and Q. Chen, “Goal-oriented wireless communication resource allocation for cyber-physical systems,” IEEE Transactions on Wireless Communications, 2024

work page 2024
[20]

Making sense of meaning: A survey on metrics for semantic and goal-oriented communication,

T. M. Getu, G. Kaddoum, and M. Bennis, “Making sense of meaning: A survey on metrics for semantic and goal-oriented communication,” IEEE Access, vol. 11, pp. 45 456–45 492, 2023

work page 2023
[21]

Push-and pull-based effective communication in cyber-physical systems,

P. Talli, F. Mason, F. Chiariotti, and A. Zanella, “Push-and pull-based effective communication in cyber-physical systems,”arXiv preprint arXiv:2401.10921, 2024

work page arXiv 2024
[22]

Content-based wake-up for top-k query in wireless sensor networks,

J. Shiraishi, H. Yomo, K. Huang, ˇC. Stefanovi ´c, and P. Popovski, “Content-based wake-up for top-k query in wireless sensor networks,” IEEE Transactions on Green Communications and Networking, vol. 5, no. 1, pp. 362–377, 2020

work page 2020
[23]

Exact top-k queries in wireless sensor networks,

B. Malhotra, M. A. Nascimento, and I. Nikolaidis, “Exact top-k queries in wireless sensor networks,”IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 10, pp. 1513–1525, 2010

work page 2010
[24]

Real-time status: How often should one update?

S. Kaul, R. Yates, and M. Gruteser, “Real-time status: How often should one update?” in2012 Proceedings IEEE INFOCOM. IEEE, 2012, pp. 2731–2735

work page 2012
[25]

Wireless scheduling to optimize age of information based on earliest update time,

Q. Liu, C. Li, Y . T. Hou, W. Lou, J. H. Reed, and S. Kompella, “Wireless scheduling to optimize age of information based on earliest update time,” IEEE Internet of Things Journal, vol. 10, no. 7, pp. 6352–6366, 2022

work page 2022
[26]

Deep reinforcement learning based scheduling for minimizing age of information in wireless powered sensor networks,

W. Jin, J. Sun, K. Chi, and S. Zhang, “Deep reinforcement learning based scheduling for minimizing age of information in wireless powered sensor networks,”Computer Communications, vol. 191, pp. 1–10, 2022

work page 2022
[27]

Age-of-information aware scheduling for edge-assisted industrial wireless networks,

M. Li, C. Chen, H. Wu, X. Guan, and X. Shen, “Age-of-information aware scheduling for edge-assisted industrial wireless networks,”IEEE Transactions on Industrial Informatics, vol. 17, no. 8, pp. 5562–5571, 2020

work page 2020
[28]

The age of incorrect information: A new performance metric for status updates,

A. Maatouk, S. Kriouile, M. Assaad, and A. Ephremides, “The age of incorrect information: A new performance metric for status updates,” IEEE/ACM Transactions on Networking, vol. 28, no. 5, pp. 2215–2228, 2020

work page 2020
[29]

Optimization of aoii and qaoii in multi-user links,

M. Ayik, E. T. Ceran, and E. Uysal, “Optimization of aoii and qaoii in multi-user links,” inIEEE INFOCOM 2023-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). IEEE, 2023, pp. 1–6

work page 2023
[30]

Scheduling to minimize age of incorrect information with imperfect channel state information,

Y . Chen and A. Ephremides, “Scheduling to minimize age of incorrect information with imperfect channel state information,”Entropy, vol. 23, no. 12, p. 1572, 2021

work page 2021
[31]

The age of incorrect in- formation: An enabler of semantics-empowered communication,

A. Maatouk, M. Assaad, and A. Ephremides, “The age of incorrect in- formation: An enabler of semantics-empowered communication,”IEEE Transactions on Wireless Communications, vol. 22, no. 4, pp. 2621– 2635, 2022

work page 2022
[32]

Minimizing the age of incorrect information for unknown markovian source,

S. Kriouile and M. Assaad, “Minimizing the age of incorrect information for unknown markovian source,”IEEE Transactions on Networking, 2026

work page 2026
[33]

Minimizing age of incorrect information over a channel with random delay,

Y . Chen and A. Ephremides, “Minimizing age of incorrect information over a channel with random delay,”IEEE/ACM Transactions on Net- working, vol. 32, no. 4, pp. 2752–2764, 2024

work page 2024
[34]

Ao 2 i: Minimizing age of outdated information to improve freshness in data collection,

Q. Liu, C. Li, Y . T. Hou, W. Lou, J. H. Reed, and S. Kompella, “Ao 2 i: Minimizing age of outdated information to improve freshness in data collection,” inIEEE INFOCOM 2022-IEEE Conference on Computer Communications. IEEE, 2022, pp. 1359–1368

work page 2022
[35]

Age of information: An introduction and survey,

R. D. Yates, Y . Sun, D. R. Brown, S. K. Kaul, E. Modiano, and S. Ulukus, “Age of information: An introduction and survey,”IEEE Journal on Selected Areas in Communications, vol. 39, no. 5, pp. 1183– 1210, 2021

work page 2021
[36]

Scheduling to minimize age of information with multiple sources,

K. Saurav and R. Vaze, “Scheduling to minimize age of information with multiple sources,”IEEE Journal on Selected Areas in Information Theory, vol. 4, pp. 539–550, 2023

work page 2023
[37]

Communication scheduling by deep reinforcement learning for remote traffic state estimation with bayesian inference,

B. Peng, Y . Xie, G. Seco-Granados, H. Wymeersch, and E. A. Jorswieck, “Communication scheduling by deep reinforcement learning for remote traffic state estimation with bayesian inference,”IEEE Transactions on Vehicular Technology, vol. 71, no. 4, pp. 4287–4300, 2022

work page 2022
[38]

Weighted linear dynamic system for feature representation and soft sensor appli- cation in nonlinear dynamic industrial processes,

X. Yuan, Y . Wang, C. Yang, Z. Ge, Z. Song, and W. Gui, “Weighted linear dynamic system for feature representation and soft sensor appli- cation in nonlinear dynamic industrial processes,”IEEE Transactions on Industrial Electronics, vol. 65, no. 2, pp. 1508–1517, 2017

work page 2017
[39]

Linearization of the sensors character- istics: A review,

T. Islam and S. Mukhopadhyay, “Linearization of the sensors character- istics: A review,”International Journal on Smart Sensing and Intelligent Systems, vol. 12, no. 1, pp. 1–21, 2019

work page 2019
[40]

Adaptive retransmission for wireless sensor nodes under bursty error conditions,

S. Jonah, S. K. Yoo, and S. Sthapit, “Adaptive retransmission for wireless sensor nodes under bursty error conditions,” in2024 5th International 16 Conference on Smart Sensors and Application (ICSSA). IEEE, 2024, pp. 1–6

work page 2024
[41]

Adaptive burst transmission scheme for wsns,

Z. Ansar and W. Dargie, “Adaptive burst transmission scheme for wsns,” in2017 26th International Conference on Computer Communication and Networks (ICCCN). IEEE, 2017, pp. 1–7

work page 2017
[42]

The complexity of optimal queueing network control,

C. H. Papadimitriou and J. N. Tsitsiklis, “The complexity of optimal queueing network control,” inProceedings of IEEE 9th annual confer- ence on structure in complexity Theory. IEEE, 1994, pp. 318–322

work page 1994
[43]

Rested and restless bandits with constrained arms and hidden states: Applications in social networks and 5g networks,

V . Mehta, R. Meshram, K. Kaza, S. N. Merchant, and U. B. Desai, “Rested and restless bandits with constrained arms and hidden states: Applications in social networks and 5g networks,”IEEE Access, vol. 6, pp. 56 782–56 799, 2018

work page 2018
[44]

Markovian restless bandits and index policies: A review,

J. Ni ˜no-Mora, “Markovian restless bandits and index policies: A review,” Mathematics, vol. 11, no. 7, p. 1639, 2023

work page 2023
[45]

Adaptive scheduling: A reinforce- ment learning whittle index approach for wireless sensor networks,

S. Jonah, S. K. Yoo, and S. Sthapit, “Adaptive scheduling: A reinforce- ment learning whittle index approach for wireless sensor networks,” IEEE Access, 2026

work page 2026
[46]

On learning whittle index policy for restless bandits with scalable regret,

N. Akbarzadeh and A. Mahajan, “On learning whittle index policy for restless bandits with scalable regret,”IEEE Transactions on Control of Network Systems, vol. 11, no. 3, pp. 1190–1202, 2023

work page 2023
[47]

Finite-time analysis of whittle index based q- learning for restless multi-armed bandits with neural network function approximation,

G. Xiong and J. Li, “Finite-time analysis of whittle index based q- learning for restless multi-armed bandits with neural network function approximation,”Advances in Neural Information Processing Systems, vol. 36, pp. 29 048–29 073, 2023

work page 2023
[48]

Learn to intervene: An adaptive learning policy for restless bandits in application to preventive healthcare,

A. Biswas, G. Aggarwal, P. Varakantham, and M. Tambe, “Learn to intervene: An adaptive learning policy for restless bandits in application to preventive healthcare,”arXiv preprint arXiv:2105.07965, 2021

work page arXiv 2021
[49]

Asymptotically optimal delay-aware scheduling in queueing systems,

S. Kriouile, M. Assaad, and M. Larranaga, “Asymptotically optimal delay-aware scheduling in queueing systems,”Journal of Communica- tions and Networks, 2024

work page 2024
[50]

Restless bandits: Activity allocation in a changing world,

P. Whittle, “Restless bandits: Activity allocation in a changing world,” Journal of applied probability, vol. 25, no. A, pp. 287–298, 1988

work page 1988
[51]

Aoi-bounded scheduling for industrial wireless sensor networks,

C. Pu, H. Yang, P. Wang, and C. Dong, “Aoi-bounded scheduling for industrial wireless sensor networks,”Electronics, vol. 12, no. 6, p. 1499, 2023

work page 2023
[52]

Monitoring correlated sources: Aoi-based scheduling is nearly optimal,

R. V . Ramakanth, V . Tripathi, and E. Modiano, “Monitoring correlated sources: Aoi-based scheduling is nearly optimal,”IEEE Transactions on Mobile Computing, 2024

work page 2024
[53]

Intel lab data,

S. Madden, “Intel lab data,” http://db.lcs.mit.edu/labdata/labdata.html, Jul. 2010, online; accessed 2010-07-01

work page 2010