pith. machine review for the scientific record. sign in

arxiv: 2604.14908 · v1 · submitted 2026-04-16 · 💻 cs.LG · cs.SY· eess.SY· stat.ML

Recognition: unknown

Multi-User mmWave Beam and Rate Adaptation via Combinatorial Satisficing Bandits

Authors on Pith no claims yet

Pith reviewed 2026-05-10 11:18 UTC · model grok-4.3

classification 💻 cs.LG cs.SYeess.SYstat.ML
keywords mmWave beamformingrate adaptationcombinatorial semi-banditssatisficing regretmulti-user MIMOregret boundswireless resource allocation
0
0 comments X

The pith

SAT-CTS achieves constant satisficing regret when a throughput target is achievable and O((log T)^2) regret otherwise in multi-user mmWave beam and rate adaptation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper models choosing beams and data rates for multiple users across base stations in millimeter-wave systems as a combinatorial bandit problem whose goal is to satisfy a minimum throughput threshold. It introduces the SAT-CTS policy that mixes cautious estimates with sampling to focus learning on meeting the threshold instead of chasing the highest possible total. The central result proves that the extra cost of learning stays bounded by a fixed constant whenever some choice meets the threshold, and grows only like the square of the log of elapsed time when no choice meets it. This setup matters because wireless networks often need reliable service levels more than peak performance, and the method uses only simple success-or-failure reports rather than full channel measurements.

Core claim

The central claim is that SAT-CTS supplies the first finite-time regret bounds for combinatorial semi-bandits under a satisficing objective: cumulative shortfall below the target threshold remains bounded by a time-independent constant when the threshold is realizable, while when the threshold is non-realizable the algorithm incurs only a finite expected transient outside committed rounds and thereafter its regret follows the sum of restarted round contributions, producing an overall O((log T)^2) standard regret bound.

What carries the argument

SAT-CTS, a lightweight threshold-aware policy that blends conservative confidence estimates with posterior sampling to steer learning toward meeting the satisficing throughput threshold rather than maximizing total reward.

Load-bearing premise

The analysis requires perfect reporting of transmission success or failure through feedback signals and that the channel conditions create a clean separation between thresholds that can be met and those that cannot.

What would settle it

If experiments with the same channel model but added noise or errors in the success/failure feedback show that shortfall keeps growing linearly even when a good combination exists, the claimed constant bound would be contradicted.

Figures

Figures reproduced from arXiv: 2604.14908 by Bar{\i}\c{s} Yayc{\i}, Cem Tekin, Emre \"Ozy{\i}ld{\i}r{\i}m, Umut Eren Akturk.

Figure 1
Figure 1. Figure 1: System model. can adapt beams and rates in a completely data-driven manner. While a wide range of MAB algorithms has been proposed for identifying the best choice (i.e., the best super-arm) among combinatorially many configurations (i.e., super-arms) [14]– [17], many of these methods are inherently designed to priori￾tize long-term optimality and thus may over-explore in practice rather than quickly settli… view at source ↗
Figure 2
Figure 2. Figure 2: Process of beam and rate assignment. The expected RSS within a time slot is then given as E[RSSm(fb,k)|hm,b] = pb [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Cumulative satisficing regret (realizable, [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Cumulative satisficing regret (non-realizable, [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗
read the original abstract

We study downlink beam and rate adaptation in a multi-user mmWave MISO system where multiple base stations (BSs), each using analog beamforming from finite codebooks, serve multiple single-antenna user equipments (UEs) with a unique beam per UE and discrete data transmission rates. BSs learn about transmission success based on ACK/NACK feedback. To encode service goals, we introduce a satisficing throughput threshold $\tau_r$ and cast joint beam and rate adaptation as a combinatorial semi-bandit over beam-rate tuples. Within this framework, we propose SAT-CTS, a lightweight, threshold-aware policy that blends conservative confidence estimates with posterior sampling, steering learning toward meeting $\tau_r$ rather than merely maximizing. Our main theoretical contribution provides the first finite-time regret bounds for combinatorial semi-bandits with satisficing objective: when $\tau_r$ is realizable, we upper bound the cumulative satisficing regret to the target with a time-independent constant, and when $\tau_r$ is non-realizable, we show that SAT-CTS incurs only a finite expected transient outside committed CTS rounds, after which its regret is governed by the sum of the regret contributions of restarted CTS rounds, yielding an $O((\log T)^2)$ standard regret bound. On the practical side, we evaluate the performance via cumulative satisficing regret to $\tau_r$ alongside standard regret and fairness. Experiments with time-varying sparse multipath channels show that SAT-CTS consistently reduces satisficing regret and maintains competitive standard regret, while achieving favorable average throughput and fairness across users, indicating that feedback-efficient learning can equitably allocate beams and rates to meet QoS targets without channel state knowledge.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The paper casts multi-user mmWave beam and rate adaptation as a combinatorial semi-bandit problem over beam-rate tuples with ACK/NACK feedback, introduces the SAT-CTS policy that incorporates a satisficing throughput threshold τ_r, and claims the first finite-time regret bounds for satisficing objectives in this setting: a time-independent constant bound on cumulative satisficing regret when τ_r is realizable, and an O((log T)^2) standard regret bound when τ_r is non-realizable via a restarted CTS argument that incurs only a finite expected transient outside committed rounds. Experiments on time-varying sparse multipath channels report reduced satisficing regret, competitive standard regret, and favorable throughput/fairness.

Significance. If the regret bounds are rigorously established, the work is significant as the first finite-time analysis of satisficing (rather than pure maximization) in combinatorial semi-bandits, with direct relevance to QoS-aware wireless resource allocation without CSI. The dual evaluation via satisficing regret and standard regret, plus fairness metrics, strengthens the practical contribution. The paper earns credit for deriving explicit bounds rather than asymptotic statements and for reproducible simulation setup on realistic mmWave models.

major comments (2)
  1. [§4.3] §4.3 (non-realizable regret theorem): The claim that SAT-CTS incurs only a finite expected transient outside committed CTS rounds, after which regret is the sum of restarted CTS contributions yielding O((log T)^2), rests on the channel process preserving controlled restart frequency and per-round independence. No explicit derivation or sensitivity result is given showing that the time-varying sparse multipath model (with its specific multipath sparsity and Doppler) preserves this finite-transient property; if non-stationarity induces unbounded restarts or couples users via the combinatorial action space, the bound fails.
  2. [Assumptions paragraph preceding the main theorems] Assumptions paragraph preceding the main theorems: The analysis assumes transmission success is fully captured by reliable ACK/NACK feedback with no errors. No robustness or sensitivity analysis is provided for how ACK/NACK errors would lengthen the transient or inflate the restarted-CTS regret terms, which is load-bearing for both the constant and O((log T)^2) claims.
minor comments (3)
  1. [Problem formulation section] Notation: the definition of the combinatorial action space (beam-rate tuples per user) and the exact form of the satisficing regret (cumulative deviation from τ_r) should be stated explicitly in the problem formulation section before the policy is introduced.
  2. [Experiments section] Experiments: the caption for the cumulative regret plots should include the number of Monte Carlo runs and the exact parameter values for the sparse multipath channel model (e.g., number of paths, Doppler spread) to allow reproduction.
  3. [Related work] Related work: a brief comparison paragraph is needed contrasting SAT-CTS with prior combinatorial semi-bandit algorithms (e.g., CTS, CombUCB) and with satisficing bandit literature to clarify the incremental contribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive review. We appreciate the recognition of the significance of providing the first finite-time regret bounds for satisficing objectives in combinatorial semi-bandits. We address each major comment point by point below and indicate the revisions we will make to strengthen the manuscript.

read point-by-point responses
  1. Referee: [§4.3] §4.3 (non-realizable regret theorem): The claim that SAT-CTS incurs only a finite expected transient outside committed CTS rounds, after which regret is the sum of restarted CTS contributions yielding O((log T)^2), rests on the channel process preserving controlled restart frequency and per-round independence. No explicit derivation or sensitivity result is given showing that the time-varying sparse multipath model (with its specific multipath sparsity and Doppler) preserves this finite-transient property; if non-stationarity induces unbounded restarts or couples users via the combinatorial action space, the bound fails.

    Authors: We thank the referee for this observation on the non-realizable case. The restarted CTS mechanism in §4.3 is constructed precisely to accommodate non-stationarity: a restart is triggered only when the empirical reward distribution deviates beyond the confidence bounds, and the analysis shows that the expected number of such restarts remains finite provided the underlying process satisfies a mild ergodicity condition (finite coherence time). For the time-varying sparse multipath channel model employed in the paper, the limited Doppler spread and path sparsity ensure that channel realizations remain correlated over short intervals and decorrelate sufficiently to bound the mismatch probability, preventing unbounded restarts. The combinatorial action space does not introduce additional coupling in the restart logic because each user’s beam-rate tuple is sampled independently within the joint posterior. Nevertheless, we agree that an explicit sensitivity result linking the model parameters (sparsity level and Doppler) to the restart bound was not derived in the main text. In the revision we will insert a short supporting lemma (with proof sketch) in an appendix that verifies the finite-transient property under the exact channel parameters used in the simulations, thereby confirming that the O((log T)^2) bound holds for the considered setting. revision: yes

  2. Referee: Assumptions paragraph preceding the main theorems: The analysis assumes transmission success is fully captured by reliable ACK/NACK feedback with no errors. No robustness or sensitivity analysis is provided for how ACK/NACK errors would lengthen the transient or inflate the restarted-CTS regret terms, which is load-bearing for both the constant and O((log T)^2) claims.

    Authors: We acknowledge that the analysis throughout the paper, including both the realizable and non-realizable theorems, is predicated on error-free ACK/NACK feedback. This is a standard modeling choice in semi-bandit formulations for wireless systems, as it isolates the learning dynamics from control-channel imperfections. ACK/NACK errors would indeed act as additional observation noise, which could increase the variance of the posterior updates, potentially extending the transient phase or adding a small additive term to each restarted round’s regret. A complete sensitivity analysis would require augmenting the model with an explicit error probability and re-deriving the concentration inequalities. Because the current manuscript does not contain such an analysis, we will add a concise discussion paragraph immediately after the assumptions statement in the revised version. The added text will note that, for typical low error rates encountered in practice, the effect is absorbed into the existing constant factors of the bounds without changing their order, and will flag noisy-feedback extensions as a natural direction for future work. revision: partial

Circularity Check

0 steps flagged

No circularity: bounds derived from standard combinatorial semi-bandit analysis with new satisficing logic

full rationale

The claimed finite-time regret bounds for the satisficing objective are presented as a novel extension of combinatorial semi-bandit techniques. When τ_r is realizable the bound is a time-independent constant; when non-realizable the analysis invokes a finite expected transient outside committed CTS rounds followed by restarted CTS contributions yielding O((log T)^2). No equation or step is shown to reduce by construction to a fitted quantity, self-defined parameter, or unverified self-citation chain. The derivation chain remains self-contained against external benchmarks for combinatorial Thompson sampling and semi-bandit regret, with the satisficing threshold added as an independent modeling choice rather than a tautology.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard assumptions from combinatorial bandit theory plus domain-specific modeling choices for mmWave channels and feedback; no new free parameters are introduced beyond the given threshold τ_r, and no invented entities are postulated.

axioms (2)
  • domain assumption Transmission success is fully observable via reliable ACK/NACK feedback with no errors or delays.
    Invoked throughout the problem formulation and regret analysis in the abstract.
  • domain assumption The channel process permits a clean separation into realizable and non-realizable threshold regimes that allow restarted CTS analysis.
    Required for the two-case regret statements.

pith-pipeline@v0.9.0 · 5643 in / 1502 out tokens · 35470 ms · 2026-05-10T11:18:36.917181+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

47 extracted references · 2 canonical work pages

  1. [1]

    Satisficing with binary feedback: Multi-user mmwave beam and rate adaptation via combinatorial bandits,

    E. ¨Ozyıldırım, B. Yaycı, U. E. Akturk, and C. Tekin, “Satisficing with binary feedback: Multi-user mmwave beam and rate adaptation via combinatorial bandits,” inNeurIPS Workshop, 2025. [Online]. Available: https://openreview.net/forum?id=qLtBU0mVB1

  2. [2]

    Covrage: Millimeter-wave beam- forming for mobile interactive virtual reality,

    J. Struye, F. Lemic, and J. Famaey, “Covrage: Millimeter-wave beam- forming for mobile interactive virtual reality,”IEEE Trans. Wireless Commun., vol. 22, no. 7, pp. 4828–4842, 2022

  3. [3]

    Mec-assisted panoramic VR video streaming over mil- limeter wave mobile networks,

    Y . Liuet al., “Mec-assisted panoramic VR video streaming over mil- limeter wave mobile networks,”IEEE Trans. Multimedia, vol. 21, no. 5, pp. 1302–1316, 2018

  4. [4]

    Wideband millimeter-wave propagation measure- ments and channel models for future wireless communication system design,

    T. S. Rappaportet al., “Wideband millimeter-wave propagation measure- ments and channel models for future wireless communication system design,”IEEE Trans. Commun., vol. 63, no. 9, pp. 3029–3056, 2015

  5. [5]

    CSI transfer from sub-6G to mmWave: Reduced- overhead multi-user hybrid beamforming,

    W. Denget al., “CSI transfer from sub-6G to mmWave: Reduced- overhead multi-user hybrid beamforming,”IEEE J. Sel. Areas Commun., vol. 43, no. 3, pp. 973–987, 2025

  6. [6]

    MAMBA: A multi-armed bandit framework for beam tracking in millimeter-wave systems,

    I. Aykinet al., “MAMBA: A multi-armed bandit framework for beam tracking in millimeter-wave systems,” inProc. IEEE INFOCOM, 2020, pp. 1469–1478

  7. [7]

    Fast mmWave beam alignment via correlated bandit learning,

    W. Wuet al., “Fast mmWave beam alignment via correlated bandit learning,”IEEE Trans. Wireless Commun., vol. 18, no. 12, pp. 5894– 5908, 2019

  8. [8]

    Machine learning techniques for blind beam alignment in mmWave massive MIMO,

    A. Ktari, H. Ghauch, and G. Rekaya-Ben Othman, “Machine learning techniques for blind beam alignment in mmWave massive MIMO,” Entropy, vol. 26, no. 8, p. 626, 2024

  9. [9]

    An overview of mimo communications - a key to gigabit wireless,

    A. Paulrajet al., “An overview of mimo communications - a key to gigabit wireless,”Proc. IEEE, vol. 92, no. 2, pp. 198–218, 2004

  10. [10]

    Millimeter wave beamforming for wireless backhaul and access in small cell networks,

    S. Huret al., “Millimeter wave beamforming for wireless backhaul and access in small cell networks,”IEEE Trans. Commun., vol. 61, no. 10, pp. 4391–4403, Oct. 2013

  11. [11]

    Enhanced random access and beam training for millime- ter wave wireless local networks with high user density,

    P. Zhouet al., “Enhanced random access and beam training for millime- ter wave wireless local networks with high user density,”IEEE Trans. Wireless Commun., vol. 16, pp. 7760–7773, 2017. 18

  12. [12]

    Fast beam alignment via pure exploration in multi-armed bandits,

    Y . Wei, Z. Zhong, and V . Y . Tan, “Fast beam alignment via pure exploration in multi-armed bandits,”IEEE Trans. Wireless Commun., vol. 22, no. 5, pp. 3264–3279, 2022

  13. [13]

    Lattimore and C

    T. Lattimore and C. Szepesv ´ari,Bandit algorithms. Cambridge Uni- versity Press, 2020

  14. [14]

    Tight regret bounds for stochastic combinatorial semi- bandits,

    B. Kvetonet al., “Tight regret bounds for stochastic combinatorial semi- bandits,” inProc. AISTATS, 2015, pp. 535–543

  15. [15]

    Combinatorial multi-armed bandit: General framework and applications,

    W. Chen, Y . Wang, and Y . Yuan, “Combinatorial multi-armed bandit: General framework and applications,” inProc. ICML, 2013, pp. 151– 159

  16. [16]

    Thompson sampling for combinatorial semi- bandits,

    S. Wang and W. Chen, “Thompson sampling for combinatorial semi- bandits,” inProc. ICML, 2018, pp. 5114–5122

  17. [17]

    Thompson sampling for combinatorial network optimization in unknown environments,

    A. H ¨uy¨uk and C. Tekin, “Thompson sampling for combinatorial network optimization in unknown environments,”IEEE Trans. Netw., vol. 28, no. 6, pp. 2836–2849, 2020

  18. [18]

    A behavioral model of rational choice,

    H. A. Simon, “A behavioral model of rational choice,”The Quarterly Journal of Economics, vol. 69, no. 1, pp. 99–118, 1955

  19. [19]

    DeepMIMO: A Generic Deep Learning Dataset for Millimeter Wave and Massive MIMO Applications

    A. Alkhateeb, “DeepMIMO: A generic deep learning dataset for millimeter wave and massive MIMO applications,” 2019. [Online]. Available: https://arxiv.org/abs/1902.06435

  20. [20]

    A high-accuracy adaptive beam training algorithm for mmwave communication,

    Z. Tanget al., “A high-accuracy adaptive beam training algorithm for mmwave communication,” inProc. IEEE GLOBECOM Workshops, 2018, pp. 1–6

  21. [21]

    Contextual combinatorial beam management via online probing for multiple access mmWave wireless networks,

    Z. Liet al., “Contextual combinatorial beam management via online probing for multiple access mmWave wireless networks,”IEEE J. Sel. Areas Commun., vol. 43, no. 3, pp. 959 – 972, 2025

  22. [22]

    Multi-user small base station association via contextual combinatorial volatile bandits,

    M. A. Qureshi, A. Nika, and C. Tekin, “Multi-user small base station association via contextual combinatorial volatile bandits,”IEEE Trans. Commun., vol. 69, no. 6, pp. 3726–3740, 2021

  23. [23]

    Fast millimeter wave beam alignment,

    H. Hassaniehet al., “Fast millimeter wave beam alignment,” inProc. ACM SIGCOMM, 2018, p. 432–445

  24. [24]

    Contextual combinatorial volatile multi-armed bandit with adaptive discretization,

    A. Nika, S. Elahi, and C. Tekin, “Contextual combinatorial volatile multi-armed bandit with adaptive discretization,” inProc. AISTATS, 2020, pp. 1486–1496

  25. [25]

    Compressive channel esti- mation and tracking for large arrays in mm-Wave picocells,

    Z. Marzi, D. Ramasamy, and U. Madhow, “Compressive channel esti- mation and tracking for large arrays in mm-Wave picocells,”IEEE J. Sel. Topics Signal Process., vol. 10, no. 3, pp. 514–527, 2016

  26. [26]

    Beam codebook based beamforming protocol for multi- Gbps millimeter-wave WPAN systems,

    J. Wanget al., “Beam codebook based beamforming protocol for multi- Gbps millimeter-wave WPAN systems,”IEEE J. Sel. Areas Commun., vol. 27, no. 8, pp. 1390–1399, 2009

  27. [27]

    Beamwidth optimization and resource partitioning scheme for localization assisted mm-Wave communication,

    G. Ghataket al., “Beamwidth optimization and resource partitioning scheme for localization assisted mm-Wave communication,”IEEE Trans. Commun., vol. 69, no. 2, pp. 1358–1374, 2020

  28. [28]

    Millimeter wave beam- selection using out-of-band spatial information,

    A. Ali, N. Gonz ´alez-Prelcic, and R. W. Heath, “Millimeter wave beam- selection using out-of-band spatial information,”IEEE Trans. Wireless Commun., vol. 17, no. 2, pp. 1038–1052, 2017

  29. [29]

    UB3: Fixed budget best beam identification in mmwave massive MISO via pure exploration unimodal bandits,

    D. Ghosh, M. K. Hanawal, and N. Zlatanov, “UB3: Fixed budget best beam identification in mmwave massive MISO via pure exploration unimodal bandits,”IEEE Trans. Wireless Commun., vol. 23, no. 10, pp. 12 658–12 669, 2024

  30. [30]

    Efficient beam alignment in millimeter wave systems using contextual bandits,

    M. Hashemiet al., “Efficient beam alignment in millimeter wave systems using contextual bandits,” inProc. IEEE INFOCOM, 2018, pp. 2393– 2401

  31. [31]

    MmWave codebook selection in rapidly-varying channels via multinomial Thompson sampling,

    Y . Zhanget al., “MmWave codebook selection in rapidly-varying channels via multinomial Thompson sampling,” inProc. ACM MobiHoc, 2021, pp. 151–160

  32. [32]

    Beam alignment for non-stationary environments using a novel time-varying structured bandit,

    K. Min, H.-S. Park, and H.-S. Lee, “Beam alignment for non-stationary environments using a novel time-varying structured bandit,”IEEE Trans. Veh. Technol., 2024

  33. [33]

    Best arm identification based beam acquisition in stationary and abruptly changing environments,

    G. Ghatak, “Best arm identification based beam acquisition in stationary and abruptly changing environments,”IEEE Trans. Signal Process., vol. 72, pp. 670–685, 2024

  34. [34]

    MAB dynamic beam zooming for mmWave alignment and tracking,

    N. Blinn and M. Bloch, “MAB dynamic beam zooming for mmWave alignment and tracking,”IEEE Trans. Wireless Commun., vol. 24, no. 9, pp. 7908 – 7922, 2025

  35. [35]

    Codebook-based hybrid precoding for millimeter wave multiuser systems,

    S. Heet al., “Codebook-based hybrid precoding for millimeter wave multiuser systems,”IEEE Trans. Signal Process., vol. 65, no. 20, pp. 5289–5304, 2017

  36. [36]

    mmwave indoor channel measurement campaign for 5g new radio indoor broadcasting,

    H. Zhanget al., “mmwave indoor channel measurement campaign for 5g new radio indoor broadcasting,”IEEE Trans. Broadcast., vol. 68, no. 2, pp. 331–344, 2022

  37. [37]

    60-ghz millimeter-wave channel measurements and mod- eling for indoor office environments,

    X. Wuet al., “60-ghz millimeter-wave channel measurements and mod- eling for indoor office environments,”IEEE Trans. Antennas Propag., vol. 65, no. 4, pp. 1912–1924, 2017

  38. [38]

    A statistical model for indoor multipath propagation,

    A. A. M. Saleh and R. A. Valenzuela, “A statistical model for indoor multipath propagation,”IEEE J. Sel. Areas Commun., vol. 5, no. 2, pp. 128–137, 1987

  39. [39]

    Efficient beam training and sparse channel estimation for millimeter wave communications under mobility,

    S. H. Limet al., “Efficient beam training and sparse channel estimation for millimeter wave communications under mobility,”IEEE Trans. Commun., vol. 68, no. 10, pp. 6583–6596, Oct. 2020

  40. [40]

    Low-complexity beam allocation for switched-beam based multiuser massive MIMO systems,

    J. Wang, H. Zhu, L. Dai, N. J. Gomes, and J. Wang, “Low-complexity beam allocation for switched-beam based multiuser massive MIMO systems,”IEEE Trans. Wireless Commun., vol. 15, no. 12, pp. 8236– 8248, 2016

  41. [41]

    Frequency reuse of beam allocation for multiuser massive MIMO systems,

    J. Wang, H. Zhu, N. J. Gomes, and J. Wang, “Frequency reuse of beam allocation for multiuser massive MIMO systems,”IEEE Trans. Wireless Commun., vol. 17, no. 4, pp. 2346–2359, 2018

  42. [42]

    Satisficing regret minimization in bandits,

    Q. Feng, T. Ma, and R. Zhu, “Satisficing regret minimization in bandits,” inProc. ICLR, 2025. [Online]. Available: https: //openreview.net/forum?id=5WPQIVgWCg

  43. [43]

    Stochastic and adversarial combinatorial bandits,

    R. Combes, M. Lelarge, A. Prouti `ere, and M. S. Talebi, “Stochastic and adversarial combinatorial bandits,”CoRR, vol. abs/1502.03475, 2015. [Online]. Available: http://arxiv.org/abs/1502.03475

  44. [44]

    Tight lower bounds for combinatorial multi- armed bandits,

    N. Merlis and S. Mannor, “Tight lower bounds for combinatorial multi- armed bandits,” inProc. COLT, J. Abernethy and S. Agarwal, Eds., vol. 125, 09–12 Jul 2020, pp. 2830–2857

  45. [45]

    The Hungarian method for the assignment problem,

    H. W. Kuhn, “The Hungarian method for the assignment problem,” Naval Research Logistics Quarterly, vol. 2, no. 1-2, pp. 83–97, 1955

  46. [46]

    A quantitative measure of fairness and discrimination for resource allocation in shared computer systems,

    R. Jain, D.-M. W. Chiu, and W. R. Hawe, “A quantitative measure of fairness and discrimination for resource allocation in shared computer systems,” Digital Equipment Corporation, Tech. Rep. DEC-TR-301, Sep. 1984

  47. [47]

    Active learning in heteroscedastic noise,

    A. Antos, V . Grover, and C. Szepesv ´ari, “Active learning in heteroscedastic noise,”Theoretical Computer Science, vol. 411, no. 29, pp. 2712–2728, 2010, algorithmic Learning Theory (ALT 2008). [Online]. Available: https://www.sciencedirect.com/science/article/pii/ S0304397510002021