Recognition: unknown
Multi-User mmWave Beam and Rate Adaptation via Combinatorial Satisficing Bandits
Pith reviewed 2026-05-10 11:18 UTC · model grok-4.3
The pith
SAT-CTS achieves constant satisficing regret when a throughput target is achievable and O((log T)^2) regret otherwise in multi-user mmWave beam and rate adaptation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that SAT-CTS supplies the first finite-time regret bounds for combinatorial semi-bandits under a satisficing objective: cumulative shortfall below the target threshold remains bounded by a time-independent constant when the threshold is realizable, while when the threshold is non-realizable the algorithm incurs only a finite expected transient outside committed rounds and thereafter its regret follows the sum of restarted round contributions, producing an overall O((log T)^2) standard regret bound.
What carries the argument
SAT-CTS, a lightweight threshold-aware policy that blends conservative confidence estimates with posterior sampling to steer learning toward meeting the satisficing throughput threshold rather than maximizing total reward.
Load-bearing premise
The analysis requires perfect reporting of transmission success or failure through feedback signals and that the channel conditions create a clean separation between thresholds that can be met and those that cannot.
What would settle it
If experiments with the same channel model but added noise or errors in the success/failure feedback show that shortfall keeps growing linearly even when a good combination exists, the claimed constant bound would be contradicted.
Figures
read the original abstract
We study downlink beam and rate adaptation in a multi-user mmWave MISO system where multiple base stations (BSs), each using analog beamforming from finite codebooks, serve multiple single-antenna user equipments (UEs) with a unique beam per UE and discrete data transmission rates. BSs learn about transmission success based on ACK/NACK feedback. To encode service goals, we introduce a satisficing throughput threshold $\tau_r$ and cast joint beam and rate adaptation as a combinatorial semi-bandit over beam-rate tuples. Within this framework, we propose SAT-CTS, a lightweight, threshold-aware policy that blends conservative confidence estimates with posterior sampling, steering learning toward meeting $\tau_r$ rather than merely maximizing. Our main theoretical contribution provides the first finite-time regret bounds for combinatorial semi-bandits with satisficing objective: when $\tau_r$ is realizable, we upper bound the cumulative satisficing regret to the target with a time-independent constant, and when $\tau_r$ is non-realizable, we show that SAT-CTS incurs only a finite expected transient outside committed CTS rounds, after which its regret is governed by the sum of the regret contributions of restarted CTS rounds, yielding an $O((\log T)^2)$ standard regret bound. On the practical side, we evaluate the performance via cumulative satisficing regret to $\tau_r$ alongside standard regret and fairness. Experiments with time-varying sparse multipath channels show that SAT-CTS consistently reduces satisficing regret and maintains competitive standard regret, while achieving favorable average throughput and fairness across users, indicating that feedback-efficient learning can equitably allocate beams and rates to meet QoS targets without channel state knowledge.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper casts multi-user mmWave beam and rate adaptation as a combinatorial semi-bandit problem over beam-rate tuples with ACK/NACK feedback, introduces the SAT-CTS policy that incorporates a satisficing throughput threshold τ_r, and claims the first finite-time regret bounds for satisficing objectives in this setting: a time-independent constant bound on cumulative satisficing regret when τ_r is realizable, and an O((log T)^2) standard regret bound when τ_r is non-realizable via a restarted CTS argument that incurs only a finite expected transient outside committed rounds. Experiments on time-varying sparse multipath channels report reduced satisficing regret, competitive standard regret, and favorable throughput/fairness.
Significance. If the regret bounds are rigorously established, the work is significant as the first finite-time analysis of satisficing (rather than pure maximization) in combinatorial semi-bandits, with direct relevance to QoS-aware wireless resource allocation without CSI. The dual evaluation via satisficing regret and standard regret, plus fairness metrics, strengthens the practical contribution. The paper earns credit for deriving explicit bounds rather than asymptotic statements and for reproducible simulation setup on realistic mmWave models.
major comments (2)
- [§4.3] §4.3 (non-realizable regret theorem): The claim that SAT-CTS incurs only a finite expected transient outside committed CTS rounds, after which regret is the sum of restarted CTS contributions yielding O((log T)^2), rests on the channel process preserving controlled restart frequency and per-round independence. No explicit derivation or sensitivity result is given showing that the time-varying sparse multipath model (with its specific multipath sparsity and Doppler) preserves this finite-transient property; if non-stationarity induces unbounded restarts or couples users via the combinatorial action space, the bound fails.
- [Assumptions paragraph preceding the main theorems] Assumptions paragraph preceding the main theorems: The analysis assumes transmission success is fully captured by reliable ACK/NACK feedback with no errors. No robustness or sensitivity analysis is provided for how ACK/NACK errors would lengthen the transient or inflate the restarted-CTS regret terms, which is load-bearing for both the constant and O((log T)^2) claims.
minor comments (3)
- [Problem formulation section] Notation: the definition of the combinatorial action space (beam-rate tuples per user) and the exact form of the satisficing regret (cumulative deviation from τ_r) should be stated explicitly in the problem formulation section before the policy is introduced.
- [Experiments section] Experiments: the caption for the cumulative regret plots should include the number of Monte Carlo runs and the exact parameter values for the sparse multipath channel model (e.g., number of paths, Doppler spread) to allow reproduction.
- [Related work] Related work: a brief comparison paragraph is needed contrasting SAT-CTS with prior combinatorial semi-bandit algorithms (e.g., CTS, CombUCB) and with satisficing bandit literature to clarify the incremental contribution.
Simulated Author's Rebuttal
We thank the referee for the careful and constructive review. We appreciate the recognition of the significance of providing the first finite-time regret bounds for satisficing objectives in combinatorial semi-bandits. We address each major comment point by point below and indicate the revisions we will make to strengthen the manuscript.
read point-by-point responses
-
Referee: [§4.3] §4.3 (non-realizable regret theorem): The claim that SAT-CTS incurs only a finite expected transient outside committed CTS rounds, after which regret is the sum of restarted CTS contributions yielding O((log T)^2), rests on the channel process preserving controlled restart frequency and per-round independence. No explicit derivation or sensitivity result is given showing that the time-varying sparse multipath model (with its specific multipath sparsity and Doppler) preserves this finite-transient property; if non-stationarity induces unbounded restarts or couples users via the combinatorial action space, the bound fails.
Authors: We thank the referee for this observation on the non-realizable case. The restarted CTS mechanism in §4.3 is constructed precisely to accommodate non-stationarity: a restart is triggered only when the empirical reward distribution deviates beyond the confidence bounds, and the analysis shows that the expected number of such restarts remains finite provided the underlying process satisfies a mild ergodicity condition (finite coherence time). For the time-varying sparse multipath channel model employed in the paper, the limited Doppler spread and path sparsity ensure that channel realizations remain correlated over short intervals and decorrelate sufficiently to bound the mismatch probability, preventing unbounded restarts. The combinatorial action space does not introduce additional coupling in the restart logic because each user’s beam-rate tuple is sampled independently within the joint posterior. Nevertheless, we agree that an explicit sensitivity result linking the model parameters (sparsity level and Doppler) to the restart bound was not derived in the main text. In the revision we will insert a short supporting lemma (with proof sketch) in an appendix that verifies the finite-transient property under the exact channel parameters used in the simulations, thereby confirming that the O((log T)^2) bound holds for the considered setting. revision: yes
-
Referee: Assumptions paragraph preceding the main theorems: The analysis assumes transmission success is fully captured by reliable ACK/NACK feedback with no errors. No robustness or sensitivity analysis is provided for how ACK/NACK errors would lengthen the transient or inflate the restarted-CTS regret terms, which is load-bearing for both the constant and O((log T)^2) claims.
Authors: We acknowledge that the analysis throughout the paper, including both the realizable and non-realizable theorems, is predicated on error-free ACK/NACK feedback. This is a standard modeling choice in semi-bandit formulations for wireless systems, as it isolates the learning dynamics from control-channel imperfections. ACK/NACK errors would indeed act as additional observation noise, which could increase the variance of the posterior updates, potentially extending the transient phase or adding a small additive term to each restarted round’s regret. A complete sensitivity analysis would require augmenting the model with an explicit error probability and re-deriving the concentration inequalities. Because the current manuscript does not contain such an analysis, we will add a concise discussion paragraph immediately after the assumptions statement in the revised version. The added text will note that, for typical low error rates encountered in practice, the effect is absorbed into the existing constant factors of the bounds without changing their order, and will flag noisy-feedback extensions as a natural direction for future work. revision: partial
Circularity Check
No circularity: bounds derived from standard combinatorial semi-bandit analysis with new satisficing logic
full rationale
The claimed finite-time regret bounds for the satisficing objective are presented as a novel extension of combinatorial semi-bandit techniques. When τ_r is realizable the bound is a time-independent constant; when non-realizable the analysis invokes a finite expected transient outside committed CTS rounds followed by restarted CTS contributions yielding O((log T)^2). No equation or step is shown to reduce by construction to a fitted quantity, self-defined parameter, or unverified self-citation chain. The derivation chain remains self-contained against external benchmarks for combinatorial Thompson sampling and semi-bandit regret, with the satisficing threshold added as an independent modeling choice rather than a tautology.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Transmission success is fully observable via reliable ACK/NACK feedback with no errors or delays.
- domain assumption The channel process permits a clean separation into realizable and non-realizable threshold regimes that allow restarted CTS analysis.
Reference graph
Works this paper leans on
-
[1]
Satisficing with binary feedback: Multi-user mmwave beam and rate adaptation via combinatorial bandits,
E. ¨Ozyıldırım, B. Yaycı, U. E. Akturk, and C. Tekin, “Satisficing with binary feedback: Multi-user mmwave beam and rate adaptation via combinatorial bandits,” inNeurIPS Workshop, 2025. [Online]. Available: https://openreview.net/forum?id=qLtBU0mVB1
2025
-
[2]
Covrage: Millimeter-wave beam- forming for mobile interactive virtual reality,
J. Struye, F. Lemic, and J. Famaey, “Covrage: Millimeter-wave beam- forming for mobile interactive virtual reality,”IEEE Trans. Wireless Commun., vol. 22, no. 7, pp. 4828–4842, 2022
2022
-
[3]
Mec-assisted panoramic VR video streaming over mil- limeter wave mobile networks,
Y . Liuet al., “Mec-assisted panoramic VR video streaming over mil- limeter wave mobile networks,”IEEE Trans. Multimedia, vol. 21, no. 5, pp. 1302–1316, 2018
2018
-
[4]
Wideband millimeter-wave propagation measure- ments and channel models for future wireless communication system design,
T. S. Rappaportet al., “Wideband millimeter-wave propagation measure- ments and channel models for future wireless communication system design,”IEEE Trans. Commun., vol. 63, no. 9, pp. 3029–3056, 2015
2015
-
[5]
CSI transfer from sub-6G to mmWave: Reduced- overhead multi-user hybrid beamforming,
W. Denget al., “CSI transfer from sub-6G to mmWave: Reduced- overhead multi-user hybrid beamforming,”IEEE J. Sel. Areas Commun., vol. 43, no. 3, pp. 973–987, 2025
2025
-
[6]
MAMBA: A multi-armed bandit framework for beam tracking in millimeter-wave systems,
I. Aykinet al., “MAMBA: A multi-armed bandit framework for beam tracking in millimeter-wave systems,” inProc. IEEE INFOCOM, 2020, pp. 1469–1478
2020
-
[7]
Fast mmWave beam alignment via correlated bandit learning,
W. Wuet al., “Fast mmWave beam alignment via correlated bandit learning,”IEEE Trans. Wireless Commun., vol. 18, no. 12, pp. 5894– 5908, 2019
2019
-
[8]
Machine learning techniques for blind beam alignment in mmWave massive MIMO,
A. Ktari, H. Ghauch, and G. Rekaya-Ben Othman, “Machine learning techniques for blind beam alignment in mmWave massive MIMO,” Entropy, vol. 26, no. 8, p. 626, 2024
2024
-
[9]
An overview of mimo communications - a key to gigabit wireless,
A. Paulrajet al., “An overview of mimo communications - a key to gigabit wireless,”Proc. IEEE, vol. 92, no. 2, pp. 198–218, 2004
2004
-
[10]
Millimeter wave beamforming for wireless backhaul and access in small cell networks,
S. Huret al., “Millimeter wave beamforming for wireless backhaul and access in small cell networks,”IEEE Trans. Commun., vol. 61, no. 10, pp. 4391–4403, Oct. 2013
2013
-
[11]
Enhanced random access and beam training for millime- ter wave wireless local networks with high user density,
P. Zhouet al., “Enhanced random access and beam training for millime- ter wave wireless local networks with high user density,”IEEE Trans. Wireless Commun., vol. 16, pp. 7760–7773, 2017. 18
2017
-
[12]
Fast beam alignment via pure exploration in multi-armed bandits,
Y . Wei, Z. Zhong, and V . Y . Tan, “Fast beam alignment via pure exploration in multi-armed bandits,”IEEE Trans. Wireless Commun., vol. 22, no. 5, pp. 3264–3279, 2022
2022
-
[13]
Lattimore and C
T. Lattimore and C. Szepesv ´ari,Bandit algorithms. Cambridge Uni- versity Press, 2020
2020
-
[14]
Tight regret bounds for stochastic combinatorial semi- bandits,
B. Kvetonet al., “Tight regret bounds for stochastic combinatorial semi- bandits,” inProc. AISTATS, 2015, pp. 535–543
2015
-
[15]
Combinatorial multi-armed bandit: General framework and applications,
W. Chen, Y . Wang, and Y . Yuan, “Combinatorial multi-armed bandit: General framework and applications,” inProc. ICML, 2013, pp. 151– 159
2013
-
[16]
Thompson sampling for combinatorial semi- bandits,
S. Wang and W. Chen, “Thompson sampling for combinatorial semi- bandits,” inProc. ICML, 2018, pp. 5114–5122
2018
-
[17]
Thompson sampling for combinatorial network optimization in unknown environments,
A. H ¨uy¨uk and C. Tekin, “Thompson sampling for combinatorial network optimization in unknown environments,”IEEE Trans. Netw., vol. 28, no. 6, pp. 2836–2849, 2020
2020
-
[18]
A behavioral model of rational choice,
H. A. Simon, “A behavioral model of rational choice,”The Quarterly Journal of Economics, vol. 69, no. 1, pp. 99–118, 1955
1955
-
[19]
DeepMIMO: A Generic Deep Learning Dataset for Millimeter Wave and Massive MIMO Applications
A. Alkhateeb, “DeepMIMO: A generic deep learning dataset for millimeter wave and massive MIMO applications,” 2019. [Online]. Available: https://arxiv.org/abs/1902.06435
work page Pith review arXiv 2019
-
[20]
A high-accuracy adaptive beam training algorithm for mmwave communication,
Z. Tanget al., “A high-accuracy adaptive beam training algorithm for mmwave communication,” inProc. IEEE GLOBECOM Workshops, 2018, pp. 1–6
2018
-
[21]
Contextual combinatorial beam management via online probing for multiple access mmWave wireless networks,
Z. Liet al., “Contextual combinatorial beam management via online probing for multiple access mmWave wireless networks,”IEEE J. Sel. Areas Commun., vol. 43, no. 3, pp. 959 – 972, 2025
2025
-
[22]
Multi-user small base station association via contextual combinatorial volatile bandits,
M. A. Qureshi, A. Nika, and C. Tekin, “Multi-user small base station association via contextual combinatorial volatile bandits,”IEEE Trans. Commun., vol. 69, no. 6, pp. 3726–3740, 2021
2021
-
[23]
Fast millimeter wave beam alignment,
H. Hassaniehet al., “Fast millimeter wave beam alignment,” inProc. ACM SIGCOMM, 2018, p. 432–445
2018
-
[24]
Contextual combinatorial volatile multi-armed bandit with adaptive discretization,
A. Nika, S. Elahi, and C. Tekin, “Contextual combinatorial volatile multi-armed bandit with adaptive discretization,” inProc. AISTATS, 2020, pp. 1486–1496
2020
-
[25]
Compressive channel esti- mation and tracking for large arrays in mm-Wave picocells,
Z. Marzi, D. Ramasamy, and U. Madhow, “Compressive channel esti- mation and tracking for large arrays in mm-Wave picocells,”IEEE J. Sel. Topics Signal Process., vol. 10, no. 3, pp. 514–527, 2016
2016
-
[26]
Beam codebook based beamforming protocol for multi- Gbps millimeter-wave WPAN systems,
J. Wanget al., “Beam codebook based beamforming protocol for multi- Gbps millimeter-wave WPAN systems,”IEEE J. Sel. Areas Commun., vol. 27, no. 8, pp. 1390–1399, 2009
2009
-
[27]
Beamwidth optimization and resource partitioning scheme for localization assisted mm-Wave communication,
G. Ghataket al., “Beamwidth optimization and resource partitioning scheme for localization assisted mm-Wave communication,”IEEE Trans. Commun., vol. 69, no. 2, pp. 1358–1374, 2020
2020
-
[28]
Millimeter wave beam- selection using out-of-band spatial information,
A. Ali, N. Gonz ´alez-Prelcic, and R. W. Heath, “Millimeter wave beam- selection using out-of-band spatial information,”IEEE Trans. Wireless Commun., vol. 17, no. 2, pp. 1038–1052, 2017
2017
-
[29]
UB3: Fixed budget best beam identification in mmwave massive MISO via pure exploration unimodal bandits,
D. Ghosh, M. K. Hanawal, and N. Zlatanov, “UB3: Fixed budget best beam identification in mmwave massive MISO via pure exploration unimodal bandits,”IEEE Trans. Wireless Commun., vol. 23, no. 10, pp. 12 658–12 669, 2024
2024
-
[30]
Efficient beam alignment in millimeter wave systems using contextual bandits,
M. Hashemiet al., “Efficient beam alignment in millimeter wave systems using contextual bandits,” inProc. IEEE INFOCOM, 2018, pp. 2393– 2401
2018
-
[31]
MmWave codebook selection in rapidly-varying channels via multinomial Thompson sampling,
Y . Zhanget al., “MmWave codebook selection in rapidly-varying channels via multinomial Thompson sampling,” inProc. ACM MobiHoc, 2021, pp. 151–160
2021
-
[32]
Beam alignment for non-stationary environments using a novel time-varying structured bandit,
K. Min, H.-S. Park, and H.-S. Lee, “Beam alignment for non-stationary environments using a novel time-varying structured bandit,”IEEE Trans. Veh. Technol., 2024
2024
-
[33]
Best arm identification based beam acquisition in stationary and abruptly changing environments,
G. Ghatak, “Best arm identification based beam acquisition in stationary and abruptly changing environments,”IEEE Trans. Signal Process., vol. 72, pp. 670–685, 2024
2024
-
[34]
MAB dynamic beam zooming for mmWave alignment and tracking,
N. Blinn and M. Bloch, “MAB dynamic beam zooming for mmWave alignment and tracking,”IEEE Trans. Wireless Commun., vol. 24, no. 9, pp. 7908 – 7922, 2025
2025
-
[35]
Codebook-based hybrid precoding for millimeter wave multiuser systems,
S. Heet al., “Codebook-based hybrid precoding for millimeter wave multiuser systems,”IEEE Trans. Signal Process., vol. 65, no. 20, pp. 5289–5304, 2017
2017
-
[36]
mmwave indoor channel measurement campaign for 5g new radio indoor broadcasting,
H. Zhanget al., “mmwave indoor channel measurement campaign for 5g new radio indoor broadcasting,”IEEE Trans. Broadcast., vol. 68, no. 2, pp. 331–344, 2022
2022
-
[37]
60-ghz millimeter-wave channel measurements and mod- eling for indoor office environments,
X. Wuet al., “60-ghz millimeter-wave channel measurements and mod- eling for indoor office environments,”IEEE Trans. Antennas Propag., vol. 65, no. 4, pp. 1912–1924, 2017
1912
-
[38]
A statistical model for indoor multipath propagation,
A. A. M. Saleh and R. A. Valenzuela, “A statistical model for indoor multipath propagation,”IEEE J. Sel. Areas Commun., vol. 5, no. 2, pp. 128–137, 1987
1987
-
[39]
Efficient beam training and sparse channel estimation for millimeter wave communications under mobility,
S. H. Limet al., “Efficient beam training and sparse channel estimation for millimeter wave communications under mobility,”IEEE Trans. Commun., vol. 68, no. 10, pp. 6583–6596, Oct. 2020
2020
-
[40]
Low-complexity beam allocation for switched-beam based multiuser massive MIMO systems,
J. Wang, H. Zhu, L. Dai, N. J. Gomes, and J. Wang, “Low-complexity beam allocation for switched-beam based multiuser massive MIMO systems,”IEEE Trans. Wireless Commun., vol. 15, no. 12, pp. 8236– 8248, 2016
2016
-
[41]
Frequency reuse of beam allocation for multiuser massive MIMO systems,
J. Wang, H. Zhu, N. J. Gomes, and J. Wang, “Frequency reuse of beam allocation for multiuser massive MIMO systems,”IEEE Trans. Wireless Commun., vol. 17, no. 4, pp. 2346–2359, 2018
2018
-
[42]
Satisficing regret minimization in bandits,
Q. Feng, T. Ma, and R. Zhu, “Satisficing regret minimization in bandits,” inProc. ICLR, 2025. [Online]. Available: https: //openreview.net/forum?id=5WPQIVgWCg
2025
-
[43]
Stochastic and adversarial combinatorial bandits,
R. Combes, M. Lelarge, A. Prouti `ere, and M. S. Talebi, “Stochastic and adversarial combinatorial bandits,”CoRR, vol. abs/1502.03475, 2015. [Online]. Available: http://arxiv.org/abs/1502.03475
-
[44]
Tight lower bounds for combinatorial multi- armed bandits,
N. Merlis and S. Mannor, “Tight lower bounds for combinatorial multi- armed bandits,” inProc. COLT, J. Abernethy and S. Agarwal, Eds., vol. 125, 09–12 Jul 2020, pp. 2830–2857
2020
-
[45]
The Hungarian method for the assignment problem,
H. W. Kuhn, “The Hungarian method for the assignment problem,” Naval Research Logistics Quarterly, vol. 2, no. 1-2, pp. 83–97, 1955
1955
-
[46]
A quantitative measure of fairness and discrimination for resource allocation in shared computer systems,
R. Jain, D.-M. W. Chiu, and W. R. Hawe, “A quantitative measure of fairness and discrimination for resource allocation in shared computer systems,” Digital Equipment Corporation, Tech. Rep. DEC-TR-301, Sep. 1984
1984
-
[47]
Active learning in heteroscedastic noise,
A. Antos, V . Grover, and C. Szepesv ´ari, “Active learning in heteroscedastic noise,”Theoretical Computer Science, vol. 411, no. 29, pp. 2712–2728, 2010, algorithmic Learning Theory (ALT 2008). [Online]. Available: https://www.sciencedirect.com/science/article/pii/ S0304397510002021
2010
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.