Recognition: 2 theorem links
· Lean TheoremGeometry-Aware Multi-Armed Bandits for Antenna Beam Selection on Spheres, Tori, SO(3), and Reconfigurable Intelligent Surfaces
Pith reviewed 2026-05-14 18:46 UTC · model grok-4.3
The pith
Intrinsic Matérn kernels on spheres, tori and discrete tori cut cumulative regret in mmWave beam selection by 25 to 45 percent versus standard codebook methods.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
On four static 3GPP-style mmWave benchmarks the intrinsic-kernel GP-UCB reduces cumulative regret by 25-45 percent relative to codebook UCB1 and Thompson sampling and by an additional 10-33 percent relative to Euclidean-ambient GP-UCB on the toroidal arm spaces; AdaptiveGP-v2, which chooses window length W by per-sample marginal likelihood with variance and drift triggers, is statistically indistinguishable from the hand-tuned fixed-window oracle at every tested speed while eliminating the need for deployment-time per-speed calibration.
What carries the argument
The intrinsic-product Matérn kernel on the product manifold (sphere for steering, torus for phase shifts, SO(3) for orientation, or discrete torus for RIS phases), which directly encodes Riemannian distance into the GP covariance so that nearby beams on the manifold are treated as correlated rewards.
If this is right
- GP-UCB becomes computationally feasible for RIS with up to 10^90 discrete configurations through O(M) table-lookup kernel evaluation.
- The adaptive controller removes the requirement for offline per-mobility tuning while matching oracle regret.
- The regret reduction persists under 100 MHz OFDM frequency-selective fading, delivering approximately 32 Mbps per UE at initial access.
- The same intrinsic-kernel construction applies without change to steering on the sphere and panel orientation on SO(3).
- Statistical tests confirm the adaptive version crosses zero paired difference with the oracle after Holm-Bonferroni correction at all four speeds.
Where Pith is reading between the lines
- The Kronecker factorisation could be reused for other combinatorial bandits whose arms form a product of identical discrete spaces.
- Extending the same manifold kernel to joint beam and power allocation would turn the method into a geometry-aware contextual bandit for multi-user RIS systems.
- If the kernel length-scale is learned jointly with the window selector, the approach might automatically adapt to different carrier frequencies without manual retuning.
- Hardware validation on a real phased-array testbed would be the next concrete step to confirm that the simulated regret gains translate to over-the-air throughput.
Load-bearing premise
The intrinsic Matérn kernel accurately captures the true reward landscape on these manifolds and the per-sample marginal-likelihood window selector remains stable under Doppler-induced non-stationarity.
What would settle it
A head-to-head trial in which measured beam rewards on hardware deviate systematically from the manifold distances encoded by the intrinsic kernel, causing the geometry-aware GP-UCB to incur higher regret than a Euclidean GP-UCB on the same data.
Figures
read the original abstract
Beam alignment in mmWave phased arrays and RIS-assisted links is a stochastic bandit under both short TTI budgets and Doppler-induced non-stationarity. The arm space is a Riemannian manifold: $\sphere^2$ for steering, $\torus^n$ for phase combining, $\SO(3)$ for panel orientation, or the discrete torus $(\mathbb Z_B)^M$ with up to $K\!\sim\!10^{90}$ configurations for $B$-level RIS ($B\!=\!2^b$, $b$ bits/element); the intrinsic Mat\'ern kernel of Borovitskiy et al.\ provides the base GP. We contribute two algorithmic pieces. \textbf{(C1)} A Kronecker-factorised intrinsic-product Mat\'ern kernel on $(\mathbb Z_B)^M$ evaluating in $O(M)$ table lookups, making GP-UCB tractable at $K\sim 10^{90}$ where the extrinsic alternative is infeasible. \textbf{(C2)} AdaptiveGP-v2, an online sliding-window controller that selects $W$ by per-sample marginal likelihood, with predictive-variance and drift $z$-score reset triggers and a post-reset $\beta$-boost. On a four-speed ($v\!\in\!\{0.02,0.08,0.12,0.20\}$~km/h), $20$-seed paired campaign at $T\!=\!3000$, AdaptiveGP-v2 is statistically indistinguishable from the hand-tuned fixed-window oracle at every speed (Holm--Bonferroni-corrected paired differences cross zero); the operational benefit is the absence of a deployment-time per-speed calibration step, not a mean-regret improvement. On four static 3GPP-style mmWave benchmarks, intrinsic-kernel GP-UCB reduces cumulative regret by $25$--$45\%$ vs.\ codebook UCB1/Thompson and by $10$--$33\%$ vs.\ Euclidean-ambient GP-UCB on the toroidal arm spaces; a wideband OFDM ablation on a $100$~MHz channel confirms the advantage persists under frequency-selective fading ($\sim\!32$~Mbps/UE at initial access vs.\ UCB1). A third-party-simulator sanity check on Sionna CDL is reported in Section~V.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces geometry-aware multi-armed bandit methods for beam selection in mmWave systems by modeling arm spaces as Riemannian manifolds including spheres, tori, SO(3), and discrete tori for RIS configurations. It contributes a Kronecker-factorized intrinsic Matérn kernel for efficient GP-UCB on large discrete spaces and AdaptiveGP-v2, an adaptive sliding-window GP controller that selects window size W via per-sample marginal likelihood with reset triggers. Empirical results on static 3GPP benchmarks show 25-45% cumulative regret reduction versus codebook UCB1/Thompson and 10-33% versus Euclidean GP-UCB, while on dynamic Doppler scenarios at various speeds, AdaptiveGP-v2 is statistically indistinguishable from a hand-tuned fixed-window oracle.
Significance. If the results hold, this work could significantly impact beam alignment algorithms by leveraging intrinsic geometry for Gaussian process modeling in high-dimensional spaces, offering practical advantages in avoiding scenario-specific tuning for non-stationary channels. The tractable kernel for enormous RIS configuration spaces addresses a key scalability issue.
major comments (2)
- [§V] §V: The reported 25–45% regret reduction on toroidal arm spaces and statistical equivalence of AdaptiveGP-v2 to the oracle depend on the fidelity of the Borovitskiy intrinsic Matérn kernel to the mmWave reward landscapes; however, no ablation comparing intrinsic versus extrinsic kernels on held-out realizations is provided, leaving open whether the gains stem from geometry awareness.
- [Dynamic scenarios section] Dynamic scenarios section: The claim that per-sample marginal likelihood stably selects W under Doppler-induced non-stationarity lacks supporting simulation quantifying bias or drift in the likelihood estimates, which is load-bearing for the assertion that no per-speed calibration is needed.
minor comments (2)
- Full details on hyperparameter fitting procedures, including how Matérn variance and lengthscales are optimized, and the exact implementation of the Kronecker factorization should be expanded for reproducibility.
- [Abstract] The abstract mentions a third-party simulator sanity check but does not specify the exact metrics or comparisons performed in Section V.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and positive assessment of the work's potential impact. We address each major comment point-by-point below, proposing revisions where they strengthen the manuscript without misrepresenting our existing results.
read point-by-point responses
-
Referee: [§V] §V: The reported 25–45% regret reduction on toroidal arm spaces and statistical equivalence of AdaptiveGP-v2 to the oracle depend on the fidelity of the Borovitskiy intrinsic Matérn kernel to the mmWave reward landscapes; however, no ablation comparing intrinsic versus extrinsic kernels on held-out realizations is provided, leaving open whether the gains stem from geometry awareness.
Authors: We thank the referee for this observation. Section V already reports that intrinsic-kernel GP-UCB yields 10–33% lower cumulative regret than Euclidean-ambient GP-UCB on the toroidal spaces, which directly compares the intrinsic Matérn kernel against its extrinsic counterpart. We nevertheless agree that an explicit ablation on held-out channel realizations would isolate the geometry-awareness contribution more clearly. We will add this ablation study, reporting regret and likelihood metrics on held-out 3GPP-style realizations, in the revised manuscript. revision: yes
-
Referee: [Dynamic scenarios section] Dynamic scenarios section: The claim that per-sample marginal likelihood stably selects W under Doppler-induced non-stationarity lacks supporting simulation quantifying bias or drift in the likelihood estimates, which is load-bearing for the assertion that no per-speed calibration is needed.
Authors: We appreciate the referee highlighting this point. The current results demonstrate that AdaptiveGP-v2 remains statistically indistinguishable from the hand-tuned oracle across the four Doppler speeds (Holm–Bonferroni corrected paired tests), supporting the no-calibration claim. To directly address potential bias or drift in the marginal-likelihood estimates, we will add targeted simulations in the dynamic scenarios section that track the evolution of the likelihood values and any systematic drift under the same non-stationary channels. revision: yes
Circularity Check
No circularity; algorithmic contributions and empirical comparisons are independent of self-defined quantities
full rationale
The paper introduces two explicit algorithmic contributions: a Kronecker-factorized intrinsic-product Matérn kernel for tractability on discrete tori (C1) and AdaptiveGP-v2 with per-sample marginal-likelihood window selection (C2). Both are presented as new constructions rather than derivations that reduce to fitted inputs. Performance claims rest on direct comparisons to named external baselines (UCB1, Thompson, Euclidean GP-UCB) on 3GPP-style benchmarks and Sionna CDL sanity checks. The base kernel is cited to Borovitskiy et al. (external prior work), not self-citation. No equation equates a claimed prediction to a parameter fitted from the same data by construction, and no uniqueness theorem or ansatz is smuggled via self-reference. The derivation chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (2)
- Matérn kernel variance and lengthscales
- GP-UCB beta parameter
axioms (2)
- domain assumption The beam-selection reward function on the manifold admits a Gaussian-process representation with the intrinsic Matérn kernel.
- domain assumption Non-stationarity induced by Doppler can be adequately captured by a sliding-window GP whose size is chosen by per-sample marginal likelihood.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
intrinsic Matérn kernel of Borovitskiy et al. provides the base GP... Kronecker-factorised intrinsic-product Matérn kernel on (Z_B)^M
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
GP-UCB... Bayesian cumulative regret E[R_T] = Õ(√T γ_T) with γ_T = Õ(T^{d/(2ν+d)})
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
RISA: Simulated annealing-based algorithm for RIS adjustment in time-varying channels,
I. Burtakov, A. A. Kureev, and E. M. Khorov, “RISA: Simulated annealing-based algorithm for RIS adjustment in time-varying channels,” IEEE Wireless Communications Letters, vol. 15, pp. 600–604, 2026
2026
-
[2]
Conditional-sample-mean bandits for fast beam training in reconfigurable intelligent surfaces,
H. Ren, J. Liu, and S. Cui, “Conditional-sample-mean bandits for fast beam training in reconfigurable intelligent surfaces,”IEEE Transactions on Wireless Communications, vol. 21, no. 12, pp. 10 312–10 326, Dec. 2022
2022
-
[3]
T. Chenet al., “Model-free optimization and experimental validation of RIS-assisted wireless communications under rich multipath fading,” IEEE Wireless Communications Letters, 2024, arXiv:2302.10561
-
[4]
Multi-armed bandits in metric spaces,
R. Kleinberg, A. Slivkins, and E. Upfal, “Multi-armed bandits in metric spaces,” inProc. ACM Symp. Theory of Computing (STOC), 2008
2008
-
[5]
X-armed bandits,
S. Bubeck, R. Munos, G. Stoltz, and C. Szepesvári, “X-armed bandits,” Journal of Machine Learning Research, vol. 12, pp. 1655–1695, 2011
2011
-
[6]
Lipschitz bandits: Regret lower bound and optimal algorithms,
S. Magureanu, R. Combes, and A. Proutiere, “Lipschitz bandits: Regret lower bound and optimal algorithms,” inProc. Conf. on Learning Theory (COLT), 2014
2014
-
[7]
Matérn Gaussian processes on Riemannian manifolds,
V . Borovitskiy, A. Terenin, P. Mostowsky, and M. P. Deisenroth, “Matérn Gaussian processes on Riemannian manifolds,” inAdvances in Neural Information Processing Systems (NeurIPS), 2020
2020
-
[8]
Stationary kernels and Gaussian processes on Lie groups and their homogeneous spaces I: the compact case,
I. Azangulov, A. Smolensky, A. Terenin, and V . Borovitskiy, “Stationary kernels and Gaussian processes on Lie groups and their homogeneous spaces I: the compact case,”Journal of Machine Learning Research, vol. 25, 2024
2024
-
[9]
P. Mostowsky, V . Dutordoir, I. Azangulov, N. Jaquier, M. Hutchinson, A. Ravuri, L. Rozo, A. Terenin, and V . Borovitskiy, “The Geomet- ricKernels package: Heat and Matérn kernels for geometric learning on manifolds, meshes, and graphs,”arXiv preprint arXiv:2407.08086, 2024
-
[10]
Bayesian optimization meets Riemannian manifolds in robot learning,
N. Jaquier, L. Rozo, D. G. Caldwell, and S. Calinon, “Bayesian optimization meets Riemannian manifolds in robot learning,” inProc. Conf. on Robot Learning (CoRL), 2020
2020
-
[11]
Geometry-aware Bayesian optimization in robotics using Riemannian Matérn kernels,
N. Jaquier, V . Borovitskiy, A. Smolensky, A. Terenin, T. Asfour, and L. Rozo, “Geometry-aware Bayesian optimization in robotics using Riemannian Matérn kernels,” inProc. Conf. on Robot Learning (CoRL), 2022
2022
-
[12]
Gaussian process optimization in the bandit setting: no regret and experimental design,
N. Srinivas, A. Krause, S. M. Kakade, and M. Seeger, “Gaussian process optimization in the bandit setting: no regret and experimental design,” inProc. Int. Conf. on Machine Learning (ICML), 2010
2010
-
[13]
Efficient beam alignment in millimeter wave systems using contextual bandits,
V . Va, T. Shimizu, G. Bansal, and R. W. Heath, “Efficient beam alignment in millimeter wave systems using contextual bandits,”Proc. IEEE Int. Conf. Computer Communications (INFOCOM), 2018
2018
-
[14]
UB3: Best beam identifica- tion in millimeter wave systems via pure exploration unimodal bandits,
D. Ghosh, M. K. Hanawal, and N. Zlatanov, “UB3: Best beam identifica- tion in millimeter wave systems via pure exploration unimodal bandits,” IEEE Transactions on Wireless Communications, 2024
2024
-
[15]
Physics-informed para- metric bandits for beam alignment in mmWave communications,
H. Qin, T. Duong, M. F. Li, and C. Zhang, “Physics-informed para- metric bandits for beam alignment in mmWave communications,”arXiv preprint arXiv:2510.18299, 2025
-
[16]
X. He and M. Tsukada, “Beam-aware kernelized contextual bandits for user association and beamforming in mmWave vehicular networks,” arXiv preprint arXiv:2603.19285, 2026
-
[17]
Beam align- ment for mmWave and THz: A systematic review,
S. Madhekwana, M. Usman, A. Ayyub, and C. Politis, “Beam align- ment for mmWave and THz: A systematic review,”Telecommunication Systems, vol. 88, no. 3, 2025
2025
-
[18]
Hierarchical beam alignment for millimeter-wave communication systems: A deep learning approach,
J. Yang, W. Zhu, M. Tao, and S. Sun, “Hierarchical beam alignment for millimeter-wave communication systems: A deep learning approach,” IEEE Transactions on Wireless Communications, vol. 23, no. 4, pp. 3541–3556, Apr. 2024
2024
-
[19]
Grid-free MIMO beam alignment through site-specific deep learning,
Y . Heng and J. G. Andrews, “Grid-free MIMO beam alignment through site-specific deep learning,”IEEE Transactions on Wireless Communi- cations, vol. 23, no. 2, pp. 908–921, Feb. 2024, arXiv:2209.08198
-
[20]
Deep reinforcement learning for mmWave initial beam alignment,
S. Aboagye, H. Saeidi, H. H. Ngo, H. V . Poor, and W. Saad, “Deep reinforcement learning for mmWave initial beam alignment,” inProc. IEEE Vehicular Technology Conference (VTC) Spring, 2023
2023
-
[21]
Deep reinforcement learning-based mmWave beam alignment for V2I commu- nications,
Y . Qiao, Y . Niu, L. Su, S. Mao, N. Wang, Z. Zhong, and B. Ai, “Deep reinforcement learning-based mmWave beam alignment for V2I commu- nications,”IEEE Transactions on Machine Learning in Communications and Networking, 2024
2024
-
[22]
Reconfigurable intelligent surface-aided wireless communications: Adaptive beamforming and experimental val- idations,
C. Huang, Z. Yang, G. C. Alexandropoulos, K. Xiong, L. Wei, C. Yuen, Z. Zhang, and M. Debbah, “Reconfigurable intelligent surface-aided wireless communications: Adaptive beamforming and experimental val- idations,”IEEE Access, vol. 9, pp. 154 728–154 742, 2021
2021
-
[23]
M. Nerini, S. Shen, H. Li, and B. Clerckx, “Beyond diagonal reconfig- urable intelligent surfaces utilizing graph theory: Modeling, architecture design, and optimization,”IEEE Transactions on Wireless Communica- tions, 2024, arXiv:2305.05013
-
[24]
Reconfigurable intelligent surface deployment for wideband millimeter wave systems,
X. Mo, L. Gui, K. Ying, X. Sang, and X. Diao, “Reconfigurable intelligent surface deployment for wideband millimeter wave systems,” IEEE Transactions on Communications, 2024, arXiv:2312.16768
-
[25]
B. Saglam, D. Gurgunoglu, and S. S. Kozat, “Deep reinforcement learn- ing based joint downlink beamforming and RIS configuration in RIS- aided MU-MISO systems under hardware impairments and imperfect CSI,” inProc. IEEE Int. Conf. Communications (ICC) Workshops, 2023, pp. 66–72, arXiv:2211.09702
-
[26]
Reconfigurable intelligent surface-aided millimetre wave communications utilizing two-phase minimax optimal stochastic strategy bandit,
E. M. Mohamed, S. Hashima, N. Anjum, K. Hatano, W. El-Shafai, and B. M. Elhalawany, “Reconfigurable intelligent surface-aided millimetre wave communications utilizing two-phase minimax optimal stochastic strategy bandit,”IET Communications, vol. 16, no. 18, pp. 2200–2207, 2022
2022
-
[27]
Wendland,Scattered Data Approximation, ser
H. Wendland,Scattered Data Approximation, ser. Cambridge Mono- graphs on Applied and Computational Mathematics. Cambridge University Press, 2004, vol. 17
2004
-
[28]
R. J. Adler and J. E. Taylor,Random Fields and Geometry. Springer, 2007, section 1.4: continuity of Gaussian random fields with continuous covariance
2007
-
[29]
On information gain and regret bounds in Gaussian process bandits,
S. Vakili, K. Khezeli, and V . Picheny, “On information gain and regret bounds in Gaussian process bandits,” inProc. Int. Conf. on Artificial Intelligence and Statistics (AISTATS), 2021
2021
-
[30]
Manifold-aware information gain and lower bounds for Gaussian-process bandits on Riemannian quotient spaces,
Y . Dorn, “Manifold-aware information gain and lower bounds for Gaussian-process bandits on Riemannian quotient spaces,” 2026, companion theory paper to the present work. The arXiv identifier 2605.XXXXX is a placeholder to be replaced with the assigned ID upon arXiv submission
2026
-
[31]
S. Iwazaki, “Tighter regret lower bound for Gaussian process ban- dits with squared exponential kernel in hypersphere,”arXiv preprint arXiv:2602.17940, 2026
-
[32]
On kernelized multi-armed bandits,
S. R. Chowdhury and A. Gopalan, “On kernelized multi-armed bandits,” inProc. Int. Conf. on Machine Learning (ICML), 2017, IGP-UCB: fre- quentist regret bound for GP-UCB withfin the RKHS, no assumption thatfis a GP sample
2017
-
[33]
Near-optimal sensor placements in Gaussian processes: theory, efficient algorithms and empirical stud- ies,
A. Krause, A. Singh, and C. Guestrin, “Near-optimal sensor placements in Gaussian processes: theory, efficient algorithms and empirical stud- ies,”Journal of Machine Learning Research, vol. 9, pp. 235–284, 2008
2008
-
[34]
Stochastic multi-armed-bandit problem with non-stationary rewards,
O. Besbes, Y . Gur, and A. Zeevi, “Stochastic multi-armed-bandit problem with non-stationary rewards,”Advances in Neural Information Processing Systems, vol. 27, 2014
2014
-
[35]
Weighted Gaussian process bandits for non-stationary environments,
Y . Deng, X. Zhou, M. Kamgarpour, and T. Stathaki, “Weighted Gaussian process bandits for non-stationary environments,” inProc. Int. Conf. Artificial Intelligence and Statistics (AISTATS), 2022, arXiv:2107.02371
-
[36]
Multi-armed bandit dynamic beam zoom- ing for mmWave alignment and tracking,
N. Blinn and M. Bloch, “Multi-armed bandit dynamic beam zoom- ing for mmWave alignment and tracking,”IEEE Transactions on Wireless Communications, vol. 24, no. 6, pp. 5042–5056, Jun. 2025, arXiv:2209.02896
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.