On Kernelized Multi-armed Bandits

Sayak Ray Chowdhury , Aditya Gopalan

Authors on Pith no claims yet

classification 💻 cs.LG

keywords algorithmsarmsbanditboundscontinuousderiveexpectedfunction

read the original abstract

We consider the stochastic bandit problem with a continuous set of arms, with the expected reward function over the arms assumed to be fixed but unknown. We provide two new Gaussian process-based algorithms for continuous bandit optimization-Improved GP-UCB (IGP-UCB) and GP-Thomson sampling (GP-TS), and derive corresponding regret bounds. Specifically, the bounds hold when the expected reward function belongs to the reproducing kernel Hilbert space (RKHS) that naturally corresponds to a Gaussian process kernel used as input by the algorithms. Along the way, we derive a new self-normalized concentration inequality for vector- valued martingales of arbitrary, possibly infinite, dimension. Finally, experimental evaluation and comparisons to existing algorithms on synthetic and real-world environments are carried out that highlight the favorable gains of the proposed strategies in many cases.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Continuous Semantic Caching for Low-Cost LLM Serving
cs.LG 2026-04 unverdicted novelty 7.0

Establishes the first rigorous framework for continuous semantic caching of LLM responses using ε-net discretization and kernel ridge regression, with sublinear regret bounds.
Ensemble Distributionally Robust Bayesian Optimisation
cs.LG 2026-05 unverdicted novelty 6.0

A tractable ensemble distributionally robust Bayesian optimization method achieves improved sublinear regret bounds under context uncertainty.
Robust Nonlinear System Identification in Reproducing Kernel Hilbert Spaces via Scenario Optimization
eess.SY 2026-04 unverdicted novelty 6.0

Finite-dimensional RKHS approximation via n-widths enables scenario optimization to deliver violation guarantees on nonlinear one-step predictors without a priori bounds on the true RKHS norm or Lipschitz constant.