Recognition: unknown
On Kernelized Multi-armed Bandits
read the original abstract
We consider the stochastic bandit problem with a continuous set of arms, with the expected reward function over the arms assumed to be fixed but unknown. We provide two new Gaussian process-based algorithms for continuous bandit optimization-Improved GP-UCB (IGP-UCB) and GP-Thomson sampling (GP-TS), and derive corresponding regret bounds. Specifically, the bounds hold when the expected reward function belongs to the reproducing kernel Hilbert space (RKHS) that naturally corresponds to a Gaussian process kernel used as input by the algorithms. Along the way, we derive a new self-normalized concentration inequality for vector- valued martingales of arbitrary, possibly infinite, dimension. Finally, experimental evaluation and comparisons to existing algorithms on synthetic and real-world environments are carried out that highlight the favorable gains of the proposed strategies in many cases.
This paper has not been read by Pith yet.
Forward citations
Cited by 3 Pith papers
-
Continuous Semantic Caching for Low-Cost LLM Serving
Establishes the first rigorous framework for continuous semantic caching of LLM responses using ε-net discretization and kernel ridge regression, with sublinear regret bounds.
-
Ensemble Distributionally Robust Bayesian Optimisation
A tractable ensemble distributionally robust Bayesian optimization method achieves improved sublinear regret bounds under context uncertainty.
-
Robust Nonlinear System Identification in Reproducing Kernel Hilbert Spaces via Scenario Optimization
Finite-dimensional RKHS approximation via n-widths enables scenario optimization to deliver violation guarantees on nonlinear one-step predictors without a priori bounds on the true RKHS norm or Lipschitz constant.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.