pith. sign in

arxiv: 2605.29748 · v1 · pith:5M4CEJDEnew · submitted 2026-05-28 · 📊 stat.ML · cs.LG

Instance-dependent Stochastic Lipschitz bandit

classification 📊 stat.ML cs.LG
keywords lipschitzboundsleftlevelregretrightsetstilde
0
0 comments X
read the original abstract

We study the Lipschitz bandit problem, where a learner sequentially maximizes an unknown Lipschitz function $f$ over a domain $\mathcal{X} \subset [0,1]^d$ using noisy pointwise evaluations. Existing regret bounds are either worst-case, scaling as $\tilde{\Theta} \left ( T^{d+1/d+2}\right )$, or adaptive via the zooming dimension $d_z$, yielding $\tilde{\Theta} \left ( T^{d_z+1/d_z+2}\right )$. However, such zooming-based guarantees are only partially instance-dependent, as they depend solely on the asymptotic growth of near-optimal level sets and fail to capture finer structural properties of $f$. We provide an analysis and an algorithm that characterizes the regret through integrals of the suboptimality gap of $f$ over its level sets. This yields regret bounds that adapt to the local growth of level sets, rather than only their asymptotic behavior. As a corollary, when the set of maximizers has dimension $d^\star>0$, we obtain improved adaptive rates of order $\tilde{\mathcal{O}} \left ( T^{d_z+1 / \max(d_z,d^\star)+2}\right )$ strictly improving over classical zooming bounds in this regime. Finally, we extend our analysis to the full-information setting (Lipschitz experts) and show how some of the regularity assumptions can be relaxed.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.