Logistic Bandits with $\tilde{O}(\sqrt{dT})$ Regret without Context Diversity Assumptions

· 2026 · cs.LG · arXiv 2604.22161

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

We study the $K$-armed logistic bandit problem, where at each round, the agent observes $K$ feature vectors associated with $K$ actions. Existing approaches that achieve a rate-optimal $\tilde{\mathcal{O}}(\sqrt{dT})$ regret bound rely heavily on context diversity assumptions, such as strict positivity of the minimum eigenvalue of a context covariance matrix. These assumptions, however, impose strong restrictions on the context process, as they rule out the situation where the context vectors are concentrated in a low-dimensional subspace. In this paper, we propose SupSplitLog, which, to the best of our knowledge, is the first algorithm for logistic bandits that achieves $\tilde{\mathcal{O}}(\sqrt{dT})$ regret without any context diversity assumption. The key idea is to split the collected samples into two disjoint subsets when constructing estimators; one is used to compute an initial-point estimator, while the other is used to apply a Newton-type one-step correction procedure. The splitting rule is carefully designed to balance the accuracy requirements of the initial-point estimator and the one-step correction procedure. Moreover, SupSplitLog strictly improves on the existing algorithms in terms of the dependence on dimension $d$ in the regret upper bound. Furthermore, SupSplitLog can be adapted simply to deduce a regret bound that grows with a data-dependent complexity measure, avoiding a direct dependence on $d$, which is favorable when the context vectors are concentrated in a low-dimensional subspace. We also provide experimental results that demonstrate numerically the superiority of our algorithm, validating the theoretical results.

representative citing papers

Algorithm for Contextual Queueing Bandits with Rate-Optimal Queue Length Regret

cs.LG · 2026-06-08 · unverdicted · novelty 7.0

Presents CQB-η-2 algorithm achieving &#x1D4AÃ(T^{-1/2}) queue length regret in contextual queueing bandits under stochastic contexts, with matching Ω(T^{-1/2}) lower bound.

citing papers explorer

Showing 1 of 1 citing paper.

Algorithm for Contextual Queueing Bandits with Rate-Optimal Queue Length Regret cs.LG · 2026-06-08 · unverdicted · none · ref 2 · internal anchor
Presents CQB-η-2 algorithm achieving &#x1D4AÃ(T^{-1/2}) queue length regret in contextual queueing bandits under stochastic contexts, with matching Ω(T^{-1/2}) lower bound.

Logistic Bandits with $\tilde{O}(\sqrt{dT})$ Regret without Context Diversity Assumptions

fields

years

verdicts

representative citing papers

citing papers explorer