Neural Logistic Bandits

Dabeen Lee; Seoungbin Bae

arxiv: 2505.02069 · v2 · pith:K3CXULI4new · submitted 2025-05-04 · 💻 cs.LG · stat.ML

Neural Logistic Bandits

Seoungbin Bae , Dabeen Lee This is my paper

classification 💻 cs.LG stat.ML

keywords kappawidetildedimensionneuraldependencelogisticbanditsdirect

0 comments

read the original abstract

We study the problem of neural logistic bandits, where the main task is to learn an unknown reward function within a logistic link function using a neural network. Existing approaches either exhibit unfavorable dependencies on $\kappa$, where $1/\kappa$ represents the minimum variance of reward distributions, or suffer from direct dependence on the feature dimension $d$, which can be huge in neural network-based settings. In this work, we introduce a novel Bernstein-type inequality for self-normalized vector-valued martingales that is designed to bypass a direct dependence on the ambient dimension. This lets us deduce a regret upper bound that grows with the effective dimension $\widetilde{d}$, not the feature dimension, while keeping a minimal dependence on $\kappa$. Based on the concentration inequality, we propose two algorithms, NeuralLog-UCB-1 and NeuralLog-UCB-2, that guarantee regret upper bounds of order $\widetilde{O}(\widetilde{d}\sqrt{\kappa T})$ and $\widetilde{O}(\widetilde{d}\sqrt{T/\kappa})$, respectively, improving on the existing results. Lastly, we report numerical results on both synthetic and real datasets to validate our theoretical findings.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Logistic Bandits with $\tilde{O}(\sqrt{dT})$ Regret without Context Diversity Assumptions
cs.LG 2026-04 unverdicted novelty 8.0

SupSplitLog achieves Õ(√(dT)) regret for logistic bandits without context diversity assumptions by splitting samples for an initial estimator and Newton correction, and can adapt to data-dependent bounds.
CASCADE: Case-Based Continual Adaptation for Large Language Models During Deployment
cs.AI 2026-05 unverdicted novelty 6.0

CASCADE enables LLMs to continually adapt at deployment via case-based episodic memory and contextual bandits, improving macro-averaged success by 20.9% over zero-shot on 16 tasks spanning medicine, law, code, and robotics.