pith. sign in

arxiv: 2505.02069 · v2 · pith:K3CXULI4new · submitted 2025-05-04 · 💻 cs.LG · stat.ML

Neural Logistic Bandits

classification 💻 cs.LG stat.ML
keywords kappawidetildedimensionneuraldependencelogisticbanditsdirect
0
0 comments X
read the original abstract

We study the problem of neural logistic bandits, where the main task is to learn an unknown reward function within a logistic link function using a neural network. Existing approaches either exhibit unfavorable dependencies on $\kappa$, where $1/\kappa$ represents the minimum variance of reward distributions, or suffer from direct dependence on the feature dimension $d$, which can be huge in neural network-based settings. In this work, we introduce a novel Bernstein-type inequality for self-normalized vector-valued martingales that is designed to bypass a direct dependence on the ambient dimension. This lets us deduce a regret upper bound that grows with the effective dimension $\widetilde{d}$, not the feature dimension, while keeping a minimal dependence on $\kappa$. Based on the concentration inequality, we propose two algorithms, NeuralLog-UCB-1 and NeuralLog-UCB-2, that guarantee regret upper bounds of order $\widetilde{O}(\widetilde{d}\sqrt{\kappa T})$ and $\widetilde{O}(\widetilde{d}\sqrt{T/\kappa})$, respectively, improving on the existing results. Lastly, we report numerical results on both synthetic and real datasets to validate our theoretical findings.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Logistic Bandits with $\tilde{O}(\sqrt{dT})$ Regret without Context Diversity Assumptions

    cs.LG 2026-04 unverdicted novelty 8.0

    SupSplitLog achieves Õ(√(dT)) regret for logistic bandits without context diversity assumptions by splitting samples for an initial estimator and Newton correction, and can adapt to data-dependent bounds.

  2. CASCADE: Case-Based Continual Adaptation for Large Language Models During Deployment

    cs.AI 2026-05 unverdicted novelty 6.0

    CASCADE enables LLMs to continually adapt at deployment via case-based episodic memory and contextual bandits, improving macro-averaged success by 20.9% over zero-shot on 16 tasks spanning medicine, law, code, and robotics.