Phased exploration-exploitation algorithms for sparse stochastic linear bandits achieve Õ(d√T) regret for Euclidean balls and α-regret bounds of Õ(d√T) or Õ(d T^{2/3}) for general convex sets using greedy approximation.
Combinatorial multi-armed bandit with general reward functions
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Learning to Sparsify Stochastic Linear Bandits
Phased exploration-exploitation algorithms for sparse stochastic linear bandits achieve Õ(d√T) regret for Euclidean balls and α-regret bounds of Õ(d√T) or Õ(d T^{2/3}) for general convex sets using greedy approximation.