pith. machine review for the scientific record. sign in

arxiv: 1302.1611 · v2 · submitted 2013-02-06 · 🧮 math.ST · cs.LG· stat.ML· stat.TH

Recognition: unknown

Bounded regret in stochastic multi-armed bandits

Authors on Pith no claims yet
classification 🧮 math.ST cs.LGstat.MLstat.TH
keywords boundedregretdeltaknowslowermulti-armedonlypositive
0
0 comments X
read the original abstract

We study the stochastic multi-armed bandit problem when one knows the value $\mu^{(\star)}$ of an optimal arm, as a well as a positive lower bound on the smallest positive gap $\Delta$. We propose a new randomized policy that attains a regret {\em uniformly bounded over time} in this setting. We also prove several lower bounds, which show in particular that bounded regret is not possible if one only knows $\Delta$, and bounded regret of order $1/\Delta$ is not possible if one only knows $\mu^{(\star)}$

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.