lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits

· 2013 · stat.ML · arXiv 1312.7308

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

The paper proposes a novel upper confidence bound (UCB) procedure for identifying the arm with the largest mean in a multi-armed bandit game in the fixed confidence setting using a small number of total samples. The procedure cannot be improved in the sense that the number of samples required to identify the best arm is within a constant factor of a lower bound based on the law of the iterated logarithm (LIL). Inspired by the LIL, we construct our confidence bounds to explicitly account for the infinite time horizon of the algorithm. In addition, by using a novel stopping time for the algorithm we avoid a union bound over the arms that has been observed in other UCB-type algorithms. We prove that the algorithm is optimal up to constants and also show through simulations that it provides superior performance with respect to the state-of-the-art.

representative citing papers

Anytime-valid Optimal Policy Identification

stat.ME · 2026-06-16 · unverdicted · novelty 6.0

Constructs a time-indexed set S_t retaining the true optimal policy uniformly over time with high probability, enabling early stopping with sample complexity O((log |Π| + log log(1/Δ_min))/Δ_min²) when the optimum is unique.

citing papers explorer

Showing 1 of 1 citing paper.

Anytime-valid Optimal Policy Identification stat.ME · 2026-06-16 · unverdicted · none · ref 8 · internal anchor
Constructs a time-indexed set S_t retaining the true optimal policy uniformly over time with high probability, enabling early stopping with sample complexity O((log |Π| + log log(1/Δ_min))/Δ_min²) when the optimum is unique.

lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits

fields

years

verdicts

representative citing papers

citing papers explorer