SIAM journal on computing , volume=

The nonstochastic multiarmed bandit problem , author= · 2002

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

browse 5 citing papers

representative citing papers

Online Learning-to-Defer with Varying Experts

stat.ML · 2026-05-12 · unverdicted · novelty 8.0

Presents the first online learning-to-defer algorithm with regret bounds O((n + n_e) T^{2/3}) generally and O((n + n_e) sqrt(T)) under low noise for multiclass classification with varying experts.

Near-Optimal Last-Iterate Convergence for Zero-Sum Games with Bandit Feedback and Opponent Actions

cs.LG · 2026-05-10 · unverdicted · novelty 8.0

With opponent-action feedback in zero-sum games, an efficient algorithm achieves near-optimal t^{-1/2} last-iterate convergence in duality gap with high probability.

Toward Optimal Regret in Robust Pricing: Decoupling Corruption and Time

cs.LG · 2026-05-08 · unverdicted · novelty 8.0

A robust variant of binary search achieves regret O(C + log T) for dynamic pricing with known corruption C and O(C + log² T) when unknown.

Fast Rates for Offline Contextual Bandits with Forward-KL Regularization under Single-Policy Concentrability

cs.LG · 2026-05-09 · unverdicted · novelty 7.0

The paper establishes the first tilde O(epsilon^{-1}) upper bounds and matching lower bounds for forward-KL-regularized offline contextual bandits under single-policy concentrability in both tabular and general function approximation settings.

On Characterizing Learnability for Adversarial Noisy Bandits

cs.LG · 2026-05-09 · unverdicted · novelty 7.0

Learnability of adversarial noisy bandits is characterized by the convexified generalized maximin volume for oblivious adversaries and for adaptive adversaries when the arm space is countable.

citing papers explorer

Showing 5 of 5 citing papers.

Online Learning-to-Defer with Varying Experts stat.ML · 2026-05-12 · unverdicted · none · ref 145
Presents the first online learning-to-defer algorithm with regret bounds O((n + n_e) T^{2/3}) generally and O((n + n_e) sqrt(T)) under low noise for multiclass classification with varying experts.
Near-Optimal Last-Iterate Convergence for Zero-Sum Games with Bandit Feedback and Opponent Actions cs.LG · 2026-05-10 · unverdicted · none · ref 168
With opponent-action feedback in zero-sum games, an efficient algorithm achieves near-optimal t^{-1/2} last-iterate convergence in duality gap with high probability.
Toward Optimal Regret in Robust Pricing: Decoupling Corruption and Time cs.LG · 2026-05-08 · unverdicted · none · ref 73
A robust variant of binary search achieves regret O(C + log T) for dynamic pricing with known corruption C and O(C + log² T) when unknown.
Fast Rates for Offline Contextual Bandits with Forward-KL Regularization under Single-Policy Concentrability cs.LG · 2026-05-09 · unverdicted · none · ref 64
The paper establishes the first tilde O(epsilon^{-1}) upper bounds and matching lower bounds for forward-KL-regularized offline contextual bandits under single-policy concentrability in both tabular and general function approximation settings.
On Characterizing Learnability for Adversarial Noisy Bandits cs.LG · 2026-05-09 · unverdicted · none · ref 6
Learnability of adversarial noisy bandits is characterized by the convexified generalized maximin volume for oblivious adversaries and for adaptive adversaries when the arm space is countable.

SIAM journal on computing , volume=

fields

years

verdicts

representative citing papers

citing papers explorer