Multiclass Online Learnability under Bandit Feedback

Ambuj Tewari; Ananth Raman; Idan Mehalel; Unique Subedi; Vinod Raman

arxiv: 2308.04620 · v3 · pith:UTSVXWULnew · submitted 2023-08-08 · 💻 cs.LG · stat.ML

Multiclass Online Learnability under Bandit Feedback

Ananth Raman , Vinod Raman , Unique Subedi , Idan Mehalel , Ambuj Tewari This is my paper

classification 💻 cs.LG stat.ML

keywords banditonlinelearnabilitymulticlassdimensionevenfeedbackfull-information

0 comments

read the original abstract

We study online multiclass classification under bandit feedback. We extend the results of Daniely and Helbertal [2013] by showing that the finiteness of the Bandit Littlestone dimension is necessary and sufficient for bandit online learnability even when the label space is unbounded. Moreover, we show that, unlike the full-information setting, sequential uniform convergence is necessary but not sufficient for bandit online learnability. Our result complements the recent work by Hanneke, Moran, Raman, Subedi, and Tewari [2023] who show that the Littlestone dimension characterizes online multiclass learnability in the full-information setting even when the label space is unbounded.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

The Sample Complexity of Multiclass and Sparse Contextual Bandits
cs.LG 2026-05 unverdicted novelty 8.0

Algorithms and matching lower bounds for s-sparse contextual bandits yield Õ((s/ε² + |A|/ε) log |Π|/δ) samples to output an ε-optimal policy.