A Survey on Practical Applications of Multi-Armed and Contextual Bandits

Djallel Bouneffouf; Irina Rish

arxiv: 1904.10040 · v1 · pith:AWBNUBAHnew · submitted 2019-04-02 · 💻 cs.LG · stat.ML

A Survey on Practical Applications of Multi-Armed and Contextual Bandits

Djallel Bouneffouf , Irina Rish This is my paper

classification 💻 cs.LG stat.ML

keywords applicationsbanditmulti-armedfieldpracticalproblemrecentvarious

0 comments

read the original abstract

In recent years, multi-armed bandit (MAB) framework has attracted a lot of attention in various applications, from recommender systems and information retrieval to healthcare and finance, due to its stellar performance combined with certain attractive properties, such as learning from less feedback. The multi-armed bandit field is currently flourishing, as novel problem settings and algorithms motivated by various practical applications are being introduced, building on top of the classical bandit problem. This article aims to provide a comprehensive review of top recent developments in multiple real-life applications of the multi-armed bandit. Specifically, we introduce a taxonomy of common MAB-based applications and summarize state-of-art for each of those domains. Furthermore, we identify important current trends and provide new perspectives pertaining to the future of this exciting and fast-growing field.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Active Context Selection Improves Simple Regret in Contextual Bandits
cs.LG 2026-05 accept novelty 7.0

Active sampling with allocation q_j proportional to p_j to the 2/3 achieves tight regret sqrt(n/T) times norm of p to the 2/3 for known context distribution p, with improvement up to Theta(k to the 1/4) over passive sampling.
Beyond Static Bias: Adaptive Multi-Fidelity Bandits with Improving Proxies
cs.LG 2026-05 unverdicted novelty 7.0

TACC algorithm for adaptive multi-fidelity bandits with improving proxies achieves instance-dependent regret by replacing logarithmic high-fidelity pulls with bounded low-fidelity continuation for intermediate arms.
A Review of Causal Decision Making
stat.ML 2025-02 unverdicted novelty 2.0

A review that organizes causal decision making into three stages and consolidates methods into an open Python collection.