The paper establishes the first tilde O(epsilon^{-1}) upper bounds and matching lower bounds for forward-KL-regularized offline contextual bandits under single-policy concentrability in both tabular and general function approximation settings.
arXiv preprint arXiv:2312.16730 , year=
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
A modular reduction from budget-constrained contextual bandits with adversarial contexts to unconstrained bandits via surrogate rewards, yielding improved guarantees and an efficient algorithm based on SquareCB.
citing papers explorer
-
Fast Rates for Offline Contextual Bandits with Forward-KL Regularization under Single-Policy Concentrability
The paper establishes the first tilde O(epsilon^{-1}) upper bounds and matching lower bounds for forward-KL-regularized offline contextual bandits under single-policy concentrability in both tabular and general function approximation settings.
-
Constrained Contextual Bandits with Adversarial Contexts
A modular reduction from budget-constrained contextual bandits with adversarial contexts to unconstrained bandits via surrogate rewards, yielding improved guarantees and an efficient algorithm based on SquareCB.