KLinf-UCB is extended to nonparametric rewards with asymptotic expected-regret optimality and a tight upper bound on regret tail probability that recovers and matches prior results for bounded and finite-support cases.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 3representative citing papers
A framework defining new causal estimands for adaptive designs and using TMLE to enable online selection among designs, including surrogate-guided ones, while handling data dependence.
Applies instrumental regression and GMM to learn contracts under moral hazard in multitasking principal-agent problems and characterizes uniformity of optimal contract shapes.
citing papers explorer
-
Regret Tail Characterization of Optimal Bandit Algorithms with Generic Rewards
KLinf-UCB is extended to nonparametric rewards with asymptotic expected-regret optimality and a tight upper bound on regret tail probability that recovers and matches prior results for bounded and finite-support cases.
-
An Online Meta-Level Adaptive Design Framework with Targeted Learning Inference: Applications to Evaluating and Utilizing Surrogate Outcomes in Adaptive Designs
A framework defining new causal estimands for adaptive designs and using TMLE to enable online selection among designs, including surrogate-guided ones, while handling data dependence.
-
Learning Under Moral Hazard with Instrumental Regression and Generalized Method of Moments
Applies instrumental regression and GMM to learn contracts under moral hazard in multitasking principal-agent problems and characterizes uniformity of optimal contract shapes.