Doubly robust off-policy evaluation with shrinkage

Akshay Krishnamurthy; Maria Dimakopoulou; Miroslav Dud\'ik; Yi Su

arxiv: 1907.09623 · v2 · pith:74OSH2NSnew · submitted 2019-07-22 · 💻 cs.LG · stat.ML

Doubly robust off-policy evaluation with shrinkage

Yi Su , Maria Dimakopoulou , Akshay Krishnamurthy , Miroslav Dud\'ik This is my paper

classification 💻 cs.LG stat.ML

keywords estimatorestimatorscombinatorialdoublyevaluationframeworkoff-policyrobust

0 comments

read the original abstract

We propose a new framework for designing estimators for off-policy evaluation in contextual bandits. Our approach is based on the asymptotically optimal doubly robust estimator, but we shrink the importance weights to minimize a bound on the mean squared error, which results in a better bias-variance tradeoff in finite samples. We use this optimization-based framework to obtain three estimators: (a) a weight-clipping estimator, (b) a new weight-shrinkage estimator, and (c) the first shrinkage-based estimator for combinatorial action sets. Extensive experiments in both standard and combinatorial bandit benchmark problems show that our estimators are highly adaptive and typically outperform state-of-the-art methods.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

A Review of Causal Decision Making
stat.ML 2025-02 unverdicted novelty 2.0

A review that organizes causal decision making into three stages and consolidates methods into an open Python collection.