Blending Proxy Metrics with a North Star

Winston Chou

read the original abstract

Proxy metrics are widely used to improve the precision and velocity of online experimentation (aka A/B testing). Although proxies are often motivated by long-term outcomes that the experimenter does not observe, in many settings they are used alongside a contemporaneous but statistically insensitive north star. This can lead to a practical dilemma: when should experimenters trust the proxy metric, and when should they trust the north star? In this paper, I propose an optimal blending approach that smoothly guides decision-making towards the north star as the power of the experiment increases and away from the north star as the quality of the proxy metric improves. I study the implications of this decision-making framework for the design of experiments and of experimentation programs. Equipped with better (worse) proxy metrics, experimenters should run smaller and more (larger and fewer) experiments. I show how to leverage past experiments to estimate optimal blending weights and experiment sizes. Lastly, I describe the real-world application of the methodology to an experimentation program at Netflix.

Blending Proxy Metrics with a North Star

discussion (0)