Recognition: unknown
Measuring Average Treatment Effect from Heavy-tailed Data
read the original abstract
Heavy-tailed metrics are common and often critical to product evaluation in the online world. While we may have samples large enough for Central Limit Theorem to kick in, experimentation is challenging due to the wide confidence interval of estimation. We demonstrate the pressure by running A/A simulations with customer spending data from a large-scale Ecommerce site. Solutions are then explored. On one front we address the heavy tail directly and highlight the often ignored nuances of winsorization. In particular, the legitimacy of false positive rate could be at risk. We are further inspired by the idea of robust statistics and introduce Huber regression as a better way to measure treatment effect. On another front covariates from pre-experiment period are exploited. Although they are independent to assignment and potentially explain the variation of response well, concerns are that models are learned against prediction error rather than the bias of parameter. We find the framework of orthogonal learning useful, matching not raw observations but residuals from two predictions, one towards the response and the other towards the assignment. Robust regression is readily integrated, together with cross-fitting. The final design is proven highly effective in driving down variance at the same time controlling bias. It is empowering our daily practice and hopefully can also benefit other applications in the industry.
This paper has not been read by Pith yet.
Forward citations
Cited by 1 Pith paper
-
Guitar Nebula: extreme accelerator in extreme environment
The Guitar Nebula requires extreme acceleration with η_acc ≳ 3/4 and traverses a dense low-ionization shell from an old supernova remnant in the pressure-driven snowplow regime.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.