Choosing Online Experiment Designs under Interference in Ads, Recommendations, and Member-Experience Systems
Pith reviewed 2026-06-29 23:26 UTC · model grok-4.3
The pith
A selector ranks experiment designs by their worst-case planning risk when interference mechanisms remain unknown.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Given a finite catalog of six implementable designs, the selector compares each design by worst-case planning risk over an ambiguity set. The risk combines exposure bias, assignment-unit variance, minimum detectable effect, contamination or carryover, operational cost, and estimand mismatch. Design bias is bounded by Wasserstein distance to the launch exposure distribution, and this penalty is minimax tight under Lipschitz exposure response. The paper also proves finite-catalog approximation and a robust selector theorem with excess-risk control, exact recovery under separation, and certified shortlists when the risk surface is flat.
What carries the argument
The robust design selector that evaluates each candidate by its worst-case planning risk over an ambiguity set of exposure mechanisms, with a geometry-aware bound via Wasserstein distance.
If this is right
- Design bias is bounded by Wasserstein distance to the launch exposure distribution.
- The bound is minimax tight under Lipschitz exposure response.
- The selector achieves excess-risk control and exact recovery under separation.
- Certified shortlists are produced when the risk surface is flat.
- Different designs are selected on samples from Criteo, Open Bandit, and KuaiRand datasets.
Where Pith is reading between the lines
- Historical logging data could be used to refine the ambiguity set and produce sharper design rankings.
- The selector could be extended to sequential re-selection as exposure observations accumulate during the experiment.
- Analogous robust selection may apply to policy experiments in networked economic or supply-chain settings.
- If exposure responses in practice satisfy the Lipschitz condition, the tightness result would directly limit excess risk.
Load-bearing premise
The true exposure mechanism at launch lies inside the ambiguity set used to compute the worst-case planning risk for each design.
What would settle it
Observe a chosen design's realized bias when the actual launch exposure distribution lies outside the ambiguity set and check whether the observed bias exceeds the Wasserstein-derived bound.
Figures
read the original abstract
Online experiments in ads, recommendation, and member-experience systems are often planned before the dominant interference mechanism is known. A treatment may propagate through budgets, inventory, producer exposure, graph spillovers, or temporal carryover, making the randomization design itself a statistical decision. We formulate this problem as robust design selection over uncertain exposure mechanisms. Given a finite catalog of six implementable designs, the selector compares each design by worst-case planning risk over an ambiguity set. The risk combines exposure bias, assignment-unit variance, minimum detectable effect, contamination or carryover, operational cost, and estimand mismatch. For theoretical justification, the paper develops a geometry-aware guarantee, stating that design bias is bounded by Wasserstein distance to the launch exposure distribution, and this penalty is minimax tight under Lipschitz exposure response. We also prove finite-catalog approximation and a robust selector theorem with excess-risk control, exact recovery under separation, and certified shortlists when the risk surface is flat. Empirically, the same selector gives different recommendations across samples from public datasets. It selects user-randomization on Criteo ads with dimensionless robust risk 1.295, switchbacks on Open Bandit-bts/men with risk 2.105, and cluster-randomization on KuaiRand with risk 2.240. The Open Bandit case stresses known but uneven logging support, with propensities from 0.00006 to 0.594 and a 5.17% IPS effective-sample share. Overall, the paper contributes an interference-aware experiment design framework based on mechanism-robust design decisions, where the output is either a justified design choice or an uncertainty shortlist.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper formulates online experiment design selection under uncertain interference as a robust optimization problem over an ambiguity set of exposure mechanisms. Given a finite catalog of six designs, a selector ranks them by worst-case planning risk that aggregates exposure bias, assignment variance, minimum detectable effect, contamination/carryover, operational cost, and estimand mismatch. Theoretical results include a Wasserstein-distance bound on design bias that is minimax-tight under Lipschitz exposure response, plus a robust selector theorem establishing finite-catalog approximation, excess-risk control, exact recovery under separation, and certified shortlists on flat risk surfaces. Empirical application to Criteo, Open Bandit, and KuaiRand datasets yields design-specific recommendations (user randomization on Criteo with robust risk 1.295; switchbacks on Open Bandit with risk 2.105; cluster randomization on KuaiRand with risk 2.240).
Significance. If the stated guarantees hold inside the user-specified ambiguity set, the framework supplies a principled, geometry-aware method for choosing among implementable designs when the dominant interference channel is unknown at planning time. The combination of Wasserstein bias bounds, minimax tightness, and excess-risk control is a clear technical contribution; the empirical selector outputs on public datasets with realistic propensity ranges further illustrate practical utility.
minor comments (3)
- The abstract states that design bias is bounded by Wasserstein distance to the launch exposure distribution and that the penalty is minimax tight under Lipschitz response; the main text should explicitly locate these statements (theorem or proposition number) and confirm that the Lipschitz constant is treated as known or estimated.
- Clarify the precise definition of the six-design catalog and how each design maps to the components of the planning-risk objective (especially estimand mismatch and carryover terms).
- The Open Bandit example reports propensities ranging from 0.00006 to 0.594 and a 5.17% IPS effective-sample share; state whether these quantities are used directly in the ambiguity-set construction or only for post-selection diagnostics.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of our work, the clear summary of the robust design selector, and the recommendation for minor revision. The report correctly identifies the core contributions: the Wasserstein bias bound that is minimax-tight under Lipschitz exposure response, the robust selector theorem with its finite-catalog, excess-risk, and exact-recovery guarantees, and the dataset-specific design recommendations. No major comments requiring clarification or correction were raised.
Circularity Check
No significant circularity
full rationale
The derivation chain consists of a standard robust-optimization formulation: an ambiguity set is user-specified, worst-case risk is computed over it, and all stated guarantees (Wasserstein bias bound, minimax tightness under Lipschitz response, excess-risk control, exact recovery under separation) are explicitly conditional on the true launch mechanism belonging to that set. This is an external modeling assumption rather than a self-referential definition or fitted quantity renamed as a prediction. No self-citation is invoked as a load-bearing uniqueness theorem, no ansatz is smuggled, and the finite-catalog selector and empirical results on public datasets are presented as separate evaluations. The central claims therefore remain independent of their own inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Exposure response function is Lipschitz continuous
Forward citations
Cited by 1 Pith paper
-
Privacy-Robust Incrementality Measurement for Advertising Systems under Signal Loss
Formulates privacy-constrained advertising measurement as a robust causal decision problem under signal loss and derives a sharp decision frontier separating certifiable from unresolved incrementality claims.
Reference graph
Works this paper leans on
-
[1]
Aronow and Cyrus Samii
Peter M. Aronow and Cyrus Samii. Estimating average causal effects under general interference, with application to a social network experiment.The Annals of Applied Statistics, 11(4):1912–1947,
1912
-
[2]
Imbens, Lorenzo Masoero, James McQueen, Thomas S
Patrick Bajari, Brian Burdick, Guido W. Imbens, Lorenzo Masoero, James McQueen, Thomas S. Richardson, and Ido Rosen. Multiple randomization designs.arXiv preprint arXiv:2112.13495,
-
[3]
doi: 10.1287/mnsc.2022.4583. Jiawei Chen, Chongming Gao, Shijun Li, Yuan Zhang, Biao Li, Wenqiang Lei, Peng Jiang, and Xiangnan He. Kuairand: An unbiased sequential recommendation dataset with randomly exposed videos.arXiv preprint arXiv:2208.08696,
-
[4]
Zahra Fatemi, Jean Pouget-Abadie, and Elena Zheleva
doi: 10.1515/jci-2015-0021. Zahra Fatemi, Jean Pouget-Abadie, and Elena Zheleva. Cascade-based randomization for inferring causal effects under diffusion interference. InProceedings of the International AAAI Conference on Web and Social Media, volume 18, pages 394–407,
-
[5]
Limiting bias from test-control interference in online marketplace experiments
David Holtz and Sinan Aral. Limiting bias from test-control interference in online marketplace experiments. arXiv preprint arXiv:2004.12162,
-
[6]
David Holtz, Ruben Lobel, Inessa Liskovich, and Sinan Aral. Reducing interference bias in online marketplace pricing experiments.arXiv preprint arXiv:2004.12489,
-
[7]
Yiming Jiang and He Wang. Causal inference under network interference using a mixture of randomized experiments.arXiv preprint arXiv:2309.00141,
-
[8]
doi: 10.1287/mnsc.2021
-
[9]
Interference, bias, and variance in two-sided marketplace experimentation: Guidance for platforms
Hannah Li, Geng Zhao, and Ramesh Johari. Interference, bias, and variance in two-sided marketplace experimentation: Guidance for platforms. InProceedings of the ACM Web Conference 2022, pages 182–192, 2022a. doi: 10.1145/3485447.3512063. Qike Li, Samir Jamkhande, Pavel Kochetkov, and Pai Liu. Assign experiment variants at scale in online controlled experi...
-
[10]
Min Liu, Jialiang Mao, and Kang Kang. Trustworthy online marketplace experimentation with budget-split design.arXiv preprint arXiv:2012.08724,
-
[11]
Robust and efficient multiple-unit switchback experimentation.arXiv preprint arXiv:2506.12654,
Paul Missault, Lorenzo Masoero, Christian Delbé, Thomas Richardson, and Guido Imbens. Robust and efficient multiple-unit switchback experimentation.arXiv preprint arXiv:2506.12654,
-
[12]
Randomized graph cluster randomization.arXiv preprint arXiv:2009.02297,
Johan Ugander and Hao Yin. Randomized graph cluster randomization.arXiv preprint arXiv:2009.02297,
-
[13]
Davide Viviano, Lihua Lei, Guido Imbens, Brian Karrer, Okke Schrijvers, and Liang Shi
doi: 10.1145/2487575.2487695. Davide Viviano, Lihua Lei, Guido Imbens, Brian Karrer, Okke Schrijvers, and Liang Shi. Causal clustering: Design of cluster experiments under network interference.arXiv preprint arXiv:2310.14983,
-
[14]
Mind: A large-scale dataset for news recommendation
doi: 10.18653/v1/2020.acl-main.331. Christina Lee Yu, Edoardo M. Airoldi, Christian Borgs, and Jennifer T. Chayes. Estimating total treatment effect in randomized experiments with unknown network structure.arXiv preprint arXiv:2205.12803,
-
[15]
Zhihua Zhu, Zheng Cai, Liang Zheng, and Nian Si. Seller-side experiments under interference induced by feedback loops in two-sided platforms.arXiv preprint arXiv:2401.15811,
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.