pith. machine review for the scientific record. sign in

arxiv: 2505.18118 · v2 · submitted 2025-05-23 · 📊 stat.ML · cs.LG

Recognition: unknown

Scalable Policy Maximization Under Network Interference

Authors on Pith no claims yet
classification 📊 stat.ML cs.LG
keywords interferencenetworkpolicyunderalgorithmalgorithmsassumptionsbandit
0
0 comments X
read the original abstract

Many interventions, such as vaccines in clinical trials or coupons in online marketplaces, must be assigned sequentially without full knowledge of their effects. Multi-armed bandit algorithms have proven successful in such settings. However, standard independence assumptions fail when the treatment status of one individual impacts the outcomes of others, a phenomenon known as interference. We study optimal-policy learning under interference on a dynamic network. Existing approaches to this problem require repeated observations of the same fixed network and struggle to scale in sample size beyond as few as fifteen connected units -- both limit applications. We show that under common assumptions on the structure of interference, rewards become linear. This enables us to develop a scalable Thompson sampling algorithm that maximizes policy impact when a new $n$-node network is observed each round. We prove a Bayesian regret bound that is sublinear in $n$ and the number of rounds. Simulation experiments show that our algorithm learns quickly and outperforms existing methods. The results close a key scalability gap between causal inference methods for interference and practical bandit algorithms, enabling policy optimization in large-scale networked systems.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Dynamic Treatment on Networks

    stat.ML 2026-05 unverdicted novelty 7.0

    Q-Ising integrates Bayesian dynamic Ising modeling with offline RL to enable adaptive network treatment policies that outperform static centrality benchmarks under spillovers.