The Wasserstein cost of Importance Sampling

Michael Goldman; Simon Coste

read the original abstract

Importance sampling (IS) consists in biasing samples from a distribution $f$ towards another distribution $g$. Concretely, given samples $X_i$ from $f$, the IS measure is $$\hat{g}_n = \frac{1}{Z_n}\sum_{i=1}^n \frac{g(X_i)}{f(X_i)} \delta_{X_i},$$ with $Z_n = \sum_{i=1}^n \frac{g(X_i)}{f(X_i)}$. The random measure $\hat{g}_n$ approximates $g$, and is used in many contexts ranging from Monte Carlo integration to Bayesian inference. We show that, in high dimension ($d \geqslant 3$), the Wasserstein cost $W_p^p(\hat{g}_n, g)$ has order $n^{-p/d}$ in expectation, i.e. $$\beta^{\mathrm{low}}_{p,d}\int gf^{-p/d}\leqslant \liminf_{n \to \infty} n^{p/d} \mathbb{E}[W_p^p(\hat{g}_n, g)] \leqslant \limsup_{n \to \infty} n^{p/d} \mathbb{E}[W_p^p(\hat{g}_n, g)] \leqslant\beta_{p,d} \int g f^{-p/d}$$ where $0<\beta^{\mathrm{low}}_{p,d}\leqslant \beta_{p,d}$ are constants depending only on $p$ and $d$, which are equal for $p=2$ and conjectured to be equal for any $p\geqslant 1$. Our results are valid for all $p\geqslant 1$ and $d\geqslant 3$. In the case where $\beta^{\mathrm{low}}_{p,d} = \beta_{p,d}$, we show that the asymptotically optimal sampling distribution $f^*$ for importance sampling is not equal to $g$ but to a tempered version of $g$, namely $f^* \propto g^{d/(p+d)}$, which is reminiscent of Zador's theorem in the domain of measure quantization.

The Wasserstein cost of Importance Sampling

discussion (0)