Amortized Monte Carlo Integration

Adam Goli\'nski; Frank Wood; Tom Rainforth

arxiv: 1907.08082 · v1 · pith:3HMACM7Dnew · submitted 2019-07-18 · 📊 stat.ML · cs.LG· stat.CO

Amortized Monte Carlo Integration

Adam Goli\'nski , Frank Wood , Tom Rainforth This is my paper

Pith reviewed 2026-05-24 19:35 UTC · model grok-4.3

classification 📊 stat.ML cs.LGstat.CO

keywords amortized Monte Carlo integrationimportance samplingBayesian inferenceexpectation estimationamortized inference

0 comments

The pith

AMCI amortizes Monte Carlo integration directly, achieving arbitrarily small errors with one sample from each of three proposals.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces AMCI to amortize the full process of computing expectations rather than approximating only the posterior distribution. It trains three separate proposals, each matched to a distinct part of the Monte Carlo estimator, so that samples drawn from them at runtime can be combined into an overall estimate. The central result is that this construction can theoretically drive the error to zero for any integrable target function, even when using just one sample from each proposal. Existing amortized inference pipelines remain limited by the quality of the posterior approximation, whereas AMCI removes that bottleneck when the target function is known in advance and also permits amortization over families of target functions.

Core claim

AMCI trains three distinct amortized proposals, each tailored to a different component of the overall expectation calculation. At runtime, one sample is drawn from each proposal and the three samples are combined to produce the estimate. This allows the error to be made arbitrarily small for any integrable target function, while also supporting amortization over target functions in addition to data.

What carries the argument

Three amortized proposals, each approximating one optimal component of the Monte Carlo estimator for the target expectation.

If this is right

AMCI can reach higher accuracy than posterior-only amortization when the target function is known upfront.
The method supports amortization over target functions as well as over datasets.
On several example problems the approach empirically beats the theoretically optimal self-normalized importance sampler.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same three-proposal structure might be applied to repeated integration tasks outside Bayesian settings where the integrand varies but remains known in advance.
Training the proposals jointly could be extended to cases where the target function itself is drawn from a distribution at runtime.
The separation into three components suggests possible hybrids with other variance-reduction techniques that also decompose the estimator.

Load-bearing premise

The three amortized proposals can be trained to sufficiently approximate the optimal components of the Monte Carlo estimator so that the combined single-sample estimate achieves the claimed accuracy.

What would settle it

An experiment on an integrable target function in which the error of the single-sample AMCI estimator remains bounded away from zero no matter how extensively the three proposals are trained.

read the original abstract

Current approaches to amortizing Bayesian inference focus solely on approximating the posterior distribution. Typically, this approximation is, in turn, used to calculate expectations for one or more target functions - a computational pipeline which is inefficient when the target function(s) are known upfront. In this paper, we address this inefficiency by introducing AMCI, a method for amortizing Monte Carlo integration directly. AMCI operates similarly to amortized inference but produces three distinct amortized proposals, each tailored to a different component of the overall expectation calculation. At runtime, samples are produced separately from each amortized proposal, before being combined to an overall estimate of the expectation. We show that while existing approaches are fundamentally limited in the level of accuracy they can achieve, AMCI can theoretically produce arbitrarily small errors for any integrable target function using only a single sample from each proposal at runtime. We further show that it is able to empirically outperform the theoretically optimal self-normalized importance sampler on a number of example problems. Furthermore, AMCI allows not only for amortizing over datasets but also amortizing over target functions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

AMCI tries to amortize the integration step itself with three proposals instead of just the posterior, but the single-sample arbitrary-accuracy claim rests on an assumption that training will hit exact optimality for any target.

read the letter

The main thing here is a shift from amortizing only the posterior to amortizing the Monte Carlo estimator directly. They train three separate proposals that target different pieces of the expectation calculation, then combine one draw from each at test time. That setup lets them amortize over target functions as well as data, which is a clear departure from the usual pipeline where you amortize inference and then run separate Monte Carlo on top. If the training works, it could cut the cost of repeated queries inside larger models when the integrands are known in advance. The empirical claim that it beats the optimal self-normalized importance sampler on some examples is worth checking once the full experiments are in front of us. The soft spot is the theoretical guarantee. The paper says AMCI can drive error to zero for any integrable function with one sample per proposal. That only follows if the three amortized proposals converge to the exact optimal components (importance density, control variate, etc.). Training over a distribution of targets does not automatically deliver that for an arbitrary new function; the parametric family or the optimization may simply not reach the required precision. The stress-test note is right on this point. Without seeing the derivations and the precise training objectives, it is hard to tell how much of the guarantee survives the approximation. The citation pattern looks standard for the area and there is no obvious circularity. This is for people already working on amortized Bayesian workflows who need faster repeated integration. It is worth sending to referees so the math and the experiments can be examined in detail; the core idea is coherent even if the strongest claim needs tightening.

Referee Report

2 major / 2 minor

Summary. The paper introduces AMCI, which amortizes Monte Carlo integration by training three distinct parametric proposals (one for each component of the expectation estimator) over a distribution of targets. At test time a single sample is drawn from each proposal and combined into an estimate; the central claims are that this yields arbitrarily small error for any integrable target (unlike posterior-amortization pipelines) and empirically outperforms the optimal self-normalized importance sampler, while also permitting amortization over target functions themselves.

Significance. If the theoretical guarantee can be made rigorous under explicit conditions on the function class and optimization, the work would be significant: it shifts amortization from posterior approximation to direct expectation estimation when the integrand is known in advance, and the ability to amortize over targets is a useful extension. The empirical comparison to an optimal baseline is a positive feature.

major comments (2)

[Abstract / §3] Abstract and §3 (theoretical analysis): the claim that AMCI produces arbitrarily small errors for any integrable target with one sample per proposal requires the three learned proposals to converge exactly to the optimal importance density, control variate, and normalization terms. The training objective is defined over a distribution of targets; without a universal-approximation theorem plus a convergence guarantee that holds for arbitrary integrable functions (rather than only those in the training support), the single-sample estimator cannot be guaranteed to become exact. This is load-bearing for the central theoretical claim.
[§4] §4 (experiments): the reported outperformance over the theoretically optimal self-normalized importance sampler must be accompanied by an explicit statement of how the optimal SNIS proposal is obtained at test time for each target; if the comparison uses an oracle that is unavailable in the amortized setting, the empirical advantage is not yet demonstrated.

minor comments (2)

[§2] Notation for the three proposals should be introduced with a single table or diagram that makes their roles in the combined estimator immediately clear.
[§3] The training objective (Eq. (X)) should state whether it is an unbiased estimator of the integrated squared error or another quantity; the current description leaves the precise loss ambiguous.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. Below we respond point-by-point to the major comments, proposing revisions to address the concerns while preserving the core contributions of the work.

read point-by-point responses

Referee: [Abstract / §3] Abstract and §3 (theoretical analysis): the claim that AMCI produces arbitrarily small errors for any integrable target with one sample per proposal requires the three learned proposals to converge exactly to the optimal importance density, control variate, and normalization terms. The training objective is defined over a distribution of targets; without a universal-approximation theorem plus a convergence guarantee that holds for arbitrary integrable functions (rather than only those in the training support), the single-sample estimator cannot be guaranteed to become exact. This is load-bearing for the central theoretical claim.

Authors: We agree that the claim of arbitrarily small error with one sample per proposal holds exactly only when the three proposals match the optimal importance density, control variate, and normalization term. The manuscript's theoretical argument is that the estimator variance vanishes under these conditions (unlike posterior amortization, which retains irreducible error), and that amortization over a distribution of targets enables learning approximations to these optima. However, we acknowledge that a fully rigorous guarantee for arbitrary integrable functions would require explicit universal-approximation results for the chosen parametric families together with optimization convergence outside the training support. We will revise the abstract and §3 to state these conditions explicitly, clarifying that arbitrary accuracy is achieved in the limit of perfect approximation and optimization rather than unconditionally for every integrable target. revision: yes
Referee: [§4] §4 (experiments): the reported outperformance over the theoretically optimal self-normalized importance sampler must be accompanied by an explicit statement of how the optimal SNIS proposal is obtained at test time for each target; if the comparison uses an oracle that is unavailable in the amortized setting, the empirical advantage is not yet demonstrated.

Authors: In the experiments the optimal SNIS proposal for each target is obtained by numerically optimizing the proposal parameters (using the known integrand and target density) to minimize the variance of the self-normalized estimator; this per-target optimization is described in §4 but will be expanded with implementation details. The comparison is deliberately against this non-amortized oracle to show that the single amortized AMCI model can still outperform even the best possible per-target SNIS. We will add an explicit statement of the optimization procedure and a short discussion distinguishing the amortized versus per-target regimes so that the empirical results are presented with full transparency. revision: yes

Circularity Check

0 steps flagged

No significant circularity; theoretical claim is self-contained

full rationale

The paper derives its central result—that AMCI can theoretically achieve arbitrarily small errors for any integrable target with one sample per proposal—via a mathematical argument on the components of the Monte Carlo estimator and amortization over targets, without any quoted reduction of the output to a fitted parameter or self-citation chain. No load-bearing self-citations, ansatzes smuggled via prior work, or renaming of known results appear in the provided abstract or description. The empirical comparisons are presented separately from the theoretical guarantee, leaving the derivation independent of its inputs.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Abstract only; the method rests on standard Monte Carlo assumptions plus the existence of trainable proposal networks whose parameters are fitted to data.

free parameters (1)

parameters of the three amortized proposals
Neural network weights for the proposals are learned from data; their specific form is not stated in the abstract.

axioms (1)

domain assumption Target function is integrable
Invoked to support the claim of arbitrarily small error for any integrable function.

pith-pipeline@v0.9.0 · 5711 in / 1015 out tokens · 19167 ms · 2026-05-24T19:35:05.663049+00:00 · methodology

Amortized Monte Carlo Integration

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)