Amortized Monte Carlo Integration
Pith reviewed 2026-05-24 19:35 UTC · model grok-4.3
The pith
AMCI amortizes Monte Carlo integration directly, achieving arbitrarily small errors with one sample from each of three proposals.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
AMCI trains three distinct amortized proposals, each tailored to a different component of the overall expectation calculation. At runtime, one sample is drawn from each proposal and the three samples are combined to produce the estimate. This allows the error to be made arbitrarily small for any integrable target function, while also supporting amortization over target functions in addition to data.
What carries the argument
Three amortized proposals, each approximating one optimal component of the Monte Carlo estimator for the target expectation.
If this is right
- AMCI can reach higher accuracy than posterior-only amortization when the target function is known upfront.
- The method supports amortization over target functions as well as over datasets.
- On several example problems the approach empirically beats the theoretically optimal self-normalized importance sampler.
Where Pith is reading between the lines
- The same three-proposal structure might be applied to repeated integration tasks outside Bayesian settings where the integrand varies but remains known in advance.
- Training the proposals jointly could be extended to cases where the target function itself is drawn from a distribution at runtime.
- The separation into three components suggests possible hybrids with other variance-reduction techniques that also decompose the estimator.
Load-bearing premise
The three amortized proposals can be trained to sufficiently approximate the optimal components of the Monte Carlo estimator so that the combined single-sample estimate achieves the claimed accuracy.
What would settle it
An experiment on an integrable target function in which the error of the single-sample AMCI estimator remains bounded away from zero no matter how extensively the three proposals are trained.
read the original abstract
Current approaches to amortizing Bayesian inference focus solely on approximating the posterior distribution. Typically, this approximation is, in turn, used to calculate expectations for one or more target functions - a computational pipeline which is inefficient when the target function(s) are known upfront. In this paper, we address this inefficiency by introducing AMCI, a method for amortizing Monte Carlo integration directly. AMCI operates similarly to amortized inference but produces three distinct amortized proposals, each tailored to a different component of the overall expectation calculation. At runtime, samples are produced separately from each amortized proposal, before being combined to an overall estimate of the expectation. We show that while existing approaches are fundamentally limited in the level of accuracy they can achieve, AMCI can theoretically produce arbitrarily small errors for any integrable target function using only a single sample from each proposal at runtime. We further show that it is able to empirically outperform the theoretically optimal self-normalized importance sampler on a number of example problems. Furthermore, AMCI allows not only for amortizing over datasets but also amortizing over target functions.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces AMCI, which amortizes Monte Carlo integration by training three distinct parametric proposals (one for each component of the expectation estimator) over a distribution of targets. At test time a single sample is drawn from each proposal and combined into an estimate; the central claims are that this yields arbitrarily small error for any integrable target (unlike posterior-amortization pipelines) and empirically outperforms the optimal self-normalized importance sampler, while also permitting amortization over target functions themselves.
Significance. If the theoretical guarantee can be made rigorous under explicit conditions on the function class and optimization, the work would be significant: it shifts amortization from posterior approximation to direct expectation estimation when the integrand is known in advance, and the ability to amortize over targets is a useful extension. The empirical comparison to an optimal baseline is a positive feature.
major comments (2)
- [Abstract / §3] Abstract and §3 (theoretical analysis): the claim that AMCI produces arbitrarily small errors for any integrable target with one sample per proposal requires the three learned proposals to converge exactly to the optimal importance density, control variate, and normalization terms. The training objective is defined over a distribution of targets; without a universal-approximation theorem plus a convergence guarantee that holds for arbitrary integrable functions (rather than only those in the training support), the single-sample estimator cannot be guaranteed to become exact. This is load-bearing for the central theoretical claim.
- [§4] §4 (experiments): the reported outperformance over the theoretically optimal self-normalized importance sampler must be accompanied by an explicit statement of how the optimal SNIS proposal is obtained at test time for each target; if the comparison uses an oracle that is unavailable in the amortized setting, the empirical advantage is not yet demonstrated.
minor comments (2)
- [§2] Notation for the three proposals should be introduced with a single table or diagram that makes their roles in the combined estimator immediately clear.
- [§3] The training objective (Eq. (X)) should state whether it is an unbiased estimator of the integrated squared error or another quantity; the current description leaves the precise loss ambiguous.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive feedback. Below we respond point-by-point to the major comments, proposing revisions to address the concerns while preserving the core contributions of the work.
read point-by-point responses
-
Referee: [Abstract / §3] Abstract and §3 (theoretical analysis): the claim that AMCI produces arbitrarily small errors for any integrable target with one sample per proposal requires the three learned proposals to converge exactly to the optimal importance density, control variate, and normalization terms. The training objective is defined over a distribution of targets; without a universal-approximation theorem plus a convergence guarantee that holds for arbitrary integrable functions (rather than only those in the training support), the single-sample estimator cannot be guaranteed to become exact. This is load-bearing for the central theoretical claim.
Authors: We agree that the claim of arbitrarily small error with one sample per proposal holds exactly only when the three proposals match the optimal importance density, control variate, and normalization term. The manuscript's theoretical argument is that the estimator variance vanishes under these conditions (unlike posterior amortization, which retains irreducible error), and that amortization over a distribution of targets enables learning approximations to these optima. However, we acknowledge that a fully rigorous guarantee for arbitrary integrable functions would require explicit universal-approximation results for the chosen parametric families together with optimization convergence outside the training support. We will revise the abstract and §3 to state these conditions explicitly, clarifying that arbitrary accuracy is achieved in the limit of perfect approximation and optimization rather than unconditionally for every integrable target. revision: yes
-
Referee: [§4] §4 (experiments): the reported outperformance over the theoretically optimal self-normalized importance sampler must be accompanied by an explicit statement of how the optimal SNIS proposal is obtained at test time for each target; if the comparison uses an oracle that is unavailable in the amortized setting, the empirical advantage is not yet demonstrated.
Authors: In the experiments the optimal SNIS proposal for each target is obtained by numerically optimizing the proposal parameters (using the known integrand and target density) to minimize the variance of the self-normalized estimator; this per-target optimization is described in §4 but will be expanded with implementation details. The comparison is deliberately against this non-amortized oracle to show that the single amortized AMCI model can still outperform even the best possible per-target SNIS. We will add an explicit statement of the optimization procedure and a short discussion distinguishing the amortized versus per-target regimes so that the empirical results are presented with full transparency. revision: yes
Circularity Check
No significant circularity; theoretical claim is self-contained
full rationale
The paper derives its central result—that AMCI can theoretically achieve arbitrarily small errors for any integrable target with one sample per proposal—via a mathematical argument on the components of the Monte Carlo estimator and amortization over targets, without any quoted reduction of the output to a fitted parameter or self-citation chain. No load-bearing self-citations, ansatzes smuggled via prior work, or renaming of known results appear in the provided abstract or description. The empirical comparisons are presented separately from the theoretical guarantee, leaving the derivation independent of its inputs.
Axiom & Free-Parameter Ledger
free parameters (1)
- parameters of the three amortized proposals
axioms (1)
- domain assumption Target function is integrable
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.