Corrected Integrated Laplace Approximation for Bayesian Inference in Latent Gaussian Models

Charles C. Margossian; Daniel R. Sheldon; Jinlin Lai

arxiv: 2605.20345 · v1 · pith:OOIQUZVJnew · submitted 2026-05-19 · 📊 stat.ML · cs.LG

Corrected Integrated Laplace Approximation for Bayesian Inference in Latent Gaussian Models

Jinlin Lai , Charles C. Margossian , Daniel R. Sheldon This is my paper

Pith reviewed 2026-05-21 07:18 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords latent Gaussian modelsintegrated Laplace approximationimportance samplingBayesian inferencequasi-Monte CarloHamiltonian Monte CarloGaussian processes

0 comments

The pith

Importance sampling corrects the integrated Laplace approximation so the posterior converges to the true one in latent Gaussian models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Latent Gaussian models require marginalizing out latent variables for Bayesian inference, but non-Gaussian likelihoods make exact marginalization impossible. The integrated Laplace approximation offers an efficient but sometimes inaccurate substitute that can produce posteriors differing noticeably from the correct distribution. This paper introduces an importance sampling scheme that removes the approximation error, with the corrected posterior approaching the true posterior as the number of samples increases. The scheme incorporates pseudo-marginalization and quasi-Monte Carlo techniques and is implemented in an automatic differentiation framework so that gradient-based methods such as Hamiltonian Monte Carlo remain available for hyperparameter inference.

Core claim

The authors propose an importance sampling scheme to correct the error introduced by the integrated Laplace approximation. By increasing the number of samples in importance sampling, the posterior with ILA converges to the correct posterior. This idea is realized with various techniques, including pseudo-marginalization, quasi-Monte Carlo and randomized quasi-Monte Carlo. The methods are implemented in an automatic differentiation framework to support gradient-based algorithms when doing inference on the hyperparameters, specifically considering Hamiltonian Monte Carlo.

What carries the argument

The importance sampling correction to the integrated Laplace approximation (ILA), which adjusts the marginal likelihood estimate so that the resulting posterior converges to the exact posterior as sample size grows.

If this is right

The corrected posterior can be used directly in downstream tasks with substantially lower error than plain ILA.
Hyperparameter inference proceeds with standard gradient-based algorithms such as Hamiltonian Monte Carlo.
The same correction framework applies to Gaussian processes, spatial models, and mixed-effect models.
Convergence to the true posterior is obtained simply by increasing the importance sample size without altering the base Laplace approximation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The sampling correction could be paired with other marginalization schemes to handle hierarchical models outside the current latent-Gaussian class.
Adaptive or learned proposal distributions might reduce the sample count needed to reach a target accuracy level.
The automatic-differentiation implementation suggests straightforward extension to larger-scale problems where gradient information is already available.

Load-bearing premise

A suitable proposal distribution exists for the importance sampler such that the correction is both effective and computationally tractable across the range of latent Gaussian models considered.

What would settle it

Compute the exact posterior for a simple latent Gaussian model with non-Gaussian likelihood and verify whether the discrepancy with the importance-sampling-corrected ILA posterior shrinks toward zero as the number of importance samples is increased.

Figures

Figures reproduced from arXiv: 2605.20345 by Charles C. Margossian, Daniel R. Sheldon, Jinlin Lai.

**Figure 2.** Figure 2: Running time to collect 100,000 samples and average ESS/min of [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Error of estimating the means of parameters as a function of time in seconds for the sparse [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Comparing the sampled posterior against the ground-truth from NUTS. We demonstrate [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Error of estimating E[Lbase] and E[log σβ] as a function of time in seconds for the Epil2 model (Lbase is transformed from L in the unconstrained space). Results are averaged from 5 independent runs. Ground-truth is estimated from NUTS with non-centered parameterization. References [1] Raj Agrawal, Brian Trippe, Jonathan Huggins, and Tamara Broderick. The kernel interaction trick: Fast Bayesian discovery o… view at source ↗

**Figure 6.** Figure 6: Error of estimating the means of parameters as a function of time in seconds for the [PITH_FULL_IMAGE:figures/full_fig_p021_6.png] view at source ↗

**Figure 7.** Figure 7: Error of estimating E[f80] and E[log α] as a function of time in seconds for the synthesized Gaussian process with Poisson likelihood. Results are averaged from 5 independent runs. 0 2500 5000 7500 10000 time 0.000 0.001 0.002 0.003 0.004 Error for lo g caux QMC (n=4) 0 2500 5000 7500 10000 time QMC (n=16) 0 2500 5000 7500 10000 time QMC (n=64) 0 2500 5000 7500 10000 time 0.000 0.001 0.002 0.003 0.004 0.00… view at source ↗

**Figure 8.** Figure 8: Error of estimating E[log caux] and E[log χ] as a function of time in seconds for the sparse kernel interaction model. Results are averaged from 5 independent runs. 0 2500 5000 7500 10000 time 0.000 0.002 0.004 0.006 Error for 0 QMC (n=4) 0 2500 5000 7500 10000 time QMC (n=16) 0 2500 5000 7500 10000 time QMC (n=64) 0 2500 5000 7500 10000 time 0.00 0.05 0.10 0.15 0.20 Error for lo g T1 QMC (n=4) 0 2500 5000… view at source ↗

**Figure 9.** Figure 9: Error of estimating E[β0] and E[log T1] as a function of time in seconds for the mixedeffects model. Results are averaged from 5 independent runs. 23 [PITH_FULL_IMAGE:figures/full_fig_p023_9.png] view at source ↗

**Figure 10.** Figure 10: Distribution of πˆ RQMC(U|θ, y) with different θ and n for the simple example model, using 5,000 evaluation points in [0, 1]. The distribution is more uniform as we increase n. Also, πˆ RQMC(U|θ, y) is not continuous. 24 [PITH_FULL_IMAGE:figures/full_fig_p024_10.png] view at source ↗

read the original abstract

Latent Gaussian models (LGMs) are a popular class of Bayesian hierarchical models that include Gaussian processes, as well as certain spatial models and mixed-effect models. Efficient Bayesian inference of LGMs often requires marginalizing out the latent variables. For LGMs with a non-Gaussian likelihood, exact marginalization is not possible and a popular approach is to do approximate marginalization with an integrated Laplace approximation (ILA). Using ILA produces an approximate posterior which, in some settings, can differ significantly from the correct posterior, which impacts downstream applications. We propose an importance sampling scheme to correct the error introduced by ILA. By increasing the number of samples in importance sampling, the posterior with ILA converges to the correct posterior. This idea is realized with various techniques, including pseudo-marginalization, quasi-Monte Carlo and randomized quasi-Monte Carlo. We implement our methods in an automatic differentiation framework to support gradient-based algorithms when doing inference on the hyperparameters. For the latter, we specifically consider the use of Hamiltonian Monte Carlo. We demonstrate the benefits of reduced error in various applied models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a direct importance-sampling correction to integrated Laplace approximation that converges to the exact posterior as samples increase, but the proposal construction will decide whether it stays practical in high-dimensional LGMs.

read the letter

The paper introduces an importance sampling correction to the integrated Laplace approximation for Bayesian inference in latent Gaussian models. The central idea is that by using importance sampling to adjust for the approximation error, the posterior on the hyperparameters converges to the correct one as the number of samples increases. They realize this with pseudo-marginal, quasi-Monte Carlo, and randomized quasi-Monte Carlo methods, all wrapped in an automatic differentiation framework that supports Hamiltonian Monte Carlo for the hyperparameters. What stands out is the direct attack on a practical problem. Integrated Laplace approximation is common for marginalizing latents in models with non-Gaussian likelihoods, but it can produce noticeable bias in the hyperparameter posterior. Showing a way to correct that bias without abandoning the Laplace step is useful for applied work in spatial statistics and Gaussian processes. The potential weakness lies in the choice of proposal distribution for the importance sampler. In high-dimensional settings the target posterior can be quite different from Gaussian, so a proposal derived from the Laplace approximation may suffer from high variance or insufficient overlap. The paper lists the sampling techniques but leaves open exactly how the proposal is built and whether it remains efficient when the latent dimension grows. If that detail holds up, the method is a clear improvement; otherwise the convergence result stays mostly asymptotic. This work is for researchers and practitioners who already rely on integrated Laplace approximations inside larger Bayesian pipelines. Anyone who cares about accurate uncertainty in predictions from these models will find the correction relevant. It deserves a serious referee because the claim is straightforward to test and the implementation choices are concrete enough to review.

Referee Report

2 major / 2 minor

Summary. The paper proposes an importance sampling scheme to correct errors from the integrated Laplace approximation (ILA) when marginalizing latent variables in latent Gaussian models (LGMs) with non-Gaussian likelihoods. The central claim is that the corrected posterior converges to the exact posterior as the number of importance samples increases. Realizations include pseudo-marginal, quasi-Monte Carlo (QMC), and randomized QMC methods, with implementation in an automatic differentiation framework to enable gradient-based hyperparameter inference via Hamiltonian Monte Carlo (HMC). Benefits are demonstrated on applied models including Gaussian processes and spatial models.

Significance. If a low-variance proposal can be constructed that remains tractable, the correction would meaningfully reduce ILA-induced bias in LGMs where approximation error affects downstream tasks. The AD/HMC integration and QMC variants are practical strengths that could improve efficiency over naive sampling corrections.

major comments (2)

[Abstract] Abstract: The convergence claim ('by increasing the number of samples in importance sampling, the posterior with ILA converges to the correct posterior') is load-bearing but rests on the unstated assumption that a proposal distribution q exists whose support contains the target and whose importance weights have controlled variance. In high-dimensional LGMs the posterior is typically far from Gaussian; any proposal derived from the Laplace approximation will have poor overlap exactly where ILA error is largest, undermining both unbiasedness (for pseudo-marginal HMC) and computational feasibility.
[Methods] Methods (proposal construction): The manuscript lists pseudo-marginal, QMC and RQMC realizations but does not specify how the importance proposal is built or adapted to the hyperparameters. Without this detail it is impossible to verify that the scheme remains both unbiased and tractable across the range of latent dimensions and likelihoods considered.

minor comments (2)

[Methods] Notation for the corrected posterior and the importance weights should be introduced with a single consistent equation early in the methods to avoid later ambiguity.
[Experiments] The empirical demonstrations would benefit from explicit reporting of effective sample size or variance of the importance weights to substantiate the practical convergence rate.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on our manuscript. We address each major point below, clarifying the assumptions underlying our convergence claim and providing additional details on proposal construction. We have revised the manuscript accordingly to strengthen the presentation.

read point-by-point responses

Referee: [Abstract] Abstract: The convergence claim ('by increasing the number of samples in importance sampling, the posterior with ILA converges to the correct posterior') is load-bearing but rests on the unstated assumption that a proposal distribution q exists whose support contains the target and whose importance weights have controlled variance. In high-dimensional LGMs the posterior is typically far from Gaussian; any proposal derived from the Laplace approximation will have poor overlap exactly where ILA error is largest, undermining both unbiasedness (for pseudo-marginal HMC) and computational feasibility.

Authors: We agree that the stated convergence relies on standard importance sampling assumptions: the proposal q must have support containing that of the target posterior and the importance weights must have finite variance for practical convergence. These conditions were implicit but will now be stated explicitly in the revised abstract and methods. Regarding high-dimensional LGMs, we acknowledge that a Laplace-derived proposal can have limited overlap where the ILA error is largest, which may lead to high variance and affect computational feasibility for pseudo-marginal HMC. However, the estimator remains unbiased (and thus the corrected posterior converges) as the number of samples tends to infinity whenever the support condition holds, independent of dimension. We will add a new paragraph discussing proposal quality, variance control strategies (including the QMC variants already in the paper), and practical limitations in high dimensions. revision: yes
Referee: [Methods] Methods (proposal construction): The manuscript lists pseudo-marginal, QMC and RQMC realizations but does not specify how the importance proposal is built or adapted to the hyperparameters. Without this detail it is impossible to verify that the scheme remains both unbiased and tractable across the range of latent dimensions and likelihoods considered.

Authors: We thank the referee for highlighting this omission. The original manuscript emphasized the general framework and its realizations but did not provide a self-contained description of proposal construction. In the revised manuscript we will insert a dedicated subsection that (i) specifies the default proposal as the Gaussian approximation obtained from the Laplace step, (ii) describes how the proposal is re-centered and re-scaled when hyperparameters change during outer-loop inference, and (iii) outlines simple adaptation heuristics (e.g., moment matching or low-rank updates) that preserve unbiasedness while remaining tractable. These additions will allow readers to verify the conditions for unbiasedness and computational feasibility across the latent dimensions and likelihoods considered in the experiments. revision: yes

Circularity Check

0 steps flagged

No circularity: importance sampling correction is a standard, independently motivated technique

full rationale

The paper proposes using importance sampling (including pseudo-marginal, QMC, and RQMC variants) to correct the known approximation error of ILA in LGMs, with the claim that the corrected posterior converges to the exact one as the number of samples grows. This follows directly from the standard properties of importance sampling and does not reduce to any self-definition, fitted parameter renamed as prediction, or load-bearing self-citation chain. The derivation chain is self-contained and relies on external statistical results rather than circular internal constructions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on standard properties of Laplace approximation and importance sampling in Bayesian hierarchical models; no new free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption Integrated Laplace approximation introduces a correctable error for non-Gaussian likelihoods in latent Gaussian models.
This premise underpins the need for and the design of the importance sampling correction.

pith-pipeline@v0.9.0 · 5723 in / 1188 out tokens · 29809 ms · 2026-05-21T07:18:56.110342+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose an importance sampling scheme to correct the error introduced by ILA. By increasing the number of samples in importance sampling, the posterior with ILA converges to the correct posterior.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Proposition 1. ... ˆπz(θ, y) is an unbiased estimator of π(θ, y).

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages · 2 internal anchors

[1]

The kernel interaction trick: Fast Bayesian discovery of pairwise interactions in high dimensions

Raj Agrawal, Brian Trippe, Jonathan Huggins, and Tamara Broderick. The kernel interaction trick: Fast Bayesian discovery of pairwise interactions in high dimensions. InInternational Conference on Machine Learning, pages 141–150. PMLR, 2019

work page 2019
[2]

Pseudo-marginal Hamiltonian Monte Carlo.Journal of Machine Learning Research, 22(141):1–45, 2021

Johan Alenlöv, Arnoud Doucet, and Fredrik Lindsten. Pseudo-marginal Hamiltonian Monte Carlo.Journal of Machine Learning Research, 22(141):1–45, 2021

work page 2021
[3]

Christophe Andrieu and Gareth O. Roberts. The pseudo-marginal approach for efficient Monte Carlo computations.The Annals of Statistics, 37(2), April 2009. ISSN 0090-5364

work page 2009
[4]

Survival Regression with Accelerated Failure Time Model in XGBoost

Martin Outzen Berild, Sara Martino, Virgilio Gómez-Rubio, and Håvard Rue. Importance sampling with the integrated nested Laplace approximation.Journal of Computational and Graphical Statistics, 31(4):1225–1237, 2022. doi: 10.1080/10618600.2022.2067551

work page doi:10.1080/10618600.2022.2067551 2022
[5]

Hamiltonian Monte Carlo for hierarchical models

Michael Betancourt and Mark Girolami. Hamiltonian Monte Carlo for hierarchical models. In Current Trends in Bayesian Methodology with Applications, page 24. Chapman and Hall/CRC,

work page
[6]

doi: 10.1201/b18502-5

work page doi:10.1201/b18502-5
[7]

Pyro: Deep universal probabilistic programming.Journal of machine learning research, 20(28):1–6, 2019

Eli Bingham, Jonathan P Chen, Martin Jankowiak, Fritz Obermeyer, Neeraj Pradhan, Theofanis Karaletsos, Rohit Singh, Paul Szerlip, Paul Horsfall, and Noah D Goodman. Pyro: Deep universal probabilistic programming.Journal of machine learning research, 20(28):1–6, 2019

work page 2019
[8]

Negative binomial loglinear mixed models.Statistical Modelling, 3(3):179–191, 2003

James G Booth, George Casella, Herwig Friedl, and James P Hobert. Negative binomial loglinear mixed models.Statistical Modelling, 3(3):179–191, 2003

work page 2003
[9]

JAX: composable transformations of Python+NumPy programs, 2018

James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman-Milne, and Qiao Zhang. JAX: composable transformations of Python+NumPy programs, 2018. URL http://github.com/jax-ml/jax

work page 2018
[10]

and Kristensen, Kasper and van Benthem, Koen J

Mollie E. Brooks, Kasper Kristensen, Koen J. van Benthem, Arni Magnusson, Casper W. Berg, Anders Nielsen, Hans J. Skaug, Martin Maechler, and Benjamin M. Bolker. glmmTMB balances speed and flexibility among packages for zero-inflated generalized linear mixed modeling.The R Journal, 9(2):378–400, 2017. doi: 10.32614/RJ-2017-066

work page doi:10.32614/rj-2017-066 2017
[11]

Sample average approximation for black- box variational inference

Javier Burroni, Justin Domke, and Daniel Sheldon. Sample average approximation for black- box variational inference. In Negar Kiyavash and Joris M. Mooij, editors,Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence, volume 244 ofProceedings of Machine Learning Research, pages 471–498. PMLR, 15–19 Jul 2024

work page 2024
[12]

BlackJAX: composable Bayesian inference in JAX.arXiv preprint arXiv:2402.10797, 2024

Alberto Cabezas, Adrien Corenflos, Junpeng Lao, Rémi Louf, Antoine Carnec, Kaustubh Chaudhari, Reuben Cohn-Gordon, Jeremie Coullon, Wei Deng, Sam Duffield, et al. BlackJAX: composable Bayesian inference in JAX.arXiv preprint arXiv:2402.10797, 2024. 10

work page arXiv 2024
[13]

Monte Carlo and quasi-Monte Carlo methods.Acta numerica, 7:1–49, 1998

Russel E Caflisch. Monte Carlo and quasi-Monte Carlo methods.Acta numerica, 7:1–49, 1998

work page 1998
[14]

Stan: A probabilistic programming language.Journal of statistical software, 76:1–32, 2017

Bob Carpenter, Andrew Gelman, Matthew D Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. Stan: A probabilistic programming language.Journal of statistical software, 76:1–32, 2017

work page 2017
[15]

An extended simplified Laplace strategy for approximate Bayesian inference of latent Gaussian models using R-INLA

Cristian Chiuchiolo, Janet van Niekerk, and Håvard Rue. An extended simplified Laplace strategy for approximate Bayesian inference of latent Gaussian models using R-INLA. arXiv:2203.14304, 2022

work page arXiv 2022
[16]

Importance weighting and variational inference.Advances in neural information processing systems, 31, 2018

Justin Domke and Daniel R Sheldon. Importance weighting and variational inference.Advances in neural information processing systems, 31, 2018

work page 2018
[17]

Divide and couple: Using Monte Carlo variational objectives for posterior approximation.Advances in neural information processing systems, 32, 2019

Justin Domke and Daniel R Sheldon. Divide and couple: Using Monte Carlo variational objectives for posterior approximation.Advances in neural information processing systems, 32, 2019

work page 2019
[18]

Hybrid Monte Carlo.Physics letters B, 195(2):216–222, 1987

Simon Duane, Anthony D Kennedy, Brian J Pendleton, and Duncan Roweth. Hybrid Monte Carlo.Physics letters B, 195(2):216–222, 1987

work page 1987
[19]

Improving the INLA approach for approximate Bayesian inference for latent Gaussian models.Electronic Journal of Statistics, 9(2):2706–2731, 2015

Egil Ferkingstad and Håvard Rue. Improving the INLA approach for approximate Bayesian inference for latent Gaussian models.Electronic Journal of Statistics, 9(2):2706–2731, 2015. doi: 10.1214/15-EJS1092

work page doi:10.1214/15-ejs1092 2015
[20]

Adaptive rejection Metropolis sampling within Gibbs sampling.Journal of the Royal Statistical Society Series C: Applied Statistics, 44 (4):455–472, 1995

Wally R Gilks, Nicky G Best, and Keith KC Tan. Adaptive rejection Metropolis sampling within Gibbs sampling.Journal of the Royal Statistical Society Series C: Applied Statistics, 44 (4):455–472, 1995

work page 1995
[21]

Black box variational inference with a deterministic objective: Faster, more accurate, and even more black box.Journal of Machine Learning Research, 25(18):1–39, 2024

Ryan Giordano, Martin Ingram, and Tamara Broderick. Black box variational inference with a deterministic objective: Faster, more accurate, and even more black box.Journal of Machine Learning Research, 25(18):1–39, 2024

work page 2024
[22]

Automatic reparameterisation of proba- bilistic programs

Maria Gorinova, Dave Moore, and Matthew Hoffman. Automatic reparameterisation of proba- bilistic programs. InInternational Conference on Machine Learning, pages 3648–3657. PMLR, 2020

work page 2020
[23]

NeuTra-lizing Bad Geometry in Hamiltonian Monte Carlo Using Neural Transport

Matthew Hoffman, Pavel Sountsov, Joshua V Dillon, Ian Langmore, Dustin Tran, and Srinivas Vasudevan. NeuTra-lizing bad geometry in Hamiltonian Monte Carlo using neural transport. arXiv preprint arXiv:1903.03704, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1903
[24]

The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo.J

Matthew D Hoffman, Andrew Gelman, et al. The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo.J. Mach. Learn. Res., 15(1):1593–1623, 2014

work page 2014
[25]

Model-informed flows for Bayesian inference

Joohwan Ko and Justin Domke. Model-informed flows for Bayesian inference. InThe Thirty- ninth Annual Conference on Neural Information Processing Systems, 2025

work page 2025
[26]

TMB: automatic differentiation and Laplace approximation.Journal of statistical software, 70:1–21, 2016

Kasper Kristensen, Anders Nielsen, Casper W Berg, Hans Skaug, and Bradley M Bell. TMB: automatic differentiation and Laplace approximation.Journal of statistical software, 70:1–21, 2016

work page 2016
[27]

Hamiltonian Monte Carlo inference of marginalized linear mixed-effects models.Advances in Neural Information Processing Systems, 37:29435–29463, 2024

Jinlin Lai, Justin Domke, and Daniel R Sheldon. Hamiltonian Monte Carlo inference of marginalized linear mixed-effects models.Advances in Neural Information Processing Systems, 37:29435–29463, 2024

work page 2024
[28]

Charles Margossian, Aki Vehtari, Daniel Simpson, and Raj Agrawal. Hamiltonian Monte Carlo using an adjoint-differentiated Laplace approximation: Bayesian inference for latent Gaussian models and beyond.Advances in neural information processing systems, 33:9086–9097, 2020

work page 2020
[29]

General adjoint-differentiated Laplace approximation.arXiv preprint arXiv:2306.14976, 2023

Charles C Margossian. General adjoint-differentiated Laplace approximation.arXiv preprint arXiv:2306.14976, 2023

work page arXiv 2023
[30]

Margossian and Lawrence K

Charles C. Margossian and Lawrence K. Saul. Generalized guarantees for variational inference in the presence of even and elliptical symmetry.arXiv:2511.01064, 2025. 11

work page arXiv 2025
[31]

No-U-turn sampling for fast Bayesian inference in ADMB and TMB: Introducing the adnuts and tmbstan R packages.PloS one, 13(5):e0197954, 2018

Cole C Monnahan and Kasper Kristensen. No-U-turn sampling for fast Bayesian inference in ADMB and TMB: Introducing the adnuts and tmbstan R packages.PloS one, 13(5):e0197954, 2018

work page 2018
[32]

Radford M. Neal. MCMC using Hamiltonian dynamics. InHandbook of Markov Chain Monte Carlo. Chapman & Hall / CRC Press, 2012

work page 2012
[33]

Generalized linear models.Journal of the Royal Statistical Society Series A: Statistics in Society, 135(3):370–384, 1972

John Ashworth Nelder and Robert WM Wedderburn. Generalized linear models.Journal of the Royal Statistical Society Series A: Statistics in Society, 135(3):370–384, 1972

work page 1972
[34]

Monte Carlo, quasi-Monte Carlo, and randomized quasi-Monte Carlo

Art B Owen. Monte Carlo, quasi-Monte Carlo, and randomized quasi-Monte Carlo. In Monte-Carlo and Quasi-Monte Carlo Methods 1998: Proceedings of a Conference held at the Claremont Graduate University, Claremont, California, USA, June 22–26, 1998, pages 86–97. Springer, 2000

work page 1998
[35]

A general framework for the parametrization of hierarchical models.Statistical Science, pages 59–73, 2007

Omiros Papaspiliopoulos, Gareth O Roberts, and Martin Sköld. A general framework for the parametrization of hierarchical models.Statistical Science, pages 59–73, 2007

work page 2007
[36]

Transport map accelerated Markov chain Monte Carlo.SIAM/ASA Journal on Uncertainty Quantification, 6(2):645–682, 2018

Matthew D Parno and Youssef M Marzouk. Transport map accelerated Markov chain Monte Carlo.SIAM/ASA Journal on Uncertainty Quantification, 6(2):645–682, 2018

work page 2018
[37]

Composable Effects for Flexible and Accelerated Probabilistic Programming in NumPyro

Du Phan, Neeraj Pradhan, and Martin Jankowiak. Composable effects for flexible and acceler- ated probabilistic programming in NumPyro.arXiv preprint arXiv:1912.11554, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1912
[38]

Håvard Rue, Sara Martino, and Nicolas Chopin. Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations.Journal of the Royal Statistical Society Series B: Statistical Methodology, 71(2):319–392, 2009

work page 2009
[39]

Bayesian computing with INLA: A review.Annual Review of Statistics and its Application, 4: 395 – 421, 2017

Havard Rue, Andrea Riebler, Sigrunn Sorbye, Janine Illian, Daniel Simson, and Finn Lindgren. Bayesian computing with INLA: A review.Annual Review of Statistics and its Application, 4: 395 – 421, 2017. doi: https://doi.org/10.1146/annurev-statistics-060116-054045

work page doi:10.1146/annurev-statistics-060116-054045 2017
[40]

Shun and P

Z. Shun and P. McCullagh. Laplace approximation of high dimensional integrals.Journal of the Royal Statistical Society: Series B, 57(4):749–760, 1995

work page 1995
[41]

Accurate approximations for posterior moments and marginal densities.Journal of the american statistical association, 81(393):82–86, 1986

Luke Tierney and Joseph B Kadane. Accurate approximations for posterior moments and marginal densities.Journal of the american statistical association, 81(393):82–86, 1986

work page 1986
[42]

Gaussian process regression with a student-t likelihood.Advances in Neural Information Processing Systems, 22:1910–1918, 2009

Jarno Vanhatalo, Pasi Jylänki, and Aki Vehtari. Gaussian process regression with a student-t likelihood.Advances in Neural Information Processing Systems, 22:1910–1918, 2009

work page 1910
[43]

Approximate inference for disease mapping with sparse Gaussian processes.Statistics in medicine, 29(15):1580–1607, 2010

Jarno Vanhatalo, Ville Pietiläinen, and Aki Vehtari. Approximate inference for disease mapping with sparse Gaussian processes.Statistics in medicine, 29(15):1580–1607, 2010

work page 2010
[44]

GPstuff: Bayesian modeling with Gaussian processes.The Journal of Machine Learning Research, 14(1):1175–1179, 2013

Jarno Vanhatalo, Jaakko Riihimäki, Jouni Hartikainen, Pasi Jylänki, Ville Tolvanen, and Aki Vehtari. GPstuff: Bayesian modeling with Gaussian processes.The Journal of Machine Learning Research, 14(1):1175–1179, 2013

work page 2013
[45]

MIT press Cambridge, MA, 2006

Christopher KI Williams and Carl Edward Rasmussen.Gaussian processes for machine learning, volume 2. MIT press Cambridge, MA, 2006

work page 2006
[46]

Corrected Integrated Laplace Approximation

Yuling Yao, Aki Vehtari, Daniel Simpson, and Andrew Gelman. Yes, but did it work?: Evalu- ating variational inference. InProceedings of the 35th International Conference on Machine Learning, volume 80 ofProceedings of Machine Learning Research, pages 5581–5590. PMLR, 2018. 12 Appendices to “Corrected Integrated Laplace Approximation” A Proof of the theori...

work page 2018

[1] [1]

The kernel interaction trick: Fast Bayesian discovery of pairwise interactions in high dimensions

Raj Agrawal, Brian Trippe, Jonathan Huggins, and Tamara Broderick. The kernel interaction trick: Fast Bayesian discovery of pairwise interactions in high dimensions. InInternational Conference on Machine Learning, pages 141–150. PMLR, 2019

work page 2019

[2] [2]

Pseudo-marginal Hamiltonian Monte Carlo.Journal of Machine Learning Research, 22(141):1–45, 2021

Johan Alenlöv, Arnoud Doucet, and Fredrik Lindsten. Pseudo-marginal Hamiltonian Monte Carlo.Journal of Machine Learning Research, 22(141):1–45, 2021

work page 2021

[3] [3]

Christophe Andrieu and Gareth O. Roberts. The pseudo-marginal approach for efficient Monte Carlo computations.The Annals of Statistics, 37(2), April 2009. ISSN 0090-5364

work page 2009

[4] [4]

Survival Regression with Accelerated Failure Time Model in XGBoost

Martin Outzen Berild, Sara Martino, Virgilio Gómez-Rubio, and Håvard Rue. Importance sampling with the integrated nested Laplace approximation.Journal of Computational and Graphical Statistics, 31(4):1225–1237, 2022. doi: 10.1080/10618600.2022.2067551

work page doi:10.1080/10618600.2022.2067551 2022

[5] [5]

Hamiltonian Monte Carlo for hierarchical models

Michael Betancourt and Mark Girolami. Hamiltonian Monte Carlo for hierarchical models. In Current Trends in Bayesian Methodology with Applications, page 24. Chapman and Hall/CRC,

work page

[6] [6]

doi: 10.1201/b18502-5

work page doi:10.1201/b18502-5

[7] [7]

Pyro: Deep universal probabilistic programming.Journal of machine learning research, 20(28):1–6, 2019

Eli Bingham, Jonathan P Chen, Martin Jankowiak, Fritz Obermeyer, Neeraj Pradhan, Theofanis Karaletsos, Rohit Singh, Paul Szerlip, Paul Horsfall, and Noah D Goodman. Pyro: Deep universal probabilistic programming.Journal of machine learning research, 20(28):1–6, 2019

work page 2019

[8] [8]

Negative binomial loglinear mixed models.Statistical Modelling, 3(3):179–191, 2003

James G Booth, George Casella, Herwig Friedl, and James P Hobert. Negative binomial loglinear mixed models.Statistical Modelling, 3(3):179–191, 2003

work page 2003

[9] [9]

JAX: composable transformations of Python+NumPy programs, 2018

James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman-Milne, and Qiao Zhang. JAX: composable transformations of Python+NumPy programs, 2018. URL http://github.com/jax-ml/jax

work page 2018

[10] [10]

and Kristensen, Kasper and van Benthem, Koen J

Mollie E. Brooks, Kasper Kristensen, Koen J. van Benthem, Arni Magnusson, Casper W. Berg, Anders Nielsen, Hans J. Skaug, Martin Maechler, and Benjamin M. Bolker. glmmTMB balances speed and flexibility among packages for zero-inflated generalized linear mixed modeling.The R Journal, 9(2):378–400, 2017. doi: 10.32614/RJ-2017-066

work page doi:10.32614/rj-2017-066 2017

[11] [11]

Sample average approximation for black- box variational inference

Javier Burroni, Justin Domke, and Daniel Sheldon. Sample average approximation for black- box variational inference. In Negar Kiyavash and Joris M. Mooij, editors,Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence, volume 244 ofProceedings of Machine Learning Research, pages 471–498. PMLR, 15–19 Jul 2024

work page 2024

[12] [12]

BlackJAX: composable Bayesian inference in JAX.arXiv preprint arXiv:2402.10797, 2024

Alberto Cabezas, Adrien Corenflos, Junpeng Lao, Rémi Louf, Antoine Carnec, Kaustubh Chaudhari, Reuben Cohn-Gordon, Jeremie Coullon, Wei Deng, Sam Duffield, et al. BlackJAX: composable Bayesian inference in JAX.arXiv preprint arXiv:2402.10797, 2024. 10

work page arXiv 2024

[13] [13]

Monte Carlo and quasi-Monte Carlo methods.Acta numerica, 7:1–49, 1998

Russel E Caflisch. Monte Carlo and quasi-Monte Carlo methods.Acta numerica, 7:1–49, 1998

work page 1998

[14] [14]

Stan: A probabilistic programming language.Journal of statistical software, 76:1–32, 2017

Bob Carpenter, Andrew Gelman, Matthew D Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. Stan: A probabilistic programming language.Journal of statistical software, 76:1–32, 2017

work page 2017

[15] [15]

An extended simplified Laplace strategy for approximate Bayesian inference of latent Gaussian models using R-INLA

Cristian Chiuchiolo, Janet van Niekerk, and Håvard Rue. An extended simplified Laplace strategy for approximate Bayesian inference of latent Gaussian models using R-INLA. arXiv:2203.14304, 2022

work page arXiv 2022

[16] [16]

Importance weighting and variational inference.Advances in neural information processing systems, 31, 2018

Justin Domke and Daniel R Sheldon. Importance weighting and variational inference.Advances in neural information processing systems, 31, 2018

work page 2018

[17] [17]

Divide and couple: Using Monte Carlo variational objectives for posterior approximation.Advances in neural information processing systems, 32, 2019

Justin Domke and Daniel R Sheldon. Divide and couple: Using Monte Carlo variational objectives for posterior approximation.Advances in neural information processing systems, 32, 2019

work page 2019

[18] [18]

Hybrid Monte Carlo.Physics letters B, 195(2):216–222, 1987

Simon Duane, Anthony D Kennedy, Brian J Pendleton, and Duncan Roweth. Hybrid Monte Carlo.Physics letters B, 195(2):216–222, 1987

work page 1987

[19] [19]

Improving the INLA approach for approximate Bayesian inference for latent Gaussian models.Electronic Journal of Statistics, 9(2):2706–2731, 2015

Egil Ferkingstad and Håvard Rue. Improving the INLA approach for approximate Bayesian inference for latent Gaussian models.Electronic Journal of Statistics, 9(2):2706–2731, 2015. doi: 10.1214/15-EJS1092

work page doi:10.1214/15-ejs1092 2015

[20] [20]

Adaptive rejection Metropolis sampling within Gibbs sampling.Journal of the Royal Statistical Society Series C: Applied Statistics, 44 (4):455–472, 1995

Wally R Gilks, Nicky G Best, and Keith KC Tan. Adaptive rejection Metropolis sampling within Gibbs sampling.Journal of the Royal Statistical Society Series C: Applied Statistics, 44 (4):455–472, 1995

work page 1995

[21] [21]

Black box variational inference with a deterministic objective: Faster, more accurate, and even more black box.Journal of Machine Learning Research, 25(18):1–39, 2024

Ryan Giordano, Martin Ingram, and Tamara Broderick. Black box variational inference with a deterministic objective: Faster, more accurate, and even more black box.Journal of Machine Learning Research, 25(18):1–39, 2024

work page 2024

[22] [22]

Automatic reparameterisation of proba- bilistic programs

Maria Gorinova, Dave Moore, and Matthew Hoffman. Automatic reparameterisation of proba- bilistic programs. InInternational Conference on Machine Learning, pages 3648–3657. PMLR, 2020

work page 2020

[23] [23]

NeuTra-lizing Bad Geometry in Hamiltonian Monte Carlo Using Neural Transport

Matthew Hoffman, Pavel Sountsov, Joshua V Dillon, Ian Langmore, Dustin Tran, and Srinivas Vasudevan. NeuTra-lizing bad geometry in Hamiltonian Monte Carlo using neural transport. arXiv preprint arXiv:1903.03704, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1903

[24] [24]

The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo.J

Matthew D Hoffman, Andrew Gelman, et al. The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo.J. Mach. Learn. Res., 15(1):1593–1623, 2014

work page 2014

[25] [25]

Model-informed flows for Bayesian inference

Joohwan Ko and Justin Domke. Model-informed flows for Bayesian inference. InThe Thirty- ninth Annual Conference on Neural Information Processing Systems, 2025

work page 2025

[26] [26]

TMB: automatic differentiation and Laplace approximation.Journal of statistical software, 70:1–21, 2016

Kasper Kristensen, Anders Nielsen, Casper W Berg, Hans Skaug, and Bradley M Bell. TMB: automatic differentiation and Laplace approximation.Journal of statistical software, 70:1–21, 2016

work page 2016

[27] [27]

Hamiltonian Monte Carlo inference of marginalized linear mixed-effects models.Advances in Neural Information Processing Systems, 37:29435–29463, 2024

Jinlin Lai, Justin Domke, and Daniel R Sheldon. Hamiltonian Monte Carlo inference of marginalized linear mixed-effects models.Advances in Neural Information Processing Systems, 37:29435–29463, 2024

work page 2024

[28] [28]

Charles Margossian, Aki Vehtari, Daniel Simpson, and Raj Agrawal. Hamiltonian Monte Carlo using an adjoint-differentiated Laplace approximation: Bayesian inference for latent Gaussian models and beyond.Advances in neural information processing systems, 33:9086–9097, 2020

work page 2020

[29] [29]

General adjoint-differentiated Laplace approximation.arXiv preprint arXiv:2306.14976, 2023

Charles C Margossian. General adjoint-differentiated Laplace approximation.arXiv preprint arXiv:2306.14976, 2023

work page arXiv 2023

[30] [30]

Margossian and Lawrence K

Charles C. Margossian and Lawrence K. Saul. Generalized guarantees for variational inference in the presence of even and elliptical symmetry.arXiv:2511.01064, 2025. 11

work page arXiv 2025

[31] [31]

No-U-turn sampling for fast Bayesian inference in ADMB and TMB: Introducing the adnuts and tmbstan R packages.PloS one, 13(5):e0197954, 2018

Cole C Monnahan and Kasper Kristensen. No-U-turn sampling for fast Bayesian inference in ADMB and TMB: Introducing the adnuts and tmbstan R packages.PloS one, 13(5):e0197954, 2018

work page 2018

[32] [32]

Radford M. Neal. MCMC using Hamiltonian dynamics. InHandbook of Markov Chain Monte Carlo. Chapman & Hall / CRC Press, 2012

work page 2012

[33] [33]

Generalized linear models.Journal of the Royal Statistical Society Series A: Statistics in Society, 135(3):370–384, 1972

John Ashworth Nelder and Robert WM Wedderburn. Generalized linear models.Journal of the Royal Statistical Society Series A: Statistics in Society, 135(3):370–384, 1972

work page 1972

[34] [34]

Monte Carlo, quasi-Monte Carlo, and randomized quasi-Monte Carlo

Art B Owen. Monte Carlo, quasi-Monte Carlo, and randomized quasi-Monte Carlo. In Monte-Carlo and Quasi-Monte Carlo Methods 1998: Proceedings of a Conference held at the Claremont Graduate University, Claremont, California, USA, June 22–26, 1998, pages 86–97. Springer, 2000

work page 1998

[35] [35]

A general framework for the parametrization of hierarchical models.Statistical Science, pages 59–73, 2007

Omiros Papaspiliopoulos, Gareth O Roberts, and Martin Sköld. A general framework for the parametrization of hierarchical models.Statistical Science, pages 59–73, 2007

work page 2007

[36] [36]

Transport map accelerated Markov chain Monte Carlo.SIAM/ASA Journal on Uncertainty Quantification, 6(2):645–682, 2018

Matthew D Parno and Youssef M Marzouk. Transport map accelerated Markov chain Monte Carlo.SIAM/ASA Journal on Uncertainty Quantification, 6(2):645–682, 2018

work page 2018

[37] [37]

Composable Effects for Flexible and Accelerated Probabilistic Programming in NumPyro

Du Phan, Neeraj Pradhan, and Martin Jankowiak. Composable effects for flexible and acceler- ated probabilistic programming in NumPyro.arXiv preprint arXiv:1912.11554, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1912

[38] [38]

Håvard Rue, Sara Martino, and Nicolas Chopin. Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations.Journal of the Royal Statistical Society Series B: Statistical Methodology, 71(2):319–392, 2009

work page 2009

[39] [39]

Bayesian computing with INLA: A review.Annual Review of Statistics and its Application, 4: 395 – 421, 2017

Havard Rue, Andrea Riebler, Sigrunn Sorbye, Janine Illian, Daniel Simson, and Finn Lindgren. Bayesian computing with INLA: A review.Annual Review of Statistics and its Application, 4: 395 – 421, 2017. doi: https://doi.org/10.1146/annurev-statistics-060116-054045

work page doi:10.1146/annurev-statistics-060116-054045 2017

[40] [40]

Shun and P

Z. Shun and P. McCullagh. Laplace approximation of high dimensional integrals.Journal of the Royal Statistical Society: Series B, 57(4):749–760, 1995

work page 1995

[41] [41]

Accurate approximations for posterior moments and marginal densities.Journal of the american statistical association, 81(393):82–86, 1986

Luke Tierney and Joseph B Kadane. Accurate approximations for posterior moments and marginal densities.Journal of the american statistical association, 81(393):82–86, 1986

work page 1986

[42] [42]

Gaussian process regression with a student-t likelihood.Advances in Neural Information Processing Systems, 22:1910–1918, 2009

Jarno Vanhatalo, Pasi Jylänki, and Aki Vehtari. Gaussian process regression with a student-t likelihood.Advances in Neural Information Processing Systems, 22:1910–1918, 2009

work page 1910

[43] [43]

Approximate inference for disease mapping with sparse Gaussian processes.Statistics in medicine, 29(15):1580–1607, 2010

Jarno Vanhatalo, Ville Pietiläinen, and Aki Vehtari. Approximate inference for disease mapping with sparse Gaussian processes.Statistics in medicine, 29(15):1580–1607, 2010

work page 2010

[44] [44]

GPstuff: Bayesian modeling with Gaussian processes.The Journal of Machine Learning Research, 14(1):1175–1179, 2013

Jarno Vanhatalo, Jaakko Riihimäki, Jouni Hartikainen, Pasi Jylänki, Ville Tolvanen, and Aki Vehtari. GPstuff: Bayesian modeling with Gaussian processes.The Journal of Machine Learning Research, 14(1):1175–1179, 2013

work page 2013

[45] [45]

MIT press Cambridge, MA, 2006

Christopher KI Williams and Carl Edward Rasmussen.Gaussian processes for machine learning, volume 2. MIT press Cambridge, MA, 2006

work page 2006

[46] [46]

Corrected Integrated Laplace Approximation

Yuling Yao, Aki Vehtari, Daniel Simpson, and Andrew Gelman. Yes, but did it work?: Evalu- ating variational inference. InProceedings of the 35th International Conference on Machine Learning, volume 80 ofProceedings of Machine Learning Research, pages 5581–5590. PMLR, 2018. 12 Appendices to “Corrected Integrated Laplace Approximation” A Proof of the theori...

work page 2018