arxiv: 2602.20549 · v2 · submitted 2026-02-24 · 💻 cs.LG · cs.CV· stat.ME

Recognition: 1 theorem link

· Lean Theorem

Sample-efficient evidence estimation of score based priors for model selection

Frederic Wang , Katherine L. Bouman

Authors on Pith no claims yet

Pith reviewed 2026-05-15 20:24 UTC · model grok-4.3

classification 💻 cs.LG cs.CVstat.ME

keywords model evidencediffusion priorsBayesian model selectioninverse problemsscore-based modelsposterior samplingDiME estimator

0 comments

The pith

DiME estimates the model evidence for diffusion priors by integrating time-marginals from reverse diffusion posterior sampling.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces DiME to compute the otherwise intractable model evidence p(y | M) for score-based diffusion priors used in Bayesian inverse problems. It does so by accumulating the time-marginals that arise naturally during any reverse diffusion posterior sampling run, turning the many intermediate states generated by existing samplers into an evidence estimate. With as few as 20 posterior samples the method matches analytically known evidence values and correctly ranks diffusion priors on highly ill-conditioned nonlinear imaging tasks, including real black-hole imaging data.

Core claim

DiME estimates the model evidence of a diffusion prior by integrating over the time-marginals of posterior sampling methods, achieving accurate estimation with only a handful of posterior samples such as 20, and enabling correct model selection and prior misfit diagnosis in highly ill-conditioned nonlinear inverse problems.

What carries the argument

DiME estimator that integrates the time-marginals produced by reverse diffusion posterior sampling to approximate the marginal likelihood p(y | M).

If this is right

The estimator can be run in tandem with any recent diffusion posterior sampling algorithm without extra forward-model evaluations.
It selects the diffusion prior whose evidence is highest among a set of candidates on the given measurements.
It flags prior-data mismatch when the estimated evidence is low for all candidate models.
It reproduces known analytical evidence values on problems where the marginal likelihood can be computed directly.
It succeeds on real, highly ill-conditioned nonlinear problems such as black-hole imaging.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same marginal-integration idea could be tested on other score-based or flow-based generative priors that admit reverse sampling trajectories.
Because the estimator re-uses samples already produced by the sampler, it could be inserted into existing imaging pipelines with negligible extra cost.
Repeated application across a sequence of measurements might allow online adaptation of the prior while data are being collected.

Load-bearing premise

The time-marginals generated during reverse diffusion posterior sampling are sufficient to integrate to an accurate value of the model evidence.

What would settle it

A controlled experiment in which DiME's numerical estimate is compared against the exact, analytically computable model evidence for a simple diffusion prior and linear forward model; large systematic discrepancy would falsify the method.

Figures

Figures reproduced from arXiv: 2602.20549 by Frederic Wang, Katherine L. Bouman.

**Figure 2.** Figure 2: Model evidence confusion matrix for Gaussian phase retrieval ( [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Model evidence estimates on real M87* observations across 5 different priors using exact [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: M87* reconstructions using the 5 priors using exact DAPS ( [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Left: GRMHD model validation results on M87* observations by comparing to evidence of in-distribution measurements y. Our method shows that the evidence of M87* observations have a z-score of about -0.81 compared to the evidence distribution of GRMHD measurements, indicating that M87* is statistically in-distribution of GRMHD. The evidence of simulated measurements of out-of-distribution images are also s… view at source ↗

**Figure 6.** Figure 6: Model evidence confusion matrix using sequential Monte Carlo (SMC) for Gaussian phase [PITH_FULL_IMAGE:figures/full_fig_p025_6.png] view at source ↗

**Figure 7.** Figure 7: Ground truth: GRMHD. Gaussian approximation SPACE RIAF MNIST Faces GRMHD SpaceNet Prior Mean image Path evidences (min/mean/max) [PITH_FULL_IMAGE:figures/full_fig_p026_7.png] view at source ↗

**Figure 8.** Figure 8: Ground truth: SpaceNet. 26 [PITH_FULL_IMAGE:figures/full_fig_p026_8.png] view at source ↗

**Figure 9.** Figure 9: Ground truth: CelebA. Gaussian approximation MNIST Mean image Path evidences (min/mean/max) RIAF GRMHD SpaceNet Faces MNIST Prior [PITH_FULL_IMAGE:figures/full_fig_p027_9.png] view at source ↗

**Figure 10.** Figure 10: Ground truth: MNIST. Gaussian approximation RIAF Mean image Path evidences (min/mean/max) RIAF SpaceNet GRMHD Faces RIAF Prior [PITH_FULL_IMAGE:figures/full_fig_p027_10.png] view at source ↗

**Figure 11.** Figure 11: Ground truth: RIAF. 27 [PITH_FULL_IMAGE:figures/full_fig_p027_11.png] view at source ↗

read the original abstract

The choice of prior is central to solving ill-posed imaging inverse problems, making it essential to select one consistent with the measurements $y$ to avoid severe bias. In Bayesian inverse problems, this could be achieved by evaluating the model evidence $p(y \mid M)$ under different models $M$ that specify the prior and then selecting the one with the highest value. Diffusion models are the state-of-the-art approach to solving inverse problems with a data-driven prior; however, directly computing the model evidence with respect to a diffusion prior is intractable. Furthermore, most existing model evidence estimators require either many pointwise evaluations of the unnormalized prior density or an accurate clean prior score. We propose DiME, an estimator of the model evidence of a diffusion prior by integrating over the time-marginals of posterior sampling methods. Our method leverages the large amount of intermediate samples naturally obtained during the reverse diffusion sampling process to obtain an accurate estimation of the model evidence using only a handful of posterior samples (e.g., 20). We also demonstrate how to implement our estimator in tandem with recent diffusion posterior sampling methods. Empirically, our estimator matches the model evidence when it can be computed analytically, and it is able to both select the correct diffusion model prior and diagnose prior misfit under different highly ill-conditioned, non-linear inverse problems, including a real-world black hole imaging problem.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes DiME, an estimator for the intractable model evidence p(y|M) under score-based diffusion priors in Bayesian inverse problems. It computes the evidence by integrating time-marginals obtained as intermediate outputs during reverse-diffusion posterior sampling, claiming that this yields accurate estimates from as few as 20 posterior samples. The method is shown to match analytically computable evidence values, correctly rank diffusion priors, and diagnose prior-data mismatch on highly ill-conditioned nonlinear inverse problems, including a real-world black-hole imaging task.

Significance. If the integration identity holds without systematic bias from approximate sampling, DiME would provide a practical, sample-efficient route to prior selection for diffusion-based imaging priors where standard evidence estimators are intractable. This addresses a key bottleneck in applying data-driven priors to ill-posed problems and could improve reliability in applications such as astronomical imaging.

major comments (3)

[§3] §3 (method derivation): The central identity that integrates the time-marginals p(x_t | y, M) over the diffusion schedule to recover p(y | M) is stated without an explicit error analysis showing that discretization bias, score approximation error, or mode-collapse in the posterior sampler integrate to zero rather than to a systematic offset in the evidence estimate.
[§4–5] §4–5 (empirical validation): The claim that 20 posterior samples suffice for reliable evidence ranking is supported only by qualitative success on selected tasks; no variance bounds, convergence diagnostics, or ablation on quadrature accuracy for the integrated estimator are reported, leaving open whether the observed prior selection is robust to the known biases of diffusion posterior samplers in ill-conditioned regimes.
[Table 1] Table 1 / black-hole experiment: While the method selects the correct prior, the absence of quantitative metrics (e.g., estimated evidence values with standard errors, or comparison against a gold-standard evidence estimator) makes it impossible to assess whether the ranking is driven by true evidence differences or by residual sampling bias.

minor comments (2)

[§3] Notation for the diffusion schedule and the precise quadrature rule used to integrate the time-marginals should be stated explicitly (e.g., as an equation) rather than described only in prose.
[§2] Related work on evidence estimation for score-based models (e.g., recent diffusion likelihood estimators) is referenced only briefly; a short comparison paragraph would clarify the novelty of the marginal-integration approach.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed comments. We address each major comment point-by-point below, indicating the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: [§3] §3 (method derivation): The central identity that integrates the time-marginals p(x_t | y, M) over the diffusion schedule to recover p(y | M) is stated without an explicit error analysis showing that discretization bias, score approximation error, or mode-collapse in the posterior sampler integrate to zero rather than to a systematic offset in the evidence estimate.

Authors: The central identity is obtained by marginalizing the joint diffusion process over time, which follows directly from the definition of the evidence as the integral of the likelihood times the prior and the fact that the reverse process marginals encode the necessary information. We acknowledge that the original submission did not contain a dedicated error-propagation analysis. In the revision we will expand §3 with an explicit derivation of the identity together with a discussion of the leading error terms (discretization, score approximation, and finite-sample effects) and the conditions under which they integrate to a negligible bias, supported by the analytic test cases already present in the paper. revision: partial
Referee: [§4–5] §4–5 (empirical validation): The claim that 20 posterior samples suffice for reliable evidence ranking is supported only by qualitative success on selected tasks; no variance bounds, convergence diagnostics, or ablation on quadrature accuracy for the integrated estimator are reported, leaving open whether the observed prior selection is robust to the known biases of diffusion posterior samplers in ill-conditioned regimes.

Authors: We agree that additional quantitative diagnostics are required. In the revised §§4–5 we will report (i) standard errors of the evidence estimates obtained from repeated independent sampling runs, (ii) convergence curves showing stabilization of the estimate with increasing sample count, and (iii) an ablation on the quadrature rule used for the time integral. These additions will demonstrate that the ranking remains stable at 20 samples even in the ill-conditioned regimes considered. revision: yes
Referee: [Table 1] Table 1 / black-hole experiment: While the method selects the correct prior, the absence of quantitative metrics (e.g., estimated evidence values with standard errors, or comparison against a gold-standard evidence estimator) makes it impossible to assess whether the ranking is driven by true evidence differences or by residual sampling bias.

Authors: We will revise Table 1 to include the numerical evidence values together with standard errors computed across multiple independent posterior-sampling runs. Although no gold-standard estimator exists for the real black-hole data, we will add a controlled synthetic experiment that mimics the black-hole imaging geometry and noise level, where high-fidelity evidence references can be obtained, to confirm that residual sampling bias does not alter the ranking. revision: yes

Circularity Check

0 steps flagged

DiME evidence estimator constructed from diffusion posterior sampling intermediates without definitional reduction or self-referential fit

full rationale

The paper introduces DiME as an estimator that integrates time-marginals produced by existing reverse-diffusion posterior samplers to approximate p(y|M). No step in the provided derivation reduces the target evidence quantity to a fitted parameter, a self-citation chain, or an ansatz smuggled from prior work by the same authors. The central identity is presented as following from the marginalization properties of the diffusion process itself, with empirical checks against analytically computable cases. No load-bearing uniqueness theorem or renaming of known results is invoked. The method therefore remains self-contained against external benchmarks and receives a non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on standard diffusion model and Bayesian evidence definitions; no new free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption The time-marginals of posterior samples generated by reverse diffusion can be integrated to estimate model evidence p(y | M).
This is the core mechanism of the DiME estimator.

pith-pipeline@v0.9.0 · 5547 in / 1307 out tokens · 37418 ms · 2026-05-15T20:24:24.901756+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

DiME estimates log p(y) via E[log p(y|x0)] - D_KL(p(x0|y)||p(x0)) with integral of ||∇ log p(y|xt)||² weighted by diffusion schedule coefficients

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Optimizing Diffusion Priors in Image Reconstruction from a Single Observation
cs.CV 2026-04 unverdicted novelty 6.0

Combining diffusion priors as a product-of-experts and optimizing exponents via Bayesian evidence maximization enables prior tuning from one observation in inverse imaging problems.

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages · cited by 1 Pith paper · 2 internal anchors

[1]

Tweedie moment projected diffusions for inverse problems.arXiv preprint arXiv:2310.06721,

Benjamin Boys, Mark Girolami, Jakiw Pidstrigach, Sebastian Reich, Alan Mosca, and O Deniz Akyildiz. Tweedie moment projected diffusions for inverse problems.arXiv preprint arXiv:2310.06721,

work page arXiv
[2]

Monte carlo guided diffusion for bayesian linear inverse problems.arXiv preprint arXiv:2308.07983,

Gabriel Cardoso, Yazid Janati El Idrissi, Sylvain Le Corff, and Eric Moulines. Monte carlo guided diffusion for bayesian linear inverse problems.arXiv preprint arXiv:2308.07983,

work page arXiv
[3]

Sequential controlled langevin diffusions.arXiv preprint arXiv:2412.07081,

Junhua Chen, Lorenz Richter, Julius Berner, Denis Blessing, Gerhard Neumann, and Anima Anand- kumar. Sequential controlled langevin diffusions.arXiv preprint arXiv:2412.07081,

work page arXiv
[4]

Diffusion Posterior Sampling for General Noisy Inverse Problems

Hyungjin Chung, Jeongsol Kim, Michael T Mccann, Marc L Klasky, and Jong Chul Ye. Diffusion posterior sampling for general noisy inverse problems.arXiv preprint arXiv:2209.14687,

work page internal anchor Pith review Pith/arXiv arXiv
[5]

Score-based diffu- sion meets annealed importance sampling.Advances in Neural Information Processing Systems, 35:21482–21494,

11 Published as a conference paper at ICLR 2026 Arnaud Doucet, Will Grathwohl, Alexander G Matthews, and Heiko Strathmann. Score-based diffu- sion meets annealed importance sampling.Advances in Neural Information Processing Systems, 35:21482–21494,

work page 2026
[6]

Complexity analysis of normalizing constant estima- tion: from jarzynski equality to annealed importance sampling and beyond.arXiv preprint arXiv:2502.04575,

Wei Guo, Molei Tao, and Yongxin Chen. Complexity analysis of normalizing constant estima- tion: from jarzynski equality to annealed importance sampling and beyond.arXiv preprint arXiv:2502.04575,

work page arXiv
[7]

Reverse diffusion monte carlo.arXiv preprint arXiv:2307.02037,

Xunpeng Huang, Hanze Dong, Yifan Hao, Yi-An Ma, and Tong Zhang. Reverse diffusion monte carlo.arXiv preprint arXiv:2307.02037,

work page arXiv
[8]

Decoupled data consistency with diffusion purification for image restoration.arXiv preprint arXiv:2403.06054,

Xiang Li, Soo Min Kwon, Shijun Liang, Ismail R Alkhouri, Saiprasad Ravishankar, and Qing Qu. Decoupled data consistency with diffusion purification for image restoration.arXiv preprint arXiv:2403.06054,

work page arXiv
[9]

Diffusion posterior sampling for simulation-based inference in tall data settings.arXiv preprint arXiv:2404.07593,

Julia Linhart, Gabriel Victorino Cardoso, Alexandre Gramfort, Sylvain Le Corff, and Pedro LC Rodrigues. Diffusion posterior sampling for simulation-based inference in tall data settings.arXiv preprint arXiv:2404.07593,

work page arXiv
[10]

A variational perspective on solving inverse problems with diffusion models.arXiv preprint arXiv:2305.04391,

Morteza Mardani, Jiaming Song, Jan Kautz, and Arash Vahdat. A variational perspective on solving inverse problems with diffusion models.arXiv preprint arXiv:2305.04391,

work page arXiv
[11]

Machine learning assisted bayesian model comparison: learnt harmonic mean estimator.arXiv preprint arXiv:2111.12720,

12 Published as a conference paper at ICLR 2026 Jason D McEwen, Christopher GR Wallis, Matthew A Price, and Alessio Spurio Mancini. Machine learning assisted bayesian model comparison: learnt harmonic mean estimator.arXiv preprint arXiv:2111.12720,

work page arXiv 2026
[12]

Variational diffusion posterior sampling with midpoint guidance.arXiv preprint arXiv:2410.09945,

Badr Moufad, Yazid Janati, Lisa Bedin, Alain Durmus, Randal Douc, Eric Moulines, and Jimmy Olsson. Variational diffusion posterior sampling with midpoint guidance.arXiv preprint arXiv:2410.09945,

work page arXiv
[13]

Improving diffusion models for inverse problems using optimal posterior covariance

Xinyu Peng, Ziyang Zheng, Wenrui Dai, Nuoqian Xiao, Chenglin Li, Junni Zou, and Hongkai Xiong. Improving diffusion models for inverse problems using optimal posterior covariance. arXiv preprint arXiv:2402.02149,

work page arXiv
[14]

The superposition of diffusion models using the it\ˆ o density estimator.arXiv preprint arXiv:2412.17762,

Marta Skreta, Lazar Atanackovic, Avishek Joey Bose, Alexander Tong, and Kirill Neklyudov. The superposition of diffusion models using the it\ˆ o density estimator.arXiv preprint arXiv:2412.17762,

work page arXiv
[15]

Score-Based Generative Modeling through Stochastic Differential Equations

Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations.arXiv preprint arXiv:2011.13456,

work page internal anchor Pith review Pith/arXiv arXiv 2011
[16]

Maximum likelihood training of score-based diffusion models.Advances in neural information processing systems, 34:1415– 1428, 2021a

Yang Song, Conor Durkan, Iain Murray, and Stefano Ermon. Maximum likelihood training of score-based diffusion models.Advances in neural information processing systems, 34:1415– 1428, 2021a. Yang Song, Liyue Shen, Lei Xing, and Stefano Ermon. Solving inverse problems in medical imaging with score-based generative models.arXiv preprint arXiv:2111.08005, 202...

work page arXiv
[17]

Princi- pled probabilistic imaging using diffusion models as plug-and-play priors.Advances in Neural Information Processing Systems, 37:118389–118427,

13 Published as a conference paper at ICLR 2026 Zihui Wu, Yu Sun, Yifan Chen, Bingliang Zhang, Yisong Yue, and Katherine Bouman. Princi- pled probabilistic imaging using diffusion models as plug-and-play priors.Advances in Neural Information Processing Systems, 37:118389–118427,

work page 2026
[18]

Inversebench: Benchmarking plug-and- play diffusion priors for inverse problems in physical sciences.arXiv preprint arXiv:2503.11043,

Hongkai Zheng, Wenda Chu, Bingliang Zhang, Zihui Wu, Austin Wang, Berthy T Feng, Caifeng Zou, Yu Sun, Nikola Kovachki, Zachary E Ross, et al. Inversebench: Benchmarking plug-and- play diffusion priors for inverse problems in physical sciences.arXiv preprint arXiv:2503.11043,

work page arXiv
[19]

This ODE has identical marginals as the SDE under ideal conditions

14 Published as a conference paper at ICLR 2026 A PROOFS Although the main text uses the diffusion SDE formulation, we introduce the Probability Flow ODE (PF-ODE) for the proofs and for Appendix B. This ODE has identical marginals as the SDE under ideal conditions. Given diffusion processx t =a tx0 +σ tzt,z t ∼ N(0,I), the PF-ODE is: dxt = h a′ t at xt − ...

work page 2026
[20]

−D KL(p(x0 |y)||p(x 0)). 3For likelihoods with Gaussian noise, if the posterior and prior have enough overlap such that posterior samples have a data fit ofχ 2 ≈1, this term essentially becomes constant under the same likelihood function. 16 Published as a conference paper at ICLR 2026 At the final diffusion step, the forward process has destroyed all inf...

work page 2026
[21]

≈ NX i=1 ∆ti σ′ ti σti −σ 2 ti a′ ti ati E xti ∼p(xti |y) ∥∇xti logp(y|x ti)∥2

Therefore, we have: DKL(p(x0 |y)||p(x 0)) =− Z T 0 d dt DKL(p(xt |y)∥p(x t))dt (i) =− Z T 0 Ext∼p(xt|y) h vp(xt |y)−v p(xt) ⊤ ∇xt logp(y|x t) i dt (ii) = Z T 0 σ′ tσt −σ 2 t a′ t at Ext∼p(xt|y) ∇xt logp(y|x t) 2 dt. ≈ NX i=1 ∆ti σ′ ti σti −σ 2 ti a′ ti ati E xti ∼p(xti |y) ∥∇xti logp(y|x ti)∥2. where (i) uses Lemma 3 but withπ(x t |y) =p(x t |y)and (ii) u...

work page 2026
[22]

Proof.Plugging in the covariance from Lemma1gives: ∥Σx0|xt ∥=λ max Σ−1 0 + a2 t σ2 t I −1 = 1 λmin Σ−1 0 + a2 t σ2 t I = 1 1 + a2 t σ2 t = σ2 t σ2 t +a 2 t

Then∥Σ x0|xt ∥= σ2 t σ2 t +a2 t . Proof.Plugging in the covariance from Lemma1gives: ∥Σx0|xt ∥=λ max Σ−1 0 + a2 t σ2 t I −1 = 1 λmin Σ−1 0 + a2 t σ2 t I = 1 1 + a2 t σ2 t = σ2 t σ2 t +a 2 t . Lemma 2.Suppose we have diffusion processx t =a tx0 +σ tzt,z t ∼ N(0,I)and sample ˜x0 ∼ p(x0 |x t,y). Under the assumption thatp(x 0 |x t)≈ N(E[x 0 |x t],Σ x0|xt), b...

work page 2026
[23]

UnlikeΘ high, the variance of this estimator goes to0asσ t →0, but for the rest of diffusion process the constant factor makes it a higher variance estimator thanΘ high in practice

≤ ∥A∥4 σ4y σ2 t a2 t (σ2 t +a 2 t )3 . UnlikeΘ high, the variance of this estimator goes to0asσ t →0, but for the rest of diffusion process the constant factor makes it a higher variance estimator thanΘ high in practice. 19 Published as a conference paper at ICLR 2026 B GENERALIZINGDIMETO ARBITRARY POSTERIOR MARGINALS While we only presentDiMEalong the st...

work page 2026
[24]

Proposition 2(DiME-PnPDM: model evidence estimation along the PnP-DM marginals).Given diffusion processx t =a tx0+σtzt,z t ∼ N(0,I), PnP-DM marginalsq(x t |y)∝p(x t)q(y|x t), and timesteps0 =t 0 <· · ·< t N =T, the model evidence can be estimated via: logp(y)≈C(q,y)− NX i=1 ∆tiEx∼q(xti |y) vp(xti)T ∇xti logq(y|x ti) (17) whereC(q,y) = logE x∼N(0,I) [q(y|x...

work page 2026
[25]

We can also do a similar derivation for the marginals used for the Twisted Diffusion Sampler (Wu et al., 2023), which followπ(x t|y)∝p(x t)p(y|E[x0|xt]); note that this evidence gradient can be computed before resampling using SMC weights or after resampling without weights. Proposition 3(DiME-TDS: model evidence estimation along the TDS marginals).Given ...

work page 2023
[26]

22 Published as a conference paper at ICLR 2026 C BASELINE METHODS In Section 4.1, we compared to five baseline methods, described here

Integrating over alltgives the estimator: logp(y) = logπ(y|t=T)− Z T 0 d dt logπ(y|t)dt ≈logπ(y|t=T)− NX i=1 ∆tiExt∼π(xti |y) vp(xti)T ∇xti logp(y|µ(x ti)) + d dt logp(y|µ(x ti)) 21 Published as a conference paper at ICLR 2026 wherelogπ(y|T)≈logE x∼N(0,σ 2 T ) [p(y|µ(x T ))]. 22 Published as a conference paper at ICLR 2026 C BASELINE METHODS In Section 4....

work page 2026
[27]

D.2 BASELINES We use TI, AIS, and SMC as baseline methods for the mixture of Gaussians experiment (Section 4.1) and SMC for the MNIST experiment (Section 4.2; Appendix E)

Computing the precision matrixΣ −1 0 : for some datasets, the covariance of specific pixel locations is nearly 0 (such as the corner regions of MNIST), so we added a jitter of 1e-2 for stability. D.2 BASELINES We use TI, AIS, and SMC as baseline methods for the mixture of Gaussians experiment (Section 4.1) and SMC for the MNIST experiment (Section 4.2; Ap...

work page 2000
[28]

As can be seen, lowering the noise level only further increases overfitting to training data, causing the estimated evidence to drop

Mean and standard deviation over 5 runs is shown here. As can be seen, lowering the noise level only further increases overfitting to training data, causing the estimated evidence to drop. σmin = 0.1 σmin = 0.05 SMC schedule GT 8 Model 6 GT 8 Model 7 GT 8 Model 8 GT 8 Model 6 GT 8 Model 7 GT 8 Model 8 Linear −182±9−212±9−210±11 −279±26−352±3−426±40 Expone...

work page 2026