Recognition: 1 theorem link
· Lean TheoremSample-efficient evidence estimation of score based priors for model selection
Pith reviewed 2026-05-15 20:24 UTC · model grok-4.3
The pith
DiME estimates the model evidence for diffusion priors by integrating time-marginals from reverse diffusion posterior sampling.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
DiME estimates the model evidence of a diffusion prior by integrating over the time-marginals of posterior sampling methods, achieving accurate estimation with only a handful of posterior samples such as 20, and enabling correct model selection and prior misfit diagnosis in highly ill-conditioned nonlinear inverse problems.
What carries the argument
DiME estimator that integrates the time-marginals produced by reverse diffusion posterior sampling to approximate the marginal likelihood p(y | M).
If this is right
- The estimator can be run in tandem with any recent diffusion posterior sampling algorithm without extra forward-model evaluations.
- It selects the diffusion prior whose evidence is highest among a set of candidates on the given measurements.
- It flags prior-data mismatch when the estimated evidence is low for all candidate models.
- It reproduces known analytical evidence values on problems where the marginal likelihood can be computed directly.
- It succeeds on real, highly ill-conditioned nonlinear problems such as black-hole imaging.
Where Pith is reading between the lines
- The same marginal-integration idea could be tested on other score-based or flow-based generative priors that admit reverse sampling trajectories.
- Because the estimator re-uses samples already produced by the sampler, it could be inserted into existing imaging pipelines with negligible extra cost.
- Repeated application across a sequence of measurements might allow online adaptation of the prior while data are being collected.
Load-bearing premise
The time-marginals generated during reverse diffusion posterior sampling are sufficient to integrate to an accurate value of the model evidence.
What would settle it
A controlled experiment in which DiME's numerical estimate is compared against the exact, analytically computable model evidence for a simple diffusion prior and linear forward model; large systematic discrepancy would falsify the method.
Figures
read the original abstract
The choice of prior is central to solving ill-posed imaging inverse problems, making it essential to select one consistent with the measurements $y$ to avoid severe bias. In Bayesian inverse problems, this could be achieved by evaluating the model evidence $p(y \mid M)$ under different models $M$ that specify the prior and then selecting the one with the highest value. Diffusion models are the state-of-the-art approach to solving inverse problems with a data-driven prior; however, directly computing the model evidence with respect to a diffusion prior is intractable. Furthermore, most existing model evidence estimators require either many pointwise evaluations of the unnormalized prior density or an accurate clean prior score. We propose DiME, an estimator of the model evidence of a diffusion prior by integrating over the time-marginals of posterior sampling methods. Our method leverages the large amount of intermediate samples naturally obtained during the reverse diffusion sampling process to obtain an accurate estimation of the model evidence using only a handful of posterior samples (e.g., 20). We also demonstrate how to implement our estimator in tandem with recent diffusion posterior sampling methods. Empirically, our estimator matches the model evidence when it can be computed analytically, and it is able to both select the correct diffusion model prior and diagnose prior misfit under different highly ill-conditioned, non-linear inverse problems, including a real-world black hole imaging problem.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes DiME, an estimator for the intractable model evidence p(y|M) under score-based diffusion priors in Bayesian inverse problems. It computes the evidence by integrating time-marginals obtained as intermediate outputs during reverse-diffusion posterior sampling, claiming that this yields accurate estimates from as few as 20 posterior samples. The method is shown to match analytically computable evidence values, correctly rank diffusion priors, and diagnose prior-data mismatch on highly ill-conditioned nonlinear inverse problems, including a real-world black-hole imaging task.
Significance. If the integration identity holds without systematic bias from approximate sampling, DiME would provide a practical, sample-efficient route to prior selection for diffusion-based imaging priors where standard evidence estimators are intractable. This addresses a key bottleneck in applying data-driven priors to ill-posed problems and could improve reliability in applications such as astronomical imaging.
major comments (3)
- [§3] §3 (method derivation): The central identity that integrates the time-marginals p(x_t | y, M) over the diffusion schedule to recover p(y | M) is stated without an explicit error analysis showing that discretization bias, score approximation error, or mode-collapse in the posterior sampler integrate to zero rather than to a systematic offset in the evidence estimate.
- [§4–5] §4–5 (empirical validation): The claim that 20 posterior samples suffice for reliable evidence ranking is supported only by qualitative success on selected tasks; no variance bounds, convergence diagnostics, or ablation on quadrature accuracy for the integrated estimator are reported, leaving open whether the observed prior selection is robust to the known biases of diffusion posterior samplers in ill-conditioned regimes.
- [Table 1] Table 1 / black-hole experiment: While the method selects the correct prior, the absence of quantitative metrics (e.g., estimated evidence values with standard errors, or comparison against a gold-standard evidence estimator) makes it impossible to assess whether the ranking is driven by true evidence differences or by residual sampling bias.
minor comments (2)
- [§3] Notation for the diffusion schedule and the precise quadrature rule used to integrate the time-marginals should be stated explicitly (e.g., as an equation) rather than described only in prose.
- [§2] Related work on evidence estimation for score-based models (e.g., recent diffusion likelihood estimators) is referenced only briefly; a short comparison paragraph would clarify the novelty of the marginal-integration approach.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed comments. We address each major comment point-by-point below, indicating the revisions we will make to strengthen the manuscript.
read point-by-point responses
-
Referee: [§3] §3 (method derivation): The central identity that integrates the time-marginals p(x_t | y, M) over the diffusion schedule to recover p(y | M) is stated without an explicit error analysis showing that discretization bias, score approximation error, or mode-collapse in the posterior sampler integrate to zero rather than to a systematic offset in the evidence estimate.
Authors: The central identity is obtained by marginalizing the joint diffusion process over time, which follows directly from the definition of the evidence as the integral of the likelihood times the prior and the fact that the reverse process marginals encode the necessary information. We acknowledge that the original submission did not contain a dedicated error-propagation analysis. In the revision we will expand §3 with an explicit derivation of the identity together with a discussion of the leading error terms (discretization, score approximation, and finite-sample effects) and the conditions under which they integrate to a negligible bias, supported by the analytic test cases already present in the paper. revision: partial
-
Referee: [§4–5] §4–5 (empirical validation): The claim that 20 posterior samples suffice for reliable evidence ranking is supported only by qualitative success on selected tasks; no variance bounds, convergence diagnostics, or ablation on quadrature accuracy for the integrated estimator are reported, leaving open whether the observed prior selection is robust to the known biases of diffusion posterior samplers in ill-conditioned regimes.
Authors: We agree that additional quantitative diagnostics are required. In the revised §§4–5 we will report (i) standard errors of the evidence estimates obtained from repeated independent sampling runs, (ii) convergence curves showing stabilization of the estimate with increasing sample count, and (iii) an ablation on the quadrature rule used for the time integral. These additions will demonstrate that the ranking remains stable at 20 samples even in the ill-conditioned regimes considered. revision: yes
-
Referee: [Table 1] Table 1 / black-hole experiment: While the method selects the correct prior, the absence of quantitative metrics (e.g., estimated evidence values with standard errors, or comparison against a gold-standard evidence estimator) makes it impossible to assess whether the ranking is driven by true evidence differences or by residual sampling bias.
Authors: We will revise Table 1 to include the numerical evidence values together with standard errors computed across multiple independent posterior-sampling runs. Although no gold-standard estimator exists for the real black-hole data, we will add a controlled synthetic experiment that mimics the black-hole imaging geometry and noise level, where high-fidelity evidence references can be obtained, to confirm that residual sampling bias does not alter the ranking. revision: yes
Circularity Check
DiME evidence estimator constructed from diffusion posterior sampling intermediates without definitional reduction or self-referential fit
full rationale
The paper introduces DiME as an estimator that integrates time-marginals produced by existing reverse-diffusion posterior samplers to approximate p(y|M). No step in the provided derivation reduces the target evidence quantity to a fitted parameter, a self-citation chain, or an ansatz smuggled from prior work by the same authors. The central identity is presented as following from the marginalization properties of the diffusion process itself, with empirical checks against analytically computable cases. No load-bearing uniqueness theorem or renaming of known results is invoked. The method therefore remains self-contained against external benchmarks and receives a non-circularity finding.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The time-marginals of posterior samples generated by reverse diffusion can be integrated to estimate model evidence p(y | M).
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
DiME estimates log p(y) via E[log p(y|x0)] - D_KL(p(x0|y)||p(x0)) with integral of ||∇ log p(y|xt)||² weighted by diffusion schedule coefficients
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
Optimizing Diffusion Priors in Image Reconstruction from a Single Observation
Combining diffusion priors as a product-of-experts and optimizing exponents via Bayesian evidence maximization enables prior tuning from one observation in inverse imaging problems.
Reference graph
Works this paper leans on
-
[1]
Tweedie moment projected diffusions for inverse problems.arXiv preprint arXiv:2310.06721,
Benjamin Boys, Mark Girolami, Jakiw Pidstrigach, Sebastian Reich, Alan Mosca, and O Deniz Akyildiz. Tweedie moment projected diffusions for inverse problems.arXiv preprint arXiv:2310.06721,
-
[2]
Monte carlo guided diffusion for bayesian linear inverse problems.arXiv preprint arXiv:2308.07983,
Gabriel Cardoso, Yazid Janati El Idrissi, Sylvain Le Corff, and Eric Moulines. Monte carlo guided diffusion for bayesian linear inverse problems.arXiv preprint arXiv:2308.07983,
-
[3]
Sequential controlled langevin diffusions.arXiv preprint arXiv:2412.07081,
Junhua Chen, Lorenz Richter, Julius Berner, Denis Blessing, Gerhard Neumann, and Anima Anand- kumar. Sequential controlled langevin diffusions.arXiv preprint arXiv:2412.07081,
-
[4]
Diffusion Posterior Sampling for General Noisy Inverse Problems
Hyungjin Chung, Jeongsol Kim, Michael T Mccann, Marc L Klasky, and Jong Chul Ye. Diffusion posterior sampling for general noisy inverse problems.arXiv preprint arXiv:2209.14687,
work page internal anchor Pith review Pith/arXiv arXiv
-
[5]
11 Published as a conference paper at ICLR 2026 Arnaud Doucet, Will Grathwohl, Alexander G Matthews, and Heiko Strathmann. Score-based diffu- sion meets annealed importance sampling.Advances in Neural Information Processing Systems, 35:21482–21494,
work page 2026
-
[6]
Wei Guo, Molei Tao, and Yongxin Chen. Complexity analysis of normalizing constant estima- tion: from jarzynski equality to annealed importance sampling and beyond.arXiv preprint arXiv:2502.04575,
-
[7]
Reverse diffusion monte carlo.arXiv preprint arXiv:2307.02037,
Xunpeng Huang, Hanze Dong, Yifan Hao, Yi-An Ma, and Tong Zhang. Reverse diffusion monte carlo.arXiv preprint arXiv:2307.02037,
-
[8]
Xiang Li, Soo Min Kwon, Shijun Liang, Ismail R Alkhouri, Saiprasad Ravishankar, and Qing Qu. Decoupled data consistency with diffusion purification for image restoration.arXiv preprint arXiv:2403.06054,
-
[9]
Julia Linhart, Gabriel Victorino Cardoso, Alexandre Gramfort, Sylvain Le Corff, and Pedro LC Rodrigues. Diffusion posterior sampling for simulation-based inference in tall data settings.arXiv preprint arXiv:2404.07593,
-
[10]
Morteza Mardani, Jiaming Song, Jan Kautz, and Arash Vahdat. A variational perspective on solving inverse problems with diffusion models.arXiv preprint arXiv:2305.04391,
-
[11]
12 Published as a conference paper at ICLR 2026 Jason D McEwen, Christopher GR Wallis, Matthew A Price, and Alessio Spurio Mancini. Machine learning assisted bayesian model comparison: learnt harmonic mean estimator.arXiv preprint arXiv:2111.12720,
-
[12]
Variational diffusion posterior sampling with midpoint guidance.arXiv preprint arXiv:2410.09945,
Badr Moufad, Yazid Janati, Lisa Bedin, Alain Durmus, Randal Douc, Eric Moulines, and Jimmy Olsson. Variational diffusion posterior sampling with midpoint guidance.arXiv preprint arXiv:2410.09945,
-
[13]
Improving diffusion models for inverse problems using optimal posterior covariance
Xinyu Peng, Ziyang Zheng, Wenrui Dai, Nuoqian Xiao, Chenglin Li, Junni Zou, and Hongkai Xiong. Improving diffusion models for inverse problems using optimal posterior covariance. arXiv preprint arXiv:2402.02149,
-
[14]
Marta Skreta, Lazar Atanackovic, Avishek Joey Bose, Alexander Tong, and Kirill Neklyudov. The superposition of diffusion models using the it\ˆ o density estimator.arXiv preprint arXiv:2412.17762,
-
[15]
Score-Based Generative Modeling through Stochastic Differential Equations
Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations.arXiv preprint arXiv:2011.13456,
work page internal anchor Pith review Pith/arXiv arXiv 2011
-
[16]
Yang Song, Conor Durkan, Iain Murray, and Stefano Ermon. Maximum likelihood training of score-based diffusion models.Advances in neural information processing systems, 34:1415– 1428, 2021a. Yang Song, Liyue Shen, Lei Xing, and Stefano Ermon. Solving inverse problems in medical imaging with score-based generative models.arXiv preprint arXiv:2111.08005, 202...
-
[17]
13 Published as a conference paper at ICLR 2026 Zihui Wu, Yu Sun, Yifan Chen, Bingliang Zhang, Yisong Yue, and Katherine Bouman. Princi- pled probabilistic imaging using diffusion models as plug-and-play priors.Advances in Neural Information Processing Systems, 37:118389–118427,
work page 2026
-
[18]
Hongkai Zheng, Wenda Chu, Bingliang Zhang, Zihui Wu, Austin Wang, Berthy T Feng, Caifeng Zou, Yu Sun, Nikola Kovachki, Zachary E Ross, et al. Inversebench: Benchmarking plug-and- play diffusion priors for inverse problems in physical sciences.arXiv preprint arXiv:2503.11043,
-
[19]
This ODE has identical marginals as the SDE under ideal conditions
14 Published as a conference paper at ICLR 2026 A PROOFS Although the main text uses the diffusion SDE formulation, we introduce the Probability Flow ODE (PF-ODE) for the proofs and for Appendix B. This ODE has identical marginals as the SDE under ideal conditions. Given diffusion processx t =a tx0 +σ tzt,z t ∼ N(0,I), the PF-ODE is: dxt = h a′ t at xt − ...
work page 2026
-
[20]
−D KL(p(x0 |y)||p(x 0)). 3For likelihoods with Gaussian noise, if the posterior and prior have enough overlap such that posterior samples have a data fit ofχ 2 ≈1, this term essentially becomes constant under the same likelihood function. 16 Published as a conference paper at ICLR 2026 At the final diffusion step, the forward process has destroyed all inf...
work page 2026
-
[21]
≈ NX i=1 ∆ti σ′ ti σti −σ 2 ti a′ ti ati E xti ∼p(xti |y) ∥∇xti logp(y|x ti)∥2
Therefore, we have: DKL(p(x0 |y)||p(x 0)) =− Z T 0 d dt DKL(p(xt |y)∥p(x t))dt (i) =− Z T 0 Ext∼p(xt|y) h vp(xt |y)−v p(xt) ⊤ ∇xt logp(y|x t) i dt (ii) = Z T 0 σ′ tσt −σ 2 t a′ t at Ext∼p(xt|y) ∇xt logp(y|x t) 2 dt. ≈ NX i=1 ∆ti σ′ ti σti −σ 2 ti a′ ti ati E xti ∼p(xti |y) ∥∇xti logp(y|x ti)∥2. where (i) uses Lemma 3 but withπ(x t |y) =p(x t |y)and (ii) u...
work page 2026
-
[22]
Then∥Σ x0|xt ∥= σ2 t σ2 t +a2 t . Proof.Plugging in the covariance from Lemma1gives: ∥Σx0|xt ∥=λ max Σ−1 0 + a2 t σ2 t I −1 = 1 λmin Σ−1 0 + a2 t σ2 t I = 1 1 + a2 t σ2 t = σ2 t σ2 t +a 2 t . Lemma 2.Suppose we have diffusion processx t =a tx0 +σ tzt,z t ∼ N(0,I)and sample ˜x0 ∼ p(x0 |x t,y). Under the assumption thatp(x 0 |x t)≈ N(E[x 0 |x t],Σ x0|xt), b...
work page 2026
-
[23]
≤ ∥A∥4 σ4y σ2 t a2 t (σ2 t +a 2 t )3 . UnlikeΘ high, the variance of this estimator goes to0asσ t →0, but for the rest of diffusion process the constant factor makes it a higher variance estimator thanΘ high in practice. 19 Published as a conference paper at ICLR 2026 B GENERALIZINGDIMETO ARBITRARY POSTERIOR MARGINALS While we only presentDiMEalong the st...
work page 2026
-
[24]
Proposition 2(DiME-PnPDM: model evidence estimation along the PnP-DM marginals).Given diffusion processx t =a tx0+σtzt,z t ∼ N(0,I), PnP-DM marginalsq(x t |y)∝p(x t)q(y|x t), and timesteps0 =t 0 <· · ·< t N =T, the model evidence can be estimated via: logp(y)≈C(q,y)− NX i=1 ∆tiEx∼q(xti |y) vp(xti)T ∇xti logq(y|x ti) (17) whereC(q,y) = logE x∼N(0,I) [q(y|x...
work page 2026
-
[25]
We can also do a similar derivation for the marginals used for the Twisted Diffusion Sampler (Wu et al., 2023), which followπ(x t|y)∝p(x t)p(y|E[x0|xt]); note that this evidence gradient can be computed before resampling using SMC weights or after resampling without weights. Proposition 3(DiME-TDS: model evidence estimation along the TDS marginals).Given ...
work page 2023
-
[26]
Integrating over alltgives the estimator: logp(y) = logπ(y|t=T)− Z T 0 d dt logπ(y|t)dt ≈logπ(y|t=T)− NX i=1 ∆tiExt∼π(xti |y) vp(xti)T ∇xti logp(y|µ(x ti)) + d dt logp(y|µ(x ti)) 21 Published as a conference paper at ICLR 2026 wherelogπ(y|T)≈logE x∼N(0,σ 2 T ) [p(y|µ(x T ))]. 22 Published as a conference paper at ICLR 2026 C BASELINE METHODS In Section 4....
work page 2026
-
[27]
Computing the precision matrixΣ −1 0 : for some datasets, the covariance of specific pixel locations is nearly 0 (such as the corner regions of MNIST), so we added a jitter of 1e-2 for stability. D.2 BASELINES We use TI, AIS, and SMC as baseline methods for the mixture of Gaussians experiment (Section 4.1) and SMC for the MNIST experiment (Section 4.2; Ap...
work page 2000
-
[28]
Mean and standard deviation over 5 runs is shown here. As can be seen, lowering the noise level only further increases overfitting to training data, causing the estimated evidence to drop. σmin = 0.1 σmin = 0.05 SMC schedule GT 8 Model 6 GT 8 Model 7 GT 8 Model 8 GT 8 Model 6 GT 8 Model 7 GT 8 Model 8 Linear −182±9−212±9−210±11 −279±26−352±3−426±40 Expone...
work page 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.