Filtered Posterior Mean Collections: A Unified Framework for Analytical Models of Diffusion Generalization

Berend Zwartsenberg; Frank Wood; Matthew Niedoba

REVIEW 2 major objections 2 minor 34 references

Filtered Posterior Mean Collections unify analytical models of diffusion model generalization from training patches.

Reviewed by Pith at T0; open to challenge. T0 means a machine referee read the full paper against a public rubric. the ladder, T0–T4 →

Challenge this review Re-run · record.json Download PDF Read on arXiv ↗

T0 review · grok-4.3

2026-06-30 16:19 UTC pith:SY5N2IVH

load-bearing objection FPMCs give a clean unification of posterior-mean models for diffusion denoisers plus some sample gains, but the paper skips any direct check on how closely the construction matches real network outputs. the 2 major comments →

arxiv 2605.24192 v1 pith:SY5N2IVH submitted 2026-05-22 cs.LG cs.AIcs.CV

Filtered Posterior Mean Collections: A Unified Framework for Analytical Models of Diffusion Generalization

Matthew Niedoba , Berend Zwartsenberg , Frank Wood This is my paper

classification cs.LG cs.AIcs.CV

keywords diffusion modelsgeneralizationposterior meansanalytical modelsdenoising networksunified frameworkimage generation

verification ladder T0 review T1 audit T2 compute T3 formal T4 reserved

The pith

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper consolidates multiple existing methods that explain the generalization behavior of denoising networks in diffusion models as weighted averages drawn from training data patches. It defines a single model class called Filtered Posterior Mean Collections using three axes: query precision vectors, response weights, and source distributions. Specific settings on these axes recover prior methods as special cases. The authors then vary the axes, finding that soft relaxations and source augmentations yield better generated samples on image datasets.

Core claim

We consolidate these approaches into a unified model class which we call Filtered Posterior Mean Collections (FPMCs). We define this model class using query precision vectors, response weights, and source distributions, and illustrate that existing methods are recoverable with specific choices of these design axes. Investigating each axis in turn, we find that FPMC performance can be improved with soft relaxations of prior patch-based methods, and through augmentations of source distributions. Applying these findings to an existing FPMC, we demonstrate consistent sample improvement across three natural image datasets.

What carries the argument

Filtered Posterior Mean Collections (FPMCs), a model class defined by choices of query precision vectors, response weights, and source distributions that recovers prior analytical models of denoising generalization as special cases.

Load-bearing premise

The outputs of neural-network denoising functions can be modeled as posterior weighted averages of training dataset patches.

What would settle it

A counterexample denoising network whose outputs cannot be expressed as any Filtered Posterior Mean Collection for any choice of the three design axes would falsify the claimed unification.

Watch this falsifier — get emailed when new claim-graph text bears on it.

If this is right

Soft relaxations of prior patch-based methods improve FPMC performance.
Augmentations of source distributions improve FPMC performance.
The improved FPMC choices produce consistent sample quality gains on three natural image datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The three-axis parameterization could be used to search for previously untested combinations that further improve generalization.
If the unification holds, it offers a systematic way to compare why different diffusion architectures exhibit similar generalization patterns.
The same design-axis approach might extend to analytical modeling of other generative processes beyond diffusion.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit.

Desk Editor's Note

FPMCs give a clean unification of posterior-mean models for diffusion denoisers plus some sample gains, but the paper skips any direct check on how closely the construction matches real network outputs.

read the letter

The main thing here is that they collect several lines of work that treat diffusion denoisers as weighted averages of training patches and fold them into one class called Filtered Posterior Mean Collections. The class is defined by three axes—query precision vectors, response weights, and source distributions—and they show that prior methods are recovered by particular choices on those axes. They then test softer weightings and larger source sets and report consistent sample quality lifts on three natural image datasets.

That unification and the concrete levers it supplies are the useful part. It gives the small group working on analytical models a shared set of knobs and some evidence that relaxing the hard constraints helps in practice.

The soft spot is exactly the one flagged in the stress test. The framework only explains generalization if the neural net outputs are actually close to these posterior means. The abstract and the reported results contain no quantitative comparison of approximation error between a trained denoiser and its FPMC counterpart on held-out queries. Sample improvement alone does not settle whether the model is capturing the network’s behavior or simply offering a flexible parameterization for tweaks.

This is for people already inside the analytical-diffusion corner who want a common language and a few new experiments to build on. It is not aimed at practitioners who just want better generators.

I would send it to peer review. The consolidation is new, the experiments are concrete, and the modeling assumption is stated clearly enough that referees can press on the missing validation.

Referee Report

2 major / 2 minor

Summary. The paper introduces Filtered Posterior Mean Collections (FPMCs) as a unified analytical framework for modeling the outputs of neural-network denoisers in diffusion models. FPMCs are parameterized by query precision vectors, response weights, and source distributions; the authors show that several prior patch-based methods arise as special cases under particular choices of these axes. They then investigate relaxations (soft weighting) and source-distribution augmentations, and report consistent improvements in sample quality when these FPMCs are substituted into the denoising step on three natural-image datasets.

Significance. If the core modeling assumption holds—that trained denoiser outputs are well-approximated by posterior-weighted averages of training patches—the unification supplies a compact design space for analytical diffusion models and could guide architecture or training choices without retraining full networks. The reported sample-quality gains on multiple datasets would then constitute evidence that the framework is not merely descriptive but practically useful. The absence of a direct quantitative check on modeling fidelity, however, limits the strength of this significance claim.

major comments (2)

[Abstract & §3 (model definition)] The central modeling assumption (that NN denoiser outputs equal or are well-approximated by posterior patch averages) is stated in the abstract and used to motivate the entire FPMC construction, yet no quantitative diagnostic—e.g., per-pixel or per-patch L2 error between a trained denoiser and its FPMC counterpart on held-out queries—is reported. Sample-quality improvement alone does not establish that the analytical form captures the network’s generalization behavior rather than simply providing a convenient re-parameterization.
[§4] §4 (empirical evaluation): the claim of “consistent sample improvement across three natural image datasets” is presented without baseline controls that isolate the contribution of the FPMC parameterization from other implementation details (e.g., number of function evaluations, scheduler, or post-processing). It is therefore unclear whether the reported gains are attributable to the analytical model or to incidental hyper-parameter changes.

minor comments (2)

[§2] Notation for the three design axes (query precision vectors, response weights, source distributions) is introduced without an explicit summary table; a single table listing the axes, their mathematical symbols, and the special-case choices that recover prior methods would improve readability.
[§4] The paper does not discuss computational cost of evaluating the FPMC versus the original neural denoiser; if the analytical form is intended as a drop-in replacement, this comparison is needed to assess practicality.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major point below and will revise the paper accordingly to strengthen the presentation of our contributions.

read point-by-point responses

Referee: [Abstract & §3 (model definition)] The central modeling assumption (that NN denoiser outputs equal or are well-approximated by posterior patch averages) is stated in the abstract and used to motivate the entire FPMC construction, yet no quantitative diagnostic—e.g., per-pixel or per-patch L2 error between a trained denoiser and its FPMC counterpart on held-out queries—is reported. Sample-quality improvement alone does not establish that the analytical form captures the network’s generalization behavior rather than simply providing a convenient re-parameterization.

Authors: We acknowledge that a direct quantitative diagnostic comparing trained denoiser outputs to their FPMC approximations would provide stronger evidence that the framework captures generalization behavior. The manuscript's primary focus is the unification of prior methods as special cases of FPMCs and the empirical gains from design choices within the framework. In the revised version we will add per-patch L2 error measurements on held-out queries to quantify approximation fidelity for the configurations used in our experiments. revision: yes
Referee: [§4] §4 (empirical evaluation): the claim of “consistent sample improvement across three natural image datasets” is presented without baseline controls that isolate the contribution of the FPMC parameterization from other implementation details (e.g., number of function evaluations, scheduler, or post-processing). It is therefore unclear whether the reported gains are attributable to the analytical model or to incidental hyper-parameter changes.

Authors: Our §4 experiments hold the diffusion pipeline fixed (scheduler, number of function evaluations, and post-processing) while substituting different FPMC instantiations, with the original patch-based methods serving as the direct baselines corresponding to specific FPMC parameter settings. The reported gains therefore arise from the soft relaxations and source augmentations. To further isolate these effects we will include additional matched-hyperparameter ablations in the revision. revision: yes

Circularity Check

0 steps flagged

No circularity: FPMC class defined independently and recovers priors as instances

full rationale

The paper defines the FPMC model class directly via three design axes (query precision vectors, response weights, source distributions) and shows prior methods arise from specific choices of those axes. No equation or claim reduces a prediction to a fitted parameter by construction, nor does any load-bearing step rely on a self-citation chain that itself lacks independent verification. Empirical sample improvements on three datasets are reported after exploring the axes, but these are presented as experimental outcomes rather than derivations that presuppose the target result. The modeling assumption that NN denoisers behave as posterior patch averages is stated as a premise, not derived from the framework's outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no specific free parameters, axioms, or invented entities can be identified from the provided information.

pith-pipeline@v0.9.1-grok · 5677 in / 1224 out tokens · 65147 ms · 2026-06-30T16:19:10.730427+00:00 · methodology

0 comments

read the original abstract

The neural-network denoising functions which form the backbone of image diffusion models are remarkably consistent in their generalization behaviour across a wide variety of network architectures and training procedure hyperparameters. A recent line of research has sought to model the outputs of these networks by aggregating posterior weighted averages of training dataset patches. In this work, we consolidate these approaches into a unified model class which we call Filtered Posterior Mean Collections (FPMCs). We define this model class using query precision vectors, response weights, and source distributions, and illustrate that existing methods are recoverable with specific choices of these design axes. Investigating each axis in turn, we find that FPMC performance can be improved with soft relaxations of prior patch-based methods, and through augmentations of source distributions. Applying these findings to an existing FPMC, we demonstrate consistent sample improvement across three natural image datasets.

Figures

Figures reproduced from arXiv: 2605.24192 by Berend Zwartsenberg, Frank Wood, Matthew Niedoba.

**Figure 1.** Figure 1: Visualization of q and r for the Lukoianov et al. [18] FPMC on CIFAR-10 at t = 3.2. All vectors are rescaled to [0, 1] for visualization. Response weight r: The one-hot response controls the output of the estimator, here a single channel of one pixel. Wiener Filter: The corresponding row of the Wiener filter exhibits a smooth, spatially localized structure. Lukoianov q: Thresholding destroys the graded str… view at source ↗

**Figure 2.** Figure 2: Effect of fine-tuning the Q and R on denoiser error vs a CIFAR-10 DDPM++ denoiser for three prior FPMC methodologies. Mean squared error is reported over 1000 z per t value. Top row: Learning soft Q (blue) consistently reduces the MSE of each method when compared against the corresponding binary Q FPMC baseline (black). Bottom row: Learning soft R (orange) improves each FPMC versus the binary R baseline (b… view at source ↗

**Figure 3.** Figure 3: Comparison of the relative change in FPMC denoiser MSE vs a DDPM++ denoiser on the [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Comparison of samples generated by various denoisers with shared initial [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Comparison of additional samples generated by various denoisers with shared initial [PITH_FULL_IMAGE:figures/full_fig_p025_5.png] view at source ↗

**Figure 6.** Figure 6: Comparison of additional samples generated by various denoisers with shared initial [PITH_FULL_IMAGE:figures/full_fig_p026_6.png] view at source ↗

**Figure 7.** Figure 7: Comparison of additional samples generated by various denoisers with shared initial [PITH_FULL_IMAGE:figures/full_fig_p027_7.png] view at source ↗

discussion (0)

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages · 1 internal anchor

[1]

Bertrand, A

Q. Bertrand, A. Gagneux, M. Massias, and R. Emonet. On the closed-form of flow matching: Generalization does not arise from target stochasticity. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

work page 2025
[2]

Biroli, T

G. Biroli, T. Bonnaire, V . De Bortoli, and M. Mézard. Dynamical regimes of diffusion models. Nature Communications, 15(1):9957, 2024

work page 2024
[3]

Buades, B

A. Buades, B. Coll, and J.-M. Morel. Non-local means denoising.Image processing on line, 1: 208–212, 2011

work page 2011
[4]

Y . Choi, Y . Uh, J. Yoo, and J.-W. Ha. Stargan v2: Diverse image synthesis for multiple domains. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020

work page 2020
[5]

Dabov, A

K. Dabov, A. Foi, V . Katkovnik, and K. Egiazarian. Image denoising by sparse 3-d transform- domain collaborative filtering.IEEE Transactions on image processing, 16(8):2080–2095, 2007

work page 2080
[6]

X. Gu, C. Du, T. Pang, C. Li, M. Lin, and Y . Wang. On memorization in diffusion models. Transactions on Machine Learning Research, 2025

work page 2025
[7]

Harvey, S

W. Harvey, S. Naderiparizi, V . Masrani, C. Weilbach, and F. Wood. Flexible diffusion modeling of long videos. InAdvances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022, 2022

work page 2022
[8]

J. Ho, A. Jain, and P. Abbeel. Denoising diffusion probabilistic models. InAdvances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020

work page 2020
[9]

Kadkhodaie, F

Z. Kadkhodaie, F. Guth, E. P. Simoncelli, and S. Mallat. Generalization in diffusion mod- els arises from geometry-adaptive harmonic representations. InThe Twelfth International Conference on Learning Representations, 2023

work page 2023
[10]

Kamb and S

M. Kamb and S. Ganguli. An analytic theory of creativity in convolutional diffusion models. In International Conference on Machine Learning, pages 28795–28831. PMLR, 2025

work page 2025
[11]

Karras, S

T. Karras, S. Laine, and T. Aila. A style-based generator architecture for generative adversarial networks. InIEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pages 4401–4410. Computer Vision Foundation / IEEE, 2019

work page 2019
[12]

Karras, M

T. Karras, M. Aittala, J. Hellsten, S. Laine, J. Lehtinen, and T. Aila. Training generative adversarial networks with limited data.Advances in neural information processing systems, 33: 12104–12114, 2020

work page 2020
[13]

Karras, M

T. Karras, M. Aittala, T. Aila, and S. Laine. Elucidating the design space of diffusion-based generative models.Advances in neural information processing systems, 35:26565–26577, 2022

work page 2022
[14]

Krizhevsky, G

A. Krizhevsky, G. Hinton, et al. Learning multiple layers of features from tiny images. 2009

work page 2009
[15]

C.-H. Lai, Y . Song, D. Kim, Y . Mitsufuji, and S. Ermon. The principles of diffusion models. arXiv preprint arXiv:2510.21890, 2025. 10

work page internal anchor Pith review Pith/arXiv arXiv 2025
[16]

X. Li, Y . Dai, and Q. Qu. Understanding generalizability of diffusion models requires rethinking the hidden gaussian structure. InAdvances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 - 15, 2024, 2024

work page 2024
[17]

Loshchilov and F

I. Loshchilov and F. Hutter. Decoupled weight decay regularization. In5th International Conference on Learning Representations, 2017

work page 2017
[18]

Lukoianov, C

A. Lukoianov, C. Yuan, J. Solomon, and V . Sitzmann. Locality in image diffusion models emerges from data statistics. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

work page 2025
[19]

Niedoba, D

M. Niedoba, D. Green, S. Naderiparizi, V . Lioutas, J. W. Lavington, X. Liang, Y . Liu, K. Zhang, S. Dabiri, A. Scibior, et al. Nearest neighbour score estimators for diffusion generative models. InForty-first International Conference on Machine Learning, 2024

work page 2024
[20]

Niedoba, B

M. Niedoba, B. Zwartsenberg, K. P. Murphy, and F. Wood. Towards a mechanistic explanation of diffusion model generalization. InInternational Conference on Machine Learning, pages 46389–46411. PMLR, 2025

work page 2025
[21]

Rombach, A

R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022

work page 2022
[22]

Roth and M

S. Roth and M. J. Black. Fields of experts: A framework for learning image priors. In2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), volume 2, pages 860–867. IEEE, 2005

work page 2005
[23]

Scarvelis, H

C. Scarvelis, H. S. d. O. Borde, and J. Solomon. Closed-form diffusion models.Transactions on Machine Learning Research, 2025

work page 2025
[24]

Sohl-Dickstein, E

J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. InInternational conference on machine learning, pages 2256–2265. PMLR, 2015

work page 2015
[25]

K. Song, J. Kim, S. Chen, Y . Du, S. Kakade, and V . Sitzmann. Selective underfitting in diffusion models.arXiv preprint arXiv:2510.01378, 2025

work page arXiv 2025
[26]

Y . Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole. Score-based generative modeling through stochastic differential equations. In9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021, 2021

work page 2021
[27]

J. J. Vastola. Generalization through variance: how noise shapes inductive biases in diffusion models. InThe Thirteenth International Conference on Learning Representations, 2025

work page 2025
[28]

Vaswani, N

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017

work page 2017
[29]

P. Vincent. A connection between score matching and denoising autoencoders.Neural compu- tation, 23(7):1661–1674, 2011

work page 2011
[30]

Wang and J

B. Wang and J. J. Vastola. The unreasonable effectiveness of gaussian score approximation for diffusion models and its applications.Transactions on Machine Learning Research, 2024

work page 2024
[31]

Y . Xu, S. Tong, and T. Jaakkola. Stable target field for reduced variance score estimation in diffusion models. InThe Eleventh International Conference on Learning Representations, 2023

work page 2023
[32]

M. Yi, J. Sun, and Z. Li. On the generalization of diffusion model.arXiv preprint arXiv:2305.14712, 2023

work page Pith review arXiv 2023
[33]

T. Yoon, J. Y . Choi, S. Kwon, and E. K. Ryu. Diffusion probabilistic models generalize when they fail to memorize. InICML 2023 workshop on structured probabilistic inference {\&} generative modeling, 2023. 11

work page 2023
[34]

Zhang, J

H. Zhang, J. Zhou, Y . Lu, M. Guo, P. Wang, L. Shen, and Q. Qu. The emergence of reproducibil- ity and consistency in diffusion models. InForty-first International Conference on Machine Learning, 2024. 12 A Prior FPMCs In this section we restate the prior methodologies of Niedoba et al. [20], Kamb and Ganguli [10], and Lukoianov et al. [18] under our comb...

work page 2024

[1] [1]

Bertrand, A

Q. Bertrand, A. Gagneux, M. Massias, and R. Emonet. On the closed-form of flow matching: Generalization does not arise from target stochasticity. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

work page 2025

[2] [2]

Biroli, T

G. Biroli, T. Bonnaire, V . De Bortoli, and M. Mézard. Dynamical regimes of diffusion models. Nature Communications, 15(1):9957, 2024

work page 2024

[3] [3]

Buades, B

A. Buades, B. Coll, and J.-M. Morel. Non-local means denoising.Image processing on line, 1: 208–212, 2011

work page 2011

[4] [4]

Y . Choi, Y . Uh, J. Yoo, and J.-W. Ha. Stargan v2: Diverse image synthesis for multiple domains. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020

work page 2020

[5] [5]

Dabov, A

K. Dabov, A. Foi, V . Katkovnik, and K. Egiazarian. Image denoising by sparse 3-d transform- domain collaborative filtering.IEEE Transactions on image processing, 16(8):2080–2095, 2007

work page 2080

[6] [6]

X. Gu, C. Du, T. Pang, C. Li, M. Lin, and Y . Wang. On memorization in diffusion models. Transactions on Machine Learning Research, 2025

work page 2025

[7] [7]

Harvey, S

W. Harvey, S. Naderiparizi, V . Masrani, C. Weilbach, and F. Wood. Flexible diffusion modeling of long videos. InAdvances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022, 2022

work page 2022

[8] [8]

J. Ho, A. Jain, and P. Abbeel. Denoising diffusion probabilistic models. InAdvances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020

work page 2020

[9] [9]

Kadkhodaie, F

Z. Kadkhodaie, F. Guth, E. P. Simoncelli, and S. Mallat. Generalization in diffusion mod- els arises from geometry-adaptive harmonic representations. InThe Twelfth International Conference on Learning Representations, 2023

work page 2023

[10] [10]

Kamb and S

M. Kamb and S. Ganguli. An analytic theory of creativity in convolutional diffusion models. In International Conference on Machine Learning, pages 28795–28831. PMLR, 2025

work page 2025

[11] [11]

Karras, S

T. Karras, S. Laine, and T. Aila. A style-based generator architecture for generative adversarial networks. InIEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pages 4401–4410. Computer Vision Foundation / IEEE, 2019

work page 2019

[12] [12]

Karras, M

T. Karras, M. Aittala, J. Hellsten, S. Laine, J. Lehtinen, and T. Aila. Training generative adversarial networks with limited data.Advances in neural information processing systems, 33: 12104–12114, 2020

work page 2020

[13] [13]

Karras, M

T. Karras, M. Aittala, T. Aila, and S. Laine. Elucidating the design space of diffusion-based generative models.Advances in neural information processing systems, 35:26565–26577, 2022

work page 2022

[14] [14]

Krizhevsky, G

A. Krizhevsky, G. Hinton, et al. Learning multiple layers of features from tiny images. 2009

work page 2009

[15] [15]

C.-H. Lai, Y . Song, D. Kim, Y . Mitsufuji, and S. Ermon. The principles of diffusion models. arXiv preprint arXiv:2510.21890, 2025. 10

work page internal anchor Pith review Pith/arXiv arXiv 2025

[16] [16]

X. Li, Y . Dai, and Q. Qu. Understanding generalizability of diffusion models requires rethinking the hidden gaussian structure. InAdvances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 - 15, 2024, 2024

work page 2024

[17] [17]

Loshchilov and F

I. Loshchilov and F. Hutter. Decoupled weight decay regularization. In5th International Conference on Learning Representations, 2017

work page 2017

[18] [18]

Lukoianov, C

A. Lukoianov, C. Yuan, J. Solomon, and V . Sitzmann. Locality in image diffusion models emerges from data statistics. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

work page 2025

[19] [19]

Niedoba, D

M. Niedoba, D. Green, S. Naderiparizi, V . Lioutas, J. W. Lavington, X. Liang, Y . Liu, K. Zhang, S. Dabiri, A. Scibior, et al. Nearest neighbour score estimators for diffusion generative models. InForty-first International Conference on Machine Learning, 2024

work page 2024

[20] [20]

Niedoba, B

M. Niedoba, B. Zwartsenberg, K. P. Murphy, and F. Wood. Towards a mechanistic explanation of diffusion model generalization. InInternational Conference on Machine Learning, pages 46389–46411. PMLR, 2025

work page 2025

[21] [21]

Rombach, A

R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022

work page 2022

[22] [22]

Roth and M

S. Roth and M. J. Black. Fields of experts: A framework for learning image priors. In2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), volume 2, pages 860–867. IEEE, 2005

work page 2005

[23] [23]

Scarvelis, H

C. Scarvelis, H. S. d. O. Borde, and J. Solomon. Closed-form diffusion models.Transactions on Machine Learning Research, 2025

work page 2025

[24] [24]

Sohl-Dickstein, E

J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. InInternational conference on machine learning, pages 2256–2265. PMLR, 2015

work page 2015

[25] [25]

K. Song, J. Kim, S. Chen, Y . Du, S. Kakade, and V . Sitzmann. Selective underfitting in diffusion models.arXiv preprint arXiv:2510.01378, 2025

work page arXiv 2025

[26] [26]

Y . Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole. Score-based generative modeling through stochastic differential equations. In9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021, 2021

work page 2021

[27] [27]

J. J. Vastola. Generalization through variance: how noise shapes inductive biases in diffusion models. InThe Thirteenth International Conference on Learning Representations, 2025

work page 2025

[28] [28]

Vaswani, N

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017

work page 2017

[29] [29]

P. Vincent. A connection between score matching and denoising autoencoders.Neural compu- tation, 23(7):1661–1674, 2011

work page 2011

[30] [30]

Wang and J

B. Wang and J. J. Vastola. The unreasonable effectiveness of gaussian score approximation for diffusion models and its applications.Transactions on Machine Learning Research, 2024

work page 2024

[31] [31]

Y . Xu, S. Tong, and T. Jaakkola. Stable target field for reduced variance score estimation in diffusion models. InThe Eleventh International Conference on Learning Representations, 2023

work page 2023

[32] [32]

M. Yi, J. Sun, and Z. Li. On the generalization of diffusion model.arXiv preprint arXiv:2305.14712, 2023

work page Pith review arXiv 2023

[33] [33]

T. Yoon, J. Y . Choi, S. Kwon, and E. K. Ryu. Diffusion probabilistic models generalize when they fail to memorize. InICML 2023 workshop on structured probabilistic inference {\&} generative modeling, 2023. 11

work page 2023

[34] [34]

Zhang, J

H. Zhang, J. Zhou, Y . Lu, M. Guo, P. Wang, L. Shen, and Q. Qu. The emergence of reproducibil- ity and consistency in diffusion models. InForty-first International Conference on Machine Learning, 2024. 12 A Prior FPMCs In this section we restate the prior methodologies of Niedoba et al. [20], Kamb and Ganguli [10], and Lukoianov et al. [18] under our comb...

work page 2024