Why DDIM Hallucinates More Than DDPM: A Theoretical Analysis of Reverse Dynamics
Pith reviewed 2026-06-30 23:09 UTC · model grok-4.3
The pith
DDIM can become stuck on the segment between nearest modes in a Gaussian mixture after a critical time τ, while DDPM stochasticity allows escape and avoids hallucination.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
For a Gaussian mixture target distribution, after a critical time τ, the reverse ODE of DDIM can become stuck on the segment connecting the two nearest modes, leading to hallucinated samples that lie between modes, whereas the SDE of DDPM uses stochasticity to become unstuck from this region and avoid hallucination.
What carries the argument
The reverse ODE dynamics of DDIM and SDE dynamics of DDPM applied to a Gaussian mixture, with identification of a critical time τ where deterministic paths stick to inter-mode segments.
If this is right
- DDPM exhibits a significantly lower hallucination rate than DDIM when trajectories enter the inter-mode region.
- Incorporating additional stochastic steps into DDIM can prevent it from getting stuck and reduce hallucinations.
- The analysis offers insights for designing samplers that better handle multimodal distributions by balancing determinism and stochasticity.
Where Pith is reading between the lines
- The mechanism may apply to other deterministic samplers in diffusion models beyond DDIM.
- Hybrid sampling strategies could be developed by switching to stochastic steps near the critical time τ.
Load-bearing premise
The sticking behavior and benefit of stochasticity are proven specifically for Gaussian mixture target distributions.
What would settle it
Running the DDIM reverse process on a two-component Gaussian mixture and checking if samples remain on the inter-mode segment after time τ, versus DDPM samples escaping it.
Figures
read the original abstract
We theoretically study the hallucination phenomena in two canonical diffusion samplers: the stochastic Denoising Diffusion Probabilistic Model (DDPM) and the deterministic Denoising Diffusion Implicit Model (DDIM). We analyze the reverse ODE (DDIM) and SDE (DDPM) for a Gaussian mixture target, proving that after a critical time $\tau$, (a) DDIM can become stuck on the segment connecting the two nearest modes and (b) DDPM *stochasticity* helps it become unstuck from this region, thus avoiding hallucination. Our empirical validation verifies that DDPM has a significantly lower hallucination rate than DDIM when this region is entered. Building on our observations, we exhibit how using additional stochastic steps can help DDIM avoid hallucinations and offer new insights on how to design improved samplers.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to theoretically analyze the reverse ODE (DDIM) and SDE (DDPM) dynamics on a Gaussian mixture target distribution. It proves that after a critical time τ, DDIM can become stuck on the line segment connecting the two nearest modes (causing hallucinations), while DDPM stochasticity allows escape from this region. Empirical validation shows DDPM has lower hallucination rates when this region is entered, and the work suggests adding stochastic steps to DDIM to mitigate hallucinations.
Significance. If the central claim holds, the work supplies a mechanistic explanation for differences in hallucination behavior between deterministic and stochastic diffusion samplers, which could inform improved sampler design. The exact analysis on a Gaussian mixture target is a strength, enabling a rigorous proof of the sticking phenomenon and the benefit of stochasticity; the empirical verification conditioned on the region is also a positive element.
major comments (2)
- [theoretical analysis (reverse dynamics)] The proof of the sticking behavior after critical time τ (abstract) lacks the derivation steps, the explicit formula for τ, and the specific Gaussian mixture parameters, so the support for the central claim cannot be evaluated.
- [empirical validation] The empirical validation verifies lower hallucination rates for DDPM only when conditioned on entering the inter-mode region; it does not demonstrate that segment-sticking is the dominant mechanism on high-dimensional or multi-modal data beyond the two-component Gaussian mixture.
minor comments (1)
- The notation and definition of the critical time τ should be stated explicitly with an equation, even if the full derivation is in an appendix.
Simulated Author's Rebuttal
We thank the referee for their thoughtful review and constructive comments. We address each major comment below.
read point-by-point responses
-
Referee: [theoretical analysis (reverse dynamics)] The proof of the sticking behavior after critical time τ (abstract) lacks the derivation steps, the explicit formula for τ, and the specific Gaussian mixture parameters, so the support for the central claim cannot be evaluated.
Authors: We agree that the presentation of the proof requires greater explicitness for full evaluability. The derivation of the sticking behavior for the DDIM reverse ODE is contained in Section 3, with supporting steps in Appendix A; however, we will revise the manuscript to prominently display the explicit formula for the critical time τ (the time at which the velocity field aligns with the inter-mode segment) and to state the precise Gaussian mixture parameters (two components with means at ±e₁ and isotropic covariance σ²I). revision: yes
-
Referee: [empirical validation] The empirical validation verifies lower hallucination rates for DDPM only when conditioned on entering the inter-mode region; it does not demonstrate that segment-sticking is the dominant mechanism on high-dimensional or multi-modal data beyond the two-component Gaussian mixture.
Authors: The empirical section is deliberately conditioned on entry into the inter-mode region precisely to isolate and verify the mechanism predicted by the theory. The paper's scope is the rigorous analysis of this phenomenon on the two-component Gaussian mixture, which permits an exact proof; we make no claim that segment-sticking is the dominant mechanism in high-dimensional or more complex multi-modal settings. The identified mechanism nevertheless supplies design insight, as illustrated by the stochastic-step augmentation we propose. No revision is required. revision: no
Circularity Check
Direct analysis of reverse ODE/SDE for Gaussian mixture; self-contained derivation
full rationale
The paper states it analyzes the reverse ODE (DDIM) and SDE (DDPM) directly for a Gaussian mixture target, proving sticking behavior after time τ and the role of stochasticity. No steps reduce by construction to fitted inputs, self-citations, or renamed empirical patterns; the proof is presented as following from the stated target distribution and diffusion equations. Empirical checks are described as verification of the derived mechanism rather than its foundation. This is a standard non-circular theoretical derivation for the toy setting.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The data distribution is a Gaussian mixture.
- standard math The reverse process is the standard probability-flow ODE for DDIM and the corresponding SDE for DDPM.
Reference graph
Works this paper leans on
-
[1]
write newline
" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...
-
[2]
K., Maini, P., Lipton, Z
Aithal, S. K., Maini, P., Lipton, Z. C., and Kolter, J. Z. Understanding hallucinations in diffusion models through mode interpolation. In Advances in Neural Information Processing Systems (NeurIPS), 2024
2024
-
[3]
Anderson, B. D. Reverse-time diffusion equation models. Stochastic Processes and their Applications, 1982
1982
-
[4]
Hallucination early detection in diffusion models
Betti, F., Baraldi, L., Baraldi, L., Cucchiara, R., and Sebe, N. Hallucination early detection in diffusion models. International Journal of Computer Vision (IJCV), 2026
2026
-
[5]
Beyler, E. and Bach, F. Convergence of deterministic and stochastic diffusion-model samplers: A simple analysis in Wasserstein distance. arXiv 2508.03210, 2025
-
[6]
Dynamical regimes of diffusion models
Biroli, G., Bonnaire, T., de Bortoli, V., and M \'e zard, M. Dynamical regimes of diffusion models. In Nature Communications, 2024
2024
-
[7]
Why diffusion models don t memorize: The role of implicit dynamical regularization in training
Bonnaire, T., Urfin, R., Biroli, G., and Mezard, M. Why diffusion models don t memorize: The role of implicit dynamical regularization in training. In Advances in Neural Information Processing Systems (NeurIPS), 2025
2025
-
[8]
Buchanan, S., Pai, D., Ma, Y., and Bortoli, V. D. On the edge of memorization in diffusion models. In Advances in Neural Information Processing Systems (NeurIPS), 2025
2025
- [9]
-
[10]
Exploring the optimal choice for generative processes in diffusion models: Ordinary vs stochastic differential equations
Cao, Y., Chen, J., Luo, Y., and Zhou, X. Exploring the optimal choice for generative processes in diffusion models: Ordinary vs stochastic differential equations. In Advances in Neural Information Processing Systems (NeurIPS), 2023
2023
-
[11]
Laplacian score sharpening for mitigating hallucination in diffusion models
Chandran.C, B., Anumasa, S., and Liu, D. Laplacian score sharpening for mitigating hallucination in diffusion models. arXiv 2511.07496, 2025
-
[12]
The probability flow ODE is provably fast
Chen, S., Chewi, S., Lee, H., Li, Y., Lu, J., and Salim, A. The probability flow ODE is provably fast. In Advances in Neural Information Processing Systems (NeurIPS), 2023
2023
-
[13]
Going beyond compositions, DDPMs can produce zero-shot interpolations
Deschenaux, J., Krawczuk, I., Chrysos, G., and Cevher, V. Going beyond compositions, DDPMs can produce zero-shot interpolations. In International Conference on Machine Learning (ICML), 2024
2024
-
[14]
M., Budd, C., and Sch \"o nlieb, C.-B
Deveney, T., Stanczuk, J., Kreusser, L. M., Budd, C., and Sch \"o nlieb, C.-B. Closing the ODE - SDE gap in score-based diffusion models through the Fokker-Planck equation. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 2025
2025
-
[15]
Dubins, L. E. and Schwarz, G. On continuous martingales. Proceedings of the National Academy of Sciences of the United States of America, 1965
1965
-
[16]
A., Liu, X., Zeng, Z., Ma, L., Zhang, Q., and Wu, Q
Fu, S., Zhou, J., Chen, Q., Jing, H., Nguyen, H. A., Liu, X., Zeng, Z., Ma, L., Zhang, Q., and Wu, Q. Counting hallucinations in diffusion models. arXiv 2510.13080, 2025
-
[17]
Multilinear latent conditioning for generating unseen attribute combinations
Georgopoulos, M., Chrysos, G., Pantic, M., and Panagakis, Y. Multilinear latent conditioning for generating unseen attribute combinations. In International Conference on Machine Learning (ICML), 2020
2020
-
[18]
Denoising diffusion probabilistic models
Ho, J., Jain, A., and Abbeel, P. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems (NeurIPS), 2020
2020
-
[19]
Kingma, D. P. and Ba, J. Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR), 2015
2015
-
[20]
and Massart, P
Laurent, B. and Massart, P. Adaptive estimation of a quadratic functional by model selection. Annals of Statistics, 2000
2000
-
[21]
Dimension-free convergence of diffusion models for approximate Gaussian mixtures
Li, G., Cai, C., and Wei, Y. Dimension-free convergence of diffusion models for approximate Gaussian mixtures. arXiv 2504.05300, 2025
-
[22]
and Chen, S
Li, M. and Chen, S. Critical windows: Non-asymptotic theory for feature emergence in diffusion models. In International Conference on Machine Learning (ICML), 2024
2024
-
[23]
Unraveling the smoothness properties of diffusion models: A Gaussian mixture perspective
Liang, Y., Shi, Z., Song, Z., and Zhou, Y. Unraveling the smoothness properties of diffusion models: A Gaussian mixture perspective. International Conference on Computer Vision (ICCV), 2025
2025
-
[24]
DPM -solver: A fast ODE solver for diffusion probabilistic model sampling in around 10 steps
Lu, C., Zhou, Y., Bao, F., Chen, J., Li, C., and Zhu, J. DPM -solver: A fast ODE solver for diffusion probabilistic model sampling in around 10 steps. In Advances in Neural Information Processing Systems (NeurIPS), 2022
2022
-
[25]
Towards understanding text hallucination of diffusion models via local generation bias
Lu, R., Wang, R., Lyu, K., Jiang, X., Huang, G., and Wang, M. Towards understanding text hallucination of diffusion models via local generation bias. In International Conference on Learning Representations (ICLR), 2025
2025
-
[26]
and Peres, Y
Mörters, P. and Peres, Y. Brownian Motion. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2010
2010
-
[27]
High-resolution image synthesis with latent diffusion models
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., and Ommer, B. High-resolution image synthesis with latent diffusion models. In Conference on Computer Vision and Pattern Recognition (CVPR), 2022
2022
-
[28]
and Hyv \"a rinen, A
Saremi, S. and Hyv \"a rinen, A. Neural empirical Bayes . Journal of Machine Learning Research, 2019
2019
-
[29]
Learning mixtures of Gaussians using the DDPM objective
Shah, K., Chen, S., and Klivans, A. Learning mixtures of Gaussians using the DDPM objective. In Advances in Neural Information Processing Systems (NeurIPS), 2023
2023
-
[30]
Improved sampling of diffusion models in fluid dynamics with Tweedie 's formula
Shehata, Y., Holzschuh, B., and Thuerey, N. Improved sampling of diffusion models in fluid dynamics with Tweedie 's formula. In International Conference on Learning Representations (ICLR), 2025
2025
-
[31]
A., Maheswaranathan, N., and Ganguli, S
Sohl-Dickstein, J., Weiss, E. A., Maheswaranathan, N., and Ganguli, S. Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning (ICML), 2015
2015
-
[32]
Understanding and mitigating copying in diffusion models
Somepalli, G., Singla, V., Goldblum, M., Geiping, J., and Goldstein, T. Understanding and mitigating copying in diffusion models. In Advances in Neural Information Processing Systems (NeurIPS), 2023
2023
-
[33]
Denoising diffusion implicit models
Song, J., Meng, C., and Ermon, S. Denoising diffusion implicit models. In International Conference on Learning Representations (ICLR), 2021 a
2021
-
[34]
P., Kumar, A., Ermon, S., and Poole, B
Song, Y., Sohl-Dickstein, J., Kingma, D. P., Kumar, A., Ermon, S., and Poole, B. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations (ICLR), 2021 b
2021
-
[35]
and Tran, T
Thanh-Tung, H. and Tran, T. Catastrophic forgetting and mode collapse in GANs . In International Joint Conference on Neural Networks (IJCNN), 2020
2020
-
[36]
Mitigating Diffusion Model Hallucinations with Dynamic Guidance
Triaridis, K., Graikos, A., Chatziagapi, A., Chrysos, G. G., and Samaras, D. Mitigating diffusion model hallucinations with dynamic guidance. arXiv 2510.05356, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[37]
Generative models of visually grounded imagination
Vedantam, R., Fischer, I., Huang, J., and Murphy, K. Generative models of visually grounded imagination. In International Conference on Learning Representations (ICLR), 2018
2018
-
[38]
Theoretical insights for diffusion guidance: A case study for Gaussian mixture models
Wu, Y., Chen, M., Li, Z., Wang, M., and Wei, Y. Theoretical insights for diffusion guidance: A case study for Gaussian mixture models. In International Conference on Machine Learning (ICML), 2024
2024
-
[39]
Xu, Y., Deng, M., Cheng, X., Tian, Y., Liu, Z., and Jaakkola, T. S. Restart sampling for improving generative processes. In Advances in Neural Information Processing Systems (NeurIPS), 2023
2023
-
[40]
On the convergence and mode collapse of GAN
Zhang, Z., Li, M., and Yu, J. On the convergence and mode collapse of GAN . SIGGRAPH Asia Technical Briefs, 2018
2018
-
[41]
Generalization of diffusion models arises with a balanced representation space
Zhang, Z., Li, X., Li, X., Shi, L., Wu, M., Tao, M., and Qu, Q. Generalization of diffusion models arises with a balanced representation space. In International Conference on Learning Representations (ICLR), 2026
2026
-
[42]
Bias and generalization in deep generative models: An empirical study
Zhao, S., Ren, H., Yuan, A., Song, J., Goodman, N., and Ermon, S. Bias and generalization in deep generative models: An empirical study. In Advances in Neural Information Processing Systems (NeurIPS), 2018
2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.