Tweedie's Formulae and Diffusion Generative Models Beyond Gaussian
Pith reviewed 2026-05-20 03:15 UTC · model grok-4.3
The pith
Tweedie's formula extends to geometric Brownian motion, squared Bessel, and Cox-Ingersoll-Ross processes, enabling denoising score matching for non-Gaussian diffusion models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We extend Tweedie's formula to the geometric Brownian motion, squared Bessel, and Cox-Ingersoll-Ross processes. The resulting identities express the score function of the perturbed data in terms of the conditional expectation of the clean data under the respective process law, thereby supplying explicit denoising score-matching losses that can be minimized to learn the reverse diffusion.
What carries the argument
Extended Tweedie's formulae for GBM, BESQ and CIR that give the score as the gradient of the log-transition density expressed via conditional expectations under each process.
If this is right
- GBM-based diffusion models become trainable for image generation via the corresponding score-matching loss.
- CIR-based models can be trained for financial time-series generation.
- BESQ processes admit empirical Bayes estimation through the derived formula.
- Diffusion models with state-dependent diffusion coefficients become practical alternatives to Gaussian ones.
Where Pith is reading between the lines
- Models built on these processes may automatically respect positivity constraints common in prices or intensities without post-processing.
- The same derivation route could be applied to other diffusions whose transition densities or conditional expectations are known in closed form.
- Empirical comparisons on data with strong mean-reversion or multiplicative noise would test whether the non-Gaussian choice improves sample quality over standard Gaussian diffusion.
Load-bearing premise
The derived formulae for GBM, BESQ and CIR produce denoising score-matching objectives that can be successfully optimized and yield useful generative performance.
What would settle it
Training a GBM- or CIR-based diffusion model on a known target distribution using the derived score-matching objective and finding that the generated samples systematically fail to match the target statistics would falsify the claim that the extension supplies workable objectives.
Figures
read the original abstract
Diffusion models have achieved remarkable success in generating samples from unknown data distributions. Most popular stochastic differential equation-based diffusion models perturb the target distribution by adding Gaussian noise, transforming it into a simple prior, and then use denoising score matching, a consequence of Tweedie's formula, to learn the score function and generate clean samples from noise. However, non-Gaussian diffusion models with state-dependent diffusion coefficient have been largely underexplored, as have the corresponding Tweedie's formulae. In this work, we extend Tweedie's formula to important non-Gaussian processes, including geometric Brownian motion (GBM), squared Bessel (BESQ) processes, and Cox-Ingersoll-Ross (CIR) processes, thereby yielding the corresponding denoising score-matching objectives. We then apply the derived formulae to image and financial time series generation using GBM- and CIR-based diffusion models, and to empirical Bayes estimation under the BESQ setting. The reported experimental results demonstrate the potential of non-Gaussian models.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper extends Tweedie's formula to non-Gaussian diffusion processes including geometric Brownian motion (GBM), squared Bessel (BESQ) processes, and Cox-Ingersoll-Ross (CIR) processes. These extensions produce corresponding denoising score-matching objectives. The authors apply the resulting objectives to train GBM- and CIR-based diffusion models for image generation and financial time-series generation, and to empirical Bayes estimation in the BESQ setting. Experimental results are reported to illustrate the potential of such non-Gaussian models.
Significance. If the derivations hold, the work is significant because it supplies explicit Tweedie-type identities and score-matching losses for processes whose diffusion coefficients depend on state, which are natural in finance and other domains. The manuscript provides closed-form expressions that generalize the Gaussian case and directly yield trainable objectives, together with reproducible experiments on both images and time series. This combination of derivation and application strengthens the case for exploring non-Gaussian diffusions.
major comments (2)
- [§3.2] §3.2, Eq. (12) (GBM Tweedie identity): the derivation does not explicitly verify reduction to the classical Gaussian Tweedie formula when the volatility parameter is taken to zero while keeping the drift fixed; without this limit check the generalization to state-dependent diffusion remains unconfirmed.
- [§4.1] §4.1, the infinitesimal-generator step for CIR: the boundary behavior at zero for the CIR process is not addressed when relating the conditional expectation to the score term; this is load-bearing because the generator contains a state-dependent term that vanishes at the boundary.
minor comments (2)
- Notation for the score function is introduced inconsistently between the GBM and BESQ sections; a single definition table would improve readability.
- Figure 3 caption does not state the number of independent runs or the error bars shown; this affects interpretation of the reported FID and likelihood values.
Simulated Author's Rebuttal
We thank the referee for the positive summary and recommendation for major revision. We address the two major comments point by point below, agreeing to incorporate clarifications and verifications in the revised manuscript.
read point-by-point responses
-
Referee: [§3.2] §3.2, Eq. (12) (GBM Tweedie identity): the derivation does not explicitly verify reduction to the classical Gaussian Tweedie formula when the volatility parameter is taken to zero while keeping the drift fixed; without this limit check the generalization to state-dependent diffusion remains unconfirmed.
Authors: We agree that an explicit verification of the limit would strengthen the presentation. In the revised manuscript, we will add a paragraph in Section 3.2 demonstrating that as the volatility parameter σ approaches 0 with the drift fixed, the GBM Tweedie identity in Eq. (12) reduces to the classical Gaussian Tweedie's formula. This limit check confirms the consistency of our generalization. revision: yes
-
Referee: [§4.1] §4.1, the infinitesimal-generator step for CIR: the boundary behavior at zero for the CIR process is not addressed when relating the conditional expectation to the score term; this is load-bearing because the generator contains a state-dependent term that vanishes at the boundary.
Authors: We appreciate this observation on the boundary behavior. The CIR process under the Feller condition (2κθ > σ²) does not reach the zero boundary with probability one, allowing the infinitesimal generator to be applied in the interior. We will revise Section 4.1 to explicitly mention this assumption and clarify that the relation between the conditional expectation and the score term holds away from the boundary. A note on the boundary conditions will be added for completeness. revision: yes
Circularity Check
Derivations of Tweedie's formulae for GBM/BESQ/CIR are independent mathematical extensions with no reduction to inputs by construction.
full rationale
The paper presents explicit derivations of Tweedie's formulae for the listed non-Gaussian processes by applying the Markov property and known transition densities to the infinitesimal generators of GBM, BESQ, and CIR SDEs. These steps produce denoising score-matching objectives as direct consequences of the conditional expectations, without any fitted parameters being relabeled as predictions or any self-referential definitions. No load-bearing claim reduces to a self-citation chain; the central results stand on the process definitions and standard stochastic calculus identities. Experiments then optimize the resulting objectives on image and time-series data, confirming the derivations are self-contained against external benchmarks rather than tautological.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
σ²(t,x)∇log p(t,x) + 2σ(t,x)∂xσ(t,x) = b(t,x) + lim ε→0 (1/ε)E(X_{t-ε}-X_t | X_t=x) (Prop. 2.3); Tweedie formulae for GBM (3.4), BESQ (3.10), CIR (3.13)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
A. Aghapour, E. Bayraktar, and F. Yuan. Solving dynamic portfolio selection problems via score-based diffusion models. 2025. arXiv:2507.09916
-
[2]
B. D. O. Anderson. Reverse-time diffusion equation models.Stochastic Process. Appl., 12(3):313–326, 1982
work page 1982
-
[3]
P. Avdeyev, C. Shi, Y. Tan, K. Dudnyk, and J. Zhou. Dirichlet diffusion score model for biological sequence generation. InICML, pages 1276–1301, 2023
work page 2023
- [4]
-
[5]
Flux.2: Frontier visual intelligence.https://bfl.ai/blog/flux-2, 2025
Black Forest Labs. Flux.2: Frontier visual intelligence.https://bfl.ai/blog/flux-2, 2025
work page 2025
-
[6]
H. Chen, H. Lee, and J. Lu. Improved analysis of score-based generative modeling: User-friendly bounds under minimal smoothness assumptions. InICML, pages 4735–4763, 2023
work page 2023
-
[7]
S. Chen, S. Chewi, J. Li, Y. Li, A. Salim, and A. Zhang. Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions. InICLR, 2023
work page 2023
-
[8]
J. C. Cox. The constant elasticity of variance option pricing model.J. Portf. Manag., page 15, 1996
work page 1996
-
[9]
J. C. Cox, J. E. Ingersoll, and S. A. Ross. A theory of the term structure of interest rates.Econometrica, 53(2):385–407, 1985
work page 1985
- [10]
-
[11]
F. Delbaen and H. Shirakawa. A note on option pricing for the constant elasticity of variance model. Asia-Pac. Financ. Mark., 9:85–99, 2002. [12]Digital Library of Mathematical Functions.http://dlmf.nist.gov/, Release 1.2.4 of 2025-03-15
work page 2002
-
[12]
L. E. Dubins and G. Schwarz. On continuous martingales.Proc. Natl. Acad. Sci., 53(5):913–916, 1965. 25
work page 1965
-
[13]
B. Efron. Microarrays, empirical Bayes and the two-groups model.Stat. Sci., 2008
work page 2008
-
[14]
B. Efron. Tweedie’s formula and selection bias.J. Amer. Statist. Assoc., 106(496):1602–1614, 2011
work page 2011
-
[15]
Efron.Large-scale inference: empirical Bayes methods for estimation, testing, and prediction
B. Efron.Large-scale inference: empirical Bayes methods for estimation, testing, and prediction. Cam- bridge University Press, 2012
work page 2012
-
[16]
B. Efron and N. R. Zhang. False discovery rates and copy number variation.Biometrika, 98(2):251–271, 2011
work page 2011
-
[17]
D. C. Emanuel and J. D. MacBeth. Further results on the constant elasticity of variance call option pricing model.J. Financ. Quant. Anal., 17(4):533–554, 1982
work page 1982
- [18]
-
[19]
P. Fitzsimmons, J. Pitman, and M. Yor. Markovian bridges: construction, Palm interpretation, and splicing. InSeminar on Stochastic Processes, 1992 (Seattle, WA, 1992), volume 33 ofProgr. Probab., pages 101–134. Birkh¨ auser Boston, Boston, MA, 1993
work page 1992
- [20]
- [21]
-
[22]
Y. Gao, H. Guo, T. Hoang, W. Huang, L. Jiang, F. Kong, H. Li, J. Li, L. Li, and X. Li. Seedance 1.0: Exploring the boundaries of video generation models. 2025. arXiv:2506.09113
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[23]
Google. State-of-the-art video and image generation with Veo 2 and Imagen 3.https://blog.google/ technology/google-labs/video-image-generation-update-december-2024/, 2024
work page 2024
-
[24]
I. S. Gradshteyn and I. M. Ryzhik.Table of integrals, series, and products. Elsevier/Academic Press, Amsterdam, eighth edition, 2015
work page 2015
- [25]
- [26]
- [27]
-
[28]
U. G. Haussmann and E. Pardoux. Time reversal of diffusions.Ann. Probab., 14(4):1188–1205, 1986
work page 1986
- [29]
-
[30]
J. Ho, A. Jain, and P. Abbeel. Denoising diffusion probabilistic models. InNeurips, volume 33, pages 6840–6851, 2020
work page 2020
-
[31]
A. Hyv¨ arinen. Estimation of non-normalized statistical models by score matching.J. Mach. Learn. Res., 6:695–709, 2005
work page 2005
-
[32]
N. Ignatiadis and B. Sen.Empirical Bayes: From Herbert Robbins to modern theory and appli- cations. 2025. Lecture notes available athttps://nignatiadis.github.io/assets/lecture_notes/ Empirical-Bayes.pdf
work page 2025
-
[33]
M. Jeanblanc, M. Yor, and M. Chesney.Mathematical methods for financial markets. Springer Finance. Springer-Verlag London, Ltd., London, 2009
work page 2009
-
[34]
I. Karatzas and S. E. Shreve.Brownian motion and stochastic calculus, volume 113 ofGraduate Texts in Mathematics. Springer-Verlag, New York, second edition, 1991
work page 1991
- [35]
-
[36]
K. Kawazu and S. Watanabe. Branching processes with immigration and related limit theorems.Teor. Verojatnost. i Primenen., 16:34–51, 1971
work page 1971
-
[37]
Mercury: Ultra-Fast Language Models Based on Diffusion
S. Khanna, S. Kharbanda, S. Li, H. Varma, E. Wang, S. Birnbaum, Z. Luo, Y. Miraoui, A. Palrecha, and S. Ermon. Mercury: Ultra-fast language models based on diffusion. 2025. arXiv:2506.17298
work page internal anchor Pith review Pith/arXiv arXiv 2025
- [38]
- [39]
-
[40]
H. Lee, J. Lu, and Y. Tan. Convergence for score-based generative modeling with polynomial complexity. InNeurips, volume 35, pages 22870–22882, 2022. 26 WENPIN TANG, NIZAR TOUZI, ZIKUN ZHANG, AND XUN YU ZHOU
work page 2022
-
[41]
G. Li, Y. Wei, Y. Chen, and Y. Chi. Towards faster non-asymptotic convergence for diffusion-based generative models. InICLR, 2024
work page 2024
- [42]
-
[43]
H. Liu, T. Zhu, N. Jia, J. He, and Z. Zheng. Learning to simulate from heavy-tailed distribution via diffusion model. 2024. SSRN 4975931
work page 2024
- [44]
- [45]
-
[46]
S. Nie, F. Zhu, Z. You, X. Zhang, J. Ou, J. Hu, J. Zhou, Y. Lin, J.-R. Wen, and C. Li. Large language diffusion models. 2025. arXiv:2502.09992
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[47]
K. Oko, S. Akiyama, and T. Suzuki. Diffusion models are minimax optimal distribution estimators. In ICML, pages 26517–26582, 2023
work page 2023
-
[48]
Sora: Creating video from text.https://openai.com/sora, 2024
OpenAI. Sora: Creating video from text.https://openai.com/sora, 2024
work page 2024
-
[49]
L. R. Pericchi and A. F. M. Smith. Exact and approximate posterior moments for a normal location parameter.J. Roy. Statist. Soc. Ser. B, 54(3):793–804, 1992
work page 1992
-
[50]
J. Pitman and M. Yor. A decomposition of Bessel bridges.Z. Wahrsch. Verw. Gebiete, 59(4):425–457, 1982
work page 1982
-
[51]
N. G. Polson. A representation of the posterior mean for a location model.Biometrika, 78(2):426–430, 1991
work page 1991
-
[52]
Hierarchical Text-Conditional Image Generation with CLIP Latents
A. Ramesh, P. Dhariwal, A. Nichol, C. Chu, and M. Chen. Hierarchical text-conditional image generation with clip latents. arXiv:2204.06125
work page internal anchor Pith review Pith/arXiv arXiv
-
[53]
D. Revuz and M. Yor.Continuous martingales and Brownian motion, volume 293 ofGrundlehren der mathematischen Wissenschaften. Springer-Verlag, Berlin, third edition, 1999
work page 1999
- [54]
-
[55]
H. Robbins. An empirical Bayes approach to statistics. InProceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, 1954–1955, vol. I, pages 157–163. Univ. California Press, Berkeley-Los Angeles, Calif., 1956
work page 1954
-
[56]
L. Rogers. Which model for term-structure of interest rates should one use?IMA, 65:93, 1995
work page 1995
-
[57]
R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer. High-resolution image synthesis with latent diffusion models. InCVPR, pages 10684–10695, 2022
work page 2022
-
[58]
O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional networks for biomedical image segmen- tation. InMICCAI, pages 234–241, 2015
work page 2015
-
[59]
S. Saremi and A. Hyv¨ arinen. Neural empirical Bayes.J. Mach. Learn. Res., 20:Paper No. 181, 23, 2019
work page 2019
-
[60]
N. Shetty, M. Prasath, and C. S. Seelamantula. Dale meets langevin: A multiplicative denoising diffusion model. 2025. arXiv:2510.02730
work page internal anchor Pith review arXiv 2025
-
[61]
J. Shi, J. Feng, and W. Song. Estimation in linear regression with laplace measurement error using tweedie-type formula.J. Syst. Sci. Complex., 32(4):1211–1230, 2019
work page 2019
- [62]
- [63]
-
[64]
Y. Song and S. Ermon. Generative modeling by estimating gradients of the data distribution. InNeurips, volume 32, page 11918–11930, 2019
work page 2019
-
[65]
Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole. Score-based generative modeling through stochastic differential equations. InICLR, 2021
work page 2021
-
[66]
C. J. Stone. Optimal rates of convergence for nonparametric estimators.Ann. Statist., 8(6):1348–1360, 1980
work page 1980
-
[67]
C. J. Stone. Optimal global rates of convergence for nonparametric regression.Ann. Statist., 10(4):1040– 1053, 1982
work page 1982
-
[68]
D. W. Stroock and S. R. S. Varadhan.Multidimensional diffusion processes, volume 233 ofGrundlehren der Mathematischen Wissenschaften. Springer-Verlag, 1979. 27
work page 1979
-
[69]
W. Tang and H. Zhao. Score-based diffusion models via stochastic differential equations.Statistic Surveys, 19:28–64, 2025
work page 2025
-
[70]
W. Tang and H. Zhao. Contractive diffusion probabilistic models. 2026. To appear in SIAM J. Imaging Sci
work page 2026
-
[71]
S. Torres. Tweedie calculus. 2026. arXiv:2604.14486
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[72]
P. Vincent. A connection between score matching and denoising autoencoders.Neural Comput., 23(7):1661–1674, 2011
work page 2011
-
[73]
J. Ye, Z. Xie, L. Zheng, J. Gao, Z. Wu, X. Jiang, Z. Li, and L. Kong. Dream 7b: Diffusion large language models. 2025. arXiv:2508.15487
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[74]
Z. Zhao, C. Yeh, L. Kong, and K. Wang. Diffusion-DFL: decision-focused diffusion models for stochastic optimization. InICLR, 2026. Department of Industrial Engineering and Operations Research, Columbia University. Email address:wt2319@columbia.edu Department of Finance and Risk Engineering, New York University. Email address:nt2635@nyu.edu Department of I...
work page 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.