Expressivity of Bi-Lipschitz Normalizing Flows: A Score-Based Diffusion Perspective

Carola-Bibiane Sch\"onlieb; Meira Iske

arxiv: 2605.06172 · v1 · submitted 2026-05-07 · 📊 stat.ML · cs.LG· cs.NA· math.NA· math.PR

Expressivity of Bi-Lipschitz Normalizing Flows: A Score-Based Diffusion Perspective

Meira Iske , Carola-Bibiane Sch\"onlieb This is my paper

Pith reviewed 2026-05-08 05:16 UTC · model grok-4.3

classification 📊 stat.ML cs.LGcs.NAmath.NAmath.PR

keywords normalizing flowsbi-Lipschitz mapsscore-based diffusionprobability flow ODEuniversal approximationL1 densityKullback-Leibler convergencetransport maps

0 comments

The pith

Bi-Lipschitz normalizing flows achieve universal L1 approximation of probability densities via Gaussian pullbacks from variance-preserving maps.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that bi-Lipschitz normalizing flows have strong distributional approximation properties when analyzed through score-based diffusion models. By linking the regularity of the score function to the bi-Lipschitz nature of the transport maps induced by the probability flow ODE, it proves that Gaussian pullbacks of such maps are dense in the L1 norm among all probability densities. For targets that are Gaussian convolutions of compactly supported measures or finite mixtures, this holds with convergence in Kullback-Leibler divergence without needing early stopping. A sympathetic reader would care because this provides a rigorous characterization of what these regularized flows can represent and bridges normalizing flows with diffusion models.

Core claim

For the probability flow ODE of a variance-preserving diffusion, Lipschitz regularity of the score induces a flow of bi-Lipschitz diffeomorphic transport maps. This ODE bridge allows proving that Gaussian pullbacks induced by bi-Lipschitz variance-preserving transport maps are L1-dense among all probability densities. For Gaussian convolution targets, convergence in Kullback-Leibler divergence is obtained without early stopping.

What carries the argument

The probability flow ODE bridge that links Lipschitz continuity of the score function to the bi-Lipschitz property of the induced transport maps.

If this is right

Bi-Lipschitz normalizing flows can approximate compactly supported densities and finite Gaussian mixtures arbitrarily closely in L1 distance.
Convergence in Kullback-Leibler divergence holds without early stopping when targets are Gaussian convolutions of compactly supported measures.
Deterministic convergence guarantees follow for diffusion-based transport maps under the same score regularity conditions.
Score regularity is verified explicitly for compactly supported densities, their Gaussian convolutions, and finite mixtures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The result suggests that objectives derived from the probability flow ODE could be used to train bi-Lipschitz flows directly.
Densities with non-Lipschitz scores may require different flow architectures or relaxation of the bi-Lipschitz constraint.
The same ODE linking technique could be applied to other diffusion schedules beyond variance-preserving ones.
The density result may connect to questions in optimal transport about which maps preserve bi-Lipschitz regularity.

Load-bearing premise

The score function remains Lipschitz continuous for the broad classes of target densities considered, including compactly supported measures, their Gaussian convolutions, and finite mixtures.

What would settle it

A concrete density whose score is not Lipschitz continuous at some point in the diffusion, for which no sequence of bi-Lipschitz variance-preserving maps pulls back a Gaussian to approximate it arbitrarily closely in L1.

Figures

Figures reproduced from arXiv: 2605.06172 by Carola-Bibiane Sch\"onlieb, Meira Iske.

**Figure 1.** Figure 1: Example of a compactly supported pdf pH and monotone transport f to the standard Gaussian pZ, satisfying pZ = f#pH. The exact transport is not a global diffeomorphism on R, and neither f nor f −1 is globally Lipschitz. Its discontinuouities can therefore only be approximated by bi-Lipschitz maps. A bi-Lipschitz map cannot exactly transform the connected full support of the Gaussian into the disconnected co… view at source ↗

**Figure 2.** Figure 2: Learned transport maps and densities of iResNets view at source ↗

**Figure 3.** Figure 3: Ground truth target pH and corresponding learned densities pθ from iResNets φθ,L at different Lipschitz constraints L and the SDM (columns) for targets pH ∈ {prings, psquares, pmoons, pconc} (rows) view at source ↗

**Figure 4.** Figure 4: Approximation errors ∥pθ − pH∥L1 (dashed lines) and DKL(pH ∥ pθ) (solid lines) over L compared to the respective constant SDM errors for each target density pH ∈ {prings, psquares, pmoons, pconc}. 0.0 0.5 1.0 1.5 2.0 2.5 3.0 t 0 2000 4000 L(t) Score Lipschitz Constants psquares prings pmoons pconc 0.0 0.5 1.0 1.5 2.0 2.5 3.0 t 0 1 2 L(t) view at source ↗

**Figure 5.** Figure 5: Lipschitz constants of sθ,t(x) over time on different y-scales. L = 0.1 L = 0.25 L = 0.5 L = 0.75 L = 0.95 SDM prings L1 0.881 0.151 0.051 0.053 0.044 0.126 DKL 0.570 0.054 0.007 0.008 0.004 0.014 psquares L1 0.595 0.354 0.229 0.214 0.172 0.159 DKL 0.403 0.196 0.111 0.098 0.072 0.075 pmoons L1 0.376 0.126 0.053 0.084 0.145 0.122 DKL 0.155 0.017 0.003 0.005 0.013 0.012 pconc L1 0.722 0.140 0.069 0.047 0.044… view at source ↗

read the original abstract

Many normalizing flow architectures impose regularity constraints, yet their distributional approximation properties are not fully characterized. We study the expressivity of bi-Lipschitz normalizing flows through the lens of score-based diffusion models. For the probability flow ODE of a variance-preserving diffusion, Lipschitz regularity of the score induces a flow of bi-Lipschitz diffeomorphic transport maps. This ODE bridge allows us to analyze the distributional approximation power of bi-Lipschitz normalizing flows and, conversely, derive deterministic convergence guarantees for diffusion-based transport. Our key idea is to use the probability flow ODE to link regularity of the score to regularity of the induced transport maps. We verify score regularity for broad target densities, including compactly supported densities, Gaussian convolutions of compactly supported measures and finite Gaussian mixtures. We obtain a universal distributional approximation result: Gaussian pullbacks induced by bi-Lipschitz variance-preserving transport maps are $L^1$-dense among all probability densities. For Gaussian convolution targets, we further obtain convergence in Kullback-Leibler divergence without early stopping.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper links bi-Lipschitz normalizing flows to score-based diffusion via the probability flow ODE to get L1-density of Gaussian pullbacks and KL convergence without early stopping.

read the letter

The paper links bi-Lipschitz normalizing flows to score-based diffusion by showing that Lipschitz scores produce bi-Lipschitz transport maps via the probability flow ODE. It then uses this to get an L1-universal approximation theorem for Gaussian pullbacks and KL convergence without early stopping for Gaussian convolution targets. What stands out is the verification of score regularity for compactly supported densities, their convolutions with Gaussians, and finite mixtures. This step grounds the bi-Lipschitz property and lets the approximation results follow from standard ODE theory. The authors also derive deterministic convergence guarantees for the diffusion transport in the other direction. The new contributions are the L1-density result and the KL statement that avoids early stopping. These do not appear in the existing literature on flows or diffusion models, so the ODE bridge is doing real work here. The main soft spot is the reliance on Lipschitz continuity of the score for those density classes. The paper claims to verify it, but the behavior of the Lipschitz constant as noise level goes to zero for non-smooth compact supports could use tighter tracking. If the constant grows too fast the uniform bi-Lipschitz control weakens, though the density argument may still hold with care. The derivations look non-circular and rest on standard existence results. This is for theorists in generative modeling who want precise statements on approximation power and convergence. Readers interested in the math behind flows and diffusion will find the connection useful. It deserves a serious referee because the results are concrete and the approach is honest about its assumptions. I recommend putting it through peer review. The core claims look worth the time of experts who can verify the regularity proofs and bounds.

Referee Report

3 major / 2 minor

Summary. The paper links bi-Lipschitz normalizing flows to score-based diffusion models by showing that Lipschitz regularity of the score for a variance-preserving diffusion induces a flow of bi-Lipschitz diffeomorphic transport maps via the probability flow ODE. This connection yields a universal approximation theorem: Gaussian pullbacks induced by such bi-Lipschitz variance-preserving maps are L¹-dense among all probability densities. For targets that are Gaussian convolutions of compactly supported measures, the same framework gives KL convergence without early stopping. The key technical step is verification that the score remains Lipschitz for compactly supported densities, their Gaussian convolutions, and finite Gaussian mixtures.

Significance. If the regularity verifications and ODE-induced bi-Lipschitz property hold with explicit constants, the work supplies a rigorous expressivity characterization for a practically relevant subclass of normalizing flows and supplies deterministic convergence guarantees for diffusion-based transport. The L¹-density result and the no-early-stopping KL claim are non-trivial contributions that could inform architecture design in both normalizing-flow and diffusion literature.

major comments (3)

[verification of score regularity (abstract and main technical sections)] The load-bearing step is the claim that the score remains Lipschitz (with a constant independent of or controlled as t→0) for compactly supported target densities. Without an explicit bound on the Lipschitz constant of the score or a demonstration that it does not diverge as the noise level vanishes, the induced transport maps lose the global bi-Lipschitz property required for the L¹-density argument.
[universal approximation result] The universal L¹-density statement for Gaussian pullbacks is asserted after the ODE bridge; however, the manuscript must supply the precise density class on which the pullback is taken and the topology in which density is measured, because the bi-Lipschitz maps are constructed only for the verified score-regular classes.
[KL convergence section] For the KL-convergence claim on Gaussian-convolution targets, the paper states convergence without early stopping, yet no explicit error bound or rate is referenced in the abstract. If the proof relies on the same Lipschitz score assumption, the same potential divergence issue as t→0 must be ruled out explicitly.

minor comments (2)

Notation for the variance-preserving diffusion and the probability-flow ODE should be introduced with a short self-contained paragraph early in the paper to aid readers unfamiliar with the diffusion literature.
[Abstract] The abstract uses the phrase 'we verify score regularity'; the corresponding theorem or proposition number should be cited in the abstract for immediate traceability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thorough review and valuable feedback on our manuscript. We have carefully considered each major comment and provide point-by-point responses below, along with indications of the revisions we intend to make to address the concerns raised.

read point-by-point responses

Referee: [verification of score regularity (abstract and main technical sections)] The load-bearing step is the claim that the score remains Lipschitz (with a constant independent of or controlled as t→0) for compactly supported target densities. Without an explicit bound on the Lipschitz constant of the score or a demonstration that it does not diverge as the noise level vanishes, the induced transport maps lose the global bi-Lipschitz property required for the L¹-density argument.

Authors: We appreciate the referee's identification of this critical technical point. In Proposition 3.3 and the surrounding discussion in Section 3, we verify that the score is Lipschitz for compactly supported densities under the variance-preserving diffusion, deriving the score explicitly via the convolution with the Gaussian kernel. The resulting bound depends on the radius of the support and the diffusion schedule but remains finite for each fixed t > 0. To ensure the global bi-Lipschitz property of the induced transport maps, we integrate the time-dependent Lipschitz constant along the probability flow ODE; the integrated quantity stays controlled because the vector field remains integrable. Nevertheless, we acknowledge that an explicit, t-independent statement of the bound is not highlighted in the main text. In the revision we will add a dedicated lemma with the explicit constant and a remark confirming that the bi-Lipschitz constants of the flow maps remain uniformly bounded as t → 0. revision: yes
Referee: [universal approximation result] The universal L¹-density statement for Gaussian pullbacks is asserted after the ODE bridge; however, the manuscript must supply the precise density class on which the pullback is taken and the topology in which density is measured, because the bi-Lipschitz maps are constructed only for the verified score-regular classes.

Authors: We agree that greater precision is required. The L¹-density claim is made with respect to the space of all probability densities on R^d that are absolutely continuous with respect to Lebesgue measure and possess finite first moments; the topology is the L¹ norm. The bi-Lipschitz maps are indeed constructed only for the three verified classes (compactly supported densities, their Gaussian convolutions, and finite Gaussian mixtures). Because these classes are dense in the L¹ topology among all integrable densities, the corresponding Gaussian pullbacks remain dense. In the revision we will insert a clarifying paragraph immediately after the statement of the universal approximation theorem that explicitly identifies the ambient function space, the L¹ topology, and the density argument that extends the result from the verified classes to all densities. revision: yes
Referee: [KL convergence section] For the KL-convergence claim on Gaussian-convolution targets, the paper states convergence without early stopping, yet no explicit error bound or rate is referenced in the abstract. If the proof relies on the same Lipschitz score assumption, the same potential divergence issue as t→0 must be ruled out explicitly.

Authors: The abstract asserts qualitative convergence in KL divergence without early stopping; no quantitative rate is claimed or derived. In Section 4 the proof proceeds by showing that the KL divergence is monotonically decreasing along the probability-flow ODE and tends to zero as t → 0, using the Lipschitz regularity of the score to guarantee global existence of the ODE. For Gaussian-convolution targets the score is in fact C^∞ and its Lipschitz constant remains bounded as t → 0 because the Gaussian kernel smooths the compactly supported measure uniformly. We will add an explicit paragraph in the revision that records this boundedness for the Gaussian-convolution case and thereby rules out divergence of the Lipschitz constant. Because no rate is obtained in the current analysis, we do not plan to insert a rate statement into the abstract, but we will ensure the convergence claim is accompanied by the required regularity justification. revision: partial

Circularity Check

0 steps flagged

No circularity; derivations use standard ODE theory and independently verified regularity assumptions

full rationale

The paper connects score Lipschitz continuity to bi-Lipschitz transport maps through the probability flow ODE, invoking standard existence/uniqueness results for ODEs under Lipschitz conditions. It separately verifies the score regularity assumption for the target classes (compactly supported densities, their Gaussian convolutions, and finite mixtures) and then derives the L1-density of Gaussian pullbacks and KL convergence without early stopping. No step reduces by construction to a fitted parameter, self-definition, or load-bearing self-citation; the approximation results follow directly from the verified assumptions and ODE bridge without circular dependence on the target claims.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard mathematical assumptions from ODE theory and regularity conditions in diffusion models. No free parameters are fitted to data, and no new entities are postulated.

axioms (2)

domain assumption Lipschitz continuity of the score function induces bi-Lipschitz regularity on the probability flow ODE transport maps
This is the key linking assumption invoked to transfer regularity from the score to the flow maps.
standard math Existence and uniqueness of solutions to the probability flow ODE under Lipschitz conditions
Standard Picard-Lindelöf type result from differential equations used to guarantee the transport maps are well-defined diffeomorphisms.

pith-pipeline@v0.9.0 · 5495 in / 1521 out tokens · 49688 ms · 2026-05-08T05:16:29.396842+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

49 extracted references · 6 canonical work pages

[1]

Glow: Generative flow with invertible 1×1 convolutions

D. P. Kingma and P. Dhariwal. “Glow: Generative flow with invertible 1×1 convolutions”. In:Advances in Neural Information Processing Systems (NeurIPS). Vol. 31. 2018

2018
[2]

A hierarchical latent vector model for learning long-term structure in music

A. Roberts et al. “A hierarchical latent vector model for learning long-term structure in music”. In: International Conference of Machine Learning (ICML). 2018

2018
[3]

VideoFlow: A flow-based generative model for video

M. Kumar et al. “VideoFlow: A flow-based generative model for video”. In:Workshop on Invertible Neural Networks and Normalizing Flows. ICML Workshop. 2019

2019
[4]

Hyungjin Chung, Jeongsol Kim, and Jong Chul Ye

H. Chung, J. Kim, and J. C. Ye. “Diffusion models for inverse problems”. In:arXiv preprint arXiv:2508.01975 (2025)

work page arXiv 2025
[5]

Noise flow: Noise modeling with conditional normalizing flows

A. Abdelhamed, M. A. Brubaker, and M. S. Brown. “Noise flow: Noise modeling with conditional normalizing flows”. In:Proceedings of the IEEE International Conference on Computer Vision. 2019, pp. 3165–3173

2019
[6]

Learned discrepancy reconstruction and benchmark dataset for magnetic particle imag- ing

M. Iske et al. “Learned discrepancy reconstruction and benchmark dataset for magnetic particle imag- ing”. In:IEEE Transactions on Computational Imaging11 (2025), pp. 1059–1073

2025
[7]

Auto-encoding variational Bayes

D. P. Kingma and M. Welling. “Auto-encoding variational Bayes”. In:International Conference on Learning Representations (ICLR). 2014. 32

2014
[8]

Generative adversarial nets

I. Goodfellow et al. “Generative adversarial nets”. In:Advances in Neural Information Processing Systems (NeurIPS). 2014

2014
[9]

Variational inference with normalizing flows

D. J. Rezende and S. Mohamed. “Variational inference with normalizing flows”. In:International Conference on Machine Learning (ICML). Vol. 37. 2015

2015
[10]

Score-based generative modeling through stochastic differential equations

Y. Song et al. “Score-based generative modeling through stochastic differential equations”. In:Inter- national Conference on Learning Representations (ICLR). 2021

2021
[11]

Deep unsupervised learning using nonequilibrium thermodynamics

J. Sohl-Dickstein et al. “Deep unsupervised learning using nonequilibrium thermodynamics”. In:In- ternational Conference on Machine Learning (ICML). 2015

2015
[12]

Normalizing flows for probabilistic modeling and inference

G. Papamakarios et al. “Normalizing flows for probabilistic modeling and inference”. In:Journal of Machine Learning Research22 (2021), pp. 1–64

2021
[13]

NICE: Non-linear independent components estimation

L. Dinh, D. Krueger, and Y. Bengio. “NICE: Non-linear independent components estimation”. In: International Conference on Learning Representations (ICLR) Workshop. 2015

2015
[14]

Invertible residual networks

J. Behrmann et al. “Invertible residual networks”. In:International Conference on Machine Learning (ICML). 2019

2019
[15]

Invertible residual networks in the context of regularization theory for linear inverse problems

C. Arndt et al. “Invertible residual networks in the context of regularization theory for linear inverse problems”. In:Inverse Problems39.12 (2023)

2023
[16]

Remarks on a multivariate transformation

M. Rosenblatt. “Remarks on a multivariate transformation”. In:The Annals of Mathematical Statistics 23.3 (1952), pp. 470–472

1952
[17]

Contributions to the theory of convex bodies

H. Knothe. “Contributions to the theory of convex bodies”. In:Michigan Mathematical Journal4 (1957), pp. 39–52

1957
[18]

Polar factorization and monotone rearrangement of vector-valued functions

Y. Brenier. “Polar factorization and monotone rearrangement of vector-valued functions”. In:Com- munications on Pure and Applied Mathematics44.4 (1991), pp. 375–417

1991
[19]

Monotonicity properties of optimal transportation and the FKG and related inequal- ities

L. A. Caffarelli. “Monotonicity properties of optimal transportation and the FKG and related inequal- ities”. In:Communications in Mathematical Physics214 (2000), pp. 547–563

2000
[20]

On a partial differential equation involving the jacobian determinant

B. Dacorogna and J. Moser. “On a partial differential equation involving the jacobian determinant”. In:Annales de l’I. H. P.1.7 (1990), pp. 1–26

1990
[21]

Relaxing bijectivity constraints with continuously indexed normalising flows

R. Cornish et al. “Relaxing bijectivity constraints with continuously indexed normalising flows”. In: International Conference on Machine Learning (ICML). Vol. 119. 2020, pp. 2133–2143

2020
[22]

The expressive power of a class of normalizing flow models

Z. Kong and K. Chaudhuri. “The expressive power of a class of normalizing flow models”. In:Inter- national Conference on Artificial Intelligence and Statistics. Vol. 108. 2020, pp. 3599–3609

2020
[23]

On the expressivity of bi-Lipschitz normalizing flows

A. Verine et al. “On the expressivity of bi-Lipschitz normalizing flows”. In:Proceedings of Machine Learning Research. Vol. 189. 2022

2022
[24]

Denoising diffusion probabilistic models

J. Ho, A. Jain, and P. Abbeel. “Denoising diffusion probabilistic models”. In:Advances in Neural Information Processing Systems (NeurIPS). 2020

2020
[25]

Neural ordinary differential equations

R. T. Q. Chen et al. “Neural ordinary differential equations”. In:Advances in Neural Information Processing Systems (NeurIPS). Vol. 31. 2018

2018
[26]

Improved analysis of score-based generative modeling: User-friendly bounds under minimal smoothness assumptions

H. Chen, H. Lee, and J. Lu. “Improved analysis of score-based generative modeling: User-friendly bounds under minimal smoothness assumptions”. In:International Conference on Machine Learn- ing (ICML). Vol. 202. Proceedings of Machine Learning Research. PMLR, 2023, pp. 4735–4763

2023
[27]

Sampling is as easy as learning the score: Theory for diffusion models with minimal data assumptions

S. Chen et al. “Sampling is as easy as learning the score: Theory for diffusion models with minimal data assumptions”. In:International Conference on Learning Representations (ICLR). 2023

2023
[28]

Convergence of score-based generative modeling for general data distribu- tions

H. Lee, J. Lu, and Y. Tan. “Convergence of score-based generative modeling for general data distribu- tions”. In:International Conference on Algorithmic Learning Theory. PMLR, 2023, pp. 946–985

2023
[29]

KL convergence guarantees for score diffusion models under minimal data assumptions

G. Conforti, A. Durmus, and M. Gentiloni Silveri. “KL convergence guarantees for score diffusion models under minimal data assumptions”. In:SIAM Journal on Mathematics of Data Science7.1 (2025), pp. 86–109. 33

2025
[30]

Bakry, I

D. Bakry, I. Gentil, and M. Ledoux.Analysis and Geometry of Markov Diffusion Operators. Vol. 348. Grundlehren der mathematischen Wissenschaften. Cham: Springer, 2014

2014
[31]

Coupling-based invertible neural networks are universal diffeomorphism approxi- mators

T. Teshima et al. “Coupling-based invertible neural networks are universal diffeomorphism approxi- mators”. In:Advances in Neural Information Processing Systems (NeurIPS). Vol. 33. 2020

2020
[32]

Universal approximation property of invertible neural networks

I. Ishikawa et al. “Universal approximation property of invertible neural networks”. In:Journal of Machine Learning Research24 (2023), pp. 1–68

2023
[33]

Approximation capabilities of neural ODEs and invertible residual networks

H. Zhang et al. “Approximation capabilities of neural ODEs and invertible residual networks”. In: International Conference on Machine Learning (ICML). Vol. 119. 2020, pp. 11086–11095

2020
[34]

On the approximation of bi-Lipschitz maps by invertible neural networks

B. Jin, Z. Zhou, and J. Zou. “On the approximation of bi-Lipschitz maps by invertible neural networks”. In:Neural Networks174 (2024)

2024
[35]

Universal approximation of residual flows in maximum mean discrepancy

Z. Kong and K. Chaudhuri. “Universal approximation of residual flows in maximum mean discrepancy”. In:ICML 2021 Workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models. 2021

2021
[36]

On the uni- versality of volume-preserving and coupling-based normalizing flows,

F. Draxler et al. “On the universality of coupling-based normalizing flows”. In:arXiv preprint arXiv:2402.06578 (2024)

work page arXiv 2024
[37]

Nearlyd-linear convergence bounds for diffusion models via stochastic localization

J. Benton et al. “Nearlyd-linear convergence bounds for diffusion models via stochastic localization”. In:International Conference on Learning Representations (ICLR). 2024

2024
[38]

Advancing Wasserstein convergence analysis of score-based models: Insights from discretization and second-order acceleration

Y. Yu and L. Yu. “Advancing Wasserstein convergence analysis of score-based models: Insights from discretization and second-order acceleration”. In:arXiv preprint arXiv:2502.04849(2025)

work page arXiv 2025
[39]

Convergence in KL divergence of the inexact Langevin algorithm with application to score-based generative models

K. Y. Yang and A. Wibisono. “Convergence in KL divergence of the inexact Langevin algorithm with application to score-based generative models”. In:arXiv preprint arXiv:2211.01512(2022)

work page arXiv 2022
[40]

Convergence analysis for general probability flow ODEs of diffusion models in Wasserstein distances

X. Gao and L. Zhu. “Convergence analysis for general probability flow ODEs of diffusion models in Wasserstein distances”. In:International Conference on Artificial Intelligence and Statistics. PMLR, 2025, pp. 1009–1017

2025
[41]

The probability flow ODE is provably fast

S. Chen et al. “The probability flow ODE is provably fast”. In:Advances in Neural Information Processing Systems (NeurIPS). 2023

2023
[42]

Improved convergence of score-based diffusion models via prediction-correction

F. Pedrotti, J. Maas, and M. Mondelli. “Improved convergence of score-based diffusion models via prediction-correction”. In:Transactions on Machine Learning Research(2024)

2024
[43]

Heat flow, log-concavity, and Lipschitz transport maps

G. Brigati and F. Pedrotti. “Heat flow, log-concavity, and Lipschitz transport maps”. In:Electronic Communications in Probability30 (2025)

2025
[44]

Global well-posedness and convergence analysis of score-based generative models via sharp Lipschitz estimates

C. Mooney et al. “Global well-posedness and convergence analysis of score-based generative models via sharp Lipschitz estimates”. In:arXiv preprint arXiv:2405.16104(2024)

work page arXiv 2024
[45]

Regularity of the score function in generative models

A. St´ ephanovitch. “Regularity of the score function in generative models”. In:arXiv preprint arXiv:2506.19559 (2025)

work page arXiv 2025
[46]

Spectral normalization for generative adversarial networks

T. Miyato et al. “Spectral normalization for generative adversarial networks”. In:International Con- ference on Learning Representations (ICLR). 2018

2018
[47]

Fourier features let networks learn high frequency functions in low dimensional domains

M. Tancik et al. “Fourier features let networks learn high frequency functions in low dimensional domains”. In:Advances in Neural Information Processing Systems (NeurIPS). 2020

2020
[48]

Amann.Ordinary Differential Equations: An Introduction to Nonlinear Analysis

H. Amann.Ordinary Differential Equations: An Introduction to Nonlinear Analysis. Berlin, New York: Walter de Gruyter, 1990

1990
[49]

Øksendal.Stochastic Differential Equations

B. Øksendal.Stochastic Differential Equations. 6th ed. Universitext. Berlin: Springer, 2003. 34

2003

[1] [1]

Glow: Generative flow with invertible 1×1 convolutions

D. P. Kingma and P. Dhariwal. “Glow: Generative flow with invertible 1×1 convolutions”. In:Advances in Neural Information Processing Systems (NeurIPS). Vol. 31. 2018

2018

[2] [2]

A hierarchical latent vector model for learning long-term structure in music

A. Roberts et al. “A hierarchical latent vector model for learning long-term structure in music”. In: International Conference of Machine Learning (ICML). 2018

2018

[3] [3]

VideoFlow: A flow-based generative model for video

M. Kumar et al. “VideoFlow: A flow-based generative model for video”. In:Workshop on Invertible Neural Networks and Normalizing Flows. ICML Workshop. 2019

2019

[4] [4]

Hyungjin Chung, Jeongsol Kim, and Jong Chul Ye

H. Chung, J. Kim, and J. C. Ye. “Diffusion models for inverse problems”. In:arXiv preprint arXiv:2508.01975 (2025)

work page arXiv 2025

[5] [5]

Noise flow: Noise modeling with conditional normalizing flows

A. Abdelhamed, M. A. Brubaker, and M. S. Brown. “Noise flow: Noise modeling with conditional normalizing flows”. In:Proceedings of the IEEE International Conference on Computer Vision. 2019, pp. 3165–3173

2019

[6] [6]

Learned discrepancy reconstruction and benchmark dataset for magnetic particle imag- ing

M. Iske et al. “Learned discrepancy reconstruction and benchmark dataset for magnetic particle imag- ing”. In:IEEE Transactions on Computational Imaging11 (2025), pp. 1059–1073

2025

[7] [7]

Auto-encoding variational Bayes

D. P. Kingma and M. Welling. “Auto-encoding variational Bayes”. In:International Conference on Learning Representations (ICLR). 2014. 32

2014

[8] [8]

Generative adversarial nets

I. Goodfellow et al. “Generative adversarial nets”. In:Advances in Neural Information Processing Systems (NeurIPS). 2014

2014

[9] [9]

Variational inference with normalizing flows

D. J. Rezende and S. Mohamed. “Variational inference with normalizing flows”. In:International Conference on Machine Learning (ICML). Vol. 37. 2015

2015

[10] [10]

Score-based generative modeling through stochastic differential equations

Y. Song et al. “Score-based generative modeling through stochastic differential equations”. In:Inter- national Conference on Learning Representations (ICLR). 2021

2021

[11] [11]

Deep unsupervised learning using nonequilibrium thermodynamics

J. Sohl-Dickstein et al. “Deep unsupervised learning using nonequilibrium thermodynamics”. In:In- ternational Conference on Machine Learning (ICML). 2015

2015

[12] [12]

Normalizing flows for probabilistic modeling and inference

G. Papamakarios et al. “Normalizing flows for probabilistic modeling and inference”. In:Journal of Machine Learning Research22 (2021), pp. 1–64

2021

[13] [13]

NICE: Non-linear independent components estimation

L. Dinh, D. Krueger, and Y. Bengio. “NICE: Non-linear independent components estimation”. In: International Conference on Learning Representations (ICLR) Workshop. 2015

2015

[14] [14]

Invertible residual networks

J. Behrmann et al. “Invertible residual networks”. In:International Conference on Machine Learning (ICML). 2019

2019

[15] [15]

Invertible residual networks in the context of regularization theory for linear inverse problems

C. Arndt et al. “Invertible residual networks in the context of regularization theory for linear inverse problems”. In:Inverse Problems39.12 (2023)

2023

[16] [16]

Remarks on a multivariate transformation

M. Rosenblatt. “Remarks on a multivariate transformation”. In:The Annals of Mathematical Statistics 23.3 (1952), pp. 470–472

1952

[17] [17]

Contributions to the theory of convex bodies

H. Knothe. “Contributions to the theory of convex bodies”. In:Michigan Mathematical Journal4 (1957), pp. 39–52

1957

[18] [18]

Polar factorization and monotone rearrangement of vector-valued functions

Y. Brenier. “Polar factorization and monotone rearrangement of vector-valued functions”. In:Com- munications on Pure and Applied Mathematics44.4 (1991), pp. 375–417

1991

[19] [19]

Monotonicity properties of optimal transportation and the FKG and related inequal- ities

L. A. Caffarelli. “Monotonicity properties of optimal transportation and the FKG and related inequal- ities”. In:Communications in Mathematical Physics214 (2000), pp. 547–563

2000

[20] [20]

On a partial differential equation involving the jacobian determinant

B. Dacorogna and J. Moser. “On a partial differential equation involving the jacobian determinant”. In:Annales de l’I. H. P.1.7 (1990), pp. 1–26

1990

[21] [21]

Relaxing bijectivity constraints with continuously indexed normalising flows

R. Cornish et al. “Relaxing bijectivity constraints with continuously indexed normalising flows”. In: International Conference on Machine Learning (ICML). Vol. 119. 2020, pp. 2133–2143

2020

[22] [22]

The expressive power of a class of normalizing flow models

Z. Kong and K. Chaudhuri. “The expressive power of a class of normalizing flow models”. In:Inter- national Conference on Artificial Intelligence and Statistics. Vol. 108. 2020, pp. 3599–3609

2020

[23] [23]

On the expressivity of bi-Lipschitz normalizing flows

A. Verine et al. “On the expressivity of bi-Lipschitz normalizing flows”. In:Proceedings of Machine Learning Research. Vol. 189. 2022

2022

[24] [24]

Denoising diffusion probabilistic models

J. Ho, A. Jain, and P. Abbeel. “Denoising diffusion probabilistic models”. In:Advances in Neural Information Processing Systems (NeurIPS). 2020

2020

[25] [25]

Neural ordinary differential equations

R. T. Q. Chen et al. “Neural ordinary differential equations”. In:Advances in Neural Information Processing Systems (NeurIPS). Vol. 31. 2018

2018

[26] [26]

Improved analysis of score-based generative modeling: User-friendly bounds under minimal smoothness assumptions

H. Chen, H. Lee, and J. Lu. “Improved analysis of score-based generative modeling: User-friendly bounds under minimal smoothness assumptions”. In:International Conference on Machine Learn- ing (ICML). Vol. 202. Proceedings of Machine Learning Research. PMLR, 2023, pp. 4735–4763

2023

[27] [27]

Sampling is as easy as learning the score: Theory for diffusion models with minimal data assumptions

S. Chen et al. “Sampling is as easy as learning the score: Theory for diffusion models with minimal data assumptions”. In:International Conference on Learning Representations (ICLR). 2023

2023

[28] [28]

Convergence of score-based generative modeling for general data distribu- tions

H. Lee, J. Lu, and Y. Tan. “Convergence of score-based generative modeling for general data distribu- tions”. In:International Conference on Algorithmic Learning Theory. PMLR, 2023, pp. 946–985

2023

[29] [29]

KL convergence guarantees for score diffusion models under minimal data assumptions

G. Conforti, A. Durmus, and M. Gentiloni Silveri. “KL convergence guarantees for score diffusion models under minimal data assumptions”. In:SIAM Journal on Mathematics of Data Science7.1 (2025), pp. 86–109. 33

2025

[30] [30]

Bakry, I

D. Bakry, I. Gentil, and M. Ledoux.Analysis and Geometry of Markov Diffusion Operators. Vol. 348. Grundlehren der mathematischen Wissenschaften. Cham: Springer, 2014

2014

[31] [31]

Coupling-based invertible neural networks are universal diffeomorphism approxi- mators

T. Teshima et al. “Coupling-based invertible neural networks are universal diffeomorphism approxi- mators”. In:Advances in Neural Information Processing Systems (NeurIPS). Vol. 33. 2020

2020

[32] [32]

Universal approximation property of invertible neural networks

I. Ishikawa et al. “Universal approximation property of invertible neural networks”. In:Journal of Machine Learning Research24 (2023), pp. 1–68

2023

[33] [33]

Approximation capabilities of neural ODEs and invertible residual networks

H. Zhang et al. “Approximation capabilities of neural ODEs and invertible residual networks”. In: International Conference on Machine Learning (ICML). Vol. 119. 2020, pp. 11086–11095

2020

[34] [34]

On the approximation of bi-Lipschitz maps by invertible neural networks

B. Jin, Z. Zhou, and J. Zou. “On the approximation of bi-Lipschitz maps by invertible neural networks”. In:Neural Networks174 (2024)

2024

[35] [35]

Universal approximation of residual flows in maximum mean discrepancy

Z. Kong and K. Chaudhuri. “Universal approximation of residual flows in maximum mean discrepancy”. In:ICML 2021 Workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models. 2021

2021

[36] [36]

On the uni- versality of volume-preserving and coupling-based normalizing flows,

F. Draxler et al. “On the universality of coupling-based normalizing flows”. In:arXiv preprint arXiv:2402.06578 (2024)

work page arXiv 2024

[37] [37]

Nearlyd-linear convergence bounds for diffusion models via stochastic localization

J. Benton et al. “Nearlyd-linear convergence bounds for diffusion models via stochastic localization”. In:International Conference on Learning Representations (ICLR). 2024

2024

[38] [38]

Advancing Wasserstein convergence analysis of score-based models: Insights from discretization and second-order acceleration

Y. Yu and L. Yu. “Advancing Wasserstein convergence analysis of score-based models: Insights from discretization and second-order acceleration”. In:arXiv preprint arXiv:2502.04849(2025)

work page arXiv 2025

[39] [39]

Convergence in KL divergence of the inexact Langevin algorithm with application to score-based generative models

K. Y. Yang and A. Wibisono. “Convergence in KL divergence of the inexact Langevin algorithm with application to score-based generative models”. In:arXiv preprint arXiv:2211.01512(2022)

work page arXiv 2022

[40] [40]

Convergence analysis for general probability flow ODEs of diffusion models in Wasserstein distances

X. Gao and L. Zhu. “Convergence analysis for general probability flow ODEs of diffusion models in Wasserstein distances”. In:International Conference on Artificial Intelligence and Statistics. PMLR, 2025, pp. 1009–1017

2025

[41] [41]

The probability flow ODE is provably fast

S. Chen et al. “The probability flow ODE is provably fast”. In:Advances in Neural Information Processing Systems (NeurIPS). 2023

2023

[42] [42]

Improved convergence of score-based diffusion models via prediction-correction

F. Pedrotti, J. Maas, and M. Mondelli. “Improved convergence of score-based diffusion models via prediction-correction”. In:Transactions on Machine Learning Research(2024)

2024

[43] [43]

Heat flow, log-concavity, and Lipschitz transport maps

G. Brigati and F. Pedrotti. “Heat flow, log-concavity, and Lipschitz transport maps”. In:Electronic Communications in Probability30 (2025)

2025

[44] [44]

Global well-posedness and convergence analysis of score-based generative models via sharp Lipschitz estimates

C. Mooney et al. “Global well-posedness and convergence analysis of score-based generative models via sharp Lipschitz estimates”. In:arXiv preprint arXiv:2405.16104(2024)

work page arXiv 2024

[45] [45]

Regularity of the score function in generative models

A. St´ ephanovitch. “Regularity of the score function in generative models”. In:arXiv preprint arXiv:2506.19559 (2025)

work page arXiv 2025

[46] [46]

Spectral normalization for generative adversarial networks

T. Miyato et al. “Spectral normalization for generative adversarial networks”. In:International Con- ference on Learning Representations (ICLR). 2018

2018

[47] [47]

Fourier features let networks learn high frequency functions in low dimensional domains

M. Tancik et al. “Fourier features let networks learn high frequency functions in low dimensional domains”. In:Advances in Neural Information Processing Systems (NeurIPS). 2020

2020

[48] [48]

Amann.Ordinary Differential Equations: An Introduction to Nonlinear Analysis

H. Amann.Ordinary Differential Equations: An Introduction to Nonlinear Analysis. Berlin, New York: Walter de Gruyter, 1990

1990

[49] [49]

Øksendal.Stochastic Differential Equations

B. Øksendal.Stochastic Differential Equations. 6th ed. Universitext. Berlin: Springer, 2003. 34

2003