pith. machine review for the scientific record. sign in

arxiv: 2605.00941 · v2 · submitted 2026-05-01 · 💻 cs.LG · cs.CV

Recognition: unknown

Divergence is Uncertainty: A Closed-Form Posterior Covariance for Flow Matching

Jian Wang, Jiarui Xing, Song Wang

Authors on Pith no claims yet

Pith reviewed 2026-05-09 20:30 UTC · model grok-4.3

classification 💻 cs.LG cs.CV
keywords flow matchinggenerative modelinguncertainty estimationposterior covariancedivergencevelocity fieldclosed-form identity
0
0 comments X

The pith

For any pre-trained flow matching velocity field the trace of the posterior covariance equals the divergence of that field up to a time-dependent factor.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that uncertainty quantification in flow matching requires no extra training, ensembles, or multi-step propagation. Instead, the trace of the posterior covariance over clean data at any intermediate state is exactly the divergence of the velocity field, scaled by a known time prefactor and shifted by a constant. The full matrix form depends only on the velocity Jacobian. This identity applies post-hoc to any existing model and yields exact end-to-end uncertainty for one-step generators in a single forward pass.

Core claim

We prove that for any pre-trained flow matching velocity field the trace of the posterior covariance over the clean data given the current state equals the divergence of the velocity field up to a known time-dependent prefactor and an additive constant. We call this the divergence-uncertainty identity. Its matrix-level version is likewise closed-form and depends solely on the velocity Jacobian. The identity is exact and can be evaluated on any pre-trained model with no retraining or architectural change.

What carries the argument

The divergence-uncertainty identity, which equates the trace of the posterior covariance to the divergence of the velocity field (with known time-dependent scaling and constant offset).

If this is right

  • Any existing flow matching model can produce per-pixel uncertainty maps by a single divergence computation on its velocity field.
  • One-step generators such as MeanFlow obtain exact generation uncertainty without propagating covariance through multiple integration steps.
  • The resulting uncertainty maps concentrate on regions of high inter-sample variation such as digit boundaries.
  • Scalar uncertainty scores derived from the identity track actual prediction error at far lower cost than ensembles or Monte Carlo dropout.
  • The matrix form supplies full posterior covariance estimates from the velocity Jacobian alone.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The identity may allow divergence-based regularization during training to directly modulate output uncertainty.
  • High-divergence regions could serve as an intrinsic signal for detecting out-of-distribution inputs without auxiliary detectors.
  • The same relation might extend to other continuous-time generative frameworks that admit a velocity-field formulation.
  • Adaptive sampling schedules could use local divergence as a cheap proxy for instantaneous uncertainty to decide step sizes.

Load-bearing premise

The pre-trained velocity field must be exactly the conditional expectation of the clean data given the current state.

What would settle it

Generate many independent samples from the model at a fixed time step, compute their empirical covariance trace, and check whether it equals the divergence of the velocity field evaluated at the same states; systematic mismatch falsifies the identity.

Figures

Figures reproduced from arXiv: 2605.00941 by Jian Wang, Jiarui Xing, Song Wang.

Figure 1
Figure 1. Figure 1: Our closed-form uncertainty for flow matching. For any pre-trained flow matching model, our formula Cov(x1 | xt) = (1−t) 2 t [I + (1−t)Jvθ ] produces per-pixel uncertainty maps directly from the velocity Jacobian, with no retraining, no ensembling, and no extra forward passes. At small t (near noise) the maps are diffuse; as t grows toward the data, uncertainty progressively concentrates on digit boundarie… view at source ↗
Figure 2
Figure 2. Figure 2: Empirical scalar uncertainty U(xt, t) vs. flow time. Blue: U computed from the trained flow matching model via Eq. (22) (mean ± std over 16 test samples, 50 Hutchinson probes). Red dashed: prior baseline (1−t) 2 /t d corresponding to div vθ = 0. The 1 to 2 orders-of-magnitude gap is the quantitative footprint of the learned flow’s contractive (negative-divergence) behaviour. makes this precise: negative di… view at source ↗
Figure 4
Figure 4. Figure 4: Euler trajectory (odd rows) and corresponding Tweedie UQ maps (even rows) for four MNIST samples. Un￾certainty evolves from diffuse (early t) to boundary-localised (late t), aligning with the model’s progressive resolution of digit identity, topology, and stroke boundary. 5.3. Correlation with Prediction Error To answer Q2, we compute the Spearman rank correlation ρ between the scalar score U(xt, t) and th… view at source ↗
Figure 5
Figure 5. Figure 5: Total UQ cost (training + inference, log scale) for 16 samples. Tweedie+FM and Tweedie+MF require no retraining and produce uncertainty in a single inference pass; MC Dropout requires retraining a dropout-enabled model plus 50 stochastic passes; deep ensembles require 5 independent training runs. Our method is roughly 104× cheaper end-to-end. simultaneously (i) retraining-free, (ii) exact at the time at wh… view at source ↗
read the original abstract

Flow matching has become a leading framework for generative modeling, but quantifying the uncertainty of its samples remains an open problem. Existing approaches retrain the model with auxiliary variance heads, maintain costly ensembles, or propagate approximate covariance through many integration steps, trading off training cost, inference cost, or accuracy. We show that none of these trade-offs is necessary. We prove that, for any pre-trained flow matching velocity field, the trace of the posterior covariance over the clean data given the current state equals, in closed form, the divergence of the velocity field, up to a known time-dependent prefactor and an additive constant. We call this the \emph{divergence-uncertainty identity} for flow matching. The matrix-level form of the identity is similarly closed-form, depending solely on the velocity Jacobian. Because the identity is exact and post-hoc, it is computable on any pre-trained flow matching model, with no retraining and no architectural modification. For one-step generators such as MeanFlow, the same identity yields the exact end-to-end generation uncertainty in a single forward pass, eliminating the multi-step variance propagation required by all prior methods. Experiments on MNIST confirm that the resulting per-pixel uncertainty maps are semantically meaningful, concentrating on digit boundaries where inter-sample variation is highest, and that the scalar uncertainty score tracks actual prediction error, all at roughly 10,000$\times$ less total compute than ensembling or Monte Carlo dropout.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper claims to derive a closed-form 'divergence-uncertainty identity' for flow matching: for any pre-trained velocity field v_t, the trace of the posterior covariance of the clean data x_1 given the current state x_t equals a known time-dependent prefactor times the divergence of v_t plus an additive constant. A matrix version depends only on the velocity Jacobian. The identity is presented as exact and post-hoc, enabling uncertainty computation without retraining, ensembles, or multi-step propagation; it is also applied to one-step generators like MeanFlow. MNIST experiments are used to show that the resulting per-pixel uncertainty maps are semantically meaningful and that a scalar uncertainty score correlates with prediction error.

Significance. If the identity holds exactly under the stated conditions, the result would be significant for uncertainty quantification in flow-matching generative models, removing the need for auxiliary heads, ensembles, or expensive propagation. The closed-form nature, applicability to any pre-trained model, and single-pass computation for one-step generators are notable strengths. The approach also provides a direct link between the velocity field's divergence and posterior uncertainty, which could be useful for analysis and downstream tasks.

major comments (3)
  1. [Abstract] Abstract: The statement that the identity holds 'for any pre-trained flow matching velocity field' is not supported by the derivation. The equality trace(cov(x_1 | x_t)) = c(t) · div(v) + const requires that v_t(x_t) exactly equals the conditional expectation E[u_t(x_t | x_1) | x_t], which holds only at the population optimum of the flow-matching objective. For any finite-training pre-trained model, optimization error means this assumption is violated and the exact identity no longer holds.
  2. [Abstract] Abstract and §3 (derivation): The manuscript states a proof exists but provides neither the explicit derivation steps nor the precise assumptions on the probability path and conditional velocity. Without these, it is impossible to verify whether the identity is exact or only approximate, and whether it requires the velocity field to be the exact conditional expectation.
  3. [Experiments] Experiments section: The MNIST results show that uncertainty maps concentrate on digit boundaries and that the scalar score tracks prediction error. However, these are only consistency checks; they do not directly compare the divergence-based covariance against ground-truth posterior covariance computed from the true conditional distribution, so they do not test the exact identity.
minor comments (2)
  1. [Abstract] Notation for the time-dependent prefactor c(t) and the additive constant should be defined explicitly in the main text rather than left implicit in the abstract.
  2. The paper should clarify whether the identity extends exactly to the matrix form of the covariance or only to its trace, and state any additional assumptions required for the Jacobian version.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. We address each of the major comments below. We agree with several points and will make revisions to clarify the assumptions and enhance the validation of the identity.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The statement that the identity holds 'for any pre-trained flow matching velocity field' is not supported by the derivation. The equality trace(cov(x_1 | x_t)) = c(t) · div(v) + const requires that v_t(x_t) exactly equals the conditional expectation E[u_t(x_t | x_1) | x_t], which holds only at the population optimum of the flow-matching objective. For any finite-training pre-trained model, optimization error means this assumption is violated and the exact identity no longer holds.

    Authors: We agree that the exact identity holds when the velocity field v_t is precisely the conditional expectation E[u_t | x_t], which is true at the population optimum of the flow-matching loss. For models trained with finite data and optimization, the identity holds only approximately. We will revise the abstract, introduction, and relevant sections to explicitly state this assumption and note that the result is exact for the optimal velocity field and approximate otherwise. This clarification is necessary. revision: yes

  2. Referee: [Abstract] Abstract and §3 (derivation): The manuscript states a proof exists but provides neither the explicit derivation steps nor the precise assumptions on the probability path and conditional velocity. Without these, it is impossible to verify whether the identity is exact or only approximate, and whether it requires the velocity field to be the exact conditional expectation.

    Authors: We will include the full step-by-step derivation in the revised §3, starting from the definition of the probability path (linear interpolation between x_0 and x_1) and the conditional velocity u_t(x_t | x_1) = (x_1 - x_t)/(1-t), showing that the posterior covariance trace equals the divergence term when v_t = E[u_t | x_t]. The assumptions will be stated clearly: the identity is exact under the flow-matching optimality condition. revision: yes

  3. Referee: [Experiments] Experiments section: The MNIST results show that uncertainty maps concentrate on digit boundaries and that the scalar score tracks prediction error. However, these are only consistency checks; they do not directly compare the divergence-based covariance against ground-truth posterior covariance computed from the true conditional distribution, so they do not test the exact identity.

    Authors: The MNIST experiments are intended to illustrate the semantic relevance of the uncertainty estimates in a real-world setting. We acknowledge that they do not constitute a direct test against ground-truth covariance, as computing the exact posterior over clean data given x_t is intractable for high-dimensional data like MNIST. To directly validate the identity, we will add a new experiment on a low-dimensional synthetic dataset (e.g., a mixture of Gaussians) where the true conditional distribution and thus the ground-truth posterior covariance can be computed analytically or via dense sampling. This will provide a numerical verification of the identity. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is a direct probabilistic identity under standard flow-matching optimality

full rationale

The claimed divergence-uncertainty identity is derived from the probabilistic definition of the flow-matching velocity field as the conditional expectation of the target velocity under the posterior p(x_1 | x_t). This is the standard population-level optimality condition of the flow-matching objective, not a self-definition, a fitted parameter, or a result smuggled in via self-citation. The trace(cov(x_1 | x_t)) = c(t) * div(v_t) + const relation follows mathematically once that expectation property is substituted into the covariance expression; it is not equivalent to the inputs by construction in any of the enumerated circular patterns. The paper's scope statement ('for any pre-trained') is an overstatement for finite models but does not create circularity in the derivation chain itself. No load-bearing self-citations or ansatzes are invoked for the central result.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the flow-matching probability path being correctly defined and the velocity field being the true conditional expectation. No free parameters are introduced in the identity itself.

axioms (2)
  • domain assumption The velocity field is the exact conditional expectation of the target velocity under the posterior at each time.
    Invoked when equating the posterior covariance to the divergence; any optimization error would violate this.
  • standard math The probability path is a valid flow-matching path with known marginals.
    Standard assumption in flow matching; required for the posterior to be well-defined.

pith-pipeline@v0.9.0 · 5561 in / 1419 out tokens · 18576 ms · 2026-05-09T20:30:57.124801+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

28 extracted references · 14 canonical work pages · 7 internal anchors

  1. [1]

    Stochastic Interpolants: A Unifying Framework for Flows and Diffusions

    Michael S Albergo and Eric Vanden-Eijnden. Stochastic interpolants: A unifying framework for flows and diffusions. arXiv preprint arXiv:2303.08797, 2023

  2. [2]

    arXiv preprint arXiv:2310.06721 , year=

    Benjamin Boys, Mark Girolami, Chris Sherlock, et al. Tweedie moment projected diffusions for inverse problems. arXiv preprint arXiv:2310.06721, 2024

  3. [3]

    Uncertainty Quantification for Distribution-to-Distribution Flow Matching in Scientific Imaging

    BSFM Authors. Uncertainty quantification for distribution- to-distribution flow matching in scientific imaging.arXiv preprint arXiv:2603.21717, 2026

  4. [4]

    ∆-LFM: Patient-specific latent flow 8 matching for interpretable disease modeling.arXiv preprint, 2025

    Anonymous Chen et al. ∆-LFM: Patient-specific latent flow 8 matching for interpretable disease modeling.arXiv preprint, 2025

  5. [5]

    Tweedie’s formula and selection bias.Journal of the American Statistical Association, 106(496):1602–1614, 2011

    Bradley Efron. Tweedie’s formula and selection bias.Journal of the American Statistical Association, 106(496):1602–1614, 2011

  6. [6]

    Eigenscore: Ood detection using covariance in diffusion models.arXiv preprint arXiv:2510.07206, 2025

    EigenScore Authors. Eigenscore: OOD detection using posterior covariance in diffusion models.arXiv preprint arXiv:2510.07206, 2025

  7. [7]

    Scaling rectified flow transformers for high-resolution image synthesis.Interna- tional Conference on Machine Learning, 2024

    Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim En- tezari, Jonas M¨uller, Harry Saini, Yam Levi, Dominik Lorber, Dustin Podell, Robin Rombach, et al. Scaling rectified flow transformers for high-resolution image synthesis.Interna- tional Conference on Machine Learning, 2024

  8. [8]

    Dropout as a Bayesian approximation: Representing model uncertainty in deep learn- ing

    Yarin Gal and Zoubin Ghahramani. Dropout as a Bayesian approximation: Representing model uncertainty in deep learn- ing. InInternational Conference on Machine Learning, pages 1050–1059, 2016

  9. [9]

    Mean Flows for One-step Generative Modeling

    Zhengyang Geng, Mingyang Deng, Xingjian Bai, J Zico Kolter, and Kaiming He. Mean flows for one-step generative modeling.arXiv preprint arXiv:2505.13447, 2025

  10. [10]

    Flow matching with uncertainty quantification and guidance.arXiv preprint arXiv:2602.10326, 2026

    Juyeop Han, Lukas Lao Beyer, and Sertac Karaman. Flow matching with uncertainty quantification and guidance.arXiv preprint arXiv:2602.10326, 2026

  11. [11]

    Denoising diffu- sion probabilistic models.Advances in Neural Information Processing Systems, 33:6840–6851, 2020

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffu- sion probabilistic models.Advances in Neural Information Processing Systems, 33:6840–6851, 2020

  12. [12]

    A stochastic estimator of the trace of the influence matrix for Laplacian smoothing splines.Com- munications in Statistics—Simulation and Computation, 18 (3):1059–1076, 1989

    Michael F Hutchinson. A stochastic estimator of the trace of the influence matrix for Laplacian smoothing splines.Com- munications in Statistics—Simulation and Computation, 18 (3):1059–1076, 1989

  13. [13]

    Quantifying epistemic uncertainty in diffusion models.arXiv preprint arXiv:2602.09170, 2026

    Metod Jazbec et al. Epistemic uncertainty in diffusion models. arXiv preprint arXiv:2602.09170, 2025

  14. [14]

    FlowDPS: Flow-Driven Posterior Sampling for Inverse Problems, March 2025

    Jeongsol Kim, Bryan Sangwoo Park, Hyungjin Chung, and Jong Chul Ye. FlowDPS: Flow-driven posterior sampling for inverse problems. InProceedings of the IEEE/CVF In- ternational Conference on Computer Vision (ICCV), 2025. arXiv:2503.08136

  15. [15]

    Bayesdiff: Estimating pixel-wise uncertainty in diffusion via bayesian inference.arXiv preprint arXiv:2310.11142, 2023

    Siqi Kou, Lei Gan, and Dequan Wang. Bayesdiff: Estimating pixel-wise uncertainty in diffusion via bayesian inference. arXiv preprint arXiv:2310.11142, 2024

  16. [16]

    Simple and scalable predictive uncertainty estima- tion using deep ensembles.Advances in Neural Information Processing Systems, 30, 2017

    Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. Simple and scalable predictive uncertainty estima- tion using deep ensembles.Advances in Neural Information Processing Systems, 30, 2017

  17. [17]

    Gradient-based learning applied to document recog- nition.Proceedings of the IEEE, 86(11):2278–2324, 1998

    Yann LeCun, L ´eon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recog- nition.Proceedings of the IEEE, 86(11):2278–2324, 1998

  18. [18]

    Flow Matching for Generative Modeling

    Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximil- ian Nickel, and Matthew Le. Flow matching for generative modeling.arXiv preprint arXiv:2210.02747, 2023

  19. [19]

    Imageflownet: Forecasting multiscale image-level trajectories of disease pro- gression with irregularly-sampled longitudinal medical im- ages.arXiv preprint arXiv:2406.14794, 2024

    Chen Liu, Ke Xu, Liangbo L Shen, Guillaume Huguet, Zilong Wang, Alexander Tong, Danielle Moyer, et al. Imageflownet: Forecasting multiscale image-level trajectories of disease pro- gression with irregularly-sampled longitudinal medical im- ages.arXiv preprint arXiv:2406.14794, 2024

  20. [20]

    Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

    Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow.arXiv preprint arXiv:2209.03003, 2023

  21. [21]

    On the posterior distribu- tion in denoising: Application to uncertainty quantification

    Hila Manor and Tomer Michaeli. On the posterior distribu- tion in denoising: Application to uncertainty quantification. International Conference on Learning Representations, 2024

  22. [22]

    Movie Gen: A Cast of Media Foundation Models

    Adam Polyak et al. Movie Gen: A cast of media foundation models.arXiv preprint arXiv:2410.13720, 2024

  23. [23]

    Free hunch: Denoiser covariance estimation for diffusion models without extra costs.arXiv preprint arXiv:2410.11149, 2024

    Severi Rissanen, Markus Heinonen, and Arno Solin. Free hunch: Denoiser covariance estimation for diffusion models without extra costs.arXiv preprint arXiv:2410.11149, 2024

  24. [24]

    An empirical Bayes approach to statistics

    Herbert E Robbins. An empirical Bayes approach to statistics. Proceedings of the Third Berkeley Symposium on Mathemati- cal Statistics and Probability, 1956

  25. [25]

    Progressive distillation for fast sampling of diffusion models.International Conference on Learning Representations, 2022

    Tim Salimans and Jonathan Ho. Progressive distillation for fast sampling of diffusion models.International Conference on Learning Representations, 2022

  26. [26]

    Score-based generative modeling through stochastic differential equations

    Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Ab- hishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. International Conference on Learning Representations, 2021

  27. [27]

    Consistency models.International Conference on Machine Learning, 2023

    Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency models.International Conference on Machine Learning, 2023

  28. [28]

    A connection between score matching and denoising autoencoders.Neural Computation, 23(7):1661– 1674, 2011

    Pascal Vincent. A connection between score matching and denoising autoencoders.Neural Computation, 23(7):1661– 1674, 2011. 9