pith. machine review for the scientific record. sign in

arxiv: 2604.24959 · v1 · submitted 2026-04-27 · 💻 cs.LG · stat.ML

Recognition: unknown

CoreFlow: Low-Rank Matrix Generative Models

Authors on Pith no claims yet

Pith reviewed 2026-05-08 04:04 UTC · model grok-4.3

classification 💻 cs.LG stat.ML
keywords low-rank matricesgenerative modelingcontinuous normalizing flowsincomplete datamatrix distributionssubspace separationfew-sample learning
0
0 comments X

The pith

CoreFlow improves matrix generation quality in few-sample regimes by flowing only on shared low-rank cores.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

High-dimensional matrix distributions are difficult to model when the number of samples is small because full-dimensional approaches become both computationally costly and statistically unreliable. CoreFlow first identifies the shared row and column subspaces that encode the common geometry across the distribution, then trains a continuous normalizing flow exclusively on the much smaller core space induced by those subspaces. This separation keeps the matrix structure intact, reduces the effective dimension dramatically, and produces better spectral and moment matching in limited-data settings while also supporting training on incomplete matrices through masked Riemannian updates. The same model stays competitive when abundant data is available and continues to work under compression to only 9 percent of the original dimension.

Core claim

CoreFlow learns shared row and column subspaces across the matrix distribution and restricts a continuous normalizing flow to the low-dimensional core they induce, using masked Riemannian updates to accommodate incomplete training matrices; this yields substantially better spectral and moment-level generation quality in few-sample regimes and remains competitive when data is plentiful.

What carries the argument

The low-rank core obtained by projecting matrices onto learned shared row and column subspaces, on which a continuous normalizing flow is trained while preserving matrix geometry.

If this is right

  • Substantially improves spectral and moment-level generation quality in few-sample regimes
  • Remains competitive in data-rich settings
  • Supports compression to 9% of ambient dimension
  • Handles up to 40% missing training entries via masked updates
  • Preserves matrix structure throughout the generative process

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach may generalize to other high-dimensional structured data with latent low-rank factors, such as tensors or covariance matrices.
  • It implies that generative modeling can often exploit shared geometry to avoid the curse of dimensionality in matrix spaces.
  • One could test extensions by applying the core-flow idea to diffusion models instead of normalizing flows.
  • The method suggests practical scaling benefits for applications like image or video generation where data can be viewed as matrices.

Load-bearing premise

The matrices in the distribution share common low-rank row and column subspaces that capture the essential geometry separate from sample-specific details.

What would settle it

A benchmark dataset of high-dimensional matrices drawn from a distribution without shared low-rank subspaces, on which CoreFlow shows no improvement or degradation relative to standard ambient-space generative models in the few-sample regime.

Figures

Figures reproduced from arXiv: 2604.24959 by Dongze Wu, Linglingzhi Zhu, Yao Xie.

Figure 1
Figure 1. Figure 1: Overview of CoreFlow. Left: learning shared row/column subspaces (U ∗ , V ∗ ) and constructing induced low-dimensional cores {Si}. Middle: learning a continuous normalizing flow in the induced core space, matching the Stage-I cores {Si} ∼ p ∗ S to a Gaussian base. Right: generation starts from Gaussian noise, reverses the learned flow to obtain a generated core Sˆ i , and decodes it with (U ∗ , V ∗ ) to pr… view at source ↗
Figure 2
Figure 2. Figure 2: True samples (left) and CoreFlow-generated samples (right) on real and synthetic benchmarks. The close visual agreement shows that CoreFlow captures the matrix distributions well even under aggressive dimensionality reduction and substantial training missingness. LSPF (80×80, data-rich). On LSPF, where training data are abundant, the gap naturally narrows. Nevertheless, view at source ↗
Figure 3
Figure 3. Figure 3: Performance–efficiency tradeoff on Solar among generative models. The horizontal axis is the learned￾dimension ratio as a proxy for generative-model cost, and the vertical axis is test singular-value discrepancy (lower is better). Full training times are reported in Appendix C.4. 10 view at source ↗
Figure 4
Figure 4. Figure 4: Patchified vs. original matrices (Solar). Patchification (Eq. (24)) reshapes local patches into rows of a patch-matrix. In the patchified view (left), many rows/columns become visually near-repetitions due to recurring local textures, while the original matrices (right) remain globally diverse. 1 2 3 4 5 6 7 8 9 10 Rank index i 10 3 10 2 10 1 10 0 N orm aliz e d eig e n v alu e i/ 1 Solar: Normalized eigen… view at source ↗
Figure 5
Figure 5. Figure 5: Consistent rank reduction via patchification. Normalized spectra (λi/λ1) show that the patchified representation decays faster: for i ≥ 2, the patchified curve lies consistently below the original, indicating that more energy concentrates in the leading components and the representation has a lower effective rank. Each matrix is cropped to Hc × Wc, divided into non-overlapping p × p patches, and rearranged… view at source ↗
Figure 6
Figure 6. Figure 6: Case Blobs view at source ↗
Figure 8
Figure 8. Figure 8: 26 view at source ↗
Figure 8
Figure 8. Figure 8: Case Waves view at source ↗
Figure 10
Figure 10. Figure 10: Comparisons of generated samples on Solar Flare with view at source ↗
Figure 11
Figure 11. Figure 11: Comparison of generated samples across methods. view at source ↗
read the original abstract

Learning matrix-valued distributions from high-dimensional and possibly incomplete training data is challenging: ambient-space generative modeling is computationally expensive and statistically fragile when the matrix dimension is large but the sample size is limited. We propose CoreFlow, a geometry-preserving low-rank flow model that learns shared row/column subspaces across the matrix distribution, and then trains a continuous normalizing flow only on the induced low-dimensional core. CoreFlow is designed for settings where shared low-rank matrix geometry is present, especially in high-dimensional limited-sample regimes. This separates shared matrix geometry from sample-specific variation, preserves matrix structure, and substantially improves training efficiency. The same framework also handles incomplete training matrices through masked Riemannian updates and iterative completion. Across real and synthetic benchmarks, CoreFlow substantially improves spectral and moment-level generation quality in few-sample regimes while remaining competitive in data-rich settings, even under compression to 9% of the ambient dimension and with up to 40% missing training entries.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes CoreFlow, a geometry-preserving low-rank generative model for matrix-valued distributions. It learns shared row/column subspaces across the training matrices, projects to an induced low-dimensional core, and trains a continuous normalizing flow only on that core; the framework also incorporates masked Riemannian updates to handle up to 40% missing entries. The central empirical claim is that CoreFlow yields substantial gains in spectral and moment-level generation quality under few-sample regimes and strong compression (down to 9% ambient dimension), while remaining competitive in data-rich settings.

Significance. If the empirical claims are robustly supported, the work would be significant for generative modeling of structured high-dimensional data (e.g., images, covariance matrices, recommender data) where sample size is limited relative to matrix dimension. By explicitly separating shared low-rank geometry from sample-specific variation and restricting the flow to the core, it offers a principled route to both statistical efficiency and computational tractability that standard ambient-space flows lack.

major comments (2)
  1. [Abstract / Experiments] Abstract and Experiments section: the central claim of 'substantial' spectral/moment improvements in few-sample regimes rests on the design assumption that shared low-rank matrix geometry is present and separable. No ablation, counter-example, or quantitative sensitivity analysis is described that measures how much the reported gains degrade when this geometry is weak or absent; without such evidence it remains unclear whether the core-flow construction itself drives the edge or whether any dimensionality reduction would suffice.
  2. [Method] Method section (presumably §3): the description of how the shared subspaces are estimated and how the continuous normalizing flow is defined on the core is not accompanied by any derivation showing that the overall procedure is parameter-free or that performance metrics are independent of the subspace estimation step. If the reported metrics are computed after fitting the subspaces to the same data used for generation, the circularity concern noted in the reader report applies directly to the load-bearing claims.
minor comments (2)
  1. [Abstract] The abstract states improvements 'across real and synthetic benchmarks' but supplies no table or figure references; the full manuscript should include explicit baseline comparisons, error bars, and ablation tables so that the magnitude of the gains can be assessed.
  2. [Method] Notation for the core projection and the masked Riemannian update should be introduced with a short equation or diagram early in the method section to improve readability for readers unfamiliar with Riemannian optimization on matrix manifolds.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major comment below and outline revisions to strengthen the manuscript's clarity and empirical support.

read point-by-point responses
  1. Referee: [Abstract / Experiments] Abstract and Experiments section: the central claim of 'substantial' spectral/moment improvements in few-sample regimes rests on the design assumption that shared low-rank matrix geometry is present and separable. No ablation, counter-example, or quantitative sensitivity analysis is described that measures how much the reported gains degrade when this geometry is weak or absent; without such evidence it remains unclear whether the core-flow construction itself drives the edge or whether any dimensionality reduction would suffice.

    Authors: The CoreFlow framework is explicitly designed for distributions exhibiting shared low-rank matrix geometry, as stated in the abstract and introduction. We agree that an explicit sensitivity analysis would strengthen the claims. In the revised manuscript we will add a controlled synthetic experiment in which we vary the strength of the shared row/column subspace alignment (by modulating subspace overlap and injecting isotropic noise) and report the resulting degradation in spectral and moment metrics. We will also include a baseline that applies generic dimensionality reduction (e.g., PCA on vectorized matrices) followed by an ambient-space flow, thereby isolating the contribution of the geometry-preserving core construction from generic compression. revision: yes

  2. Referee: [Method] Method section (presumably §3): the description of how the shared subspaces are estimated and how the continuous normalizing flow is defined on the core is not accompanied by any derivation showing that the overall procedure is parameter-free or that performance metrics are independent of the subspace estimation step. If the reported metrics are computed after fitting the subspaces to the same data used for generation, the circularity concern noted in the reader report applies directly to the load-bearing claims.

    Authors: We do not claim the procedure is parameter-free; the core dimension is a tunable hyperparameter selected via explained-variance heuristics or cross-validation on the training set, as described in the experimental protocol. Subspace estimation is performed on the training matrices and the flow is subsequently trained on the resulting cores; this is standard unsupervised practice. Generation quality is assessed on held-out test matrices using distribution-level metrics (spectral norms and moment distances) that do not reuse the fitted subspaces directly. In revision we will expand §3 with an explicit derivation of the composite training objective, clarifying the separation between subspace learning and the core flow, and we will restate the evaluation protocol to address any perceived circularity. revision: partial

Circularity Check

0 steps flagged

No circularity identified; derivation chain not inspectable from provided text

full rationale

The abstract and description present CoreFlow at a conceptual level—learning shared row/column subspaces then applying a continuous normalizing flow on the induced low-dimensional core—without any equations, parameter-fitting steps, self-citations, or derivation chains. No load-bearing claim reduces by construction to a fitted input or self-referential definition, as no mathematical steps are visible to analyze. The method's design target (presence of shared low-rank geometry) is stated explicitly but not derived from prior results within the text, leaving the approach self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit equations or sections, so free parameters, axioms, and invented entities cannot be extracted; the central claim rests on the unverified assumption of shared low-rank geometry.

pith-pipeline@v0.9.0 · 5457 in / 1123 out tokens · 56264 ms · 2026-05-08T04:04:46.262214+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

48 extracted references · 11 canonical work pages · 5 internal anchors

  1. [1]

    Princeton University Press, 2008

    P-A Absil, Robert Mahony, and Rodolphe Sepulchre.Optimization Algorithms on Matrix Manifolds. Princeton University Press, 2008

  2. [2]

    , Cottet, V

    Pierre Alquier, Vincent Cottet, Nicolas Chopin, and Judith Rousseau. Bayesian matrix comple- tion: prior specification.arXiv preprint arXiv:1406.1440, 2014. 11

  3. [3]

    A non-local algorithm for image denoising

    Antoni Buades, Bartomeu Coll, and J-M Morel. A non-local algorithm for image denoising. In2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), volume 2, pages 60–65. Ieee, 2005

  4. [4]

    Exact matrix completion via convex optimization

    Emmanuel Candes and Benjamin Recht. Exact matrix completion via convex optimization. Communications of the ACM, 55(6):111–119, 2012

  5. [5]

    The power of convex relaxation: Near-optimal matrix completion.IEEE Transactions on Information Theory, 56(5):2053–2080, 2010

    Emmanuel J Candès and Terence Tao. The power of convex relaxation: Near-optimal matrix completion.IEEE Transactions on Information Theory, 56(5):2053–2080, 2010

  6. [6]

    Neural ordinary differential equations.Advances in Neural Information Processing Systems, 31, 2018

    Ricky TQ Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. Neural ordinary differential equations.Advances in Neural Information Processing Systems, 31, 2018

  7. [7]

    Nonconvex optimization meets low-rank matrix factorization: An overview.IEEE Transactions on Signal Processing, 67(20):5239–5269, 2019

    Yuejie Chi, Yue M Lu, and Yuxin Chen. Nonconvex optimization meets low-rank matrix factorization: An overview.IEEE Transactions on Signal Processing, 67(20):5239–5269, 2019

  8. [8]

    Inference for low-rank completion without sample splitting with application to treatment effect estimation.Journal of Econometrics, 240 (1):105682, 2024

    Jungjun Choi, Hyukjun Kwon, and Yuan Liao. Inference for low-rank completion without sample splitting with application to treatment effect estimation.Journal of Econometrics, 240 (1):105682, 2024

  9. [9]

    Era5 hourly data on single levels from 1940 to present.Datos recuperados entre noviembre, 2024

    Copernicus Climate Data Store. Era5 hourly data on single levels from 1940 to present.Datos recuperados entre noviembre, 2024

  10. [10]

    Image denoising by sparse 3-d transform-domain collaborative filtering.IEEE Transactions on Image Processing, 16(8):2080–2095, 2007

    Kostadin Dabov, Alessandro Foi, Vladimir Katkovnik, and Karen Egiazarian. Image denoising by sparse 3-d transform-domain collaborative filtering.IEEE Transactions on Image Processing, 16(8):2080–2095, 2007

  11. [11]

    An overview of low-rank matrix recovery from incomplete observations.IEEE Journal of Selected Topics in Signal Processing, 10(4):608–622, 2016

    Mark A Davenport and Justin Romberg. An overview of low-rank matrix recovery from incomplete observations.IEEE Journal of Selected Topics in Signal Processing, 10(4):608–622, 2016

  12. [12]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale.arXiv preprint arXiv:2010.11929, 2020

  13. [13]

    The geometry of algorithms with orthogonality constraints.SIAM journal on Matrix Analysis and Applications, 20(2):303–353, 1998

    Alan Edelman, Tomás A Arias, and Steven T Smith. The geometry of algorithms with orthogonality constraints.SIAM journal on Matrix Analysis and Applications, 20(2):303–353, 1998

  14. [14]

    Compressive sensing in medical imaging.Applied optics, 54(8):C23–C44, 2015

    Christian G Graff and Emil Y Sidky. Compressive sensing in medical imaging.Applied optics, 54(8):C23–C44, 2015

  15. [15]

    Low rank matrix completion via robust alternating minimization in nearly linear time

    Yuzhou Gu, Zhao Song, Junze Yin, and Lichen Zhang. Low rank matrix completion via robust alternating minimization in nearly linear time. InThe Twelfth International Conference on Learning Representations, 2024

  16. [16]

    Accelerating diffusion via compressed sensing: Applications to imaging and finance

    Zhengyi Guo, Jiatu Li, Wenpin Tang, and David Yao. Accelerating diffusion via compressed sensing: Applications to imaging and finance. InNeurIPS 2025 Workshop MLxOR: Mathematical Foundations and Operational Integration of Machine Learning for Uncertainty-Aware Decision- Making, 2025. 12

  17. [17]

    Chapman and Hall/CRC, 2018

    Arjun K Gupta and Daya K Nagar.Matrix variate distributions. Chapman and Hall/CRC, 2018

  18. [18]

    Applications of matrix factorization methods to climate data.Nonlinear Processes in Geophysics Discussions, 2020:1–27, 2020

    Dylan Harries and Terence J O’Kane. Applications of matrix factorization methods to climate data.Nonlinear Processes in Geophysics Discussions, 2020:1–27, 2020

  19. [19]

    Denoising diffusion probabilistic models.Advances in Neural Information Processing Systems, 33:6840–6851, 2020

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in Neural Information Processing Systems, 33:6840–6851, 2020

  20. [20]

    Cambridge University Press, 2012

    Roger A Horn and Charles R Johnson.Matrix Analysis. Cambridge University Press, 2012

  21. [21]

    Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion.The Annals of Statistics, 39(5): 2302–2329, 2011

    Vladimir Koltchinskii, Karim Lounici, and Alexandre B Tsybakov. Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion.The Annals of Statistics, 39(5): 2302–2329, 2011

  22. [22]

    Back to Basics: Let Denoising Generative Models Denoise

    Tianhong Li and Kaiming He. Back to basics: Let denoising generative models denoise.arXiv preprint arXiv:2511.13720, 2025

  23. [23]

    Diffusion models for time- series applications: a survey.Frontiers of Information Technology & Electronic Engineering, 25 (1):19–41, 2024

    Lequan Lin, Zhengkun Li, Ruikun Li, Xuliang Li, and Junbin Gao. Diffusion models for time- series applications: a survey.Frontiers of Information Technology & Electronic Engineering, 25 (1):19–41, 2024

  24. [24]

    Flow matching for generative modeling

    Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matthew Le. Flow matching for generative modeling. InThe Eleventh International Conference on Learning Representations

  25. [25]

    Flow matching for generative modeling

    Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matthew Le. Flow matching for generative modeling. InThe Eleventh International Conference on Learning Representations, 2023

  26. [26]

    Maximum entropy low-rank matrix recovery.IEEE Journal of Selected Topics in Signal Processing, 12(5):886–901, 2018

    Simon Mak and Yao Xie. Maximum entropy low-rank matrix recovery.IEEE Journal of Selected Topics in Signal Processing, 12(5):886–901, 2018

  27. [27]

    Information-guided sampling for low-rank matrix completion.arXiv preprint arXiv:1706.08037, 2017

    Simon Mak, Henry Shaowu Yushi, and Yao Xie. Information-guided sampling for low-rank matrix completion.arXiv preprint arXiv:1706.08037, 2017

  28. [28]

    Generalization of an inequality by Talagrand and links with the logarithmic Sobolev inequality.Journal of Functional Analysis, 173(2):361–400, 2000

    Felix Otto and Cédric Villani. Generalization of an inequality by Talagrand and links with the logarithmic Sobolev inequality.Journal of Functional Analysis, 173(2):361–400, 2000

  29. [29]

    Missdiff: Training diffusion models on tabular data with missing values.arXiv preprint arXiv:2307.00467, 2023

    Yidong Ouyang, Liyan Xie, Chongxuan Li, and Guang Cheng. Missdiff: Training diffusion models on tabular data with missing values.arXiv preprint arXiv:2307.00467, 2023

  30. [30]

    Scalable diffusion models with transformers

    William Peebles and Saining Xie. Scalable diffusion models with transformers. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 4195–4205, 2023

  31. [31]

    Tensor based missing traffic data com- pletion with spatial–temporal correlation.Physica A: Statistical Mechanics and its Applications, 446:54–63, 2016

    Bin Ran, Huachun Tan, Yuankai Wu, and Peter J Jin. Tensor based missing traffic data com- pletion with spatial–temporal correlation.Physica A: Statistical Mechanics and its Applications, 446:54–63, 2016

  32. [32]

    Mcflow: Monte carlo flow models for data imputation

    Trevor W Richardson, Wencheng Wu, Lei Lin, Beilei Xu, and Edgar A Bernal. Mcflow: Monte carlo flow models for data imputation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14205–14214, 2020. 13

  33. [33]

    High-resolution image synthesis with latent diffusion models

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022

  34. [34]

    Denoising Diffusion Implicit Models

    Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models.arXiv preprint arXiv:2010.02502, 2020

  35. [35]

    Cambridge University Press, 2017

    Paul Suetens.Fundamentals of Medical Imaging. Cambridge University Press, 2017

  36. [36]

    Csdi: Conditional score-based diffusion models for probabilistic time series imputation.Advances in Neural Information Processing Systems, 34:24804–24816, 2021

    Yusuke Tashiro, Jiaming Song, Yang Song, and Stefano Ermon. Csdi: Conditional score-based diffusion models for probabilistic time series imputation.Advances in Neural Information Processing Systems, 34:24804–24816, 2021

  37. [37]

    Liouville flow importance sampler.arXiv preprint arXiv:2405.06672, 2024

    Yifeng Tian, Nishant Panda, and Yen Ting Lin. Liouville flow importance sampler.arXiv preprint arXiv:2405.06672, 2024

  38. [38]

    An introduction to matrix concentration inequalities.Foundations and Trends® in Machine Learning, 8(1-2):1–230, 2015

    Joel A Tropp. An introduction to matrix concentration inequalities.Foundations and Trends® in Machine Learning, 8(1-2):1–230, 2015

  39. [39]

    Springer, 2009

    Cédric Villani.Optimal Transport: Old and New, volume 338 ofGrundlehren der mathematischen Wissenschaften. Springer, 2009

  40. [40]

    Spatial modelling and prediction with the spatio-temporal matrix: A study on predicting future settlement growth.Land, 11(8):1174, 2022

    Zhiyuan Wang, Felix Bachofer, Jonas Koehler, Juliane Huth, Thorsten Hoeser, Mattia Mar- concini, Thomas Esch, and Claudia Kuenzer. Spatial modelling and prediction with the spatio-temporal matrix: A study on predicting future settlement growth.Land, 11(8):1174, 2022

  41. [41]

    Annealing flow generative models towards sampling high-dimensional and multi-modal distributions

    Dongze Wu and Yao Xie. Annealing flow generative models towards sampling high-dimensional and multi-modal distributions. InForty-second International Conference on Machine Learning, 2025

  42. [42]

    Flow-based Generative Modeling of Potential Outcomes and Counterfactuals

    Dongze Wu, David I Inouye, and Yao Xie. Po-flow: Flow-based generative models for sampling potential outcomes and counterfactuals.arXiv preprint arXiv:2505.16051, 2025

  43. [43]

    Doflow: Flow-based generative models for interven- tional and counterfactual forecasting on time series

    Dongze Wu, Feng Qiu, and Yao Xie. Doflow: Flow-based generative models for interven- tional and counterfactual forecasting on time series. InInternational Conference on Learning Representations, 2026

  44. [44]

    Local flow matching generative models.IEEE Transactions on Information Theory, 2026

    Chen Xu, Xiuyuan Cheng, and Yao Xie. Local flow matching generative models.IEEE Transactions on Information Theory, 2026. Accepted

  45. [45]

    Bayesian uncertainty quantification for low-rank matrix completion.Bayesian Analysis, 18(2):491–518, 2023

    Henry Shaowu Yuchi, Simon Mak, and Yao Xie. Bayesian uncertainty quantification for low-rank matrix completion.Bayesian Analysis, 18(2):491–518, 2023

  46. [46]

    Diffputer: Empowering diffusion models for missing data imputation.arXiv preprint arXiv:2405.20690, 2024

    Hengrui Zhang, Liancheng Fang, Qitian Wu, and Philip S Yu. Diffputer: Empowering diffusion models for missing data imputation.arXiv preprint arXiv:2405.20690, 2024

  47. [47]

    Diffusion Transformers with Representation Autoencoders

    Boyang Zheng, Nanye Ma, Shengbang Tong, and Saining Xie. Diffusion transformers with representation autoencoders.arXiv preprint arXiv:2510.11690, 2025

  48. [48]

    Zheng and N

    Shuhan Zheng and Nontawat Charoenphakdee. Diffusion models for missing value imputation in tabular data.arXiv preprint arXiv:2210.17128, 2022. 14 A Proofs Proof of Proposition 4.2.Let U∈R m1×R and V∈R m2×R satisfy U ⊤U = IR and V ⊤V = IR. Define the orthogonal projectors PU :=U U ⊤, P V :=V V ⊤. Note thatP ⊤ U PU =P U andP ⊤ V PV =P V. The population loss...