pith. machine review for the scientific record. sign in

arxiv: 2605.07746 · v1 · submitted 2026-05-08 · 📊 stat.ML · cs.LG· q-bio.QM

Recognition: no theorem link

Flow Matching for Count Data

Ganchao Wei, John Pearson

Pith reviewed 2026-05-11 02:23 UTC · model grok-4.3

classification 📊 stat.ML cs.LGq-bio.QM
keywords flow matchingcount databirth-death processsingle-cell RNA sequencingneural spike trainsgenerative modelsdiscrete datadistribution transport
0
0 comments X

The pith

Count-FM generates higher-quality count data samples than baselines while using substantially fewer parameters by modeling dynamics directly with birth-death processes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes count-FM, a flow-matching method that operates directly on high-dimensional count data using a continuous-time birth-death process with unit jumps. This framework learns conditional transition rates in simulation-free fashion to transport between arbitrary source and target count distributions without converting counts to continuous variables or treating them as categories. A sympathetic reader would care because the approach promises more natural modeling for applications like single-cell RNA sequencing and neural spike trains, where counts represent gene expression or firing rates and large ranges make existing transformations inefficient.

Core claim

Count-FM learns marginal transitions efficiently in count space through simulation-free training of conditional transition rates in a birth-death framework. This enables transport between arbitrary count-distributed populations. In simulation it produces better sample quality than representative baselines while using substantially fewer parameters. Applied to scRNA-seq and neural spike-train data, the same model supports unconditional generation, transport between distributions, and conditional generation, delivering improved sample quality, greater modeling efficiency, and interpretable transport paths.

What carries the argument

continuous-time birth-death process with local unit jumps, which models each count changing by +1 or -1 at stochastic times and carries the flow-matching objective directly in discrete space

If this is right

  • Unconditional generation of realistic count vectors for gene expression or neural activity becomes feasible with reduced parameter counts.
  • Transport maps between successive batches or time points of count data can be learned and visualized as sequences of unit jumps.
  • Conditional generation tasks, such as predicting target counts given partial observations, inherit the same efficiency and path interpretability.
  • The simulation-free training of transition rates scales to high-dimensional count problems where direct simulation of full trajectories would be prohibitive.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The unit-jump birth-death structure might combine with existing continuous flow models to handle mixed continuous-discrete observations in a single framework.
  • Interpretable paths could be used post hoc to identify which features drive transitions in biological count data, offering a form of mechanistic insight without additional supervision.
  • Because the model stays in count space, downstream tasks such as differential expression testing or spike-count statistics can be performed on generated samples without inverse transformations.
  • Efficiency gains from fewer parameters could allow deployment on edge devices for real-time neural decoding from spike trains.

Load-bearing premise

A continuous-time birth-death process with only local unit jumps supplies a sufficient model for the dynamics of high-dimensional count data.

What would settle it

If count-FM applied to held-out scRNA-seq or spike-train datasets produces samples whose empirical distributions match the true data worse than a standard continuous diffusion baseline when both are given equal training budget, the claimed superiority collapses.

Figures

Figures reproduced from arXiv: 2605.07746 by Ganchao Wei, John Pearson.

Figure 1
Figure 1. Figure 1: summarizes the P12-to-P35 transport results. In panel A, generated trajectories move smoothly from the P12 cells toward the P35 cells, with colors indicating cell lineage. In panel B, the inferred terminal states are largely lineage-consistent. Neuroblast and granule immature cells predominantly map to granule immature or granule mature states, while mature side populations such as astrocytes, endothelial,… view at source ↗
Figure 2
Figure 2. Figure 2: shows representative results for location-modulated neurons from these regions. Many recorded neurons in hippocampal and entorhinal regions are silent or only weakly location-specific, including many interneuron-like units [Thompson and Best, 1989, Epsztein et al., 2011, Hangya et al., 2010], so we display a subset of neurons from each region to make the spatial response pattern visible. In the mean-respon… view at source ↗
Figure 3
Figure 3. Figure 3: Representative intermediate samples from the native sampling trajectories of different models in the simulation. Red points show target samples and blue points show generated samples. Columns correspond to increasing native sampling time or sampling step within each method’s own generation procedure. The heatmaps in [PITH_FULL_IMAGE:figures/full_fig_p017_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Marginal bridge distributions for both coordinates. They are shown under a common progress variable s. Intermediate marginals are estimated by Monte Carlo sampling from each model’s bridge or forward noising law after mapping to the common progress scale. count-FM evolves gradually across s, while categorical-state baselines become target-like early. count-FM-OT uses the same bridge as count-FM but with OT… view at source ↗
Figure 5
Figure 5. Figure 5: compares the sampling efficiency of count-FM and count-FM-OT in both the simulation and scRNA experiments. We plot W2 and MMD2 RBF against the number of function evaluations (NFE) and wall-clock runtime. In the simulation, count-FM-OT shows a clear efficiency advantage across most computational budgets. On the scRNA task, the advantage is smaller but still visible at lower budgets, while the two methods be… view at source ↗
Figure 6
Figure 6. Figure 6: Transition snapshots for training. As time increases, generated cells move progressively from the P12 source manifold toward the P35 target manifold in PCA space. Light colored points show the source and target references, gray curves show trajectory tails, and black points show the current generated snapshot. PC1 10 5 0 5 10 PC2 t=0.0 (step 0/500) PC1 PC2 t=0.2 (step 100/500) PC1 PC2 t=0.4 (step 200/500) … view at source ↗
Figure 7
Figure 7. Figure 7: Transition snapshots for testing. The same progressive transport pattern is observed on held-out cells, with trajectories moving smoothly from the P12 manifold toward the P35 manifold. D.3 Conditional generation of piriform cortex spike trains We also evaluate count-FM on conditional generation of multichannel piriform cortex (PCx) spike trains using the public odor-response dataset of Bolding and Franks [… view at source ↗
Figure 8
Figure 8. Figure 8: Mean-response comparison for the hc-3 conditional-generation task. Each panel shows bin-wise mean counts across signed position, with neurons grouped by region. For generative models, summaries are estimated from 100 generated samples per held-out covariate. The MLP mean regressor, Poisson MLP, and count-FM with w = 1 recover the broad spatial response pattern. Count-FM with w = 0 lacks spatial specificity… view at source ↗
Figure 9
Figure 9. Figure 9: Population correlation comparison for the hc-3 conditional-generation task. Panels show correlation matrices for the active-neuron set. These are estimated from 100 generated samples per held-out covariate. Count-FM with w = 1 most closely matches the true dependence structure, while the Poisson MLP underestimates population correlations. Count-FM with w = 0 still captures some marginal dependence structur… view at source ↗
Figure 10
Figure 10. Figure 10: Conditional generation on held-out odor-3 trials. Upper panel shows conditioning covariates over time, with gray traces for individual held-out trials and red traces for trial averages. Lower panel shows neuron-by-time mean responses for the held-out data, the MLP mean regressor, Poisson MLP, and count-FM with w = 1 and w = 2. The MLP mean regressor is overly smooth and Poisson MLP remains diffuse, while … view at source ↗
Figure 11
Figure 11. Figure 11: Generated responses under the same held-out covariate trajectory. We compare the true held-out trial with the mean response from the Poisson MLP regressor over 24 repetitions, and with the count-FM mean responses over 24 repetitions at w = 1 and w = 2. Compared with the Poisson MLP regressor, count-FM produces a more structured condition-aligned response. Increasing guidance from w = 1 to w = 2 sharpens t… view at source ↗
read the original abstract

High-dimensional count data arise in applications such as single-cell RNA sequencing and neural spike trains, where mapping between distributions across successive batches or time points form critical components of data analysis. The recent success of diffusion- and flow-based deep generative models for images, video, and text motivates extending these ideas to count-valued settings, but many existing methods either treat each count as a categorical state or transform counts into a continuous space, neither of which is natural or efficient when the count range is large. We propose count-FM, a flow-matching framework for count data based on a continuous-time birth-death process with local unit jumps. Count-FM learns marginal transitions efficiently in count space through simulation-free training of conditional transition rates, allowing transport between arbitrary count-distributed source and target populations. In simulation, count-FM achieves better sample quality than representative baselines while using substantially fewer parameters. We further apply count-FM to scRNA-seq and neural spike-train data for unconditional generation, transport, and conditional generation. Across these tasks, count-FM yields improved sample quality, greater modeling efficiency, and interpretable transport paths.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes count-FM, a flow-matching framework for high-dimensional count data that models dynamics via a continuous-time birth-death process restricted to local unit jumps per coordinate. It introduces simulation-free training of conditional transition rates to enable transport between arbitrary count distributions, and reports superior sample quality and parameter efficiency versus baselines in simulations, plus applications to unconditional/conditional generation and transport on scRNA-seq and neural spike-train data.

Significance. If the central modeling assumptions hold, count-FM offers a natural extension of flow matching to count-valued data without continuous transformations or categorical treatments, which is valuable for single-cell and neuroscience applications. The simulation-free objective and reported gains in modeling efficiency and interpretable paths represent concrete strengths over diffusion-style alternatives that require simulation.

major comments (2)
  1. [§2] §2 (Birth-death process definition): the framework restricts dynamics to independent per-coordinate unit jumps. For applications like scRNA-seq (co-regulated genes) and spike trains (population coding), where strong cross-dimensional correlations are expected, no analysis or theorem is given showing that learned marginal rates compose to accurate joint distributions; this assumption is load-bearing for the sample-quality and transport claims.
  2. [Experiments] Experiments section (simulation and real-data results): the abstract and main claims assert better sample quality with fewer parameters, but no quantitative metrics (e.g., MMD, log-likelihood, or Wasserstein distances), baseline implementation details, error bars, or data-exclusion criteria are referenced in the provided text; without these, the performance advantage cannot be verified as load-bearing evidence.
minor comments (2)
  1. [§3] Notation for the conditional transition rates could be clarified with an explicit equation linking the simulation-free objective to the birth-death generator.
  2. [Figures] Figure captions for transport paths should specify how interpretability is quantified or visualized (e.g., via marginal trajectories or correlation preservation).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment below and outline the changes we will make in revision.

read point-by-point responses
  1. Referee: [§2] §2 (Birth-death process definition): the framework restricts dynamics to independent per-coordinate unit jumps. For applications like scRNA-seq (co-regulated genes) and spike trains (population coding), where strong cross-dimensional correlations are expected, no analysis or theorem is given showing that learned marginal rates compose to accurate joint distributions; this assumption is load-bearing for the sample-quality and transport claims.

    Authors: We thank the referee for highlighting this modeling choice. The birth-death process indeed restricts transitions to unit jumps in a single coordinate at a time, but the transition rates for each such jump are explicitly functions of the full current state vector. This state dependence allows the rates to encode cross-coordinate correlations (e.g., co-regulation in genes or population coding in spikes). The simulation-free training objective learns conditional transition rates that are consistent with the joint transport between source and target distributions, analogous to conditional flow matching in the continuous case. While we do not supply a formal theorem proving exact marginal-to-joint composition, the framework is constructed so that the learned rates define a valid joint Markov process. We will add a clarifying paragraph in §2 explaining the role of state-dependent rates and their empirical validation on correlated real-world data. revision: partial

  2. Referee: [Experiments] Experiments section (simulation and real-data results): the abstract and main claims assert better sample quality with fewer parameters, but no quantitative metrics (e.g., MMD, log-likelihood, or Wasserstein distances), baseline implementation details, error bars, or data-exclusion criteria are referenced in the provided text; without these, the performance advantage cannot be verified as load-bearing evidence.

    Authors: We agree that the current text would be strengthened by explicit quantitative metrics and experimental details. Although comparative figures and supplementary results support the claims of improved sample quality and parameter efficiency, we acknowledge that MMD, Wasserstein distances, log-likelihood values, error bars from multiple runs, baseline implementation specifics, and data-exclusion criteria are not directly referenced in the main body. In the revised manuscript we will add these quantitative evaluations, report means and standard deviations across repeated experiments, expand the baseline descriptions, and include a clear data-processing section. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained from birth-death process definition

full rationale

The paper introduces count-FM by defining a continuous-time birth-death process with local unit jumps as the underlying dynamics for count data, then derives a simulation-free training objective for conditional transition rates directly from this process. No step reduces a claimed prediction or result to a fitted parameter or self-citation by construction; the marginal transition learning and transport paths follow from the process equations without tautological equivalence to inputs. The framework remains independent of the target data distributions and does not invoke prior author work to force uniqueness or ansatz choices.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that birth-death processes with unit jumps are appropriate for count data; no free parameters or invented entities are described in the abstract.

axioms (1)
  • domain assumption A continuous-time birth-death process with local unit jumps is a suitable model for transitions between count distributions.
    The entire count-FM framework is built on learning conditional transition rates under this process.

pith-pipeline@v0.9.0 · 5483 in / 1212 out tokens · 34217 ms · 2026-05-11T02:23:20.804568+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages · 2 internal anchors

  1. [1]

    Sanjukta Bhattacharya, Christian Gensbigler, Shaamil Karim, and Jon Lees

    URLhttps://openreview.net/forum?id=h7-XixPCAL. Sanjukta Bhattacharya, Christian Gensbigler, Shaamil Karim, and Jon Lees. Discrete diffusion for single-cell gene expression modeling. InICLR 2026 Workshop on Machine Learning for Genomics Explorations,

  2. [2]

    doi: 10.7554/eLife.22630

    ISSN 2050-084X. doi: 10.7554/eLife.22630. URL https://doi.org/10.7554/eLife.22630. Kevin A Bolding, Shivathmihai Nagappan, Bao-Xia Han, Fan Wang, and Kevin M Franks. Recurrent circuitry is required to stabilize piriform cortex odor representations across brain states.eLife, 9: e53125, jul

  3. [3]

    doi: 10.7554/eLife.53125

    ISSN 2050-084X. doi: 10.7554/eLife.53125. URL https://doi.org/10. 7554/eLife.53125. Andrew Campbell, Joe Benton, Valentin De Bortoli, Tom Rainforth, George Deligiannidis, and Arnaud Doucet. A continuous time framework for discrete denoising models. InAdvances in Neural Information Processing Systems,

  4. [4]

    URL https://doi.org/ 10.1186/s13059-023-03148-9

    doi: 10.1186/s13059-023-03148-9. URL https://doi.org/ 10.1186/s13059-023-03148-9. Jérôme Epsztein, Michael Brecht, and Albert K. Lee. Intracellular determinants of hippocampal ca1 place and silent cell activity in a novel environment.Neuron, 70(1):109–120,

  5. [5]

    URLhttps://doi.org/10.1016/j.neuron.2011.03.006

    doi: 10.1016/j.neuron.2011.03.006. URLhttps://doi.org/10.1016/j.neuron.2011.03.006. Kevin Frans, Danijar Hafner, Sergey Levine, and Pieter Abbeel. One step diffusion via shortcut models. InThe Thirteenth International Conference on Learning Representations,

  6. [6]

    Arthur Gretton, Karsten M

    URL https://proceedings.neurips.cc/paper_files/paper/2014/file/ f033ed80deb0234979a61f95710dbe25-Paper.pdf. Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Schölkopf, and Alexander Smola. A kernel two-sample test.Journal of Machine Learning Research, 13(25):723–773,

  7. [7]

    2010.194274

    doi: 10.1113/jphysiol. 2010.194274. URLhttps://doi.org/10.1113/jphysiol.2010.194274. Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance. InNeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications,

  8. [8]

    Asger Hobolth and Eric A

    URL https://proceedings.neurips.cc/paper_files/paper/2020/ file/4c5bcfec8584af0d967f1ab10179ca4b-Paper.pdf. Asger Hobolth and Eric A. Stone. Simulation from endpoint-conditioned, continuous-time Markov chains on a finite state space, with applications to molecular evolution.The Annals of Applied Statistics, 3(3):1204 – 1231,

  9. [9]

    URL https://doi.org/10

    doi: 10.1214/09-AOAS247. URL https://doi.org/10. 1214/09-AOAS247. Hannah Hochgerner, Amit Zeisel, Peter Lönnerberg, and Sten Linnarsson. Conserved properties of dentate gyrus neurogenesis across postnatal development revealed by single-cell RNA sequencing. Nature Neuroscience, 21(2):290–299, February

  10. [10]

    Peter Holderrieth, Marton Havasi, Jason Yim, Neta Shaul, Itai Gat, Tommi Jaakkola, Brian Karrer, Ricky T

    doi: 10.1038/s41593-017-0056-2. Peter Holderrieth, Marton Havasi, Jason Yim, Neta Shaul, Itai Gat, Tommi Jaakkola, Brian Karrer, Ricky T. Q. Chen, and Yaron Lipman. Generator matching: Generative modeling with arbitrary markov processes. InThe Thirteenth International Conference on Learning Representations,

  11. [11]

    Emiel Hoogeboom, Taco Cohen, and Jakub Mikolaj Tomczak

    URL https://proceedings.neurips.cc/paper_files/paper/ 2019/file/9e9a30b74c49d07d8150c8c83b1ccf07-Paper.pdf. Emiel Hoogeboom, Taco Cohen, and Jakub Mikolaj Tomczak. Learning discrete distributions by dequantization. InThird Symposium on Advances in Approximate Bayesian Inference,

  12. [12]

    Auto-Encoding Variational Bayes

    URL https: //arxiv.org/abs/1312.6114. Durk P Kingma and Prafulla Dhariwal. Glow: Generative flow with invertible 1x1 convolu- tions. InAdvances in Neural Information Processing Systems, volume

  13. [13]

    Marius Lange, V olker Bergen, Michal Klein, Manu Setty, Bernhard Reuter, Mostafa Bakhti, Heiko Lickert, Meshal Ansari, Janine Schniering, Herbert B

    URL https://proceedings.neurips.cc/paper_files/paper/2018/file/ d139db6a236200b21cc7f752979132d0-Paper.pdf. Marius Lange, V olker Bergen, Michal Klein, Manu Setty, Bernhard Reuter, Mostafa Bakhti, Heiko Lickert, Meshal Ansari, Janine Schniering, Herbert B. Schiller, Dana Pe’er, and Fabian J. Theis. Cellrank for directed single-cell fate mapping.Nature Met...

  14. [14]

    URL https://doi.org/10.1038/s41592-021-01346-6

    doi: 10.1038/s41592-021-01346-6. URL https://doi.org/10.1038/s41592-021-01346-6 . Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matthew Le. Flow matching for generative modeling. InThe Eleventh International Conference on Learning Repre- sentations,

  15. [15]

    Aaron Lou, Chenlin Meng, and Stefano Ermon

    doi: 10.1038/s41592-018-0229-2. Aaron Lou, Chenlin Meng, and Stefano Ermon. Discrete diffusion modeling by estimating the ratios of the data distribution. InForty-first International Conference on Machine Learning,

  16. [16]

    doi: 10.1093/bioinformatics/btae518

    ISSN 1367-4811. doi: 10.1093/bioinformatics/btae518. URL https://doi.org/10.1093/ bioinformatics/btae518. Youzhi Luo, Keqiang Yan, and Shuiwang Ji. Graphdf: A discrete flow model for molecular graph generation. InProceedings of the 38th International Conference on Machine Learning, volume 139 ofProceedings of Machine Learning Research, pages 7192–7203. PM...

  17. [18]

    URLhttps://doi.org/10.1007/BF00237147

    doi: 10.1007/BF00237147. URLhttps://doi.org/10.1007/BF00237147. Keiji Miura, Zachary F. Mainen, and Naoshige Uchida. Odor representations in olfactory cortex: Distributed rate coding and decorrelated population activity.Neuron, 74(6):1087–1098, June

  18. [19]

    URL https://doi.org/10.1016/j.neuron.2012.04

    doi: 10.1016/j.neuron.2012.04.021. URL https://doi.org/10.1016/j.neuron.2012.04

  19. [20]

    Giovanni Palla, Sudarshan Babu, Payam Dibaeinia, Donghui Li, Aly A Khan, Theofanis Karaletsos, and Jakub M

    doi: 10.12688/f1000research.3895.2. Giovanni Palla, Sudarshan Babu, Payam Dibaeinia, Donghui Li, Aly A Khan, Theofanis Karaletsos, and Jakub M. Tomczak. A scalable latent diffusion model for single-cell gene expression data. In NeurIPS 2025 Workshop on AI Virtual Cells and Instruments: A New Era in Drug Discovery and Development,

  20. [21]

    URLhttps://arxiv.org/abs/2604.09784. Geoffrey Schiebinger, Jian Shu, Marcin Tabaka, Brian Cleary, Vidya Subramanian, Aryeh Solomon, Joshua Gould, Siyan Liu, Stacie Lin, Peter Berube, Lia Lee, Jenny Chen, Justin Brumbaugh, Philippe Rigollet, Konrad Hochedlinger, Rudolf Jaenisch, Aviv Regev, and Eric S. Lander. Optimal-transport analysis of single-cell gene...

  21. [22]

    doi: https://doi.org/10

    ISSN 0092-8674. doi: https://doi.org/10. 1016/j.cell.2019.01.006. URL https://www.sciencedirect.com/science/article/pii/ S009286741930039X. Marta Skreta, Tara Akhound-Sadegh, Viktor Ohanesian, Roberto Bondesan, Alan Aspuru-Guzik, Ar- naud Doucet, Rob Brekelmans, Alexander Tong, and Kirill Neklyudov. Feynman-kac correctors in diffusion: Annealing, guidance...

  22. [23]

    09-07-02382.1989

    doi: 10.1523/JNEUROSCI. 09-07-02382.1989. URLhttps://www.jneurosci.org/content/9/7/2382. Alexander Tong, Kilian FATRAS, Nikolay Malkin, Guillaume Huguet, Yanlei Zhang, Jarrid Rector- Brooks, Guy Wolf, and Yoshua Bengio. Improving and generalizing flow-based generative models with minibatch optimal transport.Transactions on Machine Learning Research,

  23. [24]

    We also describe the discretization used for sampling in Section 2.2

    We first derive the conditional bridge rates, then derive the local rate-matching objective and show how the local KL terms integrate to the path-space KL objective. We also describe the discretization used for sampling in Section 2.2. A.1 Derivation of conditional bridge rates This subsection derives the conditional birth and death rates used in Section ...

  24. [25]

    trajectory tail current snapshot Figure 7:Transition snapshots for testing.The same progressive transport pattern is observed on held-out cells, with trajectories moving smoothly from the P12 manifold toward the P35 manifold. D.3 Conditional generation of piriform cortex spike trains We also evaluate count-FM on conditional generation of multichannel piri...