pith. sign in

arxiv: 2511.02986 · v2 · pith:TU7AOZ6Snew · submitted 2025-11-04 · 📊 stat.ML · cs.LG· q-bio.GN

Scalable Single-Cell Gene Expression Generation with Latent Diffusion Models

classification 📊 stat.ML cs.LGq-bio.GN
keywords expressiongenelatentdatadiffusionsingle-cellgenerationmodel
0
0 comments X
read the original abstract

Computational modeling of single-cell gene expression is crucial for understanding cellular processes, but generating realistic expression profiles remains a major challenge. This difficulty arises from the count nature of gene expression data and complex latent dependencies among genes. Existing generative models often impose artificial gene orderings or rely on shallow neural network architectures. We introduce a scalable latent diffusion model for single-cell gene expression data, which we refer to as scLDM, that respects the fundamental exchangeability property of the data. Our VAE uses fixed-size latent variables leveraging a unified Multi-head Cross-Attention Block (MCAB) architecture, which serves dual roles: permutation-invariant pooling in the encoder and permutation-equivariant unpooling in the decoder. We enhance this framework by replacing the Gaussian prior with a latent diffusion model using Diffusion Transformers and linear interpolants, enabling high-quality generation with multi-conditional classifier-free guidance. We show its superior performance in a variety of experiments for both observational and perturbational single-cell data, as well as downstream tasks like cell-level classification.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. PRiMeFlow: Capturing Complex Expression Heterogeneity in Perturbation Response Modelling

    cs.LG 2026-04 unverdicted novelty 6.0

    PRiMeFlow is a flow-matching model that approximates the full empirical distribution of single-cell gene expression after perturbations.

  2. PRiMeFlow: Capturing Complex Expression Heterogeneity in Perturbation Response Modelling

    cs.LG 2026-04 unverdicted novelty 5.0

    PRiMeFlow applies flow matching in gene expression space with a U-Net velocity field and pretraining-finetuning to model perturbation-induced heterogeneity, showing strong benchmark performance on PerturBench and the ...