arxiv: 2604.27646 · v1 · submitted 2026-04-30 · 🧬 q-bio.CB

Recognition: unknown

Benchmarking virtual cell models for in-the-wild perturbation response

Hao Wu, Kedu Jin, Lei Bai, Ning Ding, Qianhong Wen, Qi Liu, Shuizhou Chen, Siqi Sun, Songming Zhang, XiangYu Wen, Xinjie Mao, Yuqiang Li, Zhangyang Gao

Authors on Pith no claims yet

Pith reviewed 2026-05-07 04:53 UTC · model grok-4.3

classification 🧬 q-bio.CB

keywords virtual cell modelsperturbation responsebenchmarkinggeneralizationrobustnesscellular contextsdrug discoveryevaluation metrics

0 comments

The pith

Virtual cell models show sharply reduced performance when tested on unseen cell contexts and perturbations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Virtual cell models aim to forecast how any cell will respond to a drug or other change, which could speed up discovery of treatments. The paper introduces a new testing framework that applies models to realistic but harder cases: cells the model has never encountered, perturbations never seen in training, and data from entirely different experiments. Under these conditions performance falls well below the levels reported on standard tests, and even simple linear methods only recover broad patterns rather than the precise effects of each perturbation. Different scoring rules also reorder the models, showing that conclusions depend heavily on exactly how success is measured. The result indicates that current models have limited ability to transfer across the variety of real cellular systems.

Core claim

The paper establishes that virtual cell models exhibit markedly reduced performance under strict evaluation conditions involving unseen cell contexts, unseen perturbations, and cross-dataset generalization, compared to standard benchmarks where performance is often overestimated. Models can still identify broad transcriptional changes but fail to capture perturbation-specific details, and naive combination of datasets can worsen results. Evaluation metrics emphasize different biological aspects, leading to inconsistent model rankings.

What carries the argument

A modular benchmarking framework that evaluates models across in-the-wild scenarios of unseen cellular contexts, unseen perturbations, and cross-dataset shifts.

If this is right

Model performance varies strongly with the exact task design and choice of evaluation criteria.
Naive aggregation of multiple datasets can lower rather than raise predictive accuracy.
In unseen-perturbation settings, even linear baselines recover global transcriptional trends but miss fine-grained, perturbation-specific effects.
Different biological metrics produce substantially different rankings among the same set of models.
Current virtual cell models display limited robustness when cellular context changes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Future model training may need explicit mechanisms that enforce generalization across cell types rather than relying on single-context data.
Coarse screening tasks might be handled adequately by simple linear approaches, while detailed mechanistic predictions will require new architectures or data strategies.
Practitioners should validate predictions within each specific biological context instead of assuming transferability across experiments.
The framework could be extended to new modalities such as single-cell data or spatial transcriptomics to test whether the same robustness gaps appear.

Load-bearing premise

The selected in-the-wild test scenarios capture the full complexity and variability of real biological systems and drug-discovery needs.

What would settle it

A virtual cell model that maintains high accuracy when predicting responses in completely new cell types, new perturbations, and across independent datasets would contradict the reported drop in performance.

read the original abstract

Virtual cell (VC) models aim to predict cellular responses to any perturbations in silico and have emerged as a promising approach for drug discovery and precision medicine. Yet, a clear gap still remains: while models routinely reported impressive results on standard benchmarks, it is unclear whether their predictions are truly meaningful in practice. This is mainly due to limitations in current evaluation setups, which are often overly simplified or inconsistent, and do not reflect the complexity and variability of real biological systems. Here, we introduce a standardized and modular benchmarking framework for virtual cell prediction. Our framework evaluates diverse models under in-the-wild challenging scenarios, including unseen cell contexts, unseen perturbations, and cross-dataset generalization, which better reflect practical applications. Our analysis shows that model performance is highly context-dependent and shaped by task design and evaluation criteria. In commonly used setups, performance is often overestimated, and naive dataset aggregation can even reduce performance. When evaluated under more strict conditions, model performance drops markedly, indicating limited robustness to shifts across cellular contexts. In unseen perturbation settings, models including simple linear approaches capture global transcriptional trends but fail to recover fine-grained perturbation-specific effects. In addition, different evaluation metrics focus on different biological properties, leading to substantially different model rankings. Together, our framework provides a more reliable and biologically grounded evaluation, offering clearer guidance for applying virtual cell models in real scenarios.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper packages a modular benchmarking protocol that exposes clear performance drops for virtual cell models on unseen cells, perturbations, and cross-dataset tests, but the splits may still mix in dataset artifacts.

read the letter

The main point is that the authors built a standardized, modular evaluation setup for virtual cell models and ran it on stricter splits than usual: held-out cell contexts, held-out perturbations, and cross-dataset cases. They report that performance falls sharply compared with standard benchmarks, that naive pooling of data can hurt results, and that different metrics produce different model orderings. Simple linear baselines pick up broad trends but miss perturbation-specific signals. This is useful because it directly shows how current reporting can overstate reliability for drug-discovery style applications.

Referee Report

1 major / 3 minor

Summary. The manuscript introduces a standardized, modular benchmarking framework for virtual cell (VC) models. It evaluates diverse models on in-the-wild scenarios (unseen cell contexts, unseen perturbations, cross-dataset generalization) that are intended to better reflect practical applications in drug discovery. The central claims are that standard benchmarks overestimate performance, naive dataset aggregation can reduce performance, performance drops markedly under stricter conditions indicating limited robustness to cellular context shifts, models (including linear baselines) capture global transcriptional trends but fail on fine-grained perturbation-specific effects, and different evaluation metrics emphasize distinct biological properties leading to divergent model rankings.

Significance. If the empirical results hold after clarification of the evaluation splits, the work would be significant for computational biology and systems pharmacology. It supplies concrete, held-out comparisons that expose gaps between current VC model benchmarks and real-world requirements, while highlighting context dependence and metric sensitivity. The absence of circularity or fitted parameters in the evaluation (results are grounded in external benchmark data) is a strength that could help guide more reliable model development and deployment.

major comments (1)

[Methods (scenario construction and data splits)] Methods (scenario construction and data splits): The interpretation that marked performance drops demonstrate limited robustness to shifts across cellular contexts is load-bearing for the abstract and main conclusions. However, the chosen in-the-wild splits may conflate context shifts with dataset-specific confounders (technical batch effects, unmatched perturbation dosages, or cell-type frequency imbalances). Without explicit controls, ablations on split criteria, or batch-correction experiments, the observed drops cannot be unambiguously attributed to robustness deficits rather than artifacts of how the train/test partitions were constructed.

minor comments (3)

[Abstract] Abstract: The phrase 'naive dataset aggregation' is used without a concise definition or reference to the exact procedure; adding one sentence would improve immediate clarity for readers.
[Results (unseen perturbation settings)] Results (unseen perturbation settings): The claim that models 'fail to recover fine-grained perturbation-specific effects' would be strengthened by a quantitative definition of 'fine-grained' (e.g., specific gene sets or effect-size thresholds) and by reporting per-model recovery rates with statistical controls.
Overall: A summary table listing performance metrics (with confidence intervals) across the three in-the-wild regimes and the baseline models would make the 'marked' drops and metric divergence easier to assess at a glance.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. The concern about potential confounders in the in-the-wild splits is well-taken and directly relevant to the robustness claims. We address this point below and commit to targeted additions in the revision to strengthen the attribution of performance drops.

read point-by-point responses

Referee: [Methods (scenario construction and data splits)] Methods (scenario construction and data splits): The interpretation that marked performance drops demonstrate limited robustness to shifts across cellular contexts is load-bearing for the abstract and main conclusions. However, the chosen in-the-wild splits may conflate context shifts with dataset-specific confounders (technical batch effects, unmatched perturbation dosages, or cell-type frequency imbalances). Without explicit controls, ablations on split criteria, or batch-correction experiments, the observed drops cannot be unambiguously attributed to robustness deficits rather than artifacts of how the train/test partitions were constructed.

Authors: We agree that unambiguous attribution of the performance drops requires explicit controls for possible confounders. Our in-the-wild splits are constructed by systematically holding out entire cellular contexts (specific cell lines, tissues, or experimental conditions) that never appear in the training data, while drawing from multiple public datasets (e.g., LINCS, Sci-Plex) with perturbation matching on compound identity and dosage ranges where available. Many source datasets already incorporate upstream batch correction, but we acknowledge that this is not uniform. To directly address the referee's concern, we will add the following in the revised manuscript: (1) ablations that compare context-holdout splits against random or perturbation-only splits to isolate the contribution of cellular context shifts; (2) preprocessing experiments applying batch-correction methods (ComBat, Harmony) to the input expression matrices before model training and re-evaluating the performance drops; and (3) frequency-balanced subsampling of cell types to control for imbalance. These controls will be reported with quantitative results and will clarify whether the observed drops are primarily driven by limited robustness to context shifts or by split-construction artifacts. We believe the core finding—that standard benchmarks overestimate performance and that stricter generalization conditions reveal substantial drops—will be reinforced rather than undermined by these additions. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical benchmarking with external data splits

full rationale

The paper introduces a modular benchmarking framework and reports direct empirical comparisons of virtual cell models on held-out scenarios (unseen cells, unseen perturbations, cross-dataset). No mathematical derivations, equations, or fitted parameters are used to define the target metrics or predictions; performance drops are measured against external benchmark datasets rather than constructed from the evaluation choices themselves. No self-citation load-bearing steps, uniqueness theorems, or ansatz smuggling appear in the derivation chain. The analysis remains self-contained against external benchmarks, consistent with the reader's assessment of score 1.0.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the domain assumption that transcriptomic profiles serve as a sufficient proxy for cellular perturbation responses; no free parameters are introduced or fitted to define the benchmark itself, and no new physical or computational entities are postulated.

axioms (1)

domain assumption Changes in gene expression profiles adequately capture cellular responses to perturbations.
All evaluation metrics and generalization tests are defined on transcriptomic readouts; the framework would lose meaning if this proxy were invalid.

pith-pipeline@v0.9.0 · 5576 in / 1450 out tokens · 110805 ms · 2026-05-07T04:53:22.672606+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

61 extracted references · 3 canonical work pages · 1 internal anchor

[1]

Perturb-seq: dissecting molecular circuits with scalable single-cell rna profiling of pooled genetic screens.cell, 167(7): 1853–1866, 2016

Atray Dixit, Oren Parnas, Biyu Li, Jenny Chen, Charles P Fulco, Livnat Jerby-Arnon, Nemanja D Marjanovic, Danielle Dionne, Tyler Burks, Raktima Raychowdhury, et al. Perturb-seq: dissecting molecular circuits with scalable single-cell rna profiling of pooled genetic screens.cell, 167(7): 1853–1866, 2016

2016
[2]

A multiplexed single- cell crispr screening platform enables systematic dissection of the unfolded protein response

Britt Adamson, Thomas M Norman, Marco Jost, Min Y Cho, James K Nuñez, Yuwen Chen, Jacqueline E Villalta, Luke A Gilbert, Max A Horlbeck, Marco Y Hein, et al. A multiplexed single- cell crispr screening platform enables systematic dissection of the unfolded protein response. Cell, 167(7):1867–1882, 2016

2016
[3]

Pooled crispr screening with single-cell transcriptome readout.Nature methods, 14(3):297–301, 2017

PaulDatlinger,AndréFRendeiro,ChristianSchmidl,ThomasKrausgruber,PeterTraxler,Johanna Klughammer, Linda C Schuster, Amelie Kuchler, Donat Alpar, and Christoph Bock. Pooled crispr screening with single-cell transcriptome readout.Nature methods, 14(3):297–301, 2017

2017
[4]

Massively multiplex chemical transcriptomics at single-cell resolution.Science, 367(6473): 45–51, 2020

Sanjay R Srivatsan, José L McFaline-Figueroa, Vijay Ramani, Lauren Saunders, Junyue Cao, Jonathan Packer, Hannah A Pliner, Dana L Jackson, Riza M Daza, Lena Christiansen, et al. Massively multiplex chemical transcriptomics at single-cell resolution.Science, 367(6473): 45–51, 2020

2020
[5]

Genome-scale crispr-cas9 knockout screening in human cells.Science, 343(6166):84–87, 2014

Ophir Shalem, Neville E Sanjana, Ella Hartenian, Xi Shi, David A Scott, Tarjei S Mikkelsen, Dirk Heckl, Benjamin L Ebert, David E Root, John G Doench, and Feng Zhang. Genome-scale crispr-cas9 knockout screening in human cells.Science, 343(6166):84–87, 2014

2014
[6]

High-content crispr screening.Nature Reviews Methods Primers, 2(1):8, 2022

Christoph Bock, Paul Datlinger, Florence Chardon, Matthew A Coelho, Matthew B Dong, Keith A Lawson, Tian Lu, Laetitia Maroc, Thomas M Norman, Bicna Song, et al. High-content crispr screening.Nature Reviews Methods Primers, 2(1):8, 2022

2022
[7]

Machine learning and statistical methods for clustering single-cell rna-sequencing data.Briefings in bioinformatics, 21(4):1209–1223, 2020

Raphael Petegrosso, Zhuliu Li, and Rui Kuang. Machine learning and statistical methods for clustering single-cell rna-sequencing data.Briefings in bioinformatics, 21(4):1209–1223, 2020

2020
[8]

Dissecting cell identity via network inference and in silico gene perturbation.Nature, 614(7949):742–751, 2023

Kenji Kamimoto, Blerta Stringa, Christy M Hoffmann, Kunal Jindal, Lilianna Solnica-Krezel, and Samantha A Morris. Dissecting cell identity via network inference and in silico gene perturbation.Nature, 614(7949):742–751, 2023

2023
[9]

Applications of single-cell rna sequencing in drug discovery and development.Nature Reviews Drug Discovery, 22(6):496–520, 2023

Bram Van de Sande, Joon Sang Lee, Euphemia Mutasa-Gottgens, Bart Naughton, Wendi Bacon, Jonathan Manning, Yong Wang, Jack Pollard, Melissa Mendez, Jon Hill, et al. Applications of single-cell rna sequencing in drug discovery and development.Nature Reviews Drug Discovery, 22(6):496–520, 2023

2023
[10]

The chemical space project.Accounts of Chemical Research, 48(3):722–730, 2015

Jean-Louis Reymond. The chemical space project.Accounts of Chemical Research, 48(3):722–730, 2015. 22

2015
[11]

Mapping information-rich genotype-phenotype landscapes with genome-scale perturb-seq

Joseph M Replogle, Reuben A Saunders, Angela N Pogson, Jeffrey A Hussmann, Alexander Lenail, Alina Guna, Lauren Mascibroda, Eric J Wagner, Karen Adelman, Gila Lithwick-Yanai, et al. Mapping information-rich genotype-phenotype landscapes with genome-scale perturb-seq. Cell, 185(14):2559–2575, 2022

2022
[12]

A mini-review on perturbation modelling across single-cell omic modalities.Computational and Structural Biotechnology Journal, 23:1886–1896, 2024

George I Gavriilidis, Vasileios Vasileiou, Aspasia Orfanou, Naveed Ishaque, and Fotis Psomopou- los. A mini-review on perturbation modelling across single-cell omic modalities.Computational and Structural Biotechnology Journal, 23:1886–1896, 2024

2024
[13]

Toward a foundation model of causal cell and tissue biology with a perturbation cell and tissue atlas.Cell, 187(17):4520–4545, 2024

Jennifer E Rood, Aleksandra Hupalowska, and Aviv Regev. Toward a foundation model of causal cell and tissue biology with a perturbation cell and tissue atlas.Cell, 187(17):4520–4545, 2024

2024
[14]

Benchmarking algorithms for generalizable single-cell perturbation response prediction.Nature Methods, 23(2):451–464, 2026

Zhiting Wei, Yiheng Wang, Yicheng Gao, Shuguang Wang, Ping Li, Duanmiao Si, Yuli Gao, Siqi Wu, Danlu Li, Kejing Dong, et al. Benchmarking algorithms for generalizable single-cell perturbation response prediction.Nature Methods, 23(2):451–464, 2026

2026
[15]

Perturbench: Benchmarking machine learning models for cellular perturbation analysis.arXiv preprint arXiv:2408.10609,

Yan Wu, Esther Wershof, Sebastian M Schmon, Marcel Nassar, Błażej Osiński, Ridvan Eksi, Zichao Yan, Rory Stark, Kun Zhang, and Thore Graepel. Perturbench: Benchmarking machine learning models for cellular perturbation analysis.arXiv preprint arXiv:2408.10609, 2024

work page arXiv 2024
[16]

Deep-learning-based gene perturbation effect prediction does not yet outperform simple linear baselines.Nature Methods, 22:1657–1661, 2025

Constantin Ahlmann-Eltze, Wolfgang Huber, and Simon Anders. Deep-learning-based gene perturbation effect prediction does not yet outperform simple linear baselines.Nature Methods, 22:1657–1661, 2025

2025
[17]

Simple controls exceed best deep learning algorithms and reveal foundation model effectiveness for predicting genetic perturbations

Daniel R Wong, Abby S Hill, and Rob Moccia. Simple controls exceed best deep learning algorithms and reveal foundation model effectiveness for predicting genetic perturbations. Bioinformatics, 41(6):btaf317, 2025

2025
[18]

scgen predicts single-cell pertur- bation responses.Nature methods, 16(8):715–721, 2019

Mohammad Lotfollahi, F Alexander Wolf, and Fabian J Theis. scgen predicts single-cell pertur- bation responses.Nature methods, 16(8):715–721, 2019

2019
[19]

Deep generative modeling for single-cell transcriptomics.Nature methods, 15(12):1053–1058, 2018

Romain Lopez, Jeffrey Regier, Michael B Cole, Michael I Jordan, and Nir Yosef. Deep generative modeling for single-cell transcriptomics.Nature methods, 15(12):1053–1058, 2018

2018
[20]

Condi- tional out-of-distribution generation for unpaired data using transfer vae.Bioinformatics, 36 (Supplement_2):i610–i617, 2020

Mohammad Lotfollahi, Mohsen Naghipourfar, Fabian J Theis, and F Alexander Wolf. Condi- tional out-of-distribution generation for unpaired data using transfer vae.Bioinformatics, 36 (Supplement_2):i610–i617, 2020

2020
[21]

Predicting cellular responses to complex perturbations in high-throughput screens.Molecular systems biology, 19(6):MSB202211517, 2023

Mohammad Lotfollahi, Anna Klimovskaia Susmelj, Carlo De Donno, Leon Hetzel, Yuge Ji, Ignacio L Ibarra, Sanjay R Srivatsan, Mohsen Naghipourfar, Riza M Daza, Beth Martin, et al. Predicting cellular responses to complex perturbations in high-throughput screens.Molecular systems biology, 19(6):MSB202211517, 2023

2023
[22]

Learning single-cell perturbation responses using neural optimal transport.Nature Methods, 20(11):1759–1768, 2023

Charlotte Bunne, Stefan G Stark, Gabriele Gut, Jacobo Sarabia del Castillo, Mitch Levesque, Kjong-Van Lehmann, Lucas Pelkmans, Andreas Krause, and Gunnar Rätsch. Learning single-cell perturbation responses using neural optimal transport.Nature Methods, 20(11):1759–1768, 2023

2023
[23]

Cellflow enables generative single-cell phenotype modeling with flow matching

Dominik Klein, Jonas Simon Fleck, Dmitrii Bobrovskiy, Leander Zimmermann, Simon Becker, Alessandro Palma, Leander Dony, Alejandro Tejada-Lapuerta, Guillaume Huguet, Hsiu-Chuan Lin, et al. Cellflow enables generative single-cell phenotype modeling with flow matching. bioRxiv, 2025

2025
[24]

Predicting transcriptional outcomes of novel multigene perturbations with gears.Nature Biotechnology, 42(6):927–935, 2024

Yusuf Roohani, Kexin Huang, and Jure Leskovec. Predicting transcriptional outcomes of novel multigene perturbations with gears.Nature Biotechnology, 42(6):927–935, 2024. 23

2024
[25]

Squidiff: predicting cellular development and responses to perturbations using a diffusion model.Nature Methods, 23(1):65–77, 2026

Siyu He, Yuefei Zhu, Daniel Naveed Tavakol, Haotian Ye, Yeh-Hsing Lao, Zixian Zhu, Cong Xu, Shradha Chauhan, Guy Garty, Raju Tomer, et al. Squidiff: predicting cellular development and responses to perturbations using a diffusion model.Nature Methods, 23(1):65–77, 2026

2026
[26]

scgpt: toward building a foundation model for single-cell multi-omics using generative ai.Nature methods, 21(8):1470–1480, 2024

Haotian Cui, Chloe Wang, Hassaan Maan, Kuan Pang, Fengning Luo, Nan Duan, and Bo Wang. scgpt: toward building a foundation model for single-cell multi-omics using generative ai.Nature methods, 21(8):1470–1480, 2024

2024
[27]

Large-scale foundation model on single-cell tran- scriptomics.Nature Methods, 21(8):1481–1491, 2024

Minsheng Hao, Jing Gong, Xin Zeng, Chiming Liu, Yucheng Guo, Xingyi Cheng, Taifeng Wang, Jianzhu Ma, Le Song, and Xuegong Zhang. Large-scale foundation model on single-cell tran- scriptomics.Nature Methods, 21(8):1481–1491, 2024

2024
[28]

Transfer learning enables predictions in network biology.Nature, 618(7965):616–624, 2023

Christina V Theodoris, Ling Xiao, Anant Chopra, Mark D Chaffin, Zeina R Al Sayed, Matthew C Hill, Helene Mantineo, Elizabeth M Brydon, Zexian Zeng, X Shirley Liu, et al. Transfer learning enables predictions in network biology.Nature, 618(7965):616–624, 2023

2023
[29]

scbert as a large-scale pretrained deep language model for cell type annotation of single-cell rna-seq data.Nature Machine Intelligence, 4(10):852–866, 2022

Fan Yang, Wenchuan Wang, Fang Wang, Yuan Fang, Duyu Tang, Junzhou Huang, Hui Lu, and Jianhua Yao. scbert as a large-scale pretrained deep language model for cell type annotation of single-cell rna-seq data.Nature Machine Intelligence, 4(10):852–866, 2022

2022
[30]

Benchmarking foundation cell models for post-perturbation rna-seq prediction.BMC genomics, 26(1):393, 2025

Gerold Csendes, Gema Sanz, Kristóf Z Szalay, and Bence Szalai. Benchmarking foundation cell models for post-perturbation rna-seq prediction.BMC genomics, 26(1):393, 2025

2025
[31]

Perteval-scfm: bench- markingsingle-cellfoundationmodelsforperturbationeffectprediction.BioRxiv, pages2024–10, 2024

Aaron Wenteler, Martina Occhetta, Nikhil Branson, Magdalena Huebner, Victor Curean, WT Dee, WT Connell, Alex Hawkins-Hooker, Siu Pui Chung, Yasha Ektefaie, et al. Perteval-scfm: bench- markingsingle-cellfoundationmodelsforperturbationeffectprediction.BioRxiv, pages2024–10, 2024

2024
[32]

A systematic comparison of computational methods for expression forecasting.bioRxiv, 2023

Eric Kernfeld, Yunxiao Yang, Joshua S Weinstock, Alexis Battle, and Patrick Cahan. A systematic comparison of computational methods for expression forecasting.bioRxiv, 2023

2023
[33]

Single-cell perturbation prediction: generalizing experimental interventions to unseen contexts

Daniel Burkhardt, Andrew Benz, Robrecht Cannoodt, Mauricio Cortes, Scott Gigante, Christo- pher Lance, Richard Lieberman, Malte Luecken, and Angela Pisco. Single-cell perturbation prediction: generalizing experimental interventions to unseen contexts. InNeurIPS Competition Track, 2023

2023
[34]

Benchmarking transcriptomics foundation models for perturbation analysis: one pca still rules them all.arXiv preprint arXiv:2410.13956, 2024

Ihab Bendidi et al. Benchmarking transcriptomics foundation models for perturbation analysis: one pca still rules them all.arXiv preprint arXiv:2410.13956, 2024

work page arXiv 2024
[35]

scperturb: harmonizedsingle-cell perturbation data.Nature Methods, 21(3):531–540, 2024

Stefan Peidli, Tessa D Green, Ciyue Shen, Torsten Gross, Joseph Min, Samuele Garda, Bo Yuan, LinusJSchumacher, JakePTaylor-King, DeboraSMarks, etal. scperturb: harmonizedsingle-cell perturbation data.Nature Methods, 21(3):531–540, 2024

2024
[36]

Multiplexed single-cell transcriptional response profiling to define cancer vulnerabilities and therapeutic mechanism of action.Nature communications, 11(1):4296, 2020

James M McFarland, Brenton R Paolella, Allison Warren, Kathryn Geiger-Schuller, Tsukasa Shibue, Michael Rothberg, Olena Kuksenko, William N Colgan, Andrew Jones, Emily Chambers, et al. Multiplexed single-cell transcriptional response profiling to define cancer vulnerabilities and therapeutic mechanism of action.Nature communications, 11(1):4296, 2020

2020
[37]

Multiplexed droplet single-cell rna-sequencing using natural genetic variation.Nature biotechnology, 36(1): 89–94, 2018

HyunMinKang, MeenaSubramaniam, SashaTarg, MichelleNguyen, LenkaMaliskova, Elizabeth McCarthy, Eunice Wan, Simon Wong, Lauren Byrnes, Cristina M Lanata, et al. Multiplexed droplet single-cell rna-sequencing using natural genetic variation.Nature biotechnology, 36(1): 89–94, 2018. 24

2018
[38]

Exploring genetic interaction manifolds constructed from rich single-cell phenotypes.Science, 365(6455):786–793, 2019

Thomas M Norman, Max A Horlbeck, Joseph M Replogle, Alex Y Ge, Albert Xu, Marco Jost, Luke A Gilbert, and Jonathan S Weissman. Exploring genetic interaction manifolds constructed from rich single-cell phenotypes.Science, 365(6455):786–793, 2019

2019
[39]

A kernel two-sample test.Journal of Machine Learning Research, 13(25):723–773, 2012

Arthur Gretton, Karsten M Borgwardt, Malte J Rasch, Bernhard Schölkopf, and Alexander Smola. A kernel two-sample test.Journal of Machine Learning Research, 13(25):723–773, 2012

2012
[40]

Energy statistics: A class of statistics based on distances

Gábor J Székely and Maria L Rizzo. Energy statistics: A class of statistics based on distances. Journal of Statistical Planning and Inference, 143(8):1249–1272, 2013

2013
[41]

Optimal distance metrics for single-cell rna-seq populations

Yuge Ji, Tessa D Green, Stefan Peidli, Mojtaba Bahrami, Meiqi Liu, Luke Zappia, Karin Hrovatin, Chris Sander, and Fabian J Theis. Optimal distance metrics for single-cell rna-seq populations. bioRxiv, pages 2023–12, 2023

2023
[42]

scpram accurately predicts single- cell gene expression perturbation response based on attention mechanism.Bioinformatics, 40 (5):btae265, 2024

Qun Jiang, Shengquan Chen, Xiaoyang Chen, and Rui Jiang. scpram accurately predicts single- cell gene expression perturbation response based on attention mechanism.Bioinformatics, 40 (5):btae265, 2024

2024
[43]

Benchmarking atlas-level data integration in single-cell genomics.Nature Methods, 19(1):41–50, 2022

Malte D Luecken, Maren Büttner, Kridsadakorn Chaichoompu, Anna Danese, Marta Interlandi, Michaela F Müller, Daniel C Strobl, Luke Zappia, Martin Dugas, Maria Colomé-Tatché, and Fabian J Theis. Benchmarking atlas-level data integration in single-cell genomics.Nature Methods, 19(1):41–50, 2022

2022
[44]

Scaling Laws for Neural Language Models

Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models.arXiv preprint arXiv:2001.08361, 2020

work page internal anchor Pith review arXiv 2001
[45]

Predicting cellular responses to novel drug perturbations at a single-cell resolution.Advances in Neural Information Processing Systems, 35:26711–26722, 2022

Leon Hetzel, Simon Boehm, Niki Kilbertus, Stephan Günnemann, and Fabian Theis. Predicting cellular responses to novel drug perturbations at a single-cell resolution.Advances in Neural Information Processing Systems, 35:26711–26722, 2022

2022
[46]

Comprehensive integra- tion of single-cell data.cell, 177(7):1888–1902, 2019

Tim Stuart, Andrew Butler, Paul Hoffman, Christoph Hafemeister, Efthymia Papalexi, William M Mauck, Yuhan Hao, Marlon Stoeckius, Peter Smibert, and Rahul Satija. Comprehensive integra- tion of single-cell data.cell, 177(7):1888–1902, 2019

1902
[47]

Tahoe-100m: A giga-scale single-cell perturbation atlas for context-dependent gene function and cellular modeling.bioRxiv, 2025

Jesse Zhang, Ariel A Ubas, Roger de Borja, Valentine Svensson, Nathan Thomas, Noopur Thakar, et al. Tahoe-100m: A giga-scale single-cell perturbation atlas for context-dependent gene function and cellular modeling.bioRxiv, 2025

2025
[48]

X-atlas/orion: Genome-wide perturb-seq datasets via a scalable fix- cryopreserve platform for training dose-dependent biological foundation models.bioRxiv, 2025

Alex C Huang et al. X-atlas/orion: Genome-wide perturb-seq datasets via a scalable fix- cryopreserve platform for training dose-dependent biological foundation models.bioRxiv, 2025

2025
[49]

Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C Lawrence Zitnick, Jerry Ma, and Rob Fergus. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences.Proceedings of the National Academy of Sciences, 118(15):e2016239118, 2021

2021
[50]

Scanpy: large-scale single-cell gene expression data analysis.Genome biology, 19(1):15, 2018

F Alexander Wolf, Philipp Angerer, and Fabian J Theis. Scanpy: large-scale single-cell gene expression data analysis.Genome biology, 19(1):15, 2018

2018
[51]

anndata: Annotated data.BioRxiv, pages 2021–12, 2021

Isaac Virshup, Sergei Rybakov, Fabian J Theis, Philipp Angerer, and F Alexander Wolf. anndata: Annotated data.BioRxiv, pages 2021–12, 2021. 25

2021
[52]

Evolutionary-scale prediction of atomic-level protein structure with a language model.Science, 379(6637):1123–1130, 2023

Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Nikita Smetanin, Robert Verkuil, Ori Kabeli, Yaniv Shmueli, et al. Evolutionary-scale prediction of atomic-level protein structure with a language model.Science, 379(6637):1123–1130, 2023

2023
[53]

Extended-connectivity fingerprints.Journal of chemical information and modeling, 50(5):742–754, 2010

David Rogers and Mathew Hahn. Extended-connectivity fingerprints.Journal of chemical information and modeling, 50(5):742–754, 2010

2010
[54]

Disentanglement of single-cell data with biolord.Nature Biotechnology, 42(11):1678–1683, 2024

Zoe Piran, Niv Cohen, Yedid Hoshen, and Mor Nitzan. Disentanglement of single-cell data with biolord.Nature Biotechnology, 42(11):1678–1683, 2024

2024
[55]

Modelling cellular perturbations with the sparse additive mechanism shift variational autoencoder.Advances in Neural Information Processing Systems, 36:1–12, 2023

Michael Bereket and Theofanis Karaletsos. Modelling cellular perturbations with the sparse additive mechanism shift variational autoencoder.Advances in Neural Information Processing Systems, 36:1–12, 2023

2023
[56]

Predicting transcriptional responses to novel chemical perturbations using deep generative model for drug discovery

Xiaoning Qi, Lianhe Zhao, Chenyu Tian, Yueyue Li, Zhen-Lin Chen, Peipei Huo, Runsheng Chen, Xiaodong Liu, Baoping Wan, Shengyong Yang, and Yi Zhao. Predicting transcriptional responses to novel chemical perturbations using deep generative model for drug discovery. Nature Communications, 15(1):9256, 2024

2024
[57]

Modeling and predicting single-cell multi-gene perturbation responses with sclambda.bioRxiv, 2024

Gefei Wang, Tianyu Liu, Jia Zhao, Youshu Cheng, and Hongyu Zhao. Modeling and predicting single-cell multi-gene perturbation responses with sclambda.bioRxiv, 2024

2024
[58]

Genepert: Leveraging genept embeddings for gene perturbation prediction.bioRxiv, 2024

Yiqun T Chen and James Zou. Genepert: Leveraging genept embeddings for gene perturbation prediction.bioRxiv, 2024

2024
[59]

Predicting cellular responses to perturbation across diverse contexts with state.bioRxiv, 2025

Abhinav K Adduri, Dhruv Gautam, Beatrice Bevilacqua, Alishba Imran, Rohan Shah, Mohsen Naghipourfar, Noam Teyssier, Rajesh Ilango, Sanjay Nagaraj, Mingze Dong, et al. Predicting cellular responses to perturbation across diverse contexts with state.bioRxiv, 2025

2025
[60]

Sinkhorn distances: Lightspeed computation of optimal transport.Advances in neural information processing systems, 26, 2013

Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport.Advances in neural information processing systems, 26, 2013

2013
[61]

Interpolating between optimal transport and mmd using sinkhorn divergences

Jean Feydy, Thibault Séjourné, François-Xavier Vialard, Shun-ichi Amari, Alain Trouvé, and Gabriel Peyré. Interpolating between optimal transport and mmd using sinkhorn divergences. InThe 22nd international conference on artificial intelligence and statistics, pages 2681–2690. PMLR, 2019. 26 FigureS1 |HierarchicalclusteringofcelllinesbasedonSTACKembedding...

2019