Perturbench: Benchmarking machine learning models for cellular perturbation analysis

Yan Wu, Esther Wershof, Sebastian M Schmon, Marcel Nassar, Bła˙zej Osi´nski, Ridvan Eksi, Zichao Yan, Rory Stark, Kun Zhang, Thore Graepel · 2025 · arXiv 2408.10609

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

representative citing papers

AssayBench: An Assay-Level Virtual Cell Benchmark for LLMs and Agents

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

AssayBench is a new gene-ranking benchmark for phenotypic CRISPR screens that shows zero-shot generalist LLMs outperform both biology-specific LLMs and trainable baselines on adjusted nDCG.

CellxPert: Inference-Time MCMC Steering of a Multi-Omics Single-Cell Foundation Model for In-Silico Perturbation

q-bio.GN · 2026-04-30 · unverdicted · novelty 7.0

CellxPert uses inference-time MCMC steering on a multi-omics single-cell foundation model to predict genome-wide transcriptomic responses to gene perturbations and outperforms baselines on cell-type annotation, perturbation prediction, and multi-omic integration benchmarks.

Unlocking LLM Creativity in Science through Analogical Reasoning

cs.AI · 2026-05-11 · conditional · novelty 6.0

Analogical reasoning increases LLM solution diversity by 90-173% and novelty rate to over 50%, delivering up to 13-fold gains on biomedical tasks including perturbation prediction and cell communication.

Benchmarking virtual cell models for in-the-wild perturbation response

q-bio.CB · 2026-04-30 · conditional · novelty 5.0

A new benchmarking framework shows virtual cell models overestimate performance on standard tests, drop sharply on unseen contexts and perturbations, and produce inconsistent rankings across metrics.

AblateCell: A Reproduce-then-Ablate Agent for Virtual Cell Repositories

cs.AI · 2026-04-21 · unverdicted · novelty 5.0

AblateCell reproduces baselines in three single-cell perturbation repositories with 88.9% success and recovers ground-truth critical components with 93.3% accuracy via closed-loop ablation.

PRiMeFlow: Capturing Complex Expression Heterogeneity in Perturbation Response Modelling

cs.LG · 2026-04-15

citing papers explorer

Showing 6 of 6 citing papers.

AssayBench: An Assay-Level Virtual Cell Benchmark for LLMs and Agents cs.LG · 2026-05-11 · unverdicted · none · ref 45
AssayBench is a new gene-ranking benchmark for phenotypic CRISPR screens that shows zero-shot generalist LLMs outperform both biology-specific LLMs and trainable baselines on adjusted nDCG.
CellxPert: Inference-Time MCMC Steering of a Multi-Omics Single-Cell Foundation Model for In-Silico Perturbation q-bio.GN · 2026-04-30 · unverdicted · none · ref 20
CellxPert uses inference-time MCMC steering on a multi-omics single-cell foundation model to predict genome-wide transcriptomic responses to gene perturbations and outperforms baselines on cell-type annotation, perturbation prediction, and multi-omic integration benchmarks.
Unlocking LLM Creativity in Science through Analogical Reasoning cs.AI · 2026-05-11 · conditional · none · ref 50
Analogical reasoning increases LLM solution diversity by 90-173% and novelty rate to over 50%, delivering up to 13-fold gains on biomedical tasks including perturbation prediction and cell communication.
Benchmarking virtual cell models for in-the-wild perturbation response q-bio.CB · 2026-04-30 · conditional · none · ref 15
A new benchmarking framework shows virtual cell models overestimate performance on standard tests, drop sharply on unseen contexts and perturbations, and produce inconsistent rankings across metrics.
AblateCell: A Reproduce-then-Ablate Agent for Virtual Cell Repositories cs.AI · 2026-04-21 · unverdicted · none · ref 57
AblateCell reproduces baselines in three single-cell perturbation repositories with 88.9% success and recovers ground-truth critical components with 93.3% accuracy via closed-loop ablation.
PRiMeFlow: Capturing Complex Expression Heterogeneity in Perturbation Response Modelling cs.LG · 2026-04-15 · unreviewed · ref 17

Perturbench: Benchmarking machine learning models for cellular perturbation analysis

fields

years

verdicts

representative citing papers

citing papers explorer