AssayBench is a new gene-ranking benchmark for phenotypic CRISPR screens that shows zero-shot generalist LLMs outperform both biology-specific LLMs and trainable baselines on adjusted nDCG.
Perturbench: Benchmarking machine learning models for cellular perturbation analysis
6 Pith papers cite this work. Polarity classification is still indexing.
years
2026 6representative citing papers
CellxPert uses inference-time MCMC steering on a multi-omics single-cell foundation model to predict genome-wide transcriptomic responses to gene perturbations and outperforms baselines on cell-type annotation, perturbation prediction, and multi-omic integration benchmarks.
Analogical reasoning increases LLM solution diversity by 90-173% and novelty rate to over 50%, delivering up to 13-fold gains on biomedical tasks including perturbation prediction and cell communication.
A new benchmarking framework shows virtual cell models overestimate performance on standard tests, drop sharply on unseen contexts and perturbations, and produce inconsistent rankings across metrics.
AblateCell reproduces baselines in three single-cell perturbation repositories with 88.9% success and recovers ground-truth critical components with 93.3% accuracy via closed-loop ablation.
citing papers explorer
-
AssayBench: An Assay-Level Virtual Cell Benchmark for LLMs and Agents
AssayBench is a new gene-ranking benchmark for phenotypic CRISPR screens that shows zero-shot generalist LLMs outperform both biology-specific LLMs and trainable baselines on adjusted nDCG.
-
CellxPert: Inference-Time MCMC Steering of a Multi-Omics Single-Cell Foundation Model for In-Silico Perturbation
CellxPert uses inference-time MCMC steering on a multi-omics single-cell foundation model to predict genome-wide transcriptomic responses to gene perturbations and outperforms baselines on cell-type annotation, perturbation prediction, and multi-omic integration benchmarks.
-
Unlocking LLM Creativity in Science through Analogical Reasoning
Analogical reasoning increases LLM solution diversity by 90-173% and novelty rate to over 50%, delivering up to 13-fold gains on biomedical tasks including perturbation prediction and cell communication.
-
Benchmarking virtual cell models for in-the-wild perturbation response
A new benchmarking framework shows virtual cell models overestimate performance on standard tests, drop sharply on unseen contexts and perturbations, and produce inconsistent rankings across metrics.
-
AblateCell: A Reproduce-then-Ablate Agent for Virtual Cell Repositories
AblateCell reproduces baselines in three single-cell perturbation repositories with 88.9% success and recovers ground-truth critical components with 93.3% accuracy via closed-loop ablation.
- PRiMeFlow: Capturing Complex Expression Heterogeneity in Perturbation Response Modelling