AssayBench is a new gene-ranking benchmark for phenotypic CRISPR screens that shows zero-shot generalist LLMs outperform both biology-specific LLMs and trainable baselines on adjusted nDCG.
arXiv preprint arXiv:2503.00096 , year=
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3representative citing papers
LLM agents execute scientific tasks but fail to follow core scientific reasoning norms such as evidence consideration and belief revision based on refutations.
BioResearcher is a new multi-agent system that leads baselines on single-step biomedical tests, BixBench, BaisBench, and a 30-query clinical discovery benchmark with 74.7% positive hit rate.
citing papers explorer
-
AssayBench: An Assay-Level Virtual Cell Benchmark for LLMs and Agents
AssayBench is a new gene-ranking benchmark for phenotypic CRISPR screens that shows zero-shot generalist LLMs outperform both biology-specific LLMs and trainable baselines on adjusted nDCG.
-
AI scientists produce results without reasoning scientifically
LLM agents execute scientific tasks but fail to follow core scientific reasoning norms such as evidence consideration and belief revision based on refutations.
-
BioResearcher: Scenario-Guided Multi-Agent for Translational Medicine
BioResearcher is a new multi-agent system that leads baselines on single-step biomedical tests, BixBench, BaisBench, and a 30-query clinical discovery benchmark with 74.7% positive hit rate.