Title resolution pending

Workman, K · 2025 · arXiv 2512.21907

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

scBench-Long: Verifiable Benchmarking of Long-Horizon Single-Cell Biology

q-bio.GN · 2026-06-25 · unverdicted · novelty 6.0

scBench-Long is a benchmark with 21 evaluations where the strongest AI model-harness pair succeeds on 25.4% of long-horizon single-cell biology tasks.

TxBench-PP: Analyzing AI Agent Performance on Small-Molecule Preclinical Pharmacology

cs.AI · 2026-06-17 · unverdicted · novelty 6.0

TxBench-PP benchmark shows leading AI agents achieve at most 59% success on tasks requiring recovery of preclinical pharmacology conclusions from assay data.

Verifiable Benchmarking of Long-Horizon Spatial Biology

cs.AI · 2026-05-27 · unverdicted · novelty 6.0

Introduces SpatialBench-Long benchmark with 24 evaluations on spatial biology datasets from PDAC, glioblastoma, lung adenocarcinoma and optic nerve systems, reporting top model performance at 8/72 runs (11.1%).

EpiBench: Verifiable Evaluation of AI Agents on Epigenomics Analysis

cs.AI · 2026-06-11 · unverdicted · novelty 5.0

No tested AI agent system passed a majority of EpiBench tasks, with the best (GPT-5.5 / Pi) succeeding on 45% of 5,088 trajectories.

citing papers explorer

Showing 4 of 4 citing papers after filters.

scBench-Long: Verifiable Benchmarking of Long-Horizon Single-Cell Biology q-bio.GN · 2026-06-25 · unverdicted · none · ref 26
scBench-Long is a benchmark with 21 evaluations where the strongest AI model-harness pair succeeds on 25.4% of long-horizon single-cell biology tasks.
TxBench-PP: Analyzing AI Agent Performance on Small-Molecule Preclinical Pharmacology cs.AI · 2026-06-17 · unverdicted · none · ref 9
TxBench-PP benchmark shows leading AI agents achieve at most 59% success on tasks requiring recovery of preclinical pharmacology conclusions from assay data.
Verifiable Benchmarking of Long-Horizon Spatial Biology cs.AI · 2026-05-27 · unverdicted · none · ref 17
Introduces SpatialBench-Long benchmark with 24 evaluations on spatial biology datasets from PDAC, glioblastoma, lung adenocarcinoma and optic nerve systems, reporting top model performance at 8/72 runs (11.1%).
EpiBench: Verifiable Evaluation of AI Agents on Epigenomics Analysis cs.AI · 2026-06-11 · unverdicted · none · ref 12
No tested AI agent system passed a majority of EpiBench tasks, with the best (GPT-5.5 / Pi) succeeding on 45% of 5,088 trajectories.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer