DALPHIN benchmark finds the pathology-specific AI copilot PathChat+ shows no statistically significant difference from expert pathologists in 4 of 6 tasks, with general models matching in 1-2 tasks, on a diverse open dataset released for ongoing evaluation.
Weishaupt, Drew F
6 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
A new open pipeline and dataset enable training of a vision-language model for whole-slide pathology VQA that outperforms MedGemma on tissue identification, neoplasm detection, and differential diagnosis.
Systematic factorial analysis shows optimized LLM input configurations for pathology WSIs raise GPT-5 performance from 15.1% to 39.5% on TCGA cancer classification and 38.1% to 62.9% on GTEx organ classification, with generalization to held-out data.
PathPocket constructs a 4.55M-entity pathology hypergraph from 110k graded documents and deploys a multi-agent framework that outperforms prior systems on 200k cases while raising pathologist accuracy in user studies.
PathoSage is a three-stage framework using Structured Evidence Deliberation and a Beta-Bernoulli experience system to improve patch-level pathology reasoning by mitigating hallucinations and tool conflicts.
UniReason-Med introduces a unified framework for 2D and 3D medical VQA with shared grounded reasoning, trained on a 220K dataset, claiming that joint 2D+3D supervision improves 3D performance over 3D-only training.
citing papers explorer
-
Democratising Pathology Co-Pilots: An Open Pipeline and Dataset for Whole-Slide Vision-Language Modelling
A new open pipeline and dataset enable training of a vision-language model for whole-slide pathology VQA that outperforms MedGemma on tissue identification, neoplasm detection, and differential diagnosis.