Systematic factorial analysis shows optimized LLM input configurations for pathology WSIs raise GPT-5 performance from 15.1% to 39.5% on TCGA cancer classification and 38.1% to 62.9% on GTEx organ classification, with generalization to held-out data.
Navigating Gigapixel Pathology Images with Large Multimodal Models
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
Recent advances in large multimodal models have allowed for the development of interactive chat models that can converse and reason about pathology whole-slide images (WSIs). However, existing slide-level chat systems are often highly specialized, typically compressing WSIs into fixed slide-level embeddings or relying on multi-component pipelines, which can lose multi-scale detail and limit generalizability beyond the target task. We present GIANT (Gigapixel Image Agent for Navigating Tissue), a simple, training-free approach that lets general-purpose multimodal models navigate WSIs on their own, iteratively selecting multi-magnification crops and aggregating evidence over time. To evaluate generalizability in WSI question answering and to promote reproducibility, we introduce MultiPathQA, a benchmark suite spanning five clinical challenges and 934 questions over 868 unique WSIs. This includes a new set of 128 pathologist-authored multiple-choice questions designed to mirror real diagnostic search and multi-scale reasoning. Using GPT-5, GIANT outperforms models specialized for pathology question answering, achieving state-of-the-art performance on four out of five benchmarks.
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
How Seemingly Inconsequential Design Choices Dictate Performance of LLMs in Pathology
Systematic factorial analysis shows optimized LLM input configurations for pathology WSIs raise GPT-5 performance from 15.1% to 39.5% on TCGA cancer classification and 38.1% to 62.9% on GTEx organ classification, with generalization to held-out data.