Navigating Gigapixel Pathology Images with Large Multimodal Models

· 2025 · cs.CV · arXiv 2511.19652

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Recent advances in large multimodal models have allowed for the development of interactive chat models that can converse and reason about pathology whole-slide images (WSIs). However, existing slide-level chat systems are often highly specialized, typically compressing WSIs into fixed slide-level embeddings or relying on multi-component pipelines, which can lose multi-scale detail and limit generalizability beyond the target task. We present GIANT (Gigapixel Image Agent for Navigating Tissue), a simple, training-free approach that lets general-purpose multimodal models navigate WSIs on their own, iteratively selecting multi-magnification crops and aggregating evidence over time. To evaluate generalizability in WSI question answering and to promote reproducibility, we introduce MultiPathQA, a benchmark suite spanning five clinical challenges and 934 questions over 868 unique WSIs. This includes a new set of 128 pathologist-authored multiple-choice questions designed to mirror real diagnostic search and multi-scale reasoning. Using GPT-5, GIANT outperforms models specialized for pathology question answering, achieving state-of-the-art performance on four out of five benchmarks.

representative citing papers

How Seemingly Inconsequential Design Choices Dictate Performance of LLMs in Pathology

cs.CV · 2026-06-10 · unverdicted · novelty 6.0

Systematic factorial analysis shows optimized LLM input configurations for pathology WSIs raise GPT-5 performance from 15.1% to 39.5% on TCGA cancer classification and 38.1% to 62.9% on GTEx organ classification, with generalization to held-out data.

citing papers explorer

Showing 1 of 1 citing paper.

How Seemingly Inconsequential Design Choices Dictate Performance of LLMs in Pathology cs.CV · 2026-06-10 · unverdicted · none · ref 2 · internal anchor
Systematic factorial analysis shows optimized LLM input configurations for pathology WSIs raise GPT-5 performance from 15.1% to 39.5% on TCGA cancer classification and 38.1% to 62.9% on GTEx organ classification, with generalization to held-out data.

Navigating Gigapixel Pathology Images with Large Multimodal Models

fields

years

verdicts

representative citing papers

citing papers explorer