Revisiting 2D Foundation Models for Scalable 3D Medical Image Classification

Bogdan Georgescu; Dorin Comaniciu; Eli Gibson; Gengyan Zhao; Han Liu; Jianing Wang; Michael Baumgartner; Riqiang Gao; Sasa Grbic; Yanbo Zhang

arxiv: 2512.12887 · v3 · pith:HLYTUYZQnew · submitted 2025-12-15 · 💻 cs.CV

Revisiting 2D Foundation Models for Scalable 3D Medical Image Classification

Han Liu , Bogdan Georgescu , Yanbo Zhang , Youngjin Yoo , Michael Baumgartner , Riqiang Gao , Jianing Wang , Gengyan Zhao

show 3 more authors

Eli Gibson Dorin Comaniciu Sasa Grbic

This is my paper

classification 💻 cs.CV

keywords classificationmedicalmodelsscalabletasksadaptationadapteddiverse

0 comments

read the original abstract

3D medical image classification is essential for modern clinical workflows. Medical foundation models (FMs) have emerged as a promising approach for scaling to new tasks, yet current research suffers from three critical pitfalls: data-regime bias, suboptimal adaptation, and insufficient task coverage. In this paper, we address these pitfalls and introduce AnyMC3D, a scalable 3D classifier adapted from 2D FMs. Our method scales efficiently to new tasks by adding only lightweight plugins (about 1M parameters per task) on top of a single frozen backbone. This versatile framework also supports multi-view inputs, auxiliary pixel-level supervision, and interpretable heatmap generation. We establish a comprehensive benchmark of 12 tasks covering diverse pathologies, anatomies, and modalities, and systematically analyze state-of-the-art 3D classification techniques. Our analysis reveals key insights: (1) effective adaptation is essential to unlock FM potential, (2) general-purpose FMs can match medical-specific FMs if properly adapted, and (3) 2D-based methods surpass 3D architectures for 3D classification. For the first time, we demonstrate the feasibility of achieving state-of-the-art performance across diverse applications using a single scalable framework (including 1st place in the VLM3D challenge), eliminating the need for separate task-specific models.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Enhancing Fine-Grained Spatial Grounding in 3D CT Report Generation via Discriminative Guidance
cs.CV 2026-04 unverdicted novelty 6.0

DCP-PD improves macro F1 scores on CT report generation benchmarks and introduces a hierarchical location-aware evaluation protocol that reveals ongoing challenges in pathology spatial grounding.
MultiMedVision: Multi-Modal Medical Vision Framework
cs.CV 2026-05 unverdicted novelty 5.0

A unified Sparse Vision Transformer learns joint 2D/3D medical image representations via self-supervision and achieves competitive AUROC on chest X-ray and CT benchmarks with 5x less data than modality-specific models.