Revisiting 2D Foundation Models for Scalable 3D Medical Image Classification
read the original abstract
3D medical image classification is essential for modern clinical workflows. Medical foundation models (FMs) have emerged as a promising approach for scaling to new tasks, yet current research suffers from three critical pitfalls: data-regime bias, suboptimal adaptation, and insufficient task coverage. In this paper, we address these pitfalls and introduce AnyMC3D, a scalable 3D classifier adapted from 2D FMs. Our method scales efficiently to new tasks by adding only lightweight plugins (about 1M parameters per task) on top of a single frozen backbone. This versatile framework also supports multi-view inputs, auxiliary pixel-level supervision, and interpretable heatmap generation. We establish a comprehensive benchmark of 12 tasks covering diverse pathologies, anatomies, and modalities, and systematically analyze state-of-the-art 3D classification techniques. Our analysis reveals key insights: (1) effective adaptation is essential to unlock FM potential, (2) general-purpose FMs can match medical-specific FMs if properly adapted, and (3) 2D-based methods surpass 3D architectures for 3D classification. For the first time, we demonstrate the feasibility of achieving state-of-the-art performance across diverse applications using a single scalable framework (including 1st place in the VLM3D challenge), eliminating the need for separate task-specific models.
This paper has not been read by Pith yet.
Forward citations
Cited by 2 Pith papers
-
Enhancing Fine-Grained Spatial Grounding in 3D CT Report Generation via Discriminative Guidance
DCP-PD improves macro F1 scores on CT report generation benchmarks and introduces a hierarchical location-aware evaluation protocol that reveals ongoing challenges in pathology spatial grounding.
-
MultiMedVision: Multi-Modal Medical Vision Framework
A unified Sparse Vision Transformer learns joint 2D/3D medical image representations via self-supervision and achieves competitive AUROC on chest X-ray and CT benchmarks with 5x less data than modality-specific models.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.