A Benchmark of (MRI-) Foundation Models to Predict IDH Mutational Status in Glioma

Efthymios Georgiou; Ekin Ermis; Elise Robinson; Nathan Hollet; Sarah Br\"uningk; Uri Nahum

arxiv: 2606.23172 · v1 · pith:N2PILTKKnew · submitted 2026-06-22 · 📡 eess.IV

A Benchmark of (MRI-) Foundation Models to Predict IDH Mutational Status in Glioma

Nathan Hollet , Elise Robinson , Efthymios Georgiou , Ekin Ermis , Uri Nahum , Sarah Br\"uningk This is my paper

Pith reviewed 2026-06-26 06:28 UTC · model grok-4.3

classification 📡 eess.IV

keywords gliomaIDH mutationMRIfoundation modelsradiomicsTabPFNmolecular predictionmodel benchmarking

0 comments

The pith

Tabular foundation models on radiomic features match or exceed image foundation models for IDH prediction from glioma MRI.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper benchmarks four image-based foundation models against radiomics-based tabular models for non-invasive prediction of IDH mutational status in glioma using FLAIR and post-contrast T1 MRI. Within individual cohorts, the tabular model TabPFN on radiomic features achieved the highest AUROC of 0.92 and best calibration. Among image encoders, BiomedCLIP performed best at 0.85 AUROC while MRI-specific models lagged. Cross-cohort and external post-treatment evaluation revealed performance drops with varying sensitivity to distribution shifts, where image models sometimes complemented the tabular baseline. The results highlight that representation type and clinical context together determine which approach works best for reliable molecular status prediction from routine scans.

Core claim

Representation modality and evaluation context critically influence foundation-model performance in MRI-based molecular prediction. Tabular foundation models on radiomic features provide a strong, well-calibrated baseline, while image foundation models may offer complementary value under clinically distinct distribution shifts.

What carries the argument

Benchmark of image foundation models (BrainIAC, MRI-CORE, BiomedCLIP, BrainDINO) versus radiomics TabPFN and logistic regression for IDH mutation prediction across four public glioma cohorts plus one external post-treatment cohort, measuring AUROC, AUPRC, and calibration error.

If this is right

TabPFN on radiomics delivers 0.92 mean AUROC and 0.07 ECE within cohorts, establishing it as the strongest baseline.
BiomedCLIP reaches the highest external-cohort AUROC of 0.74, suggesting image encoders can retain utility when prevalence or treatment status changes.
AUPRC degrades more than AUROC under cross-cohort prevalence shifts, indicating prevalence-aware evaluation is required.
MRI-specific pre-trained encoders consistently underperform general vision-language models like BiomedCLIP on this task.
Calibration remains superior for the tabular model even when AUROC is comparable, affecting downstream clinical probability use.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Hybrid models that fuse radiomic tabular features with image embeddings could capture both the calibration strength and shift robustness observed here.
The observed underperformance of MRI-specific encoders may indicate that current pre-training objectives or data scales are insufficient for molecular-status tasks and warrant targeted re-examination.
Because calibration differs markedly by modality, clinical deployment pipelines may need modality-specific uncertainty thresholds rather than a single model selection rule.
Extending the benchmark to include longitudinal or multi-sequence inputs could test whether the current modality ranking persists when more imaging context is available.

Load-bearing premise

The public glioma cohorts and external post-treatment cohort are representative enough of clinical distributions to support conclusions about generalization and the relative value of different model types.

What would settle it

On a new large multi-center prospective clinical dataset, if image foundation models show no AUROC advantage or complementary value over TabPFN under any measured distribution shift, the claim that they can offer value in distinct clinical contexts would be refuted.

Figures

Figures reproduced from arXiv: 2606.23172 by Efthymios Georgiou, Ekin Ermis, Elise Robinson, Nathan Hollet, Sarah Br\"uningk, Uri Nahum.

**Figure 1.** Figure 1: Schematic overview. Each scan (FLAIR and post-contrast T1) feeds three parallel feature extraction pipelines. (A) 2D image foundation models (MRI-CORE, BiomedCLIP, BrainDINO)Three axial slices at the 25th/50th/75th percentiles of the tumor mask pass independently through a frozen encoder, with CLS tokens concatenated. (B) 3D image foundation model (BrainIAC): the full volume passes through a frozen encod… view at source ↗

**Figure 2.** Figure 2: Cross-cohort AUROC (mean ± std). Rows are training cohorts, columns evaluation cohorts; the boxed diagonal cells are the within-cohort numbers. We report performance across the six approaches using K = 5 paired runs per cohort, as mean ± std throughout. We report AUROC throughout; AUPRC was evaluated identically and follows the same model ordering except where noted. Within-cohort performance. The boxed d… view at source ↗

read the original abstract

Non-invasive prediction of glioma molecular status from routine magnetic resonance imaging (MRI) has shown promising performance, but model generalization remains challenging given small-scale matched imaging-genomic datasets. Foundation models may address this bottleneck, but a comprehensive benchmark is needed to establish the impact of diverse architectures, pre-training domains, and objectives. Given the use case of isocitrate dehydrogenase (IDH) mutation prediction from FLAIR and post-contrast T1 MRIs, we compared four image-based foundation models, BrainIAC, MRI-CORE, BiomedCLIP, and BrainDINO, against radiomics-based TabPFN and logistic regression baselines. Prediction performance and calibration were assessed across four public adult glioma cohorts and an external post-treatment cohort. Within-cohort, TabPFN matched or outperformed all visual encoders, achieving 0.92 (0.03) AUROC and 0.74 (0.17) AUPRC (mean (SD) across all datasets). Among visual encoders, BiomedCLIP performed best (0.85 (0.08) AUROC), with BrainDINO competitive (0.82 (0.09) AUROC), while MRI-specific encoders (BrainIAC, MRI-CORE) consistently underperformed. Cross-cohort transfer showed moderate AUROC degradation but stronger AUPRC sensitivity to prevalence shifts. On the external cohort, BiomedCLIP achieved the highest AUROC (0.74 (0.07)), whereas TabPFN provided superior calibration (Expected Calibration Error 0.07 (0.01)). These results indicate that representation modality and evaluation context critically influence foundation-model performance in MRI-based molecular prediction. Tabular foundation models on radiomic features provide a strong, well-calibrated baseline, while image foundation models may offer complementary value under clinically distinct distribution shifts. Code available at https://github.com/nathanhollet/idh-status-prediction

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TabPFN on radiomics matches or beats the tested image foundation models for IDH prediction within cohorts, with a partial reversal on the external set that the paper attributes to distribution shift.

read the letter

The main point is that this benchmark finds radiomic features fed to TabPFN competitive with or better than four image foundation models on IDH status prediction from FLAIR and T1 MRI, reaching 0.92 AUROC within the public cohorts while BiomedCLIP leads the visual encoders at 0.85. On the external post-treatment cohort the ranking flips on AUROC but TabPFN keeps the calibration edge.

What the paper actually delivers is a head-to-head on four public adult glioma datasets plus one external set, with AUROC, AUPRC, and expected calibration error reported for BrainIAC, MRI-CORE, BiomedCLIP, BrainDINO, TabPFN, and logistic regression. The observation that MRI-specific encoders lag behind a general-domain model like BiomedCLIP is a concrete result, and the cross-cohort drop plus external numbers give some evidence on transfer. Code release is helpful.

The soft spot is the leap from these cohorts to claims about clinically distinct distribution shifts. The abstract notes moderate AUROC degradation and stronger AUPRC sensitivity, but does not quantify scanner, field strength, slice thickness, or treatment timing differences between the public sets and the external one. Without that, the inference that image models supply complementary value under real-world shifts stays suggestive rather than demonstrated.

This is useful for groups already working on non-invasive molecular prediction in glioma who need a current snapshot of these particular encoders. It is not a methods paper and does not claim to be. A serious editor should send it to review; the empirical comparison is clear enough to be worth referee time even if the generalization discussion needs tightening.

Referee Report

1 major / 0 minor

Summary. The paper benchmarks four image foundation models (BrainIAC, MRI-CORE, BiomedCLIP, BrainDINO) against radiomics-based TabPFN and logistic regression baselines for predicting IDH mutational status from FLAIR and post-contrast T1 MRI. It reports within-cohort performance (TabPFN AUROC 0.92, BiomedCLIP 0.85), cross-cohort transfer degradation, and external post-treatment cohort results (BiomedCLIP AUROC 0.74, TabPFN better calibration), concluding that tabular models provide a strong baseline while image models may complement under distribution shifts. Code is released.

Significance. If the tabulated results and calibration metrics hold after verification of methods, the benchmark supplies concrete empirical comparisons of modality-specific foundation models on a clinically relevant molecular prediction task. The public code release and focus on both AUROC/AUPRC and calibration are strengths that allow direct reuse and extension by the community.

major comments (1)

[Abstract] Abstract: the claim that image foundation models 'may offer complementary value under clinically distinct distribution shifts' is not supported by any quantitative characterization of how the external post-treatment cohort differs from the four public cohorts or from routine clinical distributions (scanner vendor, field strength, slice thickness, treatment timing, or demographics). Without this, the reversal in ranking (BiomedCLIP AUROC advantage vs. TabPFN calibration) cannot be interpreted as evidence of complementary value under shifts.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the thoughtful review and the opportunity to strengthen the manuscript. We address the major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that image foundation models 'may offer complementary value under clinically distinct distribution shifts' is not supported by any quantitative characterization of how the external post-treatment cohort differs from the four public cohorts or from routine clinical distributions (scanner vendor, field strength, slice thickness, treatment timing, or demographics). Without this, the reversal in ranking (BiomedCLIP AUROC advantage vs. TabPFN calibration) cannot be interpreted as evidence of complementary value under shifts.

Authors: We agree that the abstract claim would be more robust with explicit cohort characterization. The external cohort is explicitly described as post-treatment (distinct in treatment timing from the primarily pre-treatment public cohorts), and the observed reversal (BiomedCLIP AUROC 0.74 vs. TabPFN superior calibration) is presented as suggestive rather than definitive evidence. However, we did not include a consolidated table of scanner, field strength, slice thickness, or demographic metadata across cohorts. In revision we will (1) add a table or paragraph in Methods summarizing all available cohort metadata and (2) revise the abstract sentence to read: 'On the external post-treatment cohort, image encoders showed an AUROC advantage while the tabular baseline remained better calibrated, indicating that representation modality and evaluation context influence performance under distribution shift.' This directly incorporates the referee's point without overstating the evidence. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical benchmark with no derivations or self-referential predictions.

full rationale

This paper is a data-driven benchmark study that trains and evaluates multiple models (image foundation models, TabPFN, logistic regression) on public glioma cohorts and one external set, reporting AUROC, AUPRC, and calibration metrics. No equations, derivations, or fitted parameters are presented that reduce any reported performance number to an input by construction. No self-citation load-bearing steps, uniqueness theorems, or ansatzes appear in the provided text. The central claims rest on direct empirical comparisons rather than any closed logical loop.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper is an empirical benchmark relying on pre-existing foundation models, public datasets, and standard performance metrics without introducing new free parameters, axioms beyond basic statistical assumptions, or invented entities.

axioms (1)

standard math Standard assumptions underlying AUROC, AUPRC, and expected calibration error calculations hold for the evaluated datasets and models.
Performance reporting depends on these metric definitions being applicable without violation.

pith-pipeline@v0.9.1-grok · 5900 in / 1177 out tokens · 28656 ms · 2026-06-26T06:28:51.636767+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

32 extracted references · 25 canonical work pages · 3 internal anchors

[1]

PLOS ONE16(8), e0256152 (Aug 2021)

An, C., Park, Y.W., Ahn, S.S., Han, K., Kim, H., Lee, S.K.: Radiomics ma- chine learning study with a small sample size: Single random training-test set split may lead to unreliable results. PLOS ONE16(8), e0256152 (Aug 2021). https://doi.org/10.1371/journal.pone.0256152

work page doi:10.1371/journal.pone.0256152 2021
[2]

Scientific Data9, 453 (2022)

Bakas, S., et al.: The University of Pennsylvania glioblastoma (UPenn-GBM) co- hort: advanced MRI, clinical, genomics, & radiomics. Scientific Data9(1), 453 (Jul 2022). https://doi.org/10.1038/s41597-022-01560-7

work page doi:10.1038/s41597-022-01560-7 2022
[3]

On the Opportunities and Risks of Foundation Models

Bommasani, R., et al.: On the Opportunities and Risks of Foundation Models (Jul 2022). https://doi.org/10.48550/arXiv.2108.07258

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2108.07258 2022
[4]

Neuro-oncology Advances4(1), vdac060 (Apr 2022)

Calabrese, E., Rudie, J.D., Rauschecker, A.M., Villanueva-Meyer, J.E., Clarke, J.L., Solomon, D.A., Cha, S.: Combining radiomics and deep convolutional neural network features from preoperative MRI for predicting clinically relevant genetic biomarkers in glioblastoma. Neuro-oncology Advances4(1), vdac060 (Apr 2022). https://doi.org/10.1093/noajnl/vdac060

work page doi:10.1093/noajnl/vdac060 2022
[5]

Ronald C

Calabrese, E., et al.: The University of California San Francisco Preoperative Dif- fuse Glioma MRI (UCSF-PDGM) Dataset. Radiology: Artificial Intelligence4(6), e220058 (Nov 2022). https://doi.org/10.1148/ryai.220058

work page doi:10.1148/ryai.220058 2022
[6]

Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple frame- work for contrastive learning of visual representations (2020), https://arxiv.org/abs/2002.05709

Pith/arXiv arXiv 2020
[7]

Neuro-Oncology23(2), 304–313 (Jul 2020)

Choi, Y.S., et al.: Fully automated hybrid approach to predict the IDH mutation status of gliomas via deep learning and radiomics. Neuro-Oncology23(2), 304–313 (Jul 2020). https://doi.org/10.1093/neuonc/noaa177 10 N. Hollet et al

work page doi:10.1093/neuonc/noaa177 2020
[8]

Mri- core: a foundation model for magnetic resonance imaging

Dong, H., Chen, Y., Gu, H., Konz, N., Chen, Y., Li, Q., Mazurowski, M.A.: MRI-CORE: A Foundation Model for Magnetic Resonance Imaging (Jul 2025). https://doi.org/10.48550/arXiv.2506.12186

work page doi:10.48550/arxiv.2506.12186 2025
[9]

Eu- ropean Radiology36(2), 1562–1591 (Feb 2026)

Farahani, S., Hejazi, M., Tabassum, M., Di Ieva, A., Mahdavifar, N., Liu, S.: Di- agnostic performance of deep learning for predicting glioma isocitrate dehydroge- nase and 1p/19q co-deletion in MRI: a systematic review and meta-analysis. Eu- ropean Radiology36(2), 1562–1591 (Feb 2026). https://doi.org/10.1007/s00330- 025-11898-2

work page doi:10.1007/s00330- 2026
[10]

https://doi.org/10.7937/FWV2-DT74, version 3

Gagnon, L., et al.: The University of California San Diego annotated post- treatment high-grade glioma multimodal MRI dataset (UCSD-PTGBM) (2025). https://doi.org/10.7937/FWV2-DT74, version 3

work page doi:10.7937/fwv2-dt74 2025
[11]

MedComm 5(9), e722 (2024)

He, W., Huang, W., Zhang, L., Wu, X., Zhang, S., Zhang, B.: Radiogenomics: bridging the gap between imaging and genomics for precision oncology. MedComm 5(9), e722 (2024). https://doi.org/10.1002/mco2.722

work page doi:10.1002/mco2.722 2024
[12]

Clinical and Translational Radi- ation Oncology18, 74–79 (Apr 2019)

van der Heide, U.A., Frantzen-Steneker, M., Astreinidou, E., Nowee, M.E., van Houdt, P.J.: MRI basics for radiation oncologists. Clinical and Translational Radi- ation Oncology18, 74–79 (Apr 2019). https://doi.org/10.1016/j.ctro.2019.04.008

work page doi:10.1016/j.ctro.2019.04.008 2019
[13]

In: International Confer- ence on Learning Representations 2023 (2023)

Hollmann, N., Müller, S., Eggensperger, K., Hutter, F.: Tabpfn: A transformer that solves small tabular classification problems in a second. In: International Confer- ence on Learning Representations 2023 (2023)

2023
[14]

Automated brain extracƟon of mulƟsequence MRI using arƟﬁcial neural networks

Isensee, F., et al.: Automated brain extraction of multisequence MRI using ar- tificial neural networks. Human Brain Mapping40(17), 4952–4964 (Aug 2019). https://doi.org/10.1002/hbm.24750

work page doi:10.1002/hbm.24750 2019
[15]

Kumar, A., Raghunathan, A., Jones, R., Ma, T., Liang, P.: Fine-tuning can distort pretrained features and underperform out-of-distribution (2022), https://arxiv.org/abs/2202.10054

arXiv 2022
[16]

Neuro-Oncology23(8), 1231–1251 (Jun 2021)

Louis, D.N., et al.: The 2021 WHO Classification of Tumors of the Central Nervous System: a summary. Neuro-Oncology23(8), 1231–1251 (Jun 2021). https://doi.org/10.1093/neuonc/noab106

work page doi:10.1093/neuonc/noab106 2021
[17]

Magnetic Resonance Imaging104, 72–79 (Dec 2023)

Lu, J., Xu, W., Chen, X., Wang, T., Li, H.: Noninvasive prediction of IDH muta- tion status in gliomas using preoperative multiparametric MRI radiomics nomo- gram: A mutlicenter study. Magnetic Resonance Imaging104, 72–79 (Dec 2023). https://doi.org/10.1016/j.mri.2023.09.001

work page doi:10.1016/j.mri.2023.09.001 2023
[18]

npj Precision Oncology9(1), 187 (Jun 2025)

Nakase, T., et al.: Integration of MRI radiomics and germline genetics to predict the IDH mutation status of gliomas. npj Precision Oncology9(1), 187 (Jun 2025). https://doi.org/10.1038/s41698-025-00980-z

work page doi:10.1038/s41698-025-00980-z 2025
[19]

Oquab, M., et al.: Dinov2: Learning robust visual features without supervision (2024), https://arxiv.org/abs/2304.07193

Pith/arXiv arXiv 2024
[20]

Paszke, A., et al.: Pytorch: An imperative style, high-performance deep learning library (2019), https://arxiv.org/abs/1912.01703

Pith/arXiv arXiv 2019
[21]

In: 2019 53rd Asilo- mar Conference on Signals, Systems, and Computers

Qin, J., Lou, Y.: L1-2 Regularized Logistic Regression. In: 2019 53rd Asilo- mar Conference on Signals, Systems, and Computers. pp. 779–783 (Nov 2019). https://doi.org/10.1109/IEEECONF44664.2019.9048830, iSSN: 2576-2303

work page doi:10.1109/ieeeconf44664.2019.9048830 2019
[22]

Learning Transferable Visual Models From Natural Language Supervision

Radford, A., et al.: Learning Transferable Visual Models From Natural Language Supervision (Feb 2021). https://doi.org/10.48550/arXiv.2103.00020

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2103.00020 2021
[23]

https://doi.org/10.7937/DFAE-1B86, version 1

Reddy, D., et al.: The University of Texas Southwestern Glioma MRI dataset with molecular marker characterization and segmentations (UTSW-Glioma) (2026). https://doi.org/10.7937/DFAE-1B86, version 1

work page doi:10.7937/dfae-1b86 2026
[24]

Siméoni, O., et al.: Dinov3 (2025), https://arxiv.org/abs/2508.10104 Title Suppressed Due to Excessive Length 11

Pith/arXiv arXiv 2025
[25]

Proceedings of the National Academy of Sciences of the United States of America110(10), 4009–4014 (Mar 2013)

Sottoriva, A., et al.: Intratumor heterogeneity in human glioblastoma re- flects cancer evolutionary dynamics. Proceedings of the National Academy of Sciences of the United States of America110(10), 4009–4014 (Mar 2013). https://doi.org/10.1073/pnas.1219747110

work page doi:10.1073/pnas.1219747110 2013
[26]

Task representations in neural networks trained to perform many cognitive tasks

Tak, D., et al.: A generalizable foundation model for analysis of human brain MRI. Nature Neuroscience29(4), 945–956 (Apr 2026). https://doi.org/10.1038/s41593- 026-02202-6

work page doi:10.1038/s41593- 2026
[27]

Data in Brief37, 107191 (Jun 2021)

van der Voort, S.R., et al.: The Erasmus Glioma Database (EGD): Structural MRI scans, WHO 2016 subtypes, and segmentations of 774 patients with glioma. Data in Brief37, 107191 (Jun 2021). https://doi.org/10.1016/j.dib.2021.107191

work page doi:10.1016/j.dib.2021.107191 2016
[28]

Wu, Y., Wang, S., Li, Y., Safari, M., Hu, M., Chang, C.W., Veeraraghavan, H., Yang, X.: Braindino: A brain mri foundation model for generalizable clinical rep- resentation learning (2026), https://arxiv.org/abs/2604.27277

Pith/arXiv arXiv 2026
[29]

The New England journal of medicine360(8), 765–773 (Feb 2009)

Yan, H., et al.: IDH1 and IDH2 Mutations in Gliomas. The New England journal of medicine360(8), 765–773 (Feb 2009). https://doi.org/10.1056/NEJMoa0808710

work page doi:10.1056/nejmoa0808710 2009
[30]

Health Care Science4(2), 110–143 (Apr 2025)

Yuan, H., Zhu, M., Yang, R., Liu, H., Li, I., Hong, C.: Rethinking domain- specific pretraining by supervised or self-supervised learning for chest radio- graph classification: A comparative study against imagenet counterparts in cold-start active learning. Health Care Science4(2), 110–143 (Apr 2025). https://doi.org/10.1002/hcs2.70009

work page doi:10.1002/hcs2.70009 2025
[31]

BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs

Zhang, S., et al.: BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs (Jan 2025). https://doi.org/10.48550/arXiv.2303.00915

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2303.00915 2025
[32]

Cancer journal (Sudbury, Mass.)22(6), 418–422 (2016)

Ziv, E., Durack, J.C., Solomon, S.B.: The Importance of Biopsy in the Era of Molecular Medicine. Cancer journal (Sudbury, Mass.)22(6), 418–422 (2016). https://doi.org/10.1097/PPO.0000000000000228

work page doi:10.1097/ppo.0000000000000228 2016

[1] [1]

PLOS ONE16(8), e0256152 (Aug 2021)

An, C., Park, Y.W., Ahn, S.S., Han, K., Kim, H., Lee, S.K.: Radiomics ma- chine learning study with a small sample size: Single random training-test set split may lead to unreliable results. PLOS ONE16(8), e0256152 (Aug 2021). https://doi.org/10.1371/journal.pone.0256152

work page doi:10.1371/journal.pone.0256152 2021

[2] [2]

Scientific Data9, 453 (2022)

Bakas, S., et al.: The University of Pennsylvania glioblastoma (UPenn-GBM) co- hort: advanced MRI, clinical, genomics, & radiomics. Scientific Data9(1), 453 (Jul 2022). https://doi.org/10.1038/s41597-022-01560-7

work page doi:10.1038/s41597-022-01560-7 2022

[3] [3]

On the Opportunities and Risks of Foundation Models

Bommasani, R., et al.: On the Opportunities and Risks of Foundation Models (Jul 2022). https://doi.org/10.48550/arXiv.2108.07258

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2108.07258 2022

[4] [4]

Neuro-oncology Advances4(1), vdac060 (Apr 2022)

Calabrese, E., Rudie, J.D., Rauschecker, A.M., Villanueva-Meyer, J.E., Clarke, J.L., Solomon, D.A., Cha, S.: Combining radiomics and deep convolutional neural network features from preoperative MRI for predicting clinically relevant genetic biomarkers in glioblastoma. Neuro-oncology Advances4(1), vdac060 (Apr 2022). https://doi.org/10.1093/noajnl/vdac060

work page doi:10.1093/noajnl/vdac060 2022

[5] [5]

Ronald C

Calabrese, E., et al.: The University of California San Francisco Preoperative Dif- fuse Glioma MRI (UCSF-PDGM) Dataset. Radiology: Artificial Intelligence4(6), e220058 (Nov 2022). https://doi.org/10.1148/ryai.220058

work page doi:10.1148/ryai.220058 2022

[6] [6]

Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple frame- work for contrastive learning of visual representations (2020), https://arxiv.org/abs/2002.05709

Pith/arXiv arXiv 2020

[7] [7]

Neuro-Oncology23(2), 304–313 (Jul 2020)

Choi, Y.S., et al.: Fully automated hybrid approach to predict the IDH mutation status of gliomas via deep learning and radiomics. Neuro-Oncology23(2), 304–313 (Jul 2020). https://doi.org/10.1093/neuonc/noaa177 10 N. Hollet et al

work page doi:10.1093/neuonc/noaa177 2020

[8] [8]

Mri- core: a foundation model for magnetic resonance imaging

Dong, H., Chen, Y., Gu, H., Konz, N., Chen, Y., Li, Q., Mazurowski, M.A.: MRI-CORE: A Foundation Model for Magnetic Resonance Imaging (Jul 2025). https://doi.org/10.48550/arXiv.2506.12186

work page doi:10.48550/arxiv.2506.12186 2025

[9] [9]

Eu- ropean Radiology36(2), 1562–1591 (Feb 2026)

Farahani, S., Hejazi, M., Tabassum, M., Di Ieva, A., Mahdavifar, N., Liu, S.: Di- agnostic performance of deep learning for predicting glioma isocitrate dehydroge- nase and 1p/19q co-deletion in MRI: a systematic review and meta-analysis. Eu- ropean Radiology36(2), 1562–1591 (Feb 2026). https://doi.org/10.1007/s00330- 025-11898-2

work page doi:10.1007/s00330- 2026

[10] [10]

https://doi.org/10.7937/FWV2-DT74, version 3

Gagnon, L., et al.: The University of California San Diego annotated post- treatment high-grade glioma multimodal MRI dataset (UCSD-PTGBM) (2025). https://doi.org/10.7937/FWV2-DT74, version 3

work page doi:10.7937/fwv2-dt74 2025

[11] [11]

MedComm 5(9), e722 (2024)

He, W., Huang, W., Zhang, L., Wu, X., Zhang, S., Zhang, B.: Radiogenomics: bridging the gap between imaging and genomics for precision oncology. MedComm 5(9), e722 (2024). https://doi.org/10.1002/mco2.722

work page doi:10.1002/mco2.722 2024

[12] [12]

Clinical and Translational Radi- ation Oncology18, 74–79 (Apr 2019)

van der Heide, U.A., Frantzen-Steneker, M., Astreinidou, E., Nowee, M.E., van Houdt, P.J.: MRI basics for radiation oncologists. Clinical and Translational Radi- ation Oncology18, 74–79 (Apr 2019). https://doi.org/10.1016/j.ctro.2019.04.008

work page doi:10.1016/j.ctro.2019.04.008 2019

[13] [13]

In: International Confer- ence on Learning Representations 2023 (2023)

Hollmann, N., Müller, S., Eggensperger, K., Hutter, F.: Tabpfn: A transformer that solves small tabular classification problems in a second. In: International Confer- ence on Learning Representations 2023 (2023)

2023

[14] [14]

Automated brain extracƟon of mulƟsequence MRI using arƟﬁcial neural networks

Isensee, F., et al.: Automated brain extraction of multisequence MRI using ar- tificial neural networks. Human Brain Mapping40(17), 4952–4964 (Aug 2019). https://doi.org/10.1002/hbm.24750

work page doi:10.1002/hbm.24750 2019

[15] [15]

Kumar, A., Raghunathan, A., Jones, R., Ma, T., Liang, P.: Fine-tuning can distort pretrained features and underperform out-of-distribution (2022), https://arxiv.org/abs/2202.10054

arXiv 2022

[16] [16]

Neuro-Oncology23(8), 1231–1251 (Jun 2021)

Louis, D.N., et al.: The 2021 WHO Classification of Tumors of the Central Nervous System: a summary. Neuro-Oncology23(8), 1231–1251 (Jun 2021). https://doi.org/10.1093/neuonc/noab106

work page doi:10.1093/neuonc/noab106 2021

[17] [17]

Magnetic Resonance Imaging104, 72–79 (Dec 2023)

Lu, J., Xu, W., Chen, X., Wang, T., Li, H.: Noninvasive prediction of IDH muta- tion status in gliomas using preoperative multiparametric MRI radiomics nomo- gram: A mutlicenter study. Magnetic Resonance Imaging104, 72–79 (Dec 2023). https://doi.org/10.1016/j.mri.2023.09.001

work page doi:10.1016/j.mri.2023.09.001 2023

[18] [18]

npj Precision Oncology9(1), 187 (Jun 2025)

Nakase, T., et al.: Integration of MRI radiomics and germline genetics to predict the IDH mutation status of gliomas. npj Precision Oncology9(1), 187 (Jun 2025). https://doi.org/10.1038/s41698-025-00980-z

work page doi:10.1038/s41698-025-00980-z 2025

[19] [19]

Oquab, M., et al.: Dinov2: Learning robust visual features without supervision (2024), https://arxiv.org/abs/2304.07193

Pith/arXiv arXiv 2024

[20] [20]

Paszke, A., et al.: Pytorch: An imperative style, high-performance deep learning library (2019), https://arxiv.org/abs/1912.01703

Pith/arXiv arXiv 2019

[21] [21]

In: 2019 53rd Asilo- mar Conference on Signals, Systems, and Computers

Qin, J., Lou, Y.: L1-2 Regularized Logistic Regression. In: 2019 53rd Asilo- mar Conference on Signals, Systems, and Computers. pp. 779–783 (Nov 2019). https://doi.org/10.1109/IEEECONF44664.2019.9048830, iSSN: 2576-2303

work page doi:10.1109/ieeeconf44664.2019.9048830 2019

[22] [22]

Learning Transferable Visual Models From Natural Language Supervision

Radford, A., et al.: Learning Transferable Visual Models From Natural Language Supervision (Feb 2021). https://doi.org/10.48550/arXiv.2103.00020

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2103.00020 2021

[23] [23]

https://doi.org/10.7937/DFAE-1B86, version 1

Reddy, D., et al.: The University of Texas Southwestern Glioma MRI dataset with molecular marker characterization and segmentations (UTSW-Glioma) (2026). https://doi.org/10.7937/DFAE-1B86, version 1

work page doi:10.7937/dfae-1b86 2026

[24] [24]

Siméoni, O., et al.: Dinov3 (2025), https://arxiv.org/abs/2508.10104 Title Suppressed Due to Excessive Length 11

Pith/arXiv arXiv 2025

[25] [25]

Proceedings of the National Academy of Sciences of the United States of America110(10), 4009–4014 (Mar 2013)

Sottoriva, A., et al.: Intratumor heterogeneity in human glioblastoma re- flects cancer evolutionary dynamics. Proceedings of the National Academy of Sciences of the United States of America110(10), 4009–4014 (Mar 2013). https://doi.org/10.1073/pnas.1219747110

work page doi:10.1073/pnas.1219747110 2013

[26] [26]

Task representations in neural networks trained to perform many cognitive tasks

Tak, D., et al.: A generalizable foundation model for analysis of human brain MRI. Nature Neuroscience29(4), 945–956 (Apr 2026). https://doi.org/10.1038/s41593- 026-02202-6

work page doi:10.1038/s41593- 2026

[27] [27]

Data in Brief37, 107191 (Jun 2021)

van der Voort, S.R., et al.: The Erasmus Glioma Database (EGD): Structural MRI scans, WHO 2016 subtypes, and segmentations of 774 patients with glioma. Data in Brief37, 107191 (Jun 2021). https://doi.org/10.1016/j.dib.2021.107191

work page doi:10.1016/j.dib.2021.107191 2016

[28] [28]

Wu, Y., Wang, S., Li, Y., Safari, M., Hu, M., Chang, C.W., Veeraraghavan, H., Yang, X.: Braindino: A brain mri foundation model for generalizable clinical rep- resentation learning (2026), https://arxiv.org/abs/2604.27277

Pith/arXiv arXiv 2026

[29] [29]

The New England journal of medicine360(8), 765–773 (Feb 2009)

Yan, H., et al.: IDH1 and IDH2 Mutations in Gliomas. The New England journal of medicine360(8), 765–773 (Feb 2009). https://doi.org/10.1056/NEJMoa0808710

work page doi:10.1056/nejmoa0808710 2009

[30] [30]

Health Care Science4(2), 110–143 (Apr 2025)

Yuan, H., Zhu, M., Yang, R., Liu, H., Li, I., Hong, C.: Rethinking domain- specific pretraining by supervised or self-supervised learning for chest radio- graph classification: A comparative study against imagenet counterparts in cold-start active learning. Health Care Science4(2), 110–143 (Apr 2025). https://doi.org/10.1002/hcs2.70009

work page doi:10.1002/hcs2.70009 2025

[31] [31]

BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs

Zhang, S., et al.: BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs (Jan 2025). https://doi.org/10.48550/arXiv.2303.00915

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2303.00915 2025

[32] [32]

Cancer journal (Sudbury, Mass.)22(6), 418–422 (2016)

Ziv, E., Durack, J.C., Solomon, S.B.: The Importance of Biopsy in the Era of Molecular Medicine. Cancer journal (Sudbury, Mass.)22(6), 418–422 (2016). https://doi.org/10.1097/PPO.0000000000000228

work page doi:10.1097/ppo.0000000000000228 2016