Probing, Fusion, and Trustworthiness: A Systematic Evaluation of Foundation Model Representations for Multimodal Cancer Analysis

Giuseppe Tripodi; Jingyu Hu; Reed Naidoo; Sarah F. McGough; Tapabrata Chakraborti

arxiv: 2606.17115 · v1 · pith:6AZ7U3IFnew · submitted 2026-06-15 · 💻 cs.LG · cs.AI· q-bio.QM

Probing, Fusion, and Trustworthiness: A Systematic Evaluation of Foundation Model Representations for Multimodal Cancer Analysis

Jingyu Hu , Giuseppe Tripodi , Reed Naidoo , Sarah F. McGough , Tapabrata Chakraborti This is my paper

Pith reviewed 2026-06-27 03:15 UTC · model grok-4.3

classification 💻 cs.LG cs.AIq-bio.QM

keywords foundation modelsmultimodal fusionconformal predictioncomputational pathologycancer analysisout-of-distribution generalizationtrustworthinesstranscriptomics

0 comments

The pith

Foundation model representations achieve competitive performance on out-of-distribution cancer data from commercial cohorts, with multimodal fusion providing gains primarily when no single modality dominates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper evaluates foundation model representations from whole-slide images and transcriptomic profiles on eight classification tasks across two real-world cancer cohorts. It benchmarks five foundation models in unimodal settings and tests three fusion strategies for combining image and omics data. Conformal prediction is used to assess trustworthiness by checking if prediction sets recover the true label when point predictions fail. The results indicate that these representations hold up under distribution shifts and that fusion is conditionally useful. This matters because it informs when to rely on single modalities versus combined ones in clinical computational pathology.

Core claim

Foundation model representations achieve competitive performance on out-of-distribution data and multimodal fusion helps mainly when no single modality dominates the signal. Conformal prediction reveals that in the majority of cases where a point prediction fails, the true diagnosis remains recoverable within the prediction set.

What carries the argument

The systematic evaluation pipeline consisting of unimodal probing of five foundation models, three image-omics fusion strategies on paired representations, and conformal prediction for uncertainty assessment on two commercial oncology cohorts.

If this is right

FM representations from images and transcriptomics carry complementary predictive signals.
Multimodal fusion yields additional gains over unimodal baselines primarily when neither modality dominates.
Conformal prediction sets recover the true diagnosis in most cases where the point prediction is incorrect.
Uncertainty-aware inference adds value for clinical support in computational pathology.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar evaluations could be extended to additional modalities like genomics or radiology to test broader applicability.
The conditional benefit of fusion suggests prioritizing modality selection based on signal strength rather than always fusing.
Conformal methods may enable safer deployment by providing recoverable sets instead of single risky predictions.
Results on commercial cohorts point toward the need for testing on more diverse real-world data sources.

Load-bearing premise

The two commercial cohorts sufficiently represent real-world distribution shifts in cancer data and the three fusion strategies adequately cover when fusion adds value.

What would settle it

Performance on additional held-out cohorts showing significantly lower accuracy than reported or fusion failing to show conditional gains would challenge the claims.

Figures

Figures reproduced from arXiv: 2606.17115 by Giuseppe Tripodi, Jingyu Hu, Reed Naidoo, Sarah F. McGough, Tapabrata Chakraborti.

**Figure 2.** Figure 2: ROC Comparison on BC-LOH Task. 0.63 0.70 CONCH+PCA ACC BC-Subtype 0.60 0.66 BC-LOH 0.65 0.69 NSCLC-TMB 0.41 0.45 NSCLC-Biopsy Site 0.83 0.88 CONCH+PCA AUC 0.76 0.80 0.68 0.72 0.54 0.60 0.62 0.69 UNI+PCA ACC BC-Subtype 0.62 0.68 BC-LOH 0.67 0.69 NSCLC-TMB 0.40 0.46 NSCLC-Biopsy Site 0.84 0.89 UNI+PCA AUC 0.76 0.81 0.70 0.74 0.57 0.62 GeneMLP GeneMLP HEMIL MCAT CONTACT CONTACT LateMIL LateMIL GeneMLP GeneMLP… view at source ↗

**Figure 3.** Figure 3: Performance Comparisons of Unimodal (GeneMLP, HEMIL) and Multimodal (MCAT, CONTACT, LateMIL) Methods ance across FMs. All three omics representation methods (UCE, PCA, and scVI) outperform direct modeling on the raw full gene expression profile on BC-LOH. However, the foundation model UCE underperforms the non-foundation-model approaches scVI and PCA. The pattern observed in the industrial IH dataset align… view at source ↗

**Figure 4.** Figure 4: The Detailed Workflow GeneMLP is an omics-based multilayer perceptron that receives the omics representation zi,omics and has widths ⟨domics, 512, 256, 128, |Y|⟩, with LayerNorm, ReLU, and dropout 0.2 after each hidden linear layer. GeneMLP is trained with weight decay=1 × 10−4 for 200 epochs. We apply early stopping with patience=20 on validation set. CONTACT is a multimodal fusion strategy that concate… view at source ↗

**Figure 5.** Figure 5: ROC curves of unimodal baselines across different tasks (BC-Subtype, BCLOH, NSCLC-TMB, NSCLC-Biopsy) Here ‘RAW’ refers to directly feeding the full set of gene TPM values as omics features. BC LOH BC Subtype NSCLC Biopsy Site NSCLC Tumor Site 0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 Empirical coverage =0.05 (target 0.95) BC LOH BC Subtype NSCLC Biopsy Site NSCLC Tumor Site 0.84 0.86 0.88 0.90 0.92 0.9… view at source ↗

**Figure 6.** Figure 6: Empirical coverage across models and target coverage levels α ∈ {0.05, 0.10, 0.20}. Each point is the mean coverage averaged over encoder configurations for a given model and task. of coverage level. CONTACT and MCAT are consistently well calibrated, maintaining ECE values in the range 0.04–0.07 across all four tasks, whereas LateMIL and GeneMLP are notably less reliable, with ECE reaching 0.144 and 0.16… view at source ↗

**Figure 7.** Figure 7: Error decomposition by task and model family. Each panel shows one of the four multiclass tasks; within each panel, bars are grouped by model family (colour). Solid bars represent the top-1 classification error (1−accuracy); hatched bars represent the conformal miss rate (1 − coverage) at α = 0.10. The gap between the two bars quantifies the uncertainty absorbed by the prediction set, samples misclassified… view at source ↗

read the original abstract

Foundation models (FMs) have emerged as powerful representation extractors for medical data, yet their generalizability to datasets under distribution shift remains underexplored. This work systematically evaluates FM-based representations on a suite of computational pathology tasks across two real-world commercial cohorts, IH-BC and IH-NSCLC, drawn from the licensed in-house (IH) oncology dataset. The analysis focuses on two modalities, whole-slide images and transcriptomic profiles, drawn from the IH multimodal data. We first benchmark unimodal probing performance across five FMs on eight downstream classification tasks, and find that image and omics representations carry complementary predictive signals. Then we investigate whether multimodal fusion can yield additional gains over unimodal baselines by comparing three image-omics fusion strategies built on paired representations. The trustworthiness of selected unimodal and multimodal pipelines is further assessed through conformal prediction. Our results show that FM representations achieve competitive performance on out-of-distribution data and that multimodal fusion helps mainly when no single modality dominates the signal. Conformal prediction reveals that in the majority of cases where a point prediction fails, the true diagnosis remains recoverable within the prediction set, reinforcing the value of uncertainty-aware inference for clinical support.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is an incremental benchmarking study on foundation models for cancer pathology that applies known methods to new cohorts but does not quantify the claimed distribution shifts.

read the letter

This paper is a systematic evaluation that applies existing foundation model probing, image-omics fusion, and conformal prediction techniques to two new in-house oncology cohorts, IH-BC and IH-NSCLC. It reports complementary signals between modalities, fusion benefits mainly when neither dominates, and conformal sets recovering the true label in many point-prediction failures. The central limitation is that the out-of-distribution performance claims are not supported by any shift quantification.

The contribution is empirical rather than methodological. The authors benchmark five foundation models on eight classification tasks using whole-slide images and transcriptomic profiles from their licensed dataset. They test three fusion strategies on paired representations and assess trustworthiness with conformal prediction. Nothing here is new in terms of techniques, but the application to these specific commercial cohorts is the fresh element.

The paper does a decent job of structuring the experiments clearly and highlighting when fusion adds value. The finding that fusion helps conditionally is useful for practitioners deciding on multimodal setups. The conformal results also provide a practical takeaway about uncertainty quantification in clinical settings.

The main soft spot is the handling of distribution shift. The abstract states that the representations achieve competitive performance on out-of-distribution data, yet there are no reported metrics like maximum mean discrepancy or comparisons to public benchmarks such as TCGA to establish the nature or magnitude of the shift. The two cohorts are presented as real-world examples under shift, but without that evidence it is difficult to separate true generalization from performance on additional in-house data. This weakens both the generalizability and the fusion claims, as the latter depend on the modality dominance patterns observed in these particular datasets.

This work is aimed at researchers and practitioners in computational pathology and multimodal medical AI who are interested in how foundation models perform on real clinical cohorts. Readers looking for new algorithms or theoretical insights will not find much here.

I would recommend sending it for peer review. The evaluation is structured and relevant enough to warrant referee input, particularly on strengthening the shift analysis and clarifying the scope of the conclusions.

Referee Report

2 major / 1 minor

Summary. The manuscript systematically evaluates foundation model (FM) representations for multimodal cancer analysis using whole-slide images and transcriptomic profiles from two commercial cohorts (IH-BC and IH-NSCLC). It benchmarks unimodal probing performance of five FMs across eight downstream classification tasks, compares three image-omics fusion strategies on paired representations, and assesses trustworthiness via conformal prediction. The central claims are that FM representations achieve competitive performance on out-of-distribution data, that multimodal fusion yields gains primarily when no single modality dominates the signal, and that conformal prediction sets recover the true diagnosis in most cases where point predictions fail.

Significance. If the empirical results hold after addressing the OOD framing, the work would provide useful benchmarking insights into modality complementarity and the practical value of uncertainty quantification in computational pathology pipelines. The conditional fusion finding and conformal recovery observation could guide model selection in clinical settings, though the current lack of shift quantification limits broader claims about generalizability.

major comments (2)

[Abstract] Abstract and evaluation framing: the claim that 'FM representations achieve competitive performance on out-of-distribution data' is load-bearing but unsupported by any explicit quantification of distribution shift (e.g., MMD, Wasserstein distance, or covariate-shift statistics) between the FMs' pretraining data and the IH-BC/IH-NSCLC cohorts, nor by comparison to public benchmarks with known shifts such as TCGA or CPTAC. This prevents distinguishing OOD performance from performance on additional in-house data.
[Fusion evaluation] Fusion strategies section: the claim that 'multimodal fusion helps mainly when no single modality dominates the signal' depends on the two cohorts spanning relevant modality-dominance regimes, yet no metrics or ablation are provided to characterize dominance vs. balanced-signal cases in IH-BC and IH-NSCLC, weakening the conditional conclusion.

minor comments (1)

[Methods] The abstract references eight downstream tasks and three fusion strategies without listing them; the methods section should explicitly enumerate the tasks, fusion implementations, and any statistical tests used for performance comparisons to support reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their insightful comments on our manuscript. We agree that strengthening the evaluation framing is important and will make revisions accordingly. We address each major comment below.

read point-by-point responses

Referee: [Abstract] Abstract and evaluation framing: the claim that 'FM representations achieve competitive performance on out-of-distribution data' is load-bearing but unsupported by any explicit quantification of distribution shift (e.g., MMD, Wasserstein distance, or covariate-shift statistics) between the FMs' pretraining data and the IH-BC/IH-NSCLC cohorts, nor by comparison to public benchmarks with known shifts such as TCGA or CPTAC. This prevents distinguishing OOD performance from performance on additional in-house data.

Authors: We concur that without explicit quantification of the distribution shift, the OOD claim is not fully supported. The manuscript relies on the in-house nature of the cohorts as evidence of shift from typical pretraining data, but we did not compute metrics such as MMD. In revision, we will attempt to add such quantifications using the available representations and reframe the language in the abstract and introduction to reflect performance on commercial in-house data with potential shift, while acknowledging the limitation. Direct comparison to TCGA may not be feasible due to data licensing, but we will note this. revision: partial
Referee: [Fusion evaluation] Fusion strategies section: the claim that 'multimodal fusion helps mainly when no single modality dominates the signal' depends on the two cohorts spanning relevant modality-dominance regimes, yet no metrics or ablation are provided to characterize dominance vs. balanced-signal cases in IH-BC and IH-NSCLC, weakening the conditional conclusion.

Authors: This is a fair observation. The two cohorts were chosen to potentially represent different scenarios, but we did not explicitly measure modality dominance (e.g., via unimodal accuracy differences or signal balance metrics). We will add an analysis or table in the revised manuscript to characterize the dominance in each cohort based on unimodal probing results, and if necessary, adjust the claim to be more precise based on the observed regimes. revision: yes

Circularity Check

0 steps flagged

No circularity: pure empirical benchmarking without derivations or self-referential quantities

full rationale

The paper reports direct performance measurements of foundation-model representations on eight classification tasks across two in-house cohorts, compares three fusion strategies, and applies conformal prediction to selected pipelines. No equations, fitted parameters, or derivation chains appear in the abstract or described content. Claims about competitive OOD performance and conditional fusion gains are grounded in observed metrics on the evaluated data rather than any reduction to inputs by construction. Self-citations, if present, are not load-bearing for any uniqueness theorem or ansatz. This is standard empirical evaluation and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

No free parameters, invented entities, or non-standard axioms identified from abstract; relies on standard ML evaluation assumptions.

axioms (1)

domain assumption Foundation models extract transferable representations useful for downstream medical tasks
Core premise enabling the probing and fusion experiments.

pith-pipeline@v0.9.1-grok · 5930 in / 1069 out tokens · 84158 ms · 2026-06-27T03:15:22.014365+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

63 extracted references · 17 canonical work pages · 5 internal anchors

[1]

Wiley interdisciplinary reviews: computational statistics2(4), 433–459 (2010)

Abdi, H., Williams, L.J.: Principal component analysis. Wiley interdisciplinary reviews: computational statistics2(4), 433–459 (2010)

2010
[2]

Angelopoulos, A.N., Bates, S.: A gentle introduction to conformal prediction and distribution-free uncertainty quantification (2021).https://doi.org/10.48550/ ARXIV.2107.07511,https://arxiv.org/abs/2107.07511

work page internal anchor Pith review Pith/arXiv arXiv 2021
[3]

arXiv preprint arXiv:2208.02814 (2022)

Angelopoulos, A.N., Bates, S., Fisch, A., Lei, L., Schuster, T.: Conformal risk control. arXiv preprint arXiv:2208.02814 (2022)

work page arXiv 2022
[4]

Nature medicine25(6), 954–961 (2019)

Ardila, D., Kiraly, A.P., Bharadwaj, S., Choi, B., Reicher, J.J., Peng, L., Tse, D., Etemadi, M., Ye, W., Corrado, G., et al.: End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography. Nature medicine25(6), 954–961 (2019)

2019
[5]

Breast Cancer: Targets and TherapyV olume 14, 113–123 (Apr 2022).https: //doi.org/10.2147/bctt.s293597,http://dx.doi.org/10.2147/BCTT.S293597

Bagegni, N.A., Davis, A.A., Clifton, K.K., Ademuyiwa, F.O.: Targeted treatment for high-risk early-stage triple-negative breast cancer: Spotlight on pembrolizumab. Breast Cancer: Targets and TherapyV olume 14, 113–123 (Apr 2022).https: //doi.org/10.2147/bctt.s293597,http://dx.doi.org/10.2147/BCTT.S293597

work page doi:10.2147/bctt.s293597 2022
[6]

Nature Machine Intelli- gence1(1), 20–23 (Jan 2019).https://doi.org/10.1038/s42256-018-0004-1, http://dx.doi.org/10.1038/s42256-018-0004-1

Begoli, E., Bhattacharya, T., Kusnezov, D.: The need for uncertainty quan- tification in machine-assisted medical decision making. Nature Machine Intelli- gence1(1), 20–23 (Jan 2019).https://doi.org/10.1038/s42256-018-0004-1, http://dx.doi.org/10.1038/s42256-018-0004-1

work page doi:10.1038/s42256-018-0004-1 2019
[7]

Nature Machine Intelligence1(1), 20– 23 (2019)

Begoli,E.,Bhattacharya,T.,Kusnezov,D.:Theneedforuncertaintyquantification in machine-learning for clinical applications. Nature Machine Intelligence1(1), 20– 23 (2019)

2019
[8]

2024 , month =

Bendidi, I., Whitfield, S., Kenyon-Dean, K., Yedder, H.B., Mesbahi, Y.E., Noutahi, E., Denton, A.K.: Benchmarking transcriptomics foundation models for perturba- tion analysis: one pca still rules them all. arXiv preprint arXiv:2410.13956 (2024)

work page arXiv 2024
[9]

Nature medicine25(8), 1301–1309 (2019) Probing, Fusion, and Trustworthiness of FM Representation 9

Campanella, G., Hanna, M.G., Geneslaw, L., Miraflor, A., Werneck Krauss Silva, V., Busam, K.J., Brogi, E., Reuter, V.E., Klimstra, D.S., Fuchs, T.J.: Clinical- grade computational pathology using weakly supervised deep learning on whole slide images. Nature medicine25(8), 1301–1309 (2019) Probing, Fusion, and Trustworthiness of FM Representation 9

2019
[10]

Nature555(7697), 469–474 (2018)

Capper, D., Jones, D.T., Sill, M., Hovestadt, V., Schrimpf, D., Sturm, D., Koelsche, C., Sahm, F., Chavez, L., Reuss, D.E., et al.: Dna methylation-based classification of central nervous system tumours. Nature555(7697), 469–474 (2018)

2018
[11]

carislifesciences.com,https://www

Caris Life Sciences: Mi cancer seek. carislifesciences.com,https://www. carislifesciences.com/physicians/physician-tests/mi-cancer-seek/, accessed May 31, 2026

2026
[12]

carislifesciences.com,https://www

Caris Life Sciences: Mi profile. carislifesciences.com,https://www. carislifesciences.com/physicians/physician-tests/mi-profile/, accessed May 31, 2026

2026
[13]

carislifesciences.com, https://www.carislifesciences.com/physicians/physician-tests/ mi-tumor-seek-hybrid/, accessed May 31, 2026

Caris Life Sciences: Mi tumor seek hybrid. carislifesciences.com, https://www.carislifesciences.com/physicians/physician-tests/ mi-tumor-seek-hybrid/, accessed May 31, 2026

2026
[14]

Bioinformatics35(14), i446–i454 (2019)

Cheerla, A., Gevaert, O.: Deep learning with multimodal representation for pan- cancer prognosis prediction. Bioinformatics35(14), i446–i454 (2019)

2019
[15]

Nature Computational Science pp

Chen, H., Venkatesh, M.S., Gómez Ortega, J., Mahesh, S.V., Nandi, T.N., Mad- duri, R.K., Pelka, K., Theodoris, C.V.: Scaling and quantization of large-scale foundation model enables resource-efficient predictions in network biology. Nature Computational Science pp. 1–14 (2026)

2026
[16]

Nature medicine30(3), 850–862 (2024)

Chen, R.J., Ding, T., Lu, M.Y., Williamson, D.F., Jaume, G., Song, A.H., Chen, B., Zhang, A., Shao, D., Shaban, M., et al.: Towards a general-purpose foundation model for computational pathology. Nature medicine30(3), 850–862 (2024)

2024
[17]

IEEE transactions on medical imaging41(4), 757–770 (2020)

Chen, R.J., Lu, M.Y., Wang, J., Williamson, D.F., Rodig, S.J., Lindeman, N.I., Mahmood, F.: Pathomic fusion: an integrated framework for fusing histopathology and genomic features for cancer diagnosis and prognosis. IEEE transactions on medical imaging41(4), 757–770 (2020)

2020
[18]

In: Proceedings of the IEEE/CVF Inter- national Conference on Computer Vision

Chen, R.J., Lu, M.Y., Weng, W.H., Chen, T.Y., Williamson, D.F., Manz, T., Shady, M., Mahmood, F.: Multimodal co-attention transformer for survival pre- diction in gigapixel whole slide images. In: Proceedings of the IEEE/CVF Inter- national Conference on Computer Vision. pp. 4015–4025 (2021)

2021
[19]

Cancer cell40(8), 865–878 (2022)

Chen, R.J., Lu, M.Y., Williamson, D.F., Chen, T.Y., Lipkova, J., Noor, Z., Shaban, M., Shady, M., Williams, M., Joo, B., et al.: Pan-cancer integrative histology- genomic analysis via multimodal deep learning. Cancer cell40(8), 865–878 (2022)

2022
[20]

Nature methods21(8), 1470–1480 (2024)

Cui, H., Wang, C., Maan, H., Pang, K., Luo, F., Duan, N., Wang, B.: scgpt: toward building a foundation model for single-cell multi-omics using generative ai. Nature methods21(8), 1470–1480 (2024)

2024
[21]

Medical Image Analysis88, 102861 (Aug 2023).https://doi.org/10.1016/j.media.2023.102861,http://dx.doi

Dawood, T., Chen, C., Sidhu, B.S., Ruijsink, B., Gould, J., Porter, B., El- liott, M.K., Mehta, V., Rinaldi, C.A., Puyol-Antón, E., Razavi, R., King, A.P.: Uncertainty aware training to improve deep learning model calibration for classification of cardiac mr images. Medical Image Analysis88, 102861 (Aug 2023).https://doi.org/10.1016/j.media.2023.102861,ht...

work page doi:10.1016/j.media.2023.102861 2023
[22]

arXiv preprint arXiv:2603.27460 (2026)

Deng, Z., Tang, C., Huang, Z., Lin, J., Chen, Y., Ning, J., Ma, C., Liu, J., Li, W., Zhu, Y., et al.: Project imaging-x: A survey of 1000+ open-access medical imag- ing datasets for foundation model development. arXiv preprint arXiv:2603.27460 (2026)

work page arXiv 2026
[23]

48550/arXiv.2502.00568,http://arxiv.org/abs/2502.00568, arXiv:2502.00568 [cs] 10 J

Dey,S.,Banerji,C.R.S.,Basuchowdhuri,P.,Saha,S.K.,Parashar,D.,Chakraborti, T.: Generating crossmodal gene expression from cancer histopathology improves multimodal ai predictions (arXiv:2502.00568) (Feb 2025).https://doi.org/10. 48550/arXiv.2502.00568,http://arxiv.org/abs/2502.00568, arXiv:2502.00568 [cs] 10 J. Hu, G. Tripodi, R. Naidoo, S. F. McGough, T. ...

work page arXiv 2025
[24]

Nature medicine pp

Ding, T., Wagner, S.J., Song, A.H., Chen, R.J., Lu, M.Y., Zhang, A., Vaidya, A.J., Jaume,G.,Shaban,M.,Kim,A.,etal.:Amultimodalwhole-slidefoundationmodel for pathology. Nature medicine pp. 1–13 (2025)

2025
[25]

In: Annual Conference on Medical Image Understanding and Analysis

Duvieusart, B., Krones, F., Parsons, G., Tarassenko, L., Papież, B.W., Mahdi, A.: Multimodal cardiomegaly classification with image-derived digital biomarkers. In: Annual Conference on Medical Image Understanding and Analysis. pp. 13–27. Springer (2022)

2022
[26]

Ac- cessed May 31, 2026

FlatironHealth:Databasecharacterizationguide.Flatiron.com(Mar2025),https: //flatiron.com/database-characterization, published March 18, 2025. Ac- cessed May 31, 2026

2025
[27]

arXiv preprint arXiv:2205.14204 (2022)

Geng, X., Liu, H., Lee, L., Schuurmans, D., Levine, S., Abbeel, P.: Multi- modal masked autoencoders learn transferable representations. arXiv preprint arXiv:2205.14204 (2022)

work page arXiv 2022
[28]

In: International Conference on Machine Learning

Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neu- ral networks. In: International Conference on Machine Learning. pp. 1321–1330. PMLR (2017)

2017
[29]

On Calibration of Modern Neural Networks

Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. CoRRabs/1706.04599(2017),http://arxiv.org/abs/1706.04599

work page internal anchor Pith review Pith/arXiv arXiv 2017
[30]

Advances in Neural Information Processing Systems37, 64479–64498 (2024)

Hemker, K., Simidjievski, N., Jamnik, M.: Healnet: multimodal fusion for hetero- geneous biomedical data. Advances in Neural Information Processing Systems37, 64479–64498 (2024)

2024
[31]

Signal Processing183, 108036 (2021)

Hermessi, H., Mourali, O., Zagrouba, E.: Multimodal medical image fusion review: Theoretical background and recent advances. Signal Processing183, 108036 (2021)

2021
[32]

NPJ digital medicine3(1), 136 (2020)

Huang, S.C., Pareek, A., Seyyedi, S., Banerjee, I., Lungren, M.P.: Fusion of medical imaging and electronic health records using deep learning: a systematic review and implementation guidelines. NPJ digital medicine3(1), 136 (2020)

2020
[33]

Scientific reports10(1), 22147 (2020)

Huang, S.C., Pareek, A., Zamanian, R., Banerjee, I., Lungren, M.P.: Multimodal fusion with deep neural networks for leveraging ct imaging and electronic health record: a case-study in pulmonary embolism detection. Scientific reports10(1), 22147 (2020)

2020
[34]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Jaume, G., Vaidya, A., Chen, R.J., Williamson, D.F., Liang, P.P., Mahmood, F.: Modeling dense multimodal interactions between biological pathways and histology for survival prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 11579–11590 (2024)

2024
[35]

Nature communications11(1), 728 (2020)

Jiao, W., Atwal, G., Polak, P., Karlic, R., Cuppen, E., Danyi, A., de Ridder, J., van Herpen, C., Lolkema, M.P., et al.: A deep learning system accurately clas- sifies primary and metastatic cancers using passenger mutation patterns. Nature communications11(1), 728 (2020)

2020
[36]

In: International Conference on Medical Image Computing and Computer-Assisted Intervention

Karasikov, M., van Doorn, J., Känzig, N., Erdal Cesur, M., Horlings, H.M., Berke, R., Tang, F., Otálora, S.: Training state-of-the-art pathology foundation models with orders of magnitude less data. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 573–583. Springer (2025)

2025
[37]

Genome Biology26(1), 101 (2025)

Kedzierska, K.Z., Crawford, L., Amini, A.P., Lu, A.X.: Zero-shot evaluation reveals limitations of single-cell foundation models. Genome Biology26(1), 101 (2025)

2025
[38]

Information Fusion114, 102690 (2025)

Krones, F., Marikkar, U., Parsons, G., Szmul, A., Mahdi, A.: Review of multimodal machine learning approaches in healthcare. Information Fusion114, 102690 (2025)

2025
[39]

In: Langley, P

Langley, P.: Crafting papers on machine learning. In: Langley, P. (ed.) Proceedings of the 17th International Conference on Machine Learning (ICML 2000). pp. 1207–

2000
[40]

Morgan Kaufmann, Stanford, CA (2000) Probing, Fusion, and Trustworthiness of FM Representation 11

2000
[41]

Scientific Reports 7(1), 17816 (2017)

Leibig, C., Allken, V., Ayhan, M.S., Berens, P., Wahl, S.: Leveraging uncertainty information from deep neural networks for disease detection. Scientific Reports 7(1), 17816 (2017)

2017
[42]

Nature methods15(12), 1053–1058 (2018)

Lopez,R.,Regier,J.,Cole,M.B.,Jordan,M.I.,Yosef,N.:Deepgenerativemodeling for single-cell transcriptomics. Nature methods15(12), 1053–1058 (2018)

2018
[43]

Nature medicine30(3), 863–874 (2024)

Lu, M.Y., Chen, B., Williamson, D.F., Chen, R.J., Liang, I., Ding, T., Jaume, G., Odintsov, I., Le, L.P., Gerber, G., et al.: A visual-language foundation model for computational pathology. Nature medicine30(3), 863–874 (2024)

2024
[44]

Nature634(8033), 466–473 (2024)

Lu, M.Y., Chen, B., Williamson, D.F., Chen, R.J., Zhao, M., Chow, A.K., Ikemura, K., Kim, A., Pouli, D., Patel, A., et al.: A multimodal generative ai copilot for human pathology. Nature634(8033), 466–473 (2024)

2024
[45]

MedRxiv pp

Ma, X., Long, L., Moon, S., Adamson, B.J., Baxi, S.S.: Comparison of population characteristics in real-world clinical oncology databases in the us: Flatiron health, seer, and npcr. MedRxiv pp. 2020–03 (2020)

2020
[46]

Nature577(7788), 89–94 (2020)

McKinney, S.M., Sieniek, M., Godbole, V., Godwin, J., Antropova, N., Ashrafian, H., Back, T., Chesus, M., Corrado, G.S., Darzi, A., et al.: International evaluation of an ai system for breast cancer screening. Nature577(7788), 89–94 (2020)

2020
[47]

Naidoo, R., Fourkioti, O., Vries, M.D., Bakal, C.: Survivmil: A multimodal, multi- ple instance learning pipeline for survival outcome of neuroblastoma patients. In: Ciompi, F., Khalili, N., Studer, L., Poceviciute, M., Khan, A., Veta, M., Jiao, Y., Haj-Hosseini, N., Chen, H., Raza, S., Minhas, FayyazZlobec, I., Burlutskiy, N., Vilaplana, V., Brattoli, B....

2024
[48]

European Radiology34(10), 6639–6651 (2024).https: //doi.org/10.1007/s00330-024-10714-7

Peeters, D., Alves, N., Venkadesh, K.V., Dinnessen, R., Saghir, Z., Scholten, E.T., Schaefer-Prokop, C., Vliegenthart, R., Prokop, M., Jacobs, C.: Enhancing a deep learning model for pulmonary nodule malignancy risk estimation in chest ct with uncertainty estimation. European Radiology34(10), 6639–6651 (2024).https: //doi.org/10.1007/s00330-024-10714-7

work page doi:10.1007/s00330-024-10714-7 2024
[49]

Nature medicine29(5), 1113–1122 (2023)

Placido, D., Yuan, B., Hjaltelin, J.X., Zheng, C., Haue, A.D., Chmura, P.J., Yuan, C., Kim, J., Umeton, R., Antell, G., et al.: A deep learning algorithm to predict risk of pancreatic cancer from disease trajectories. Nature medicine29(5), 1113–1122 (2023)

2023
[50]

BioRxiv pp

Rosen, Y., Roohani, Y., Agarwal, A., Samotorčan, L., Consortium, T.S., Quake, S.R., Leskovec, J.: Universal cell embeddings: A foundation model for cell biology. BioRxiv pp. 2023–11 (2023)

2023
[51]

NeuroImage195, 11–22 (2019)

Roy, A.G., Conjeti, S., Navab, N., Wachinger, C.: Bayesian QuickNAT: Model uncertainty in deep whole-brain segmentation for structure-wise quality control. NeuroImage195, 11–22 (2019)

2019
[52]

MedGemma 1.5 Technical Report

Sellergren, A., Gao, C., Mahvar, F., Kohlberger, T., Jamil, F., Traverse, M., Tono, A., Sadjad, B., Yang, L., Lau, C., et al.: Medgemma 1.5 technical report. arXiv preprint arXiv:2604.05081 (2026)

work page internal anchor Pith review Pith/arXiv arXiv 2026
[53]

MedGemma Technical Report

Sellergren, A., Kazemzadeh, S., Jaroensri, T., Kiraly, A., Traverse, M., Kohlberger, T., Xu, S., Jamil, F., Hughes, C., Lau, C., et al.: Medgemma technical report. arXiv preprint arXiv:2507.05201 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[54]

arXiv preprint arXiv:2405.10254 (2024) 12 J

Shaikovski, G., Casson, A., Severson, K., Zimmermann, E., Wang, Y.K., Kunz, J.D., Retamero, J.A., Oakley, G., Klimstra, D., Kanan, C., et al.: Prism: A multi- modal generative foundation model for slide-level histopathology. arXiv preprint arXiv:2405.10254 (2024) 12 J. Hu, G. Tripodi, R. Naidoo, S. F. McGough, T. Chakraborti

work page arXiv 2024
[55]

Nature618(7965), 616–624 (2023)

Theodoris, C.V., Xiao, L., Chopra, A., Chaffin, M.D., Al Sayed, Z.R., Hill, M.C., Mantineo, H., Brydon, E.M., Zeng, Z., Liu, X.S., et al.: Transfer learning enables predictions in network biology. Nature618(7965), 616–624 (2023)

2023
[56]

Virchow: A Million-Slide Digital Pathology Foundation Model

Vorontsov, E., Bozkurt, A., Casson, A., Shaikovski, G., Zelechowski, M., Liu, S., Severson, K., Zimmermann, E., Hall, J., Tenenholtz, N., et al.: Virchow: A million- slide digital pathology foundation model. arXiv preprint arXiv:2309.07778 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[57]

Medical image analysis81, 102559 (2022)

Wang, X., Yang, S., Zhang, J., Wang, M., Zhang, J., Yang, W., Huang, J., Han, X.: Transformer-based unsupervised contrastive learning for histopathological image classification. Medical image analysis81, 102559 (2022)

2022
[58]

Nature638(8051), 769–778 (2025)

Xiang, J., Wang, X., Zhang, X., Xi, Y., Eweje, F., Chen, Y., Li, Y., Bergstrom, C., Gopaulchan, M., Kim, T., et al.: A vision–language foundation model for precision oncology. Nature638(8051), 769–778 (2025)

2025
[59]

In: 2023 IEEE 13th International Conference on Pattern Recog- nition Systems (ICPRS)

Zahari, R., Cox, J., Obara, B.: Quantifying the uncertainty in 3d ct lung cancer im- ages classification. In: 2023 IEEE 13th International Conference on Pattern Recog- nition Systems (ICPRS). pp. 1–7 (2023).https://doi.org/10.1109/ICPRS58416. 2023.10179053

work page doi:10.1109/icprs58416 2023
[60]

IEEE Journal of Selected Topics in Signal Processing14(3), 478–493 (2020)

Zhang, C., Yang, Z., He, X., Deng, L.: Multimodal intelligence: Representation learning, information fusion, and applications. IEEE Journal of Selected Topics in Signal Processing14(3), 478–493 (2020)

2020
[61]

Zhang, Q., Gossai, A., Monroe, S., Nussbaum, N.C., Parrinello, C.M.: Validation analysis of a composite real-world mortality endpoint for patients with cancer in the united states. Health services research56(6), 1281–1287 (2021) A Related Work A.1 Medical Foundation Models Foundation models (FMs) for computational pathology can be broadly cat- egorized in...

2021
[62]

Get Representations Gene Encoder(UCE, SCVI, PCA) HBA1…AACSA1CFA1BGA2M 150…33729253Gene Expressions

Patch Extraction Image Foundation Models(CONCH, UNI, Virchow, MUSK)2. Get Representations Gene Encoder(UCE, SCVI, PCA) HBA1…AACSA1CFA1BGA2M 150…33729253Gene Expressions
[63]

WSI Multimodal Learning Unimodal Learning + + Evaluation LateMIL CONTACT GeneMLPHEMIL Y Y Y MCAT Fig

Get representations . . . WSI Multimodal Learning Unimodal Learning + + Evaluation LateMIL CONTACT GeneMLPHEMIL Y Y Y MCAT Fig. 4.The Detailed Workflow GeneMLPis an omics-based multilayer perceptron that receives the omics representationz i,omics and has widths⟨d omics,512,256,128,|Y|⟩, with Layer- Norm,ReLU,anddropout0.2aftereachhiddenlinearlayer.GeneMLP...

work page arXiv 2048

[1] [1]

Wiley interdisciplinary reviews: computational statistics2(4), 433–459 (2010)

Abdi, H., Williams, L.J.: Principal component analysis. Wiley interdisciplinary reviews: computational statistics2(4), 433–459 (2010)

2010

[2] [2]

Angelopoulos, A.N., Bates, S.: A gentle introduction to conformal prediction and distribution-free uncertainty quantification (2021).https://doi.org/10.48550/ ARXIV.2107.07511,https://arxiv.org/abs/2107.07511

work page internal anchor Pith review Pith/arXiv arXiv 2021

[3] [3]

arXiv preprint arXiv:2208.02814 (2022)

Angelopoulos, A.N., Bates, S., Fisch, A., Lei, L., Schuster, T.: Conformal risk control. arXiv preprint arXiv:2208.02814 (2022)

work page arXiv 2022

[4] [4]

Nature medicine25(6), 954–961 (2019)

Ardila, D., Kiraly, A.P., Bharadwaj, S., Choi, B., Reicher, J.J., Peng, L., Tse, D., Etemadi, M., Ye, W., Corrado, G., et al.: End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography. Nature medicine25(6), 954–961 (2019)

2019

[5] [5]

Breast Cancer: Targets and TherapyV olume 14, 113–123 (Apr 2022).https: //doi.org/10.2147/bctt.s293597,http://dx.doi.org/10.2147/BCTT.S293597

Bagegni, N.A., Davis, A.A., Clifton, K.K., Ademuyiwa, F.O.: Targeted treatment for high-risk early-stage triple-negative breast cancer: Spotlight on pembrolizumab. Breast Cancer: Targets and TherapyV olume 14, 113–123 (Apr 2022).https: //doi.org/10.2147/bctt.s293597,http://dx.doi.org/10.2147/BCTT.S293597

work page doi:10.2147/bctt.s293597 2022

[6] [6]

Nature Machine Intelli- gence1(1), 20–23 (Jan 2019).https://doi.org/10.1038/s42256-018-0004-1, http://dx.doi.org/10.1038/s42256-018-0004-1

Begoli, E., Bhattacharya, T., Kusnezov, D.: The need for uncertainty quan- tification in machine-assisted medical decision making. Nature Machine Intelli- gence1(1), 20–23 (Jan 2019).https://doi.org/10.1038/s42256-018-0004-1, http://dx.doi.org/10.1038/s42256-018-0004-1

work page doi:10.1038/s42256-018-0004-1 2019

[7] [7]

Nature Machine Intelligence1(1), 20– 23 (2019)

Begoli,E.,Bhattacharya,T.,Kusnezov,D.:Theneedforuncertaintyquantification in machine-learning for clinical applications. Nature Machine Intelligence1(1), 20– 23 (2019)

2019

[8] [8]

2024 , month =

Bendidi, I., Whitfield, S., Kenyon-Dean, K., Yedder, H.B., Mesbahi, Y.E., Noutahi, E., Denton, A.K.: Benchmarking transcriptomics foundation models for perturba- tion analysis: one pca still rules them all. arXiv preprint arXiv:2410.13956 (2024)

work page arXiv 2024

[9] [9]

Nature medicine25(8), 1301–1309 (2019) Probing, Fusion, and Trustworthiness of FM Representation 9

Campanella, G., Hanna, M.G., Geneslaw, L., Miraflor, A., Werneck Krauss Silva, V., Busam, K.J., Brogi, E., Reuter, V.E., Klimstra, D.S., Fuchs, T.J.: Clinical- grade computational pathology using weakly supervised deep learning on whole slide images. Nature medicine25(8), 1301–1309 (2019) Probing, Fusion, and Trustworthiness of FM Representation 9

2019

[10] [10]

Nature555(7697), 469–474 (2018)

Capper, D., Jones, D.T., Sill, M., Hovestadt, V., Schrimpf, D., Sturm, D., Koelsche, C., Sahm, F., Chavez, L., Reuss, D.E., et al.: Dna methylation-based classification of central nervous system tumours. Nature555(7697), 469–474 (2018)

2018

[11] [11]

carislifesciences.com,https://www

Caris Life Sciences: Mi cancer seek. carislifesciences.com,https://www. carislifesciences.com/physicians/physician-tests/mi-cancer-seek/, accessed May 31, 2026

2026

[12] [12]

carislifesciences.com,https://www

Caris Life Sciences: Mi profile. carislifesciences.com,https://www. carislifesciences.com/physicians/physician-tests/mi-profile/, accessed May 31, 2026

2026

[13] [13]

carislifesciences.com, https://www.carislifesciences.com/physicians/physician-tests/ mi-tumor-seek-hybrid/, accessed May 31, 2026

Caris Life Sciences: Mi tumor seek hybrid. carislifesciences.com, https://www.carislifesciences.com/physicians/physician-tests/ mi-tumor-seek-hybrid/, accessed May 31, 2026

2026

[14] [14]

Bioinformatics35(14), i446–i454 (2019)

Cheerla, A., Gevaert, O.: Deep learning with multimodal representation for pan- cancer prognosis prediction. Bioinformatics35(14), i446–i454 (2019)

2019

[15] [15]

Nature Computational Science pp

Chen, H., Venkatesh, M.S., Gómez Ortega, J., Mahesh, S.V., Nandi, T.N., Mad- duri, R.K., Pelka, K., Theodoris, C.V.: Scaling and quantization of large-scale foundation model enables resource-efficient predictions in network biology. Nature Computational Science pp. 1–14 (2026)

2026

[16] [16]

Nature medicine30(3), 850–862 (2024)

Chen, R.J., Ding, T., Lu, M.Y., Williamson, D.F., Jaume, G., Song, A.H., Chen, B., Zhang, A., Shao, D., Shaban, M., et al.: Towards a general-purpose foundation model for computational pathology. Nature medicine30(3), 850–862 (2024)

2024

[17] [17]

IEEE transactions on medical imaging41(4), 757–770 (2020)

Chen, R.J., Lu, M.Y., Wang, J., Williamson, D.F., Rodig, S.J., Lindeman, N.I., Mahmood, F.: Pathomic fusion: an integrated framework for fusing histopathology and genomic features for cancer diagnosis and prognosis. IEEE transactions on medical imaging41(4), 757–770 (2020)

2020

[18] [18]

In: Proceedings of the IEEE/CVF Inter- national Conference on Computer Vision

Chen, R.J., Lu, M.Y., Weng, W.H., Chen, T.Y., Williamson, D.F., Manz, T., Shady, M., Mahmood, F.: Multimodal co-attention transformer for survival pre- diction in gigapixel whole slide images. In: Proceedings of the IEEE/CVF Inter- national Conference on Computer Vision. pp. 4015–4025 (2021)

2021

[19] [19]

Cancer cell40(8), 865–878 (2022)

Chen, R.J., Lu, M.Y., Williamson, D.F., Chen, T.Y., Lipkova, J., Noor, Z., Shaban, M., Shady, M., Williams, M., Joo, B., et al.: Pan-cancer integrative histology- genomic analysis via multimodal deep learning. Cancer cell40(8), 865–878 (2022)

2022

[20] [20]

Nature methods21(8), 1470–1480 (2024)

Cui, H., Wang, C., Maan, H., Pang, K., Luo, F., Duan, N., Wang, B.: scgpt: toward building a foundation model for single-cell multi-omics using generative ai. Nature methods21(8), 1470–1480 (2024)

2024

[21] [21]

Medical Image Analysis88, 102861 (Aug 2023).https://doi.org/10.1016/j.media.2023.102861,http://dx.doi

Dawood, T., Chen, C., Sidhu, B.S., Ruijsink, B., Gould, J., Porter, B., El- liott, M.K., Mehta, V., Rinaldi, C.A., Puyol-Antón, E., Razavi, R., King, A.P.: Uncertainty aware training to improve deep learning model calibration for classification of cardiac mr images. Medical Image Analysis88, 102861 (Aug 2023).https://doi.org/10.1016/j.media.2023.102861,ht...

work page doi:10.1016/j.media.2023.102861 2023

[22] [22]

arXiv preprint arXiv:2603.27460 (2026)

Deng, Z., Tang, C., Huang, Z., Lin, J., Chen, Y., Ning, J., Ma, C., Liu, J., Li, W., Zhu, Y., et al.: Project imaging-x: A survey of 1000+ open-access medical imag- ing datasets for foundation model development. arXiv preprint arXiv:2603.27460 (2026)

work page arXiv 2026

[23] [23]

48550/arXiv.2502.00568,http://arxiv.org/abs/2502.00568, arXiv:2502.00568 [cs] 10 J

Dey,S.,Banerji,C.R.S.,Basuchowdhuri,P.,Saha,S.K.,Parashar,D.,Chakraborti, T.: Generating crossmodal gene expression from cancer histopathology improves multimodal ai predictions (arXiv:2502.00568) (Feb 2025).https://doi.org/10. 48550/arXiv.2502.00568,http://arxiv.org/abs/2502.00568, arXiv:2502.00568 [cs] 10 J. Hu, G. Tripodi, R. Naidoo, S. F. McGough, T. ...

work page arXiv 2025

[24] [24]

Nature medicine pp

Ding, T., Wagner, S.J., Song, A.H., Chen, R.J., Lu, M.Y., Zhang, A., Vaidya, A.J., Jaume,G.,Shaban,M.,Kim,A.,etal.:Amultimodalwhole-slidefoundationmodel for pathology. Nature medicine pp. 1–13 (2025)

2025

[25] [25]

In: Annual Conference on Medical Image Understanding and Analysis

Duvieusart, B., Krones, F., Parsons, G., Tarassenko, L., Papież, B.W., Mahdi, A.: Multimodal cardiomegaly classification with image-derived digital biomarkers. In: Annual Conference on Medical Image Understanding and Analysis. pp. 13–27. Springer (2022)

2022

[26] [26]

Ac- cessed May 31, 2026

FlatironHealth:Databasecharacterizationguide.Flatiron.com(Mar2025),https: //flatiron.com/database-characterization, published March 18, 2025. Ac- cessed May 31, 2026

2025

[27] [27]

arXiv preprint arXiv:2205.14204 (2022)

Geng, X., Liu, H., Lee, L., Schuurmans, D., Levine, S., Abbeel, P.: Multi- modal masked autoencoders learn transferable representations. arXiv preprint arXiv:2205.14204 (2022)

work page arXiv 2022

[28] [28]

In: International Conference on Machine Learning

Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neu- ral networks. In: International Conference on Machine Learning. pp. 1321–1330. PMLR (2017)

2017

[29] [29]

On Calibration of Modern Neural Networks

Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. CoRRabs/1706.04599(2017),http://arxiv.org/abs/1706.04599

work page internal anchor Pith review Pith/arXiv arXiv 2017

[30] [30]

Advances in Neural Information Processing Systems37, 64479–64498 (2024)

Hemker, K., Simidjievski, N., Jamnik, M.: Healnet: multimodal fusion for hetero- geneous biomedical data. Advances in Neural Information Processing Systems37, 64479–64498 (2024)

2024

[31] [31]

Signal Processing183, 108036 (2021)

Hermessi, H., Mourali, O., Zagrouba, E.: Multimodal medical image fusion review: Theoretical background and recent advances. Signal Processing183, 108036 (2021)

2021

[32] [32]

NPJ digital medicine3(1), 136 (2020)

Huang, S.C., Pareek, A., Seyyedi, S., Banerjee, I., Lungren, M.P.: Fusion of medical imaging and electronic health records using deep learning: a systematic review and implementation guidelines. NPJ digital medicine3(1), 136 (2020)

2020

[33] [33]

Scientific reports10(1), 22147 (2020)

Huang, S.C., Pareek, A., Zamanian, R., Banerjee, I., Lungren, M.P.: Multimodal fusion with deep neural networks for leveraging ct imaging and electronic health record: a case-study in pulmonary embolism detection. Scientific reports10(1), 22147 (2020)

2020

[34] [34]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Jaume, G., Vaidya, A., Chen, R.J., Williamson, D.F., Liang, P.P., Mahmood, F.: Modeling dense multimodal interactions between biological pathways and histology for survival prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 11579–11590 (2024)

2024

[35] [35]

Nature communications11(1), 728 (2020)

Jiao, W., Atwal, G., Polak, P., Karlic, R., Cuppen, E., Danyi, A., de Ridder, J., van Herpen, C., Lolkema, M.P., et al.: A deep learning system accurately clas- sifies primary and metastatic cancers using passenger mutation patterns. Nature communications11(1), 728 (2020)

2020

[36] [36]

In: International Conference on Medical Image Computing and Computer-Assisted Intervention

Karasikov, M., van Doorn, J., Känzig, N., Erdal Cesur, M., Horlings, H.M., Berke, R., Tang, F., Otálora, S.: Training state-of-the-art pathology foundation models with orders of magnitude less data. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 573–583. Springer (2025)

2025

[37] [37]

Genome Biology26(1), 101 (2025)

Kedzierska, K.Z., Crawford, L., Amini, A.P., Lu, A.X.: Zero-shot evaluation reveals limitations of single-cell foundation models. Genome Biology26(1), 101 (2025)

2025

[38] [38]

Information Fusion114, 102690 (2025)

Krones, F., Marikkar, U., Parsons, G., Szmul, A., Mahdi, A.: Review of multimodal machine learning approaches in healthcare. Information Fusion114, 102690 (2025)

2025

[39] [39]

In: Langley, P

Langley, P.: Crafting papers on machine learning. In: Langley, P. (ed.) Proceedings of the 17th International Conference on Machine Learning (ICML 2000). pp. 1207–

2000

[40] [40]

Morgan Kaufmann, Stanford, CA (2000) Probing, Fusion, and Trustworthiness of FM Representation 11

2000

[41] [41]

Scientific Reports 7(1), 17816 (2017)

Leibig, C., Allken, V., Ayhan, M.S., Berens, P., Wahl, S.: Leveraging uncertainty information from deep neural networks for disease detection. Scientific Reports 7(1), 17816 (2017)

2017

[42] [42]

Nature methods15(12), 1053–1058 (2018)

Lopez,R.,Regier,J.,Cole,M.B.,Jordan,M.I.,Yosef,N.:Deepgenerativemodeling for single-cell transcriptomics. Nature methods15(12), 1053–1058 (2018)

2018

[43] [43]

Nature medicine30(3), 863–874 (2024)

Lu, M.Y., Chen, B., Williamson, D.F., Chen, R.J., Liang, I., Ding, T., Jaume, G., Odintsov, I., Le, L.P., Gerber, G., et al.: A visual-language foundation model for computational pathology. Nature medicine30(3), 863–874 (2024)

2024

[44] [44]

Nature634(8033), 466–473 (2024)

Lu, M.Y., Chen, B., Williamson, D.F., Chen, R.J., Zhao, M., Chow, A.K., Ikemura, K., Kim, A., Pouli, D., Patel, A., et al.: A multimodal generative ai copilot for human pathology. Nature634(8033), 466–473 (2024)

2024

[45] [45]

MedRxiv pp

Ma, X., Long, L., Moon, S., Adamson, B.J., Baxi, S.S.: Comparison of population characteristics in real-world clinical oncology databases in the us: Flatiron health, seer, and npcr. MedRxiv pp. 2020–03 (2020)

2020

[46] [46]

Nature577(7788), 89–94 (2020)

McKinney, S.M., Sieniek, M., Godbole, V., Godwin, J., Antropova, N., Ashrafian, H., Back, T., Chesus, M., Corrado, G.S., Darzi, A., et al.: International evaluation of an ai system for breast cancer screening. Nature577(7788), 89–94 (2020)

2020

[47] [47]

Naidoo, R., Fourkioti, O., Vries, M.D., Bakal, C.: Survivmil: A multimodal, multi- ple instance learning pipeline for survival outcome of neuroblastoma patients. In: Ciompi, F., Khalili, N., Studer, L., Poceviciute, M., Khan, A., Veta, M., Jiao, Y., Haj-Hosseini, N., Chen, H., Raza, S., Minhas, FayyazZlobec, I., Burlutskiy, N., Vilaplana, V., Brattoli, B....

2024

[48] [48]

European Radiology34(10), 6639–6651 (2024).https: //doi.org/10.1007/s00330-024-10714-7

Peeters, D., Alves, N., Venkadesh, K.V., Dinnessen, R., Saghir, Z., Scholten, E.T., Schaefer-Prokop, C., Vliegenthart, R., Prokop, M., Jacobs, C.: Enhancing a deep learning model for pulmonary nodule malignancy risk estimation in chest ct with uncertainty estimation. European Radiology34(10), 6639–6651 (2024).https: //doi.org/10.1007/s00330-024-10714-7

work page doi:10.1007/s00330-024-10714-7 2024

[49] [49]

Nature medicine29(5), 1113–1122 (2023)

Placido, D., Yuan, B., Hjaltelin, J.X., Zheng, C., Haue, A.D., Chmura, P.J., Yuan, C., Kim, J., Umeton, R., Antell, G., et al.: A deep learning algorithm to predict risk of pancreatic cancer from disease trajectories. Nature medicine29(5), 1113–1122 (2023)

2023

[50] [50]

BioRxiv pp

Rosen, Y., Roohani, Y., Agarwal, A., Samotorčan, L., Consortium, T.S., Quake, S.R., Leskovec, J.: Universal cell embeddings: A foundation model for cell biology. BioRxiv pp. 2023–11 (2023)

2023

[51] [51]

NeuroImage195, 11–22 (2019)

Roy, A.G., Conjeti, S., Navab, N., Wachinger, C.: Bayesian QuickNAT: Model uncertainty in deep whole-brain segmentation for structure-wise quality control. NeuroImage195, 11–22 (2019)

2019

[52] [52]

MedGemma 1.5 Technical Report

Sellergren, A., Gao, C., Mahvar, F., Kohlberger, T., Jamil, F., Traverse, M., Tono, A., Sadjad, B., Yang, L., Lau, C., et al.: Medgemma 1.5 technical report. arXiv preprint arXiv:2604.05081 (2026)

work page internal anchor Pith review Pith/arXiv arXiv 2026

[53] [53]

MedGemma Technical Report

Sellergren, A., Kazemzadeh, S., Jaroensri, T., Kiraly, A., Traverse, M., Kohlberger, T., Xu, S., Jamil, F., Hughes, C., Lau, C., et al.: Medgemma technical report. arXiv preprint arXiv:2507.05201 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[54] [54]

arXiv preprint arXiv:2405.10254 (2024) 12 J

Shaikovski, G., Casson, A., Severson, K., Zimmermann, E., Wang, Y.K., Kunz, J.D., Retamero, J.A., Oakley, G., Klimstra, D., Kanan, C., et al.: Prism: A multi- modal generative foundation model for slide-level histopathology. arXiv preprint arXiv:2405.10254 (2024) 12 J. Hu, G. Tripodi, R. Naidoo, S. F. McGough, T. Chakraborti

work page arXiv 2024

[55] [55]

Nature618(7965), 616–624 (2023)

Theodoris, C.V., Xiao, L., Chopra, A., Chaffin, M.D., Al Sayed, Z.R., Hill, M.C., Mantineo, H., Brydon, E.M., Zeng, Z., Liu, X.S., et al.: Transfer learning enables predictions in network biology. Nature618(7965), 616–624 (2023)

2023

[56] [56]

Virchow: A Million-Slide Digital Pathology Foundation Model

Vorontsov, E., Bozkurt, A., Casson, A., Shaikovski, G., Zelechowski, M., Liu, S., Severson, K., Zimmermann, E., Hall, J., Tenenholtz, N., et al.: Virchow: A million- slide digital pathology foundation model. arXiv preprint arXiv:2309.07778 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[57] [57]

Medical image analysis81, 102559 (2022)

Wang, X., Yang, S., Zhang, J., Wang, M., Zhang, J., Yang, W., Huang, J., Han, X.: Transformer-based unsupervised contrastive learning for histopathological image classification. Medical image analysis81, 102559 (2022)

2022

[58] [58]

Nature638(8051), 769–778 (2025)

Xiang, J., Wang, X., Zhang, X., Xi, Y., Eweje, F., Chen, Y., Li, Y., Bergstrom, C., Gopaulchan, M., Kim, T., et al.: A vision–language foundation model for precision oncology. Nature638(8051), 769–778 (2025)

2025

[59] [59]

In: 2023 IEEE 13th International Conference on Pattern Recog- nition Systems (ICPRS)

Zahari, R., Cox, J., Obara, B.: Quantifying the uncertainty in 3d ct lung cancer im- ages classification. In: 2023 IEEE 13th International Conference on Pattern Recog- nition Systems (ICPRS). pp. 1–7 (2023).https://doi.org/10.1109/ICPRS58416. 2023.10179053

work page doi:10.1109/icprs58416 2023

[60] [60]

IEEE Journal of Selected Topics in Signal Processing14(3), 478–493 (2020)

Zhang, C., Yang, Z., He, X., Deng, L.: Multimodal intelligence: Representation learning, information fusion, and applications. IEEE Journal of Selected Topics in Signal Processing14(3), 478–493 (2020)

2020

[61] [61]

Zhang, Q., Gossai, A., Monroe, S., Nussbaum, N.C., Parrinello, C.M.: Validation analysis of a composite real-world mortality endpoint for patients with cancer in the united states. Health services research56(6), 1281–1287 (2021) A Related Work A.1 Medical Foundation Models Foundation models (FMs) for computational pathology can be broadly cat- egorized in...

2021

[62] [62]

Get Representations Gene Encoder(UCE, SCVI, PCA) HBA1…AACSA1CFA1BGA2M 150…33729253Gene Expressions

Patch Extraction Image Foundation Models(CONCH, UNI, Virchow, MUSK)2. Get Representations Gene Encoder(UCE, SCVI, PCA) HBA1…AACSA1CFA1BGA2M 150…33729253Gene Expressions

[63] [63]

WSI Multimodal Learning Unimodal Learning + + Evaluation LateMIL CONTACT GeneMLPHEMIL Y Y Y MCAT Fig

Get representations . . . WSI Multimodal Learning Unimodal Learning + + Evaluation LateMIL CONTACT GeneMLPHEMIL Y Y Y MCAT Fig. 4.The Detailed Workflow GeneMLPis an omics-based multilayer perceptron that receives the omics representationz i,omics and has widths⟨d omics,512,256,128,|Y|⟩, with Layer- Norm,ReLU,anddropout0.2aftereachhiddenlinearlayer.GeneMLP...

work page arXiv 2048