pith. sign in

arxiv: 2606.17115 · v1 · pith:6AZ7U3IFnew · submitted 2026-06-15 · 💻 cs.LG · cs.AI· q-bio.QM

Probing, Fusion, and Trustworthiness: A Systematic Evaluation of Foundation Model Representations for Multimodal Cancer Analysis

Pith reviewed 2026-06-27 03:15 UTC · model grok-4.3

classification 💻 cs.LG cs.AIq-bio.QM
keywords foundation modelsmultimodal fusionconformal predictioncomputational pathologycancer analysisout-of-distribution generalizationtrustworthinesstranscriptomics
0
0 comments X

The pith

Foundation model representations achieve competitive performance on out-of-distribution cancer data from commercial cohorts, with multimodal fusion providing gains primarily when no single modality dominates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper evaluates foundation model representations from whole-slide images and transcriptomic profiles on eight classification tasks across two real-world cancer cohorts. It benchmarks five foundation models in unimodal settings and tests three fusion strategies for combining image and omics data. Conformal prediction is used to assess trustworthiness by checking if prediction sets recover the true label when point predictions fail. The results indicate that these representations hold up under distribution shifts and that fusion is conditionally useful. This matters because it informs when to rely on single modalities versus combined ones in clinical computational pathology.

Core claim

Foundation model representations achieve competitive performance on out-of-distribution data and multimodal fusion helps mainly when no single modality dominates the signal. Conformal prediction reveals that in the majority of cases where a point prediction fails, the true diagnosis remains recoverable within the prediction set.

What carries the argument

The systematic evaluation pipeline consisting of unimodal probing of five foundation models, three image-omics fusion strategies on paired representations, and conformal prediction for uncertainty assessment on two commercial oncology cohorts.

If this is right

  • FM representations from images and transcriptomics carry complementary predictive signals.
  • Multimodal fusion yields additional gains over unimodal baselines primarily when neither modality dominates.
  • Conformal prediction sets recover the true diagnosis in most cases where the point prediction is incorrect.
  • Uncertainty-aware inference adds value for clinical support in computational pathology.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar evaluations could be extended to additional modalities like genomics or radiology to test broader applicability.
  • The conditional benefit of fusion suggests prioritizing modality selection based on signal strength rather than always fusing.
  • Conformal methods may enable safer deployment by providing recoverable sets instead of single risky predictions.
  • Results on commercial cohorts point toward the need for testing on more diverse real-world data sources.

Load-bearing premise

The two commercial cohorts sufficiently represent real-world distribution shifts in cancer data and the three fusion strategies adequately cover when fusion adds value.

What would settle it

Performance on additional held-out cohorts showing significantly lower accuracy than reported or fusion failing to show conditional gains would challenge the claims.

Figures

Figures reproduced from arXiv: 2606.17115 by Giuseppe Tripodi, Jingyu Hu, Reed Naidoo, Sarah F. McGough, Tapabrata Chakraborti.

Figure 1
Figure 1. Figure 1: Overview of the Experimental Setups [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: ROC Comparison on BC-LOH Task. 0.63 0.70 CONCH+PCA ACC BC-Subtype 0.60 0.66 BC-LOH 0.65 0.69 NSCLC-TMB 0.41 0.45 NSCLC-Biopsy Site 0.83 0.88 CONCH+PCA AUC 0.76 0.80 0.68 0.72 0.54 0.60 0.62 0.69 UNI+PCA ACC BC-Subtype 0.62 0.68 BC-LOH 0.67 0.69 NSCLC-TMB 0.40 0.46 NSCLC-Biopsy Site 0.84 0.89 UNI+PCA AUC 0.76 0.81 0.70 0.74 0.57 0.62 GeneMLP GeneMLP HEMIL MCAT CONTACT CONTACT LateMIL LateMIL GeneMLP GeneMLP… view at source ↗
Figure 3
Figure 3. Figure 3: Performance Comparisons of Unimodal (GeneMLP, HEMIL) and Multimodal (MCAT, CONTACT, LateMIL) Methods ance across FMs. All three omics representation methods (UCE, PCA, and scVI) outperform direct modeling on the raw full gene expression profile on BC-LOH. However, the foundation model UCE underperforms the non-foundation-model approaches scVI and PCA. The pattern observed in the industrial IH dataset align… view at source ↗
Figure 4
Figure 4. Figure 4: The Detailed Workflow GeneMLP is an omics-based multilayer perceptron that receives the omics representation zi,omics and has widths ⟨domics, 512, 256, 128, |Y|⟩, with Layer￾Norm, ReLU, and dropout 0.2 after each hidden linear layer. GeneMLP is trained with weight decay=1 × 10−4 for 200 epochs. We apply early stopping with pa￾tience=20 on validation set. CONTACT is a multimodal fusion strategy that concate… view at source ↗
Figure 5
Figure 5. Figure 5: ROC curves of unimodal baselines across different tasks (BC-Subtype, BC￾LOH, NSCLC-TMB, NSCLC-Biopsy) Here ‘RAW’ refers to directly feeding the full set of gene TPM values as omics features. BC LOH BC Subtype NSCLC Biopsy Site NSCLC Tumor Site 0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 Empirical coverage =0.05 (target 0.95) BC LOH BC Subtype NSCLC Biopsy Site NSCLC Tumor Site 0.84 0.86 0.88 0.90 0.92 0.9… view at source ↗
Figure 6
Figure 6. Figure 6: Empirical coverage across models and target coverage levels α ∈ {0.05, 0.10, 0.20}. Each point is the mean coverage averaged over encoder configura￾tions for a given model and task. of coverage level. CONTACT and MCAT are consistently well calibrated, main￾taining ECE values in the range 0.04–0.07 across all four tasks, whereas LateMIL and GeneMLP are notably less reliable, with ECE reaching 0.144 and 0.16… view at source ↗
Figure 7
Figure 7. Figure 7: Error decomposition by task and model family. Each panel shows one of the four multiclass tasks; within each panel, bars are grouped by model family (colour). Solid bars represent the top-1 classification error (1−accuracy); hatched bars represent the conformal miss rate (1 − coverage) at α = 0.10. The gap between the two bars quantifies the uncertainty absorbed by the prediction set, samples misclassified… view at source ↗
read the original abstract

Foundation models (FMs) have emerged as powerful representation extractors for medical data, yet their generalizability to datasets under distribution shift remains underexplored. This work systematically evaluates FM-based representations on a suite of computational pathology tasks across two real-world commercial cohorts, IH-BC and IH-NSCLC, drawn from the licensed in-house (IH) oncology dataset. The analysis focuses on two modalities, whole-slide images and transcriptomic profiles, drawn from the IH multimodal data. We first benchmark unimodal probing performance across five FMs on eight downstream classification tasks, and find that image and omics representations carry complementary predictive signals. Then we investigate whether multimodal fusion can yield additional gains over unimodal baselines by comparing three image-omics fusion strategies built on paired representations. The trustworthiness of selected unimodal and multimodal pipelines is further assessed through conformal prediction. Our results show that FM representations achieve competitive performance on out-of-distribution data and that multimodal fusion helps mainly when no single modality dominates the signal. Conformal prediction reveals that in the majority of cases where a point prediction fails, the true diagnosis remains recoverable within the prediction set, reinforcing the value of uncertainty-aware inference for clinical support.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript systematically evaluates foundation model (FM) representations for multimodal cancer analysis using whole-slide images and transcriptomic profiles from two commercial cohorts (IH-BC and IH-NSCLC). It benchmarks unimodal probing performance of five FMs across eight downstream classification tasks, compares three image-omics fusion strategies on paired representations, and assesses trustworthiness via conformal prediction. The central claims are that FM representations achieve competitive performance on out-of-distribution data, that multimodal fusion yields gains primarily when no single modality dominates the signal, and that conformal prediction sets recover the true diagnosis in most cases where point predictions fail.

Significance. If the empirical results hold after addressing the OOD framing, the work would provide useful benchmarking insights into modality complementarity and the practical value of uncertainty quantification in computational pathology pipelines. The conditional fusion finding and conformal recovery observation could guide model selection in clinical settings, though the current lack of shift quantification limits broader claims about generalizability.

major comments (2)
  1. [Abstract] Abstract and evaluation framing: the claim that 'FM representations achieve competitive performance on out-of-distribution data' is load-bearing but unsupported by any explicit quantification of distribution shift (e.g., MMD, Wasserstein distance, or covariate-shift statistics) between the FMs' pretraining data and the IH-BC/IH-NSCLC cohorts, nor by comparison to public benchmarks with known shifts such as TCGA or CPTAC. This prevents distinguishing OOD performance from performance on additional in-house data.
  2. [Fusion evaluation] Fusion strategies section: the claim that 'multimodal fusion helps mainly when no single modality dominates the signal' depends on the two cohorts spanning relevant modality-dominance regimes, yet no metrics or ablation are provided to characterize dominance vs. balanced-signal cases in IH-BC and IH-NSCLC, weakening the conditional conclusion.
minor comments (1)
  1. [Methods] The abstract references eight downstream tasks and three fusion strategies without listing them; the methods section should explicitly enumerate the tasks, fusion implementations, and any statistical tests used for performance comparisons to support reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their insightful comments on our manuscript. We agree that strengthening the evaluation framing is important and will make revisions accordingly. We address each major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract and evaluation framing: the claim that 'FM representations achieve competitive performance on out-of-distribution data' is load-bearing but unsupported by any explicit quantification of distribution shift (e.g., MMD, Wasserstein distance, or covariate-shift statistics) between the FMs' pretraining data and the IH-BC/IH-NSCLC cohorts, nor by comparison to public benchmarks with known shifts such as TCGA or CPTAC. This prevents distinguishing OOD performance from performance on additional in-house data.

    Authors: We concur that without explicit quantification of the distribution shift, the OOD claim is not fully supported. The manuscript relies on the in-house nature of the cohorts as evidence of shift from typical pretraining data, but we did not compute metrics such as MMD. In revision, we will attempt to add such quantifications using the available representations and reframe the language in the abstract and introduction to reflect performance on commercial in-house data with potential shift, while acknowledging the limitation. Direct comparison to TCGA may not be feasible due to data licensing, but we will note this. revision: partial

  2. Referee: [Fusion evaluation] Fusion strategies section: the claim that 'multimodal fusion helps mainly when no single modality dominates the signal' depends on the two cohorts spanning relevant modality-dominance regimes, yet no metrics or ablation are provided to characterize dominance vs. balanced-signal cases in IH-BC and IH-NSCLC, weakening the conditional conclusion.

    Authors: This is a fair observation. The two cohorts were chosen to potentially represent different scenarios, but we did not explicitly measure modality dominance (e.g., via unimodal accuracy differences or signal balance metrics). We will add an analysis or table in the revised manuscript to characterize the dominance in each cohort based on unimodal probing results, and if necessary, adjust the claim to be more precise based on the observed regimes. revision: yes

Circularity Check

0 steps flagged

No circularity: pure empirical benchmarking without derivations or self-referential quantities

full rationale

The paper reports direct performance measurements of foundation-model representations on eight classification tasks across two in-house cohorts, compares three fusion strategies, and applies conformal prediction to selected pipelines. No equations, fitted parameters, or derivation chains appear in the abstract or described content. Claims about competitive OOD performance and conditional fusion gains are grounded in observed metrics on the evaluated data rather than any reduction to inputs by construction. Self-citations, if present, are not load-bearing for any uniqueness theorem or ansatz. This is standard empirical evaluation and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

No free parameters, invented entities, or non-standard axioms identified from abstract; relies on standard ML evaluation assumptions.

axioms (1)
  • domain assumption Foundation models extract transferable representations useful for downstream medical tasks
    Core premise enabling the probing and fusion experiments.

pith-pipeline@v0.9.1-grok · 5930 in / 1069 out tokens · 84158 ms · 2026-06-27T03:15:22.014365+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

63 extracted references · 17 canonical work pages · 5 internal anchors

  1. [1]

    Wiley interdisciplinary reviews: computational statistics2(4), 433–459 (2010)

    Abdi, H., Williams, L.J.: Principal component analysis. Wiley interdisciplinary reviews: computational statistics2(4), 433–459 (2010)

  2. [2]

    Angelopoulos, A.N., Bates, S.: A gentle introduction to conformal prediction and distribution-free uncertainty quantification (2021).https://doi.org/10.48550/ ARXIV.2107.07511,https://arxiv.org/abs/2107.07511

  3. [3]

    arXiv preprint arXiv:2208.02814 (2022)

    Angelopoulos, A.N., Bates, S., Fisch, A., Lei, L., Schuster, T.: Conformal risk control. arXiv preprint arXiv:2208.02814 (2022)

  4. [4]

    Nature medicine25(6), 954–961 (2019)

    Ardila, D., Kiraly, A.P., Bharadwaj, S., Choi, B., Reicher, J.J., Peng, L., Tse, D., Etemadi, M., Ye, W., Corrado, G., et al.: End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography. Nature medicine25(6), 954–961 (2019)

  5. [5]

    Breast Cancer: Targets and TherapyV olume 14, 113–123 (Apr 2022).https: //doi.org/10.2147/bctt.s293597,http://dx.doi.org/10.2147/BCTT.S293597

    Bagegni, N.A., Davis, A.A., Clifton, K.K., Ademuyiwa, F.O.: Targeted treatment for high-risk early-stage triple-negative breast cancer: Spotlight on pembrolizumab. Breast Cancer: Targets and TherapyV olume 14, 113–123 (Apr 2022).https: //doi.org/10.2147/bctt.s293597,http://dx.doi.org/10.2147/BCTT.S293597

  6. [6]

    Nature Machine Intelli- gence1(1), 20–23 (Jan 2019).https://doi.org/10.1038/s42256-018-0004-1, http://dx.doi.org/10.1038/s42256-018-0004-1

    Begoli, E., Bhattacharya, T., Kusnezov, D.: The need for uncertainty quan- tification in machine-assisted medical decision making. Nature Machine Intelli- gence1(1), 20–23 (Jan 2019).https://doi.org/10.1038/s42256-018-0004-1, http://dx.doi.org/10.1038/s42256-018-0004-1

  7. [7]

    Nature Machine Intelligence1(1), 20– 23 (2019)

    Begoli,E.,Bhattacharya,T.,Kusnezov,D.:Theneedforuncertaintyquantification in machine-learning for clinical applications. Nature Machine Intelligence1(1), 20– 23 (2019)

  8. [8]

    2024 , month =

    Bendidi, I., Whitfield, S., Kenyon-Dean, K., Yedder, H.B., Mesbahi, Y.E., Noutahi, E., Denton, A.K.: Benchmarking transcriptomics foundation models for perturba- tion analysis: one pca still rules them all. arXiv preprint arXiv:2410.13956 (2024)

  9. [9]

    Nature medicine25(8), 1301–1309 (2019) Probing, Fusion, and Trustworthiness of FM Representation 9

    Campanella, G., Hanna, M.G., Geneslaw, L., Miraflor, A., Werneck Krauss Silva, V., Busam, K.J., Brogi, E., Reuter, V.E., Klimstra, D.S., Fuchs, T.J.: Clinical- grade computational pathology using weakly supervised deep learning on whole slide images. Nature medicine25(8), 1301–1309 (2019) Probing, Fusion, and Trustworthiness of FM Representation 9

  10. [10]

    Nature555(7697), 469–474 (2018)

    Capper, D., Jones, D.T., Sill, M., Hovestadt, V., Schrimpf, D., Sturm, D., Koelsche, C., Sahm, F., Chavez, L., Reuss, D.E., et al.: Dna methylation-based classification of central nervous system tumours. Nature555(7697), 469–474 (2018)

  11. [11]

    carislifesciences.com,https://www

    Caris Life Sciences: Mi cancer seek. carislifesciences.com,https://www. carislifesciences.com/physicians/physician-tests/mi-cancer-seek/, accessed May 31, 2026

  12. [12]

    carislifesciences.com,https://www

    Caris Life Sciences: Mi profile. carislifesciences.com,https://www. carislifesciences.com/physicians/physician-tests/mi-profile/, accessed May 31, 2026

  13. [13]

    carislifesciences.com, https://www.carislifesciences.com/physicians/physician-tests/ mi-tumor-seek-hybrid/, accessed May 31, 2026

    Caris Life Sciences: Mi tumor seek hybrid. carislifesciences.com, https://www.carislifesciences.com/physicians/physician-tests/ mi-tumor-seek-hybrid/, accessed May 31, 2026

  14. [14]

    Bioinformatics35(14), i446–i454 (2019)

    Cheerla, A., Gevaert, O.: Deep learning with multimodal representation for pan- cancer prognosis prediction. Bioinformatics35(14), i446–i454 (2019)

  15. [15]

    Nature Computational Science pp

    Chen, H., Venkatesh, M.S., Gómez Ortega, J., Mahesh, S.V., Nandi, T.N., Mad- duri, R.K., Pelka, K., Theodoris, C.V.: Scaling and quantization of large-scale foundation model enables resource-efficient predictions in network biology. Nature Computational Science pp. 1–14 (2026)

  16. [16]

    Nature medicine30(3), 850–862 (2024)

    Chen, R.J., Ding, T., Lu, M.Y., Williamson, D.F., Jaume, G., Song, A.H., Chen, B., Zhang, A., Shao, D., Shaban, M., et al.: Towards a general-purpose foundation model for computational pathology. Nature medicine30(3), 850–862 (2024)

  17. [17]

    IEEE transactions on medical imaging41(4), 757–770 (2020)

    Chen, R.J., Lu, M.Y., Wang, J., Williamson, D.F., Rodig, S.J., Lindeman, N.I., Mahmood, F.: Pathomic fusion: an integrated framework for fusing histopathology and genomic features for cancer diagnosis and prognosis. IEEE transactions on medical imaging41(4), 757–770 (2020)

  18. [18]

    In: Proceedings of the IEEE/CVF Inter- national Conference on Computer Vision

    Chen, R.J., Lu, M.Y., Weng, W.H., Chen, T.Y., Williamson, D.F., Manz, T., Shady, M., Mahmood, F.: Multimodal co-attention transformer for survival pre- diction in gigapixel whole slide images. In: Proceedings of the IEEE/CVF Inter- national Conference on Computer Vision. pp. 4015–4025 (2021)

  19. [19]

    Cancer cell40(8), 865–878 (2022)

    Chen, R.J., Lu, M.Y., Williamson, D.F., Chen, T.Y., Lipkova, J., Noor, Z., Shaban, M., Shady, M., Williams, M., Joo, B., et al.: Pan-cancer integrative histology- genomic analysis via multimodal deep learning. Cancer cell40(8), 865–878 (2022)

  20. [20]

    Nature methods21(8), 1470–1480 (2024)

    Cui, H., Wang, C., Maan, H., Pang, K., Luo, F., Duan, N., Wang, B.: scgpt: toward building a foundation model for single-cell multi-omics using generative ai. Nature methods21(8), 1470–1480 (2024)

  21. [21]

    Medical Image Analysis88, 102861 (Aug 2023).https://doi.org/10.1016/j.media.2023.102861,http://dx.doi

    Dawood, T., Chen, C., Sidhu, B.S., Ruijsink, B., Gould, J., Porter, B., El- liott, M.K., Mehta, V., Rinaldi, C.A., Puyol-Antón, E., Razavi, R., King, A.P.: Uncertainty aware training to improve deep learning model calibration for classification of cardiac mr images. Medical Image Analysis88, 102861 (Aug 2023).https://doi.org/10.1016/j.media.2023.102861,ht...

  22. [22]

    arXiv preprint arXiv:2603.27460 (2026)

    Deng, Z., Tang, C., Huang, Z., Lin, J., Chen, Y., Ning, J., Ma, C., Liu, J., Li, W., Zhu, Y., et al.: Project imaging-x: A survey of 1000+ open-access medical imag- ing datasets for foundation model development. arXiv preprint arXiv:2603.27460 (2026)

  23. [23]

    48550/arXiv.2502.00568,http://arxiv.org/abs/2502.00568, arXiv:2502.00568 [cs] 10 J

    Dey,S.,Banerji,C.R.S.,Basuchowdhuri,P.,Saha,S.K.,Parashar,D.,Chakraborti, T.: Generating crossmodal gene expression from cancer histopathology improves multimodal ai predictions (arXiv:2502.00568) (Feb 2025).https://doi.org/10. 48550/arXiv.2502.00568,http://arxiv.org/abs/2502.00568, arXiv:2502.00568 [cs] 10 J. Hu, G. Tripodi, R. Naidoo, S. F. McGough, T. ...

  24. [24]

    Nature medicine pp

    Ding, T., Wagner, S.J., Song, A.H., Chen, R.J., Lu, M.Y., Zhang, A., Vaidya, A.J., Jaume,G.,Shaban,M.,Kim,A.,etal.:Amultimodalwhole-slidefoundationmodel for pathology. Nature medicine pp. 1–13 (2025)

  25. [25]

    In: Annual Conference on Medical Image Understanding and Analysis

    Duvieusart, B., Krones, F., Parsons, G., Tarassenko, L., Papież, B.W., Mahdi, A.: Multimodal cardiomegaly classification with image-derived digital biomarkers. In: Annual Conference on Medical Image Understanding and Analysis. pp. 13–27. Springer (2022)

  26. [26]

    Ac- cessed May 31, 2026

    FlatironHealth:Databasecharacterizationguide.Flatiron.com(Mar2025),https: //flatiron.com/database-characterization, published March 18, 2025. Ac- cessed May 31, 2026

  27. [27]

    arXiv preprint arXiv:2205.14204 (2022)

    Geng, X., Liu, H., Lee, L., Schuurmans, D., Levine, S., Abbeel, P.: Multi- modal masked autoencoders learn transferable representations. arXiv preprint arXiv:2205.14204 (2022)

  28. [28]

    In: International Conference on Machine Learning

    Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neu- ral networks. In: International Conference on Machine Learning. pp. 1321–1330. PMLR (2017)

  29. [29]

    On Calibration of Modern Neural Networks

    Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. CoRRabs/1706.04599(2017),http://arxiv.org/abs/1706.04599

  30. [30]

    Advances in Neural Information Processing Systems37, 64479–64498 (2024)

    Hemker, K., Simidjievski, N., Jamnik, M.: Healnet: multimodal fusion for hetero- geneous biomedical data. Advances in Neural Information Processing Systems37, 64479–64498 (2024)

  31. [31]

    Signal Processing183, 108036 (2021)

    Hermessi, H., Mourali, O., Zagrouba, E.: Multimodal medical image fusion review: Theoretical background and recent advances. Signal Processing183, 108036 (2021)

  32. [32]

    NPJ digital medicine3(1), 136 (2020)

    Huang, S.C., Pareek, A., Seyyedi, S., Banerjee, I., Lungren, M.P.: Fusion of medical imaging and electronic health records using deep learning: a systematic review and implementation guidelines. NPJ digital medicine3(1), 136 (2020)

  33. [33]

    Scientific reports10(1), 22147 (2020)

    Huang, S.C., Pareek, A., Zamanian, R., Banerjee, I., Lungren, M.P.: Multimodal fusion with deep neural networks for leveraging ct imaging and electronic health record: a case-study in pulmonary embolism detection. Scientific reports10(1), 22147 (2020)

  34. [34]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Jaume, G., Vaidya, A., Chen, R.J., Williamson, D.F., Liang, P.P., Mahmood, F.: Modeling dense multimodal interactions between biological pathways and histology for survival prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 11579–11590 (2024)

  35. [35]

    Nature communications11(1), 728 (2020)

    Jiao, W., Atwal, G., Polak, P., Karlic, R., Cuppen, E., Danyi, A., de Ridder, J., van Herpen, C., Lolkema, M.P., et al.: A deep learning system accurately clas- sifies primary and metastatic cancers using passenger mutation patterns. Nature communications11(1), 728 (2020)

  36. [36]

    In: International Conference on Medical Image Computing and Computer-Assisted Intervention

    Karasikov, M., van Doorn, J., Känzig, N., Erdal Cesur, M., Horlings, H.M., Berke, R., Tang, F., Otálora, S.: Training state-of-the-art pathology foundation models with orders of magnitude less data. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 573–583. Springer (2025)

  37. [37]

    Genome Biology26(1), 101 (2025)

    Kedzierska, K.Z., Crawford, L., Amini, A.P., Lu, A.X.: Zero-shot evaluation reveals limitations of single-cell foundation models. Genome Biology26(1), 101 (2025)

  38. [38]

    Information Fusion114, 102690 (2025)

    Krones, F., Marikkar, U., Parsons, G., Szmul, A., Mahdi, A.: Review of multimodal machine learning approaches in healthcare. Information Fusion114, 102690 (2025)

  39. [39]

    In: Langley, P

    Langley, P.: Crafting papers on machine learning. In: Langley, P. (ed.) Proceedings of the 17th International Conference on Machine Learning (ICML 2000). pp. 1207–

  40. [40]

    Morgan Kaufmann, Stanford, CA (2000) Probing, Fusion, and Trustworthiness of FM Representation 11

  41. [41]

    Scientific Reports 7(1), 17816 (2017)

    Leibig, C., Allken, V., Ayhan, M.S., Berens, P., Wahl, S.: Leveraging uncertainty information from deep neural networks for disease detection. Scientific Reports 7(1), 17816 (2017)

  42. [42]

    Nature methods15(12), 1053–1058 (2018)

    Lopez,R.,Regier,J.,Cole,M.B.,Jordan,M.I.,Yosef,N.:Deepgenerativemodeling for single-cell transcriptomics. Nature methods15(12), 1053–1058 (2018)

  43. [43]

    Nature medicine30(3), 863–874 (2024)

    Lu, M.Y., Chen, B., Williamson, D.F., Chen, R.J., Liang, I., Ding, T., Jaume, G., Odintsov, I., Le, L.P., Gerber, G., et al.: A visual-language foundation model for computational pathology. Nature medicine30(3), 863–874 (2024)

  44. [44]

    Nature634(8033), 466–473 (2024)

    Lu, M.Y., Chen, B., Williamson, D.F., Chen, R.J., Zhao, M., Chow, A.K., Ikemura, K., Kim, A., Pouli, D., Patel, A., et al.: A multimodal generative ai copilot for human pathology. Nature634(8033), 466–473 (2024)

  45. [45]

    MedRxiv pp

    Ma, X., Long, L., Moon, S., Adamson, B.J., Baxi, S.S.: Comparison of population characteristics in real-world clinical oncology databases in the us: Flatiron health, seer, and npcr. MedRxiv pp. 2020–03 (2020)

  46. [46]

    Nature577(7788), 89–94 (2020)

    McKinney, S.M., Sieniek, M., Godbole, V., Godwin, J., Antropova, N., Ashrafian, H., Back, T., Chesus, M., Corrado, G.S., Darzi, A., et al.: International evaluation of an ai system for breast cancer screening. Nature577(7788), 89–94 (2020)

  47. [47]

    Naidoo, R., Fourkioti, O., Vries, M.D., Bakal, C.: Survivmil: A multimodal, multi- ple instance learning pipeline for survival outcome of neuroblastoma patients. In: Ciompi, F., Khalili, N., Studer, L., Poceviciute, M., Khan, A., Veta, M., Jiao, Y., Haj-Hosseini, N., Chen, H., Raza, S., Minhas, FayyazZlobec, I., Burlutskiy, N., Vilaplana, V., Brattoli, B....

  48. [48]

    European Radiology34(10), 6639–6651 (2024).https: //doi.org/10.1007/s00330-024-10714-7

    Peeters, D., Alves, N., Venkadesh, K.V., Dinnessen, R., Saghir, Z., Scholten, E.T., Schaefer-Prokop, C., Vliegenthart, R., Prokop, M., Jacobs, C.: Enhancing a deep learning model for pulmonary nodule malignancy risk estimation in chest ct with uncertainty estimation. European Radiology34(10), 6639–6651 (2024).https: //doi.org/10.1007/s00330-024-10714-7

  49. [49]

    Nature medicine29(5), 1113–1122 (2023)

    Placido, D., Yuan, B., Hjaltelin, J.X., Zheng, C., Haue, A.D., Chmura, P.J., Yuan, C., Kim, J., Umeton, R., Antell, G., et al.: A deep learning algorithm to predict risk of pancreatic cancer from disease trajectories. Nature medicine29(5), 1113–1122 (2023)

  50. [50]

    BioRxiv pp

    Rosen, Y., Roohani, Y., Agarwal, A., Samotorčan, L., Consortium, T.S., Quake, S.R., Leskovec, J.: Universal cell embeddings: A foundation model for cell biology. BioRxiv pp. 2023–11 (2023)

  51. [51]

    NeuroImage195, 11–22 (2019)

    Roy, A.G., Conjeti, S., Navab, N., Wachinger, C.: Bayesian QuickNAT: Model uncertainty in deep whole-brain segmentation for structure-wise quality control. NeuroImage195, 11–22 (2019)

  52. [52]

    MedGemma 1.5 Technical Report

    Sellergren, A., Gao, C., Mahvar, F., Kohlberger, T., Jamil, F., Traverse, M., Tono, A., Sadjad, B., Yang, L., Lau, C., et al.: Medgemma 1.5 technical report. arXiv preprint arXiv:2604.05081 (2026)

  53. [53]

    MedGemma Technical Report

    Sellergren, A., Kazemzadeh, S., Jaroensri, T., Kiraly, A., Traverse, M., Kohlberger, T., Xu, S., Jamil, F., Hughes, C., Lau, C., et al.: Medgemma technical report. arXiv preprint arXiv:2507.05201 (2025)

  54. [54]

    arXiv preprint arXiv:2405.10254 (2024) 12 J

    Shaikovski, G., Casson, A., Severson, K., Zimmermann, E., Wang, Y.K., Kunz, J.D., Retamero, J.A., Oakley, G., Klimstra, D., Kanan, C., et al.: Prism: A multi- modal generative foundation model for slide-level histopathology. arXiv preprint arXiv:2405.10254 (2024) 12 J. Hu, G. Tripodi, R. Naidoo, S. F. McGough, T. Chakraborti

  55. [55]

    Nature618(7965), 616–624 (2023)

    Theodoris, C.V., Xiao, L., Chopra, A., Chaffin, M.D., Al Sayed, Z.R., Hill, M.C., Mantineo, H., Brydon, E.M., Zeng, Z., Liu, X.S., et al.: Transfer learning enables predictions in network biology. Nature618(7965), 616–624 (2023)

  56. [56]

    Virchow: A Million-Slide Digital Pathology Foundation Model

    Vorontsov, E., Bozkurt, A., Casson, A., Shaikovski, G., Zelechowski, M., Liu, S., Severson, K., Zimmermann, E., Hall, J., Tenenholtz, N., et al.: Virchow: A million- slide digital pathology foundation model. arXiv preprint arXiv:2309.07778 (2023)

  57. [57]

    Medical image analysis81, 102559 (2022)

    Wang, X., Yang, S., Zhang, J., Wang, M., Zhang, J., Yang, W., Huang, J., Han, X.: Transformer-based unsupervised contrastive learning for histopathological image classification. Medical image analysis81, 102559 (2022)

  58. [58]

    Nature638(8051), 769–778 (2025)

    Xiang, J., Wang, X., Zhang, X., Xi, Y., Eweje, F., Chen, Y., Li, Y., Bergstrom, C., Gopaulchan, M., Kim, T., et al.: A vision–language foundation model for precision oncology. Nature638(8051), 769–778 (2025)

  59. [59]

    In: 2023 IEEE 13th International Conference on Pattern Recog- nition Systems (ICPRS)

    Zahari, R., Cox, J., Obara, B.: Quantifying the uncertainty in 3d ct lung cancer im- ages classification. In: 2023 IEEE 13th International Conference on Pattern Recog- nition Systems (ICPRS). pp. 1–7 (2023).https://doi.org/10.1109/ICPRS58416. 2023.10179053

  60. [60]

    IEEE Journal of Selected Topics in Signal Processing14(3), 478–493 (2020)

    Zhang, C., Yang, Z., He, X., Deng, L.: Multimodal intelligence: Representation learning, information fusion, and applications. IEEE Journal of Selected Topics in Signal Processing14(3), 478–493 (2020)

  61. [61]

    Zhang, Q., Gossai, A., Monroe, S., Nussbaum, N.C., Parrinello, C.M.: Validation analysis of a composite real-world mortality endpoint for patients with cancer in the united states. Health services research56(6), 1281–1287 (2021) A Related Work A.1 Medical Foundation Models Foundation models (FMs) for computational pathology can be broadly cat- egorized in...

  62. [62]

    Get Representations Gene Encoder(UCE, SCVI, PCA) HBA1…AACSA1CFA1BGA2M 150…33729253Gene Expressions

    Patch Extraction Image Foundation Models(CONCH, UNI, Virchow, MUSK)2. Get Representations Gene Encoder(UCE, SCVI, PCA) HBA1…AACSA1CFA1BGA2M 150…33729253Gene Expressions

  63. [63]

    WSI Multimodal Learning Unimodal Learning + + Evaluation LateMIL CONTACT GeneMLPHEMIL Y Y Y MCAT Fig

    Get representations . . . WSI Multimodal Learning Unimodal Learning + + Evaluation LateMIL CONTACT GeneMLPHEMIL Y Y Y MCAT Fig. 4.The Detailed Workflow GeneMLPis an omics-based multilayer perceptron that receives the omics representationz i,omics and has widths⟨d omics,512,256,128,|Y|⟩, with Layer- Norm,ReLU,anddropout0.2aftereachhiddenlinearlayer.GeneMLP...