pith. sign in

arxiv: 2606.03644 · v1 · pith:OCTLBLFFnew · submitted 2026-05-29 · 💻 cs.LG

Spatial Transcriptomics-Guided Alignment Enhances Molecular Profiling in Pathology Foundation Model

Pith reviewed 2026-06-28 23:24 UTC · model grok-4.3

classification 💻 cs.LG
keywords spatial transcriptomicspathology foundation modelsmolecular profilingH&E whole-slide imagesparameter-efficient fine-tuningpathway aggregationprecision oncology
0
0 comments X

The pith

STAMP endows pathology foundation models with molecular awareness by aligning them to spatial transcriptomics via pathway aggregation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces STAMP to give pathology foundation models the ability to infer molecular profiles directly from standard H&E whole-slide images. It does this by curating a dataset of 1.8 million image-transcriptomic pairs and applying a pathway-informed alignment strategy during parameter-efficient fine-tuning. The goal is to connect subtle morphological features in routine slides to underlying genomic alterations that current vision-only or vision-language models miss. This would make comprehensive molecular profiling faster and less dependent on expensive sequencing.

Core claim

STAMP integrates pathway-aggregated spatial transcriptomics into PFMs through parameter-efficient fine-tuning, enriching the representation space and unlocking the capacity to resolve sub-visual molecular signatures from H&E WSIs.

What carries the argument

The pathway-informed alignment strategy that aggregates raw transcriptomic counts into biologically functional pathways before fine-tuning the PFMs.

If this is right

  • PFMs can perform molecular profiling from routine H&E WSIs without direct sequencing.
  • The models resolve sub-visual molecular signatures tied to histology.
  • Clinical utility is shown through multi-tier evaluation on diverse anatomical sites and sequencing platforms.
  • HumanST-1k supplies 1.8 million aligned image-transcriptomic pairs as a reusable training resource.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same alignment method could be applied to other tissue imaging types to extract molecular signals.
  • Improved molecular awareness might boost accuracy on downstream tasks such as treatment response prediction.
  • Wider adoption could lower the frequency of repeat biopsies needed for molecular testing.

Load-bearing premise

Aggregating transcriptomic counts into pathways reduces technical noise enough to preserve the exact spatial link between morphological features and genomic alterations.

What would settle it

STAMP-tuned models show no gain over baseline PFMs when predicting held-out gene expression or pathway activity from H&E images across multiple organs.

Figures

Figures reproduced from arXiv: 2606.03644 by Can Yang, Cheng Jin, Chenglong Zhao, Du Cai, Feng Gao, Fengtao Zhou, Hao Chen, Hongyi Wang, Huajun Zhou, Jiabo Ma, Li Liang, Ling Liang, Wenbin Li, Xiuming Zhang, Xi Wang, Yihui Wang, Yingxue Xu, Yu Wang, Zhengrui Guo, Zhengyu Zhang, Zhenhui Li, Zhe Wang, Ziyi Liu.

Figure 1
Figure 1. Figure 1: Establishment and clinical validation of STAMP. (a) Data curation process harmonizing paired H&E WSIs and ST data from public repositories. (b) Demographics of the HumanST-1k dataset, detailing the pan-cancer distribution across organs, detected genes per spot, and spatial sequencing platforms. (c) Pre-training corpus comprising 2.1 million registered H&E patches with paired spatial gene expression profile… view at source ↗
Figure 2
Figure 2. Figure 2: Spatial gene expression inference via linear probing. (a) Data preprocessing workflow illustrating the extraction of 50 highly variable genes (HVGs) from spatial transcriptomics datasets to serve as target molecular variables. (b) Linear probing pipeline for spatial gene expression prediction. The pathology encoder remains frozen. Extracted visual features undergo principal component analysis (PCA) for dim… view at source ↗
Figure 3
Figure 3. Figure 3: Spatial domain recognition via unsupervised clustering. Adjusted Rand Index (ARI), Normalized Mutual Information (NMI), Homogeneity (HOM), and Completeness (COM) scores quantifying the spatial concordance between unsupervised clusters derived from model embeddings and pathologist-annotated ground truth across four cancer types. (a and b) Overall performance benchmarking and representative spatial cluster m… view at source ↗
Figure 4
Figure 4. Figure 4: Inference of diagnostic immunohistochemical biomarkers in breast and lung cancer. (a) Summary of the clinical biomarkers and cohort design for breast cancer across biopsy and resection specimens. (b) Overall predictive performance for breast cancer biomarkers across internal testing and external validation cohorts. c, d, Performance distributions comparing STAMP against the Virchow2 baseline on the interna… view at source ↗
Figure 5
Figure 5. Figure 5: Identification of actionable driver mutations and immunotherapy response indicators. (a) Overall predictive performance for actionable driver mutations (PIK3CA in breast cancer, EGFR and KRAS in lung cancer, BRAF in colorectal cancer, and IDH in brain cancer) across internal and external cohorts. b, c Performance distributions of STAMP versus the Virchow2 baseline for individual driver mutations on the int… view at source ↗
Figure 6
Figure 6. Figure 6: Evaluation of molecular prognostic signatures and assessment of prospective clinical utility. (a) Overall predictive performance for molecular prognostic signatures, encompassing TP53 mutation status, Ki-67 proliferation index, and molecular subtyping, across internal and external cohorts. (b) Detailed performance benchmarking of STAMP versus baseline models for individual prognostic targets on the interna… view at source ↗
read the original abstract

Comprehensive molecular profiling is essential for modern precision oncology but remains hindered by prohibitive costs, specimen exhaustion, and protracted turnaround times. While pathology foundation models (PFMs) have demonstrated potential for inferring molecular phenotypes from routine hematoxylin and eosin (H&E) whole-slide images (WSIs), current architectures primarily rely on vision-centric self-supervised learning or vision-language alignment, lacking the spatially resolved molecular supervision required to connect subtle morphological features with underlying genomic alterations. Spatial transcriptomics (ST) emerges as a transformative technology that enables transcriptomic quantification within intact tissue sections, thereby preserving the precise spatial link between histology and molecular profiles. In this study, we present a Spatial Transcriptomics-guided Alignment framework for Molecular Profiling (STAMP), which endows PFMs with intrinsic molecular awareness. To support this paradigm, we curated HumanST-1k, a human ST dataset spanning diverse anatomical organs and sequencing platforms. This atlas yields 1.8 million pairs of H&E patches and corresponding transcriptomic profiles, providing a corpus that links histological structures with their molecular states. To mitigate the technical noise inherent to raw transcriptomics, STAMP applies a pathway-informed alignment strategy that aggregates transcriptomic data into biologically functional pathways, which are subsequently integrated into PFMs via parameter-efficient fine-tuning. This alignment enriches the representation space of PFMs and unlocks their capacity to resolve sub-visual molecular signatures. The clinical utility of these augmented representations was validated through a multi-tier evaluation framework.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript introduces the STAMP framework, which endows pathology foundation models (PFMs) with intrinsic molecular awareness by curating the HumanST-1k dataset (1.8 million H&E patch–transcriptomic profile pairs across organs and platforms) and applying a pathway-informed alignment strategy. Raw spatial transcriptomics counts are aggregated into biologically functional pathways to mitigate technical noise, then integrated into PFMs via parameter-efficient fine-tuning; the resulting representations are claimed to resolve sub-visual molecular signatures from routine H&E WSIs, with clinical utility assessed via a multi-tier evaluation framework.

Significance. If the empirical results hold, the work would be significant for precision oncology: it directly addresses the lack of spatially resolved molecular supervision in current vision-centric or vision-language PFMs by leveraging ST data to link subtle morphology with genomic alterations. The scale of the HumanST-1k atlas and the practical use of parameter-efficient fine-tuning are clear strengths that could enable broader adoption and serve as a resource for the community.

major comments (1)
  1. [Abstract] Abstract: the central claim that pathway-informed aggregation 'mitigates the technical noise inherent to raw transcriptomics' while 'preserving the precise spatial link' between histology and molecular profiles is load-bearing for the entire alignment strategy. Aggregation necessarily collapses gene-level spatial variation; without any referenced quantitative check (e.g., patch-level mutual information between aggregated pathways and raw counts, or ablation against non-aggregated ST baselines), it remains unclear whether the procedure retains the sub-visual signatures the model is intended to resolve.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the single major comment below and will incorporate revisions to strengthen the justification for the pathway-informed aggregation strategy.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that pathway-informed aggregation 'mitigates the technical noise inherent to raw transcriptomics' while 'preserving the precise spatial link' between histology and molecular profiles is load-bearing for the entire alignment strategy. Aggregation necessarily collapses gene-level spatial variation; without any referenced quantitative check (e.g., patch-level mutual information between aggregated pathways and raw counts, or ablation against non-aggregated ST baselines), it remains unclear whether the procedure retains the sub-visual signatures the model is intended to resolve.

    Authors: We agree that the pathway aggregation step is central to the STAMP framework and that explicit quantitative validation would strengthen the manuscript. The aggregation into Hallmark and Reactome pathways is motivated by established biological priors that reduce technical noise (e.g., dropout, batch effects) while retaining functional signals, but we acknowledge the absence of a direct patch-level comparison in the current version. In the revised manuscript we will add: (1) a supplementary analysis computing patch-level Pearson correlation and mutual information between raw gene counts and pathway aggregates across the HumanST-1k atlas; (2) an ablation study training the alignment module on raw counts versus pathway aggregates and reporting downstream performance on molecular phenotype prediction tasks. These additions will quantify whether the procedure preserves spatially relevant molecular variation. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical pipeline is self-contained

full rationale

The paper describes a data curation step (HumanST-1k atlas yielding 1.8M H&E-transcript pairs), followed by pathway-informed aggregation of raw counts and parameter-efficient fine-tuning of existing PFMs. No equations, uniqueness theorems, fitted parameters renamed as predictions, or self-citation chains are present in the provided text. The central claim rests on empirical validation of the resulting representations rather than any derivation that reduces to its own inputs by construction. This is the expected non-finding for a methods-oriented empirical study.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only view supplies no explicit free parameters, mathematical axioms, or invented entities; the pathway aggregation step implicitly relies on external biological pathway databases whose selection criteria are not stated.

pith-pipeline@v0.9.1-grok · 5869 in / 1176 out tokens · 15782 ms · 2026-06-28T23:24:38.456182+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

51 extracted references · 7 canonical work pages · 4 internal anchors

  1. [1]

    & Rubin, M

    Akhoundova, D. & Rubin, M. A. Clinical application of advanced multi-omics tumor profiling: Shaping precision oncology of the future.Cancer cell40, 920–938 (2022)

  2. [2]

    Cancers17, 3500 (2025)

    Brlek, P.et al.Advances in precision oncology: From molecular profiling to regulatory-approved targeted therapies. Cancers17, 3500 (2025)

  3. [3]

    A., Rimm, D

    Bera, K., Schalper, K. A., Rimm, D. L., Velcheti, V . & Madabhushi, A. Artificial intelligence in digital pathology—new tools for diagnosis and precision oncology.Nat. reviews Clin. oncology16, 703–715 (2019)

  4. [4]

    Niazi, M. K. K., Parwani, A. V . & Gurcan, M. N. Digital pathology and artificial intelligence.The lancet oncology20, e253–e261 (2019)

  5. [5]

    N.et al.Deep learning can predict microsatellite instability directly from histology in gastrointestinal cancer

    Kather, J. N.et al.Deep learning can predict microsatellite instability directly from histology in gastrointestinal cancer. Nat. medicine25, 1054–1056 (2019). 7.Kather, J. N.et al.Pan-cancer image-based detection of clinically actionable genetic alterations.Nat. cancer1, 789–799 (2020)

  6. [6]

    communications11, 3877 (2020)

    Schmauch, B.et al.A deep learning model to predict rna-seq expression of tumours from whole slide images.Nat. communications11, 3877 (2020)

  7. [7]

    Saltz, J.et al.Spatial organization and molecular correlation of tumor-infiltrating lymphocytes using deep learning on pathology images.Cell reports23, 181–193 (2018)

  8. [8]

    Barkley, D.et al.Cancer cell states recur across tumor types and form specific interactions with the tumor microenvironment. Nat. genetics54, 1192–1201 (2022)

  9. [9]

    Jerby-Arnon, L.et al.A cancer cell program promotes t cell exclusion and resistance to checkpoint blockade.Cell175, 984–997 (2018)

  10. [10]

    J.et al.Towards a general-purpose foundation model for computational pathology.Nat

    Chen, R. J.et al.Towards a general-purpose foundation model for computational pathology.Nat. medicine30, 850–862 (2024). 13.Xu, H.et al.A whole-slide foundation model for digital pathology from real-world data.Nature630, 181–188 (2024)

  11. [11]

    medicine30, 2924–2935 (2024)

    V orontsov, E.et al.A foundation model for clinical-grade computational pathology and rare cancers detection.Nat. medicine30, 2924–2935 (2024)

  12. [12]

    Wang, X.et al.A pathology foundation model for cancer diagnosis and prognosis prediction.Nature634, 970–978 (2024)

  13. [13]

    Ma, J.et al.A generalizable pathology foundation model using a unified knowledge distillation pretraining framework. Nat. Biomed. Eng.1–20 (2025). 17.Oquab, M.et al.Dinov2: Learning robust visual features without supervision.arXiv preprint arXiv:2304.07193(2023)

  14. [14]

    InInternational conference on machine learning, 8748–8763 (PmLR, 2021)

    Radford, A.et al.Learning transferable visual models from natural language supervision. InInternational conference on machine learning, 8748–8763 (PmLR, 2021)

  15. [15]

    CoCa: Contrastive Captioners are Image-Text Foundation Models

    Li, J., Li, D., Xiong, C. & Hoi, S. Blip: Bootstrapping language-image pre-training for unified vision-language understand- ing and generation. InInternational conference on machine learning, 12888–12900 (PMLR, 2022). 20.Yu, J.et al.Coca: Contrastive captioners are image-text foundation models.arXiv preprint arXiv:2205.01917(2022). 21.Lu, M. Y .et al.A vi...

  16. [16]

    Huang, Z., Bianchi, F., Yuksekgonul, M., Montine, T. J. & Zou, J. A visual–language foundation model for pathology image analysis using medical twitter.Nat. medicine29, 2307–2316 (2023)

  17. [17]

    neural information processing systems 36, 37995–38017 (2023)

    Ikezogwo, W.et al.Quilt-1m: One million image-text pairs for histopathology.Adv. neural information processing systems 36, 37995–38017 (2023). 24.Xu, Y .et al.A multimodal knowledge-enhanced whole-slide pathology foundation model.Nat. Commun.(2025). 25.Marx, V . Method of the year: spatially resolved transcriptomics.Nat. methods18, 9–14 (2021)

  18. [18]

    Rao, A., Barkley, D., França, G. S. & Yanai, I. Exploring tissue architecture using spatial transcriptomics.Nature596, 211–220 (2021)

  19. [19]

    national academy sciences102, 15545–15550 (2005)

    Subramanian, A.et al.Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.Proc. national academy sciences102, 15545–15550 (2005). 28.Hu, E. J.et al.Lora: Low-rank adaptation of large language models.Iclr1, 3 (2022)

  20. [20]

    Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey

    Han, Z., Gao, C., Liu, J., Zhang, J. & Zhang, S. Q. Parameter-efficient fine-tuning for large models: A comprehensive survey.arXiv preprint arXiv:2403.14608(2024)

  21. [21]

    Zimmermann, E.et al.Virchow2: Scaling self-supervised mixed magnification models in pathology.arXiv preprint arXiv:2408.00738(2024)

  22. [22]

    32.Liberzon, A.et al.The molecular signatures database hallmark gene set collection.Cell systems1, 417–425 (2015)

    Ma, J.et al.Pathbench: A comprehensive comparison benchmark for pathology foundation models towards precision oncology.arXiv preprint arXiv:2505.20202(2025). 32.Liberzon, A.et al.The molecular signatures database hallmark gene set collection.Cell systems1, 417–425 (2015)

  23. [23]

    Methods22, 1568–1582 (2025)

    Chen, W.et al.A visual–omics foundation model to bridge histopathology with spatial transcriptomics.Nat. Methods22, 1568–1582 (2025)

  24. [24]

    Junttila, M. R. & De Sauvage, F. J. Influence of tumour micro-environment heterogeneity on therapeutic response.Nature 501, 346–354 (2013)

  25. [25]

    medicine24, 541–550 (2018)

    Binnewies, M.et al.Understanding the tumor immune microenvironment (time) for effective therapy.Nat. medicine24, 541–550 (2018)

  26. [26]

    Roma-Rodrigues, C., Mendes, R., Baptista, P. V . & Fernandes, A. R. Targeting tumor microenvironment for cancer therapy. Int. journal molecular sciences20, 840 (2019)

  27. [27]

    Neural Inf

    Jaume, G.et al.Hest-1k: A dataset for spatial transcriptomics and histology image analysis.Adv. Neural Inf. Process. Syst. 37, 53798–53833 (2024)

  28. [28]

    Keren, L.et al.A structured tumor-immune microenvironment in triple negative breast cancer revealed by multiplexed ion beam imaging.Cell174, 1373–1387 (2018)

  29. [29]

    Andersson, A.et al.Spatial deconvolution of her2-positive breast tumors reveals novel intercellular relationships.bioRxiv 2020–07 (2020)

  30. [30]

    & Silina, K

    Dawo, S., Nonchev, K. & Silina, K. 10x Visium Spatial Transcriptomics Dataset: Kidney (3) and Lung (5) Cancer with Tertiary Lymphoid Structures, DOI: 10.5281/zenodo.14620362 (2025)

  31. [31]

    Erickson, A.et al.Spatially resolved clonal copy number alterations in benign and malignant tissue.Nature608, 360–367 (2022)

  32. [32]

    D.et al.Image analysis with deep learning to predict breast cancer grade, er status, histologic subtype, and intrinsic subtype.NPJ breast cancer4, 30 (2018)

    Couture, H. D.et al.Image analysis with deep learning to predict breast cancer grade, er status, histologic subtype, and intrinsic subtype.NPJ breast cancer4, 30 (2018)

  33. [33]

    Valieris, R.et al.Weakly-supervised deep learning models enable her2-low prediction from h &e stained slides.Breast Cancer Res.26, 124 (2024)

  34. [34]

    H.et al.Estrogen and progesterone receptor testing in breast cancer: Asco/cap guideline update.J

    Allison, K. H.et al.Estrogen and progesterone receptor testing in breast cancer: Asco/cap guideline update.J. Clin. Oncol. 38, 1346–1366 (2020)

  35. [35]

    Wolff, A. C.et al.Human epidermal growth factor receptor 2 testing in breast cancer: American society of clinical oncology/college of american pathologists clinical practice guideline focused update.Arch. pathology & laboratory medicine142, 1364–1382 (2018)

  36. [36]

    D.et al.The 2015 world health organization classification of lung tumors: impact of genetic, clinical and radiologic advances since the 2004 classification.J

    Travis, W. D.et al.The 2015 world health organization classification of lung tumors: impact of genetic, clinical and radiologic advances since the 2004 classification.J. thoracic oncology10, 1243–1260 (2015). 28/51

  37. [37]

    thoracic oncology 14, 377–407 (2019)

    Yatabe, Y .et al.Best practices recommendations for diagnostic immunohistochemistry in lung cancer.J. thoracic oncology 14, 377–407 (2019)

  38. [38]

    A.et al.p40 ( δnp63) is superior to p63 for the diagnosis of pulmonary squamous cell carcinoma.Mod

    Bishop, J. A.et al.p40 ( δnp63) is superior to p63 for the diagnosis of pulmonary squamous cell carcinoma.Mod. pathology25, 405–415 (2012)

  39. [39]

    Mosele, F.et al.Recommendations for the use of next-generation sequencing (ngs) for patients with metastatic cancers: a report from the esmo precision medicine working group.Annals Oncol.31, 1491–1505 (2020)

  40. [40]

    medicine24, 1559–1567 (2018)

    Coudray, N.et al.Classification and mutation prediction from non–small cell lung cancer histopathology images using deep learning.Nat. medicine24, 1559–1567 (2018)

  41. [41]

    T.et al.Mismatch repair deficiency predicts response of solid tumors to pd-1 blockade.Science357, 409–413 (2017)

    Le, D. T.et al.Mismatch repair deficiency predicts response of solid tumors to pd-1 blockade.Science357, 409–413 (2017)

  42. [42]

    The Lancet Oncol.21, 1353–1365 (2020)

    Marabelle, A.et al.Association of tumour mutational burden with outcomes in patients with advanced solid tumours treated with pembrolizumab: prospective biomarker analysis of the multicohort, open-label, phase 2 keynote-158 study. The Lancet Oncol.21, 1353–1365 (2020)

  43. [43]

    R.et al.Analysis of 100,000 human cancer genomes reveals the landscape of tumor mutational burden

    Chalmers, Z. R.et al.Analysis of 100,000 human cancer genomes reveals the landscape of tumor mutational burden. Genome medicine9, 34 (2017)

  44. [44]

    McGrail, D.et al.High tumor mutation burden fails to predict immune checkpoint blockade response across all cancer types.Annals Oncol.32, 661–672 (2021)

  45. [45]

    13, B. . W. H. . H. M. S. C. L. . . P. P. J. . K. R., data analysis: Baylor College of Medicine Creighton Chad J. 22 23 Donehower Lawrence A. 22 23 24 25, G., for Systems Biology Reynolds Sheila 31 Kreisberg Richard B. 31 Bernard Brady 31 Bressler Ryan 31 Erkkila Timo 32 Lin Jake 31 Thorsson Vesteinn 31 Zhang Wei 33 Shmulevich Ilya 31, I.et al. Comprehens...

  46. [46]

    S.et al.Supervised risk predictor of breast cancer based on intrinsic subtypes.J

    Parker, J. S.et al.Supervised risk predictor of breast cancer based on intrinsic subtypes.J. clinical oncology27, 1160–1167 (2009). 57.Guinney, J.et al.The consensus molecular subtypes of colorectal cancer.Nat. medicine21, 1350–1356 (2015). 58.Sanchez-Vega, F.et al.Oncogenic signaling pathways in the cancer genome atlas.Cell173, 321–337 (2018)

  47. [47]

    & Jaffee, E

    Yarchoan, M., Hopkins, A. & Jaffee, E. M. Tumor mutational burden and response rate to pd-1 inhibition.New Engl. J. Medicine377, 2500–2501 (2017)

  48. [48]

    61.Kleppe, A.et al.Designing deep learning studies in cancer diagnostics.Nat

    Litchfield, K.et al.Meta-analysis of tumor-and t cell-intrinsic mechanisms of sensitization to checkpoint inhibition.Cell 184, 596–614 (2021). 61.Kleppe, A.et al.Designing deep learning studies in cancer diagnostics.Nat. Rev. Cancer21, 199–211 (2021)

  49. [49]

    & V oet, T

    Vandereyken, K., Sifrim, A., Thienpont, B. & V oet, T. Methods and applications for single-cell and spatial multi-omics. Nat. Rev. Genet.24, 494–515 (2023)

  50. [50]

    & Hochreiter, S

    Klambauer, G., Unterthiner, T., Mayr, A. & Hochreiter, S. Self-normalizing neural networks.Adv. neural information processing systems30(2017). 64.Vaswani, A.et al.Attention is all you need.Adv. neural information processing systems30(2017)

  51. [51]

    Representation Learning with Contrastive Predictive Coding

    Oord, A. v. d., Li, Y . & Vinyals, O. Representation learning with contrastive predictive coding.arXiv preprint arXiv:1807.03748(2018). 29/51 Extended Data Extended Data Table 1.Dataset details for spatial gene expression prediction.There are a total of 8 datasets across 8 distinct cancer types. The table provides the overview of the oncology cohorts, the...