Recognition: unknown
OpenTME: An Open Dataset of AI-powered H&E Tumor Microenvironment Profiles from TCGA
Pith reviewed 2026-05-10 15:31 UTC · model grok-4.3
The pith
OpenTME releases pre-computed tumor microenvironment profiles with over 4,500 quantitative readouts per slide from 3,634 TCGA H&E images across five cancers.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We introduce OpenTME, an open-access dataset of pre-computed TME profiles derived from 3,634 H&E-stained whole-slide images across five cancer types (bladder, breast, colorectal, liver, and lung cancer) from TCGA. All outputs were generated using Atlas H&E-TME, an AI-powered application built on the Atlas family of pathology foundation models, which performs tissue quality control, tissue segmentation, cell detection and classification, and spatial neighborhood analysis, yielding over 4,500 quantitative readouts per slide at cell-level resolution.
What carries the argument
Atlas H&E-TME, the AI application that runs tissue quality control, segmentation, cell classification, and spatial neighborhood measurements to produce the per-slide quantitative profiles.
If this is right
- Researchers gain immediate access to standardized, large-scale TME data without running their own image-analysis pipelines.
- The dataset supports direct comparisons of microenvironment features across five distinct cancer types from the same source archive.
- Integration with existing TCGA genomic and clinical records becomes straightforward for multimodal studies.
- Computational groups can use the profiles to prototype and benchmark new spatial-analysis algorithms.
- Continued expansion will increase the number of slides and cancer types covered over time.
Where Pith is reading between the lines
- The collection could function as a public benchmark for evaluating future AI models on pathology images.
- Aggregating the profiles might surface previously unseen cross-cancer patterns in cell neighborhoods or tissue architecture.
- Downstream models trained on these features could be tested for their ability to predict treatment response or survival from H&E alone.
- The fixed set of 4,500+ readouts per slide reduces variability when different labs attempt to reproduce TME findings.
Load-bearing premise
The AI pipeline produces accurate cell classifications, tissue segmentations, and spatial measurements without systematic errors or biases that would need separate confirmation by pathologists or clinical data.
What would settle it
Direct comparison of the AI cell-type labels and neighborhood statistics against pathologist annotations on a random subset of the same slides would show low agreement rates.
Figures
read the original abstract
The tumor microenvironment (TME) plays a central role in cancer progression, treatment response, and patient outcomes, yet large-scale, consistent, and quantitative TME characterization from routine hematoxylin and eosin (H&E)-stained histopathology remains scarce. We introduce OpenTME, an open-access dataset of pre-computed TME profiles derived from 3,634 H&E-stained whole-slide images across five cancer types (bladder, breast, colorectal, liver, and lung cancer) from The Cancer Genome Atlas (TCGA). All outputs were generated using Atlas H&E-TME, an AI-powered application built on the Atlas family of pathology foundation models, which performs tissue quality control, tissue segmentation, cell detection and classification, and spatial neighborhood analysis, yielding over 4,500 quantitative readouts per slide at cell-level resolution. OpenTME is available for non-commercial academic research on Hugging Face. We will continue to expand OpenTME over time and anticipate it will serve as a resource for biomarker discovery, spatial biology research, and the development of computational methods for TME analysis.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces OpenTME, an open-access dataset of pre-computed TME profiles derived from 3,634 H&E-stained whole-slide images across five TCGA cancer types (bladder, breast, colorectal, liver, and lung). All profiles were generated by the Atlas H&E-TME AI application (built on the Atlas family of pathology foundation models), which performs tissue quality control, segmentation, cell detection/classification, and spatial neighborhood analysis to produce over 4,500 quantitative readouts per slide at cell-level resolution. The dataset is released on Hugging Face for non-commercial academic use, with plans for future expansion.
Significance. If the underlying AI pipeline is shown to be accurate, OpenTME would be a valuable large-scale resource for TME biomarker discovery, spatial biology, and computational pathology method development, as it supplies consistent, high-resolution quantitative features that are otherwise computationally expensive to generate from raw TCGA slides. The open release and multi-cancer scope add to its potential utility.
major comments (2)
- [Abstract] Abstract: The central claim that Atlas H&E-TME produces reliable quantitative TME readouts (tissue segmentation, cell classification, spatial neighborhoods) is unsupported because the manuscript reports no validation metrics whatsoever—no cell-classification F1 scores, no segmentation Dice coefficients versus expert annotations, no spatial feature agreement checks, and no correlation with genomic/clinical endpoints. This directly undermines the utility of every one of the >4,500 readouts per slide.
- [Methods (Atlas H&E-TME pipeline description)] Methods/description of Atlas H&E-TME: No information is given on training data, fine-tuning, or performance of the foundation models on the specific TCGA cohorts, nor any internal or external validation of the TME pipeline outputs. Without these, potential systematic biases in cell typing or neighborhood statistics for the five cancer types cannot be assessed and affect all downstream use of the dataset.
minor comments (2)
- The manuscript would benefit from a brief table summarizing the exact 4,500+ readout categories and their definitions to improve usability for potential users.
- Clarify the exact license and any usage restrictions beyond 'non-commercial academic research' on the Hugging Face page and in the text.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which highlight important considerations for a data descriptor manuscript. OpenTME is released as a resource of pre-computed profiles generated by the Atlas H&E-TME pipeline; this paper does not constitute a primary validation study of the underlying models. We address each point below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that Atlas H&E-TME produces reliable quantitative TME readouts (tissue segmentation, cell classification, spatial neighborhoods) is unsupported because the manuscript reports no validation metrics whatsoever—no cell-classification F1 scores, no segmentation Dice coefficients versus expert annotations, no spatial feature agreement checks, and no correlation with genomic/clinical endpoints. This directly undermines the utility of every one of the >4,500 readouts per slide.
Authors: We agree that the abstract should avoid implying comprehensive validation within this manuscript. OpenTME is a data release paper, and the reliability claims rest on the prior characterization of the Atlas foundation models and the Atlas H&E-TME application in separate publications. In the revised version we will (1) tone down the abstract language to emphasize that the profiles are generated by a published pipeline and (2) add a concise Methods/Discussion paragraph with explicit citations to the relevant validation studies, including reported F1 scores for cell classification, Dice coefficients for segmentation, and any available spatial or clinical correlation results. This will allow readers to locate the supporting evidence without misrepresenting the current work as a validation study. revision: yes
-
Referee: [Methods (Atlas H&E-TME pipeline description)] Methods/description of Atlas H&E-TME: No information is given on training data, fine-tuning, or performance of the foundation models on the specific TCGA cohorts, nor any internal or external validation of the TME pipeline outputs. Without these, potential systematic biases in cell typing or neighborhood statistics for the five cancer types cannot be assessed and affect all downstream use of the dataset.
Authors: We accept this criticism. The Atlas models are general-purpose foundation models trained on large, multi-institutional pathology corpora and were applied to the TCGA slides without cohort-specific fine-tuning, which is an intentional design choice for broad applicability. In the revised manuscript we will expand the Methods section to (a) summarize the training data and reported performance metrics of the foundation models with citations, (b) state that no TCGA-specific fine-tuning or per-cohort validation was performed for this release, and (c) explicitly discuss the possibility of systematic biases and the value of community re-evaluation. We will also add a limitations paragraph addressing how users may assess or mitigate such biases when using the >4,500 readouts. revision: yes
Circularity Check
No derivation chain present; dataset release is self-contained
full rationale
The paper introduces OpenTME as a direct release of pre-computed TME profiles from TCGA slides processed by the Atlas H&E-TME application. No mathematical derivations, equations, parameter fittings, predictions, or uniqueness theorems are claimed or walked through. The contribution consists solely of data generation and public availability on Hugging Face, with no load-bearing steps that reduce to self-citations or inputs by construction. This is a standard data-release paper with no circularity risk.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The Atlas family of pathology foundation models perform accurate tissue quality control, segmentation, cell detection and classification, and spatial neighborhood analysis on H&E slides.
Reference graph
Works this paper leans on
-
[1]
Towards comprehensive cellular characterisation of H&E slides.arXiv preprint arXiv:2508.09926, 2025
Benjamin Adjadj, Pierre-Antoine Bannier, Guillaume Horent, Sebastien Mandela, Aurore Lyon, Kathryn Schutte, Ulysse Marteau, Valentin Gaury, Laura Dumont, Thomas Mathieu, Reda Belbahri, Benoît Schmauch, Eric Durand, Katharina V on Loga, and Lucie Gillet. Towards comprehensive cellular characterisation of H&E slides.arXiv preprint arXiv:2508.09926, 2025
-
[2]
Atlas 2 – foundation models for clinical deployment.arXiv preprint arXiv:2601.05148, 2026
Maximilian Alber, Timo Milbich, Alexandra Carpen-Amarie, Stephan Tietz, Jonas Dippel, Lukas Muttenthaler, Beatriz Perez Cancer, Alessandro Benetti, Panos Korfiatis, Elias Eulig, Jérôme Lüscher, Jiasen Wu, Sayed Abid Hashimi, Gabriel Dernbach, Simon Schallenberg, 2https://huggingface.co/datasets/Aignostics/OpenTME 3https://github.com/aignostics/tme-studio ...
-
[3]
Maximilian Alber, Stephan Tietz, Jonas Dippel, Timo Milbich, Timothée Lesort, Panos Korfiatis, Moritz Krügener, Beatriz Perez Cancer, Neelay Shah, Alexander Möllers, Philipp Seegerer, Alexandra Carpen-Amarie, Kai Standvoss, Gabriel Dernbach, Edwin de Jong, Simon Schal- lenberg, Andreas Kunft, Helmut Hoffer von Ankershoffen, Gavin Schaeferle, Patrick Duffy...
-
[4]
Topol, and Guergana K
Salim Arslan, Disha Mehta, Alexei Gusev, Eric J. Topol, and Guergana K. Savova. A systematic pan-cancer study on deep learning-based prediction of multi-omic biomarkers from routine pathology images.Communications Medicine, 4:48, 2024
2024
-
[5]
Chen, Tong Ding, Ming Y
Richard J. Chen, Tong Ding, Ming Y . Lu, Drew F. K. Williamson, Guillaume Jaume, Andrew H. Song, Bowen Chen, Andrew Zhang, Daniel Shao, Muhammad Shaban, Mane Williams, Lukas Oldenburg, Luca L. Weishaupt, Judy J. Wang, Anurag Vaidya, Long Phi Le, Georg Gerber, Sharifa Sahai, Walt Williams, and Faisal Mahmood. Towards a general-purpose foundation model for ...
2024
-
[6]
Human-interpretable image features derived from densely mapped cancer pathology slides predict diverse molecular phenotypes.Nature communications, 12(1):1613, 2021
James A Diao, Jason K Wang, Wan Fung Chui, Victoria Mountain, Sai Chowdary Gullapally, Ramprakash Srinivasan, Richard N Mitchell, Benjamin Glass, Sara Hoffman, Sudha K Rao, et al. Human-interpretable image features derived from densely mapped cancer pathology slides predict diverse molecular phenotypes.Nature communications, 12(1):1613, 2021
2021
-
[7]
RudolfV: A Foundation Model by Pathologists for Pathologists.arXiv preprint arXiv:2401.04079, 2024
Jonas Dippel, Barbara Feulner, Tobias Winterhoff, Timo Milbich, Stephan Tietz, Simon Schal- lenberg, Gabriel Dernbach, Andreas Kunft, Simon Heinke, Marie-Lisa Eich, Julika Ribbat-Idel, Rosemarie Krupar, Philipp Anders, Niklas Prenißl, Philipp Jurmeister, David Horst, Lukas Ruff, Klaus-Robert Müller, Frederick Klauschen, and Maximilian Alber. RudolfV: A Fo...
-
[8]
PanNuke: An open pan-cancer histology dataset for nuclei instance segmentation and classifica- tion
Jevgenij Gamper, Navid Alemi Koohbanani, Ksenija Benet, Ali Khuram, and Nasir Rajpoot. PanNuke: An open pan-cancer histology dataset for nuclei instance segmentation and classifica- tion. InEuropean Congress on Digital Pathology, volume 11435 ofLecture Notes in Computer Science, pages 11–19. Springer, 2019
2019
- [9]
-
[10]
Hover-net: Simultaneous segmentation and classification of nuclei in multi-tissue histology images.Medical Image Analysis, 58:101563, 2019
Simon Graham, Quoc Dang Vu, Shan E Ahmed Raza, Ayesha Azam, Yee Wah Tsang, Jin Tae Kwak, and Nasir Rajpoot. Hover-net: Simultaneous segmentation and classification of nuclei in multi-tissue histology images.Medical Image Analysis, 58:101563, 2019
2019
-
[11]
CellViT: Vi- sion transformers for precise cell segmentation and classification.Medical Image Analysis, 94:103143, 2024
Fabian Hörst, Moritz Rempe, Lukas Heine, Constantin Seibold, Julius Keyl, Giulia Baldini, Selma Ugurel, Jens Siveke, Barbara Grünwald, Jan Egger, and Jens Kleesiek. CellViT: Vi- sion transformers for precise cell segmentation and classification.Medical Image Analysis, 94:103143, 2024
2024
-
[12]
LoRA: Low-rank adaptation of large language models
Edward J Hu, yelong shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022
2022
-
[13]
Deriva- tion of prognostic contextual histopathological features from whole-slide images of tumours via graph deep learning.Nature Biomedical Engineering, 6:1395–1406, 2022
Yongju Lee, Jun Hyeong Park, Seonwook Oh, Kyoungseob Shin, Jiyu Sun, Mingu Jung, Changho Lee, Hyunjin Kim, Jin-Haeng Chung, Kyung Chul Moon, and Donggeun Yoo. Deriva- tion of prognostic contextual histopathological features from whole-slide images of tumours via graph deep learning.Nature Biomedical Engineering, 6:1395–1406, 2022. 6
2022
-
[14]
Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel HAZIZA, Francisco Massa, Alaaeldin El-Nouby, Mido Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herve Jegou, Julien Mairal, Patrick L...
2024
-
[15]
Budzinska, Tomasz Kucharczyk, Justyna Szumiło, Paweł Krawczyk, Nicola Crosetto, and Ewa Szczurek
Alicja R ˛ aczkowska, Iwona Pa´snik, Michał Kukiełka, Marcin Nico´s, Magdalena A. Budzinska, Tomasz Kucharczyk, Justyna Szumiło, Paweł Krawczyk, Nicola Crosetto, and Ewa Szczurek. Deep learning-based tumor microenvironment segmentation is predictive of tumor mutations and patient survival in non-small-cell lung cancer.BMC Cancer, 22(1):1001, 2022
2022
-
[16]
Shroyer, Tianhao Zhao, Rebecca Batiste, John Van Arnam, The Cancer Genome Atlas Research Network, Ilya Shmulevich, Arvind U
Joel Saltz, Rajarsi Gupta, Le Hou, Tahsin Kurc, Pankaj Singh, Vu Nguyen, Dimitris Samaras, Kenneth R. Shroyer, Tianhao Zhao, Rebecca Batiste, John Van Arnam, The Cancer Genome Atlas Research Network, Ilya Shmulevich, Arvind U. K. Rao, Alexander J. Lazar, Ashish Sharma, and Vésteinn Thorsson. Spatial organization and molecular correlation of tumor- infiltr...
2018
-
[17]
Cell detection with star-convex polygons
Uwe Schmidt, Martin Weigert, Coleman Broaddus, and Gene Myers. Cell detection with star-convex polygons. InInternational conference on medical image computing and computer- assisted intervention, pages 265–273. Springer, 2018
2018
-
[18]
Oriane Siméoni, Huy V . V o, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Michaël Ramamonjisoa, Francisco Massa, Daniel Haziza, Luca Wehrstedt, Jianyuan Wang, Timothée Darcet, Théo Moutakanni, Leonel Sentana, Claire Roberts, Andrea Vedaldi, Jamie Tolan, John Brandt, Camille Couprie, Julie...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[19]
Yang, Junya Fujimoto, Faliu Yan, Ling Cai, Ling Yang, Bo Yao, Shengjie Li, Maria Chikina, Yuval Kluger, Ignacio I
Shidan Wang, Ruichen Rong, Donghan M. Yang, Junya Fujimoto, Faliu Yan, Ling Cai, Ling Yang, Bo Yao, Shengjie Li, Maria Chikina, Yuval Kluger, Ignacio I. Wistuba, John D. Minna, and Guanghua Xiao. Computational staining of pathology images to study the tumor microenvi- ronment in lung cancer.Cancer Research, 80(10):2056–2066, 2020
2056
-
[20]
Eric Zimmermann, Eugene V orontsov, Julian Viret, Adam Casson, Michal Zelechowski, George Shaikovski, Neil Tenenholtz, James Hall, David Klimstra, Razik Yousfi, Thomas Fuchs, Nicolò Fusi, Siqi Liu, and Kristen Severson. Virchow2: Scaling self-supervised mixed magnification models in pathology.arXiv preprint arXiv:2408.00738, 2024. 7 A Atlas H&E-TME: Model...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.