pith. machine review for the scientific record. sign in

arxiv: 2605.08276 · v1 · submitted 2026-05-08 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

Beyond ViT Tokens: Masked-Diffusion Pretrained Convolutional Pathology Foundation Model for Cell-Level Dense Prediction

Authors on Pith no claims yet

Pith reviewed 2026-05-12 00:53 UTC · model grok-4.3

classification 💻 cs.CV
keywords pathology foundation modelmasked diffusionconvolutional backbonecell-level dense predictionself-supervised learningConvNeXtlimited annotationshistological structure
0
0 comments X

The pith

A convolutional masked-diffusion model outperforms ViT-based pathology foundation models on cell-level dense prediction tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a self-supervised pretraining approach for pathology images that relies on a fully convolutional network rather than the patch-tokenization used in vision transformers. By applying masked diffusion directly in pixel space with a ConvNeXt-UNet backbone and adding adaptive normalization from existing foundation models, the method learns representations that retain local histological structures and spatial continuity. Experiments across several dense prediction tasks show that this convolutional foundation model surpasses both ViT-based pathology models and many specialized end-to-end segmentation techniques. The gains are largest when only limited annotations are available, indicating stronger generalization under data scarcity. The work argues that convolutional architectures can serve as viable alternatives to the current ViT-dominated paradigm for fine-grained pathology understanding.

Core claim

CMD uses a fully convolutional ConvNeXt-UNet backbone, performs masked-diffusion pretraining in pixel space, and incorporates frozen pathology foundation model features through adaptive normalization; the resulting model consistently outperforms existing ViT-based pathology foundation models and even surpasses state-of-the-art end-to-end segmentation methods while fine-tuning only a small number of task-specific parameters across multiple pathology dense prediction tasks, with the advantage most pronounced under limited annotation settings.

What carries the argument

The CMD framework: a ConvNeXt-UNet backbone that conducts masked diffusion pretraining directly in pixel space while using adaptive normalization to integrate features from frozen pathology foundation models.

If this is right

  • CMD achieves leading performance on multiple cell-level dense prediction tasks while requiring only minimal task-specific fine-tuning.
  • The performance gap widens under limited annotation regimes, demonstrating improved robustness and generalization.
  • Purely convolutional architectures can function as competitive pathology foundation models within the current ViT-dominated setting.
  • The approach supplies a scalable pretraining recipe that maintains spatial continuity for fine-grained histological understanding.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same pixel-space masked diffusion strategy could be tested on other dense-prediction domains where spatial continuity matters, such as electron microscopy or satellite imagery.
  • Hybrid models that combine CMD-style pretraining with selective ViT components might further improve results on tasks that need both local detail and long-range context.
  • If the convolutional advantage holds, future pathology foundation models may shift away from exclusive reliance on transformer tokenization for segmentation-heavy applications.

Load-bearing premise

That masked-diffusion pretraining performed in pixel space with a convolutional backbone preserves histological structural priors and local morphological details better than the patch tokenization used by vision transformers.

What would settle it

On a held-out pathology dataset with fine cell boundaries and strong domain shift, fine-tune both CMD and a comparable ViT model with the same number of task-specific parameters and measure whether CMD still yields higher segmentation accuracy.

Figures

Figures reproduced from arXiv: 2605.08276 by Benyou Wang, Jiawen Li, Tian Guan, Weiming Chen, Xidong Wang, Xitong Ling, Yonghong He, Zhenyang Cai.

Figure 1
Figure 1. Figure 1: Overview of the proposed CMD framework. (A) Large-scale unlabeled pathology patches [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Qualitative comparison under the frozen-backbone dense prediction setting. CMD-L [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Visualization of cell-level dense representations. For ViT-based pathology foundation [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
read the original abstract

Cell-level dense prediction is central to computational pathology, but remains challenging due to fine-grained histological structures, strong domain shifts, and costly dense annotations. Existing ViT-based pathology foundation models rely on patch tokenization, which can disrupt spatial continuity and weaken local morphological details needed for cell-level prediction. To address this, we propose Masked-Diffusion Convolutional Foundation Models, termed ConvNeXt Masked-Diffusion (CMD), a self-supervised convolutional generative pretraining framework for dense pathology representation learning. CMD uses a fully convolutional ConvNeXt-UNet backbone, performs masked-diffusion pretraining in pixel space, and incorporates frozen pathology foundation model features through adaptive normalization. Experimental results demonstrate that CMD consistently outperforms existing ViT-based pathology foundation models and even surpasses state-of-the-art end-to-end segmentation methods while fine-tuning only a small number of task-specific parameters across multiple pathology dense prediction tasks. The advantage is particularly pronounced under limited annotation settings, where CMD exhibits stronger robustness and generalization ability. Our findings suggest that purely convolutional architectures can also serve as competitive pathology foundation models for cell-level dense prediction, achieving leading performance within the current ViT-dominated paradigm and providing a scalable, high-performance solution that better preserves histological structural priors for fine-grained pathology understanding.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper proposes ConvNeXt Masked-Diffusion (CMD), a self-supervised convolutional generative pretraining framework using a fully convolutional ConvNeXt-UNet backbone. It performs masked-diffusion pretraining directly in pixel space and incorporates frozen pathology foundation model features via adaptive normalization layers. The central claim is that CMD consistently outperforms existing ViT-based pathology foundation models and even surpasses state-of-the-art end-to-end segmentation methods across multiple cell-level dense prediction tasks in pathology, with particular advantages under limited annotations, while suggesting that purely convolutional architectures can serve as competitive foundation models that better preserve histological structural priors.

Significance. If the performance gains can be isolated to the proposed convolutional masked-diffusion pretraining, the work would be significant for computational pathology by providing an alternative to the dominant ViT/patch-tokenization paradigm for fine-grained tasks. It highlights potential benefits of pixel-space generative pretraining and convolutional inductive biases for spatial continuity and low-data robustness, offering a scalable path for dense prediction without heavy reliance on patch-based tokenization.

major comments (1)
  1. [Abstract and Methods] Abstract and Methods: The claim that CMD demonstrates 'purely convolutional architectures can also serve as competitive pathology foundation models' is undermined by the explicit incorporation of frozen pathology foundation model features (almost certainly ViT-derived) through adaptive normalization. Without an ablation that removes the ViT-feature injection, retrains the pure ConvNeXt masked-diffusion backbone, and re-evaluates on the dense prediction tasks, it is impossible to attribute the reported outperformance specifically to the masked-diffusion objective and convolutional backbone rather than to distillation of ViT priors. This directly affects the central attribution of gains and the paper's positioning against ViT-based models.
minor comments (1)
  1. [Abstract] The abstract would benefit from including at least one or two key quantitative metrics (e.g., Dice scores or mIoU improvements on specific datasets) to substantiate the claims of consistent outperformance and robustness under limited annotations.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address the major comment point by point below.

read point-by-point responses
  1. Referee: [Abstract and Methods] Abstract and Methods: The claim that CMD demonstrates 'purely convolutional architectures can also serve as competitive pathology foundation models' is undermined by the explicit incorporation of frozen pathology foundation model features (almost certainly ViT-derived) through adaptive normalization. Without an ablation that removes the ViT-feature injection, retrains the pure ConvNeXt masked-diffusion backbone, and re-evaluates on the dense prediction tasks, it is impossible to attribute the reported outperformance specifically to the masked-diffusion objective and convolutional backbone rather than to distillation of ViT priors. This directly affects the central attribution of gains and the paper's positioning against ViT-based models.

    Authors: We appreciate the referee's observation that the incorporation of frozen pathology foundation model features (typically ViT-derived) via adaptive normalization layers means the model is not entirely isolated from ViT priors. The CMD framework is nevertheless built around a fully convolutional ConvNeXt-UNet backbone whose masked-diffusion pretraining occurs directly in pixel space. This choice is motivated by the need to preserve spatial continuity and fine-grained morphological details that patch tokenization can disrupt. The adaptive normalization layers provide a lightweight mechanism for injecting high-level semantic guidance into the convolutional feature maps without replacing the backbone's core representational and predictive pathway. We agree that the current evidence does not fully isolate the contribution of the convolutional masked-diffusion objective from the injected priors. In the revised manuscript we will therefore add the requested ablation: a version of CMD trained without the ViT-feature injection, followed by re-evaluation on the cell-level dense prediction tasks. This will allow clearer attribution of gains to the proposed pretraining and architecture while refining the paper's positioning. revision: yes

Circularity Check

0 steps flagged

No circularity; purely empirical claims with no derivation chain

full rationale

The manuscript presents a new pretraining method (ConvNeXt-UNet with masked diffusion in pixel space plus adaptive normalization from frozen external features) and supports its claims exclusively via experimental comparisons on pathology dense-prediction benchmarks. No equations, parameter-fitting steps, or mathematical derivations appear in the abstract or described text. The central performance claims therefore cannot reduce to self-definitional inputs, fitted quantities renamed as predictions, or load-bearing self-citations. While the hybrid use of frozen ViT-derived features raises separate questions of attribution, that issue lies outside the circularity criteria (no reduction by construction is exhibited). The paper is self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The work relies on standard assumptions from self-supervised learning and convolutional networks but introduces no new physical entities. Free parameters are typical ML hyperparameters not detailed here.

axioms (2)
  • domain assumption Masked diffusion pretraining in pixel space preserves spatial continuity better than patch-based tokenization for histological structures.
    Invoked in the motivation and method description to justify the convolutional backbone choice.
  • domain assumption Frozen features from existing pathology foundation models can be effectively integrated via adaptive normalization without domain shift issues.
    Used in the pretraining framework description.

pith-pipeline@v0.9.0 · 5548 in / 1320 out tokens · 22016 ms · 2026-05-12T00:53:14.628460+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · 5 internal anchors

  1. [1]

    DINOv2: Learning Robust Visual Features without Supervision

    Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervision.arXiv preprint arXiv:2304.07193, 2023

  2. [2]

    iBOT: Image BERT Pre-Training with Online Tokenizer

    Jinghao Zhou, Chen Wei, Huiyu Wang, Wei Shen, Cihang Xie, Alan Yuille, and Tao Kong. ibot: Image bert pre-training with online tokenizer.arXiv preprint arXiv:2111.07832, 2021

  3. [3]

    Masked autoencoders are scalable vision learners

    Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. Masked autoencoders are scalable vision learners. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009, 2022

  4. [4]

    Data-efficient and weakly supervised computational pathology on whole-slide images.Nature biomedical engineering, 5(6):555–570, 2021

    Ming Y Lu, Drew FK Williamson, Tiffany Y Chen, Richard J Chen, Matteo Barbieri, and Faisal Mahmood. Data-efficient and weakly supervised computational pathology on whole-slide images.Nature biomedical engineering, 5(6):555–570, 2021

  5. [5]

    Agent aggregator with mask denoise mechanism for histopathology whole slide image analysis

    Xitong Ling, Minxi Ouyang, Yizhi Wang, Xinrui Chen, Renao Yan, Hongbo Chu, Junru Cheng, Tian Guan, Sufang Tian, Xiaoping Liu, et al. Agent aggregator with mask denoise mechanism for histopathology whole slide image analysis. InProceedings of the 32nd ACM International Conference on Multimedia, pages 2795–2803, 2024

  6. [6]

    nnmil: A generalizable multiple instance learning framework for computational pathology.arXiv preprint arXiv:2511.14907, 2025

    Xiangde Luo, Jinxi Xiang, Yuanfeng Ji, and Ruijiang Li. nnmil: A generalizable multiple instance learning framework for computational pathology.arXiv preprint arXiv:2511.14907, 2025

  7. [7]

    Towards a general-purpose foundation model for computational pathology.Nature medicine, 30(3):850–862, 2024

    Richard J Chen, Tong Ding, Ming Y Lu, Drew FK Williamson, Guillaume Jaume, Andrew H Song, Bowen Chen, Andrew Zhang, Daniel Shao, Muhammad Shaban, et al. Towards a general-purpose foundation model for computational pathology.Nature medicine, 30(3):850–862, 2024

  8. [8]

    A foundation model for clinical-grade computational pathology and rare cancers detection.Nature medicine, 30(10):2924–2935, 2024

    Eugene V orontsov, Alican Bozkurt, Adam Casson, George Shaikovski, Michal Zelechowski, Kristen Severson, Eric Zimmermann, James Hall, Neil Tenenholtz, Nicolo Fusi, et al. A foundation model for clinical-grade computational pathology and rare cancers detection.Nature medicine, 30(10):2924–2935, 2024

  9. [9]

    Virchow2: Scaling self-supervised mixed magnification models in pathology.arXiv preprint arXiv:2408.00738, 2024

    Eric Zimmermann, Eugene V orontsov, Julian Viret, Adam Casson, Michal Zelechowski, George Shaikovski, Neil Tenenholtz, James Hall, David Klimstra, Razik Yousfi, et al. Virchow2: Scaling self-supervised mixed magnification models in pathology.arXiv preprint arXiv:2408.00738, 2024

  10. [10]

    Pathorchestra: A comprehensive foundation model for computational pathology with over 100 diverse clinical-grade tasks.npj Digital Medicine, 8(1):695, 2025

    Fang Yan, Jianfeng Wu, Jiawen Li, Wei Wang, Yirong Chen, Linda Wei, Jiaxuan Lu, Wen Chen, Zizhao Gao, Jianan Li, et al. Pathorchestra: A comprehensive foundation model for computational pathology with over 100 diverse clinical-grade tasks.npj Digital Medicine, 8(1):695, 2025

  11. [11]

    Scaling self-supervised learning for histopathology with masked image modeling.medRxiv, 2023

    Alexandre Filiot, Ridouane Ghermi, Antoine Olivier, Paul Jacob, Lucas Fidon, Alice Mac Kain, Charlie Saillard, and Jean-Baptiste Schiratti. Scaling self-supervised learning for histopathology with masked image modeling.medRxiv, 2023

  12. [12]

    Phikon-v2, a large and public feature extractor for biomarker prediction.arXiv preprint arXiv:2409.09173, 2024

    Alexandre Filiot, Paul Jacob, Alice Mac Kain, and Charlie Saillard. Phikon-v2, a large and public feature extractor for biomarker prediction.arXiv preprint arXiv:2409.09173, 2024

  13. [13]

    A whole-slide foundation model for digital pathology from real-world data.Nature, 630(8015):181–188, 2024

    Hanwen Xu, Naoto Usuyama, Jaspreet Bagga, Sheng Zhang, Rajesh Rao, Tristan Naumann, Cliff Wong, Zelalem Gero, Javier González, Yu Gu, et al. A whole-slide foundation model for digital pathology from real-world data.Nature, 630(8015):181–188, 2024

  14. [14]

    Hibou: A family of foundational vision transformers for pathology.arXiv preprint arXiv:2406.05074, 2024

    Dmitry Nechaev, Alexey Pchelnikov, and Ekaterina Ivanova. Hibou: A family of foundational vision transformers for pathology.arXiv preprint arXiv:2406.05074, 2024

  15. [15]

    Towards large-scale training of pathology foundation models.arXiv preprint arXiv:2404.15217, 2024

    Nanne Aben, Edwin D de Jong, Ioannis Gatopoulos, Nicolas Känzig, Mikhail Karasikov, Axel Lagré, Roman Moser, Joost van Doorn, Fei Tang, et al. Towards large-scale training of pathology foundation models.arXiv preprint arXiv:2404.15217, 2024

  16. [16]

    Subspecialty-specific foundation model for intelligent gastrointestinal pathology.arXiv preprint arXiv:2505.21928, 2025

    Lianghui Zhu, Xitong Ling, Minxi Ouyang, Xiaoping Liu, Tian Guan, Mingxi Fu, Zhiqiang Cheng, Fanglei Fu, Maomao Zeng, Liming Liu, et al. Subspecialty-specific foundation model for intelligent gastrointestinal pathology.arXiv preprint arXiv:2505.21928, 2025

  17. [17]

    Stainnet: A special staining self-supervised vision transformer for computational pathology

    Jiawen Li, Jiali Hu, Xitong Ling, Yongqiang Lv, Yuxuan Chen, Yizhi Wang, Tian Guan, Yifei Liu, and Yonghong He. Stainnet: A special staining self-supervised vision transformer for computational pathology. arXiv preprint arXiv:2512.10326, 2025. 10

  18. [18]

    A generalizable pathology foundation model using a unified knowledge distillation pretraining framework.Nature Biomedical Engineering, pages 1–20, 2025

    Jiabo Ma, Zhengrui Guo, Fengtao Zhou, Yihui Wang, Yingxue Xu, Jinbang Li, Fang Yan, Yu Cai, Zhengjie Zhu, Cheng Jin, et al. A generalizable pathology foundation model using a unified knowledge distillation pretraining framework.Nature Biomedical Engineering, pages 1–20, 2025

  19. [19]

    Training state-of-the-art pathology foundation models with orders of magnitude less data.arXiv preprint arXiv:2504.05186, 2025

    Mikhail Karasikov, Joost van Doorn, Nicolas Känzig, Melis Erdal Cesur, Hugo Mark Horlings, Robert Berke, Fei Tang, and Sebastian Otálora. Training state-of-the-art pathology foundation models with orders of magnitude less data.arXiv preprint arXiv:2504.05186, 2025

  20. [20]

    Genbio-pathfm: A state-of-the-art foundation model for histopathology.bioRxiv, pages 2026–03, 2026

    Saarthak Kapse, Mehmet Aygün, Elijah Cole, Emma Lundberg, Le Song, and Eric P Xing. Genbio-pathfm: A state-of-the-art foundation model for histopathology.bioRxiv, pages 2026–03, 2026

  21. [21]

    A visual–language foundation model for pathology image analysis using medical twitter.Nature medicine, 29(9):2307–2316, 2023

    Zhi Huang, Federico Bianchi, Mert Yuksekgonul, Thomas J Montine, and James Zou. A visual–language foundation model for pathology image analysis using medical twitter.Nature medicine, 29(9):2307–2316, 2023

  22. [22]

    A visual-language foundation model for computational pathology.Nature medicine, 30(3):863–874, 2024

    Ming Y Lu, Bowen Chen, Drew FK Williamson, Richard J Chen, Ivy Liang, Tong Ding, Guillaume Jaume, Igor Odintsov, Long Phi Le, Georg Gerber, et al. A visual-language foundation model for computational pathology.Nature medicine, 30(3):863–874, 2024

  23. [23]

    A multimodal whole-slide foundation model for pathology.Nature medicine, pages 1–13, 2025

    Tong Ding, Sophia J Wagner, Andrew H Song, Richard J Chen, Ming Y Lu, Andrew Zhang, Anurag J Vaidya, Guillaume Jaume, Muhammad Shaban, Ahrong Kim, et al. A multimodal whole-slide foundation model for pathology.Nature medicine, pages 1–13, 2025

  24. [24]

    A vision–language foundation model for precision oncology.Nature, 638(8051):769–778, 2025

    Jinxi Xiang, Xiyue Wang, Xiaoming Zhang, Yinghua Xi, Feyisope Eweje, Yijiang Chen, Yuchen Li, Colin Bergstrom, Matthew Gopaulchan, Ted Kim, et al. A vision–language foundation model for precision oncology.Nature, 638(8051):769–778, 2025

  25. [25]

    U-net: Convolutional networks for biomedical image segmentation

    Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. InInternational Conference on Medical image computing and computer-assisted intervention, pages 234–241. Springer, 2015

  26. [26]

    A panoptic segmentation dataset and deep-learning approach for explainable scoring of tumor- infiltrating lymphocytes.NPJ Breast Cancer, 10(1):52, 2024

    Shangke Liu, Mohamed Amgad, Deeptej More, Muhammad A Rathore, Roberto Salgado, and Lee AD Cooper. A panoptic segmentation dataset and deep-learning approach for explainable scoring of tumor- infiltrating lymphocytes.NPJ Breast Cancer, 10(1):52, 2024

  27. [27]

    TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation

    Jieneng Chen, Yongyi Lu, Qihang Yu, Xiangde Luo, Ehsan Adeli, Yan Wang, Le Lu, Alan L Yuille, and Yuyin Zhou. Transunet: Transformers make strong encoders for medical image segmentation.arXiv preprint arXiv:2102.04306, 2021

  28. [28]

    Vision transformer adapter for dense predictions,

    Zhe Chen, Yuchen Duan, Wenhai Wang, Junjun He, Tong Lu, Jifeng Dai, and Yu Qiao. Vision transformer adapter for dense predictions.arXiv preprint arXiv:2205.08534, 2022

  29. [29]

    Masked diffusion as self-supervised representation learner.arXiv preprint arXiv:2308.05695, 2023

    Zixuan Pan, Jianxu Chen, and Yiyu Shi. Masked diffusion as self-supervised representation learner.arXiv preprint arXiv:2308.05695, 2023

  30. [30]

    A convnet for the 2020s

    Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, and Saining Xie. A convnet for the 2020s. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11976–11986, 2022

  31. [31]

    Convnext v2: Co-designing and scaling convnets with masked autoencoders

    Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, and Saining Xie. Convnext v2: Co-designing and scaling convnets with masked autoencoders. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16133–16142, 2023

  32. [32]

    Scalable diffusion models with transformers

    William Peebles and Saining Xie. Scalable diffusion models with transformers. InProceedings of the IEEE/CVF international conference on computer vision, pages 4195–4205, 2023

  33. [33]

    Attention U-Net: Learning Where to Look for the Pancreas

    Ozan Oktay, Jo Schlemper, Loic Le Folgoc, Matthew Lee, Mattias Heinrich, Kazunari Misawa, Kensaku Mori, Steven McDonagh, Nils Y Hammerla, Bernhard Kainz, et al. Attention u-net: Learning where to look for the pancreas.arXiv preprint arXiv:1804.03999, 2018

  34. [34]

    Methods for segmentation and classification of digital microscopy tissue images.Frontiers in bioengineering and biotechnology, 7:53, 2019

    Quoc Dang Vu, Simon Graham, Tahsin Kurc, Minh Nguyen Nhat To, Muhammad Shaban, Talha Qaiser, Navid Alemi Koohbanani, Syed Ali Khurram, Jayashree Kalpathy-Cramer, Tianhao Zhao, et al. Methods for segmentation and classification of digital microscopy tissue images.Frontiers in bioengineering and biotechnology, 7:53, 2019

  35. [35]

    Deep learning for digital pathology image analysis: A comprehensive tutorial with selected use cases.Journal of pathology informatics, 7(1):29, 2016

    Andrew Janowczyk and Anant Madabhushi. Deep learning for digital pathology image analysis: A comprehensive tutorial with selected use cases.Journal of pathology informatics, 7(1):29, 2016. 11

  36. [36]

    Dense steerable filter cnns for exploiting rotational symmetry in histology images.IEEE Transactions on Medical Imaging, 39(12):4124–4136, 2020

    Simon Graham, David Epstein, and Nasir Rajpoot. Dense steerable filter cnns for exploiting rotational symmetry in histology images.IEEE Transactions on Medical Imaging, 39(12):4124–4136, 2020

  37. [37]

    Segmentation of nuclei in histopathology images by deep regression of the distance map.IEEE transactions on medical imaging, 38(2):448–459, 2018

    Peter Naylor, Marick Laé, Fabien Reyal, and Thomas Walter. Segmentation of nuclei in histopathology images by deep regression of the distance map.IEEE transactions on medical imaging, 38(2):448–459, 2018

  38. [38]

    Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

  39. [39]

    Auto-Encoding Variational Bayes

    Diederik P Kingma and Max Welling. Auto-encoding variational bayes.arXiv preprint arXiv:1312.6114, 2013

  40. [40]

    Vista-path: An interactive foundation model for pathology image segmentation and quantitative analysis in computational pathology.arXiv preprint arXiv:2601.16451, 2026

    Peixian Liang, Songhao Li, Shunsuke Koga, Yutong Li, Zahra Alipour, Yucheng Tang, Daguang Xu, and Zhi Huang. Vista-path: An interactive foundation model for pathology image segmentation and quantitative analysis in computational pathology.arXiv preprint arXiv:2601.16451, 2026

  41. [41]

    Open-vocabulary object segmentation with diffusion models

    Ziyi Li, Qinye Zhou, Xiaoyun Zhang, Ya Zhang, Yanfeng Wang, and Weidi Xie. Open-vocabulary object segmentation with diffusion models. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 7667–7676, 2023

  42. [42]

    & Ivanova, E

    Dmitry Nechaev, Alexey Pchelnikov, and Ekaterina Ivanova. Histai: an open-source, large-scale whole slide image dataset for computational pathology.arXiv preprint arXiv:2505.12120, 2025. 12 A Theoretical Overview of ConvNeXt Masked-Diffusion Models Masked-diffusion pretraining can be viewed as a self-supervised relaxation of denoising diffusion models for...