arxiv: 2604.09169 · v1 · submitted 2026-04-10 · 💻 cs.CV

Recognition: unknown

UniSemAlign: Text-Prototype Alignment with a Foundation Encoder for Semi-Supervised Histopathology Segmentation

Le-Van Thai , Tien Dat Nguyen , Hoai Nhan Pham , Lan Anh Dinh Thi , Duy-Dong Nguyen , Ngoc Lam Quang Bui

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:10 UTC · model grok-4.3

classification 💻 cs.CV

keywords semi-supervised segmentationhistopathologyprototype alignmenttext alignmentfoundation encodercomputational pathologypseudo-label refinement

0 comments

The pith

By aligning text prototypes and visual features in a shared space, UniSemAlign generates more reliable pseudo-labels for semi-supervised histopathology segmentation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that a dual-modal alignment framework can inject explicit class-level structure into pixel-wise learning to reduce ambiguity when labels are scarce. This matters because histopathology segmentation typically suffers from unreliable pseudo-label supervision on unlabeled images. UniSemAlign uses a pathology-pretrained Transformer encoder with complementary prototype-level and text-level branches whose outputs fuse with visual predictions. The model trains end-to-end on supervised segmentation, cross-view consistency, and cross-modal alignment losses. Experiments on GlaS and CRAG show Dice gains up to 2.6 percent and 8.6 percent respectively at 10 percent labeled data.

Core claim

UniSemAlign introduces complementary prototype-level and text-level alignment branches in a shared embedding space built upon a pathology-pretrained Transformer encoder; the aligned representations are fused with visual predictions to produce more reliable supervision signals for unlabeled images, trained jointly with supervised segmentation, cross-view consistency, and cross-modal alignment objectives.

What carries the argument

Dual-modal semantic alignment with prototype-level and text-level branches operating in a shared embedding space that stabilizes pseudo-label refinement.

If this is right

More reliable pseudo-labels are generated for unlabeled histopathology images through fusion of aligned representations.
Performance improves substantially over recent semi-supervised baselines at 10 percent and 20 percent labeled data on GlaS and CRAG.
End-to-end training with supervised, consistency, and alignment objectives stabilizes refinement across limited supervision regimes.
The shared embedding space provides structured guidance that directly addresses class ambiguity in pixel-wise predictions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method may extend to other medical imaging tasks where class boundaries are ambiguous and foundation encoders are available.
Reducing reliance on pixel-level annotations could accelerate adoption in clinical pathology workflows.
The alignment mechanism might support adaptation to new tissue types with minimal additional labeling.

Load-bearing premise

The prototype and text alignment branches will consistently reduce class ambiguity and improve pseudo-label quality without introducing new errors on unlabeled histopathology images.

What would settle it

Running the model on GlaS or CRAG at 10 percent labeled data without the alignment branches and observing no Dice improvement or a drop compared to the full UniSemAlign version.

Figures

Figures reproduced from arXiv: 2604.09169 by Duy-Dong Nguyen, Hoai Nhan Pham, Lan Anh Dinh Thi, Le-Van Thai, Ngoc Lam Quang Bui, Tien Dat Nguyen.

**Figure 1.** Figure 1: Overview of UniSemAlign. An input image is encoded by UNI ViT-B/16 and decoded by DeepLabV3+ to produce visual logits. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: Qualitative results for different semi-supervised methods under the 10% labeling setting on GlaS-2017 and CRAG-2019. Pink [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Qualitative comparison of the dual semantic alignment [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

read the original abstract

Semi-supervised semantic segmentation in computational pathology remains challenging due to scarce pixel-level annotations and unreliable pseudo-label supervision. We propose UniSemAlign, a dual-modal semantic alignment framework that enhances visual segmentation by injecting explicit class-level structure into pixel-wise learning. Built upon a pathology-pretrained Transformer encoder, UniSemAlign introduces complementary prototype-level and text-level alignment branches in a shared embedding space, providing structured guidance that reduces class ambiguity and stabilizes pseudo-label refinement. The aligned representations are fused with visual predictions to generate more reliable supervision for unlabeled histopathology images. The framework is trained end-to-end with supervised segmentation, cross-view consistency, and cross-modal alignment objectives. Extensive experiments on the GlaS and CRAG datasets demonstrate that UniSemAlign substantially outperforms recent semi-supervised baselines under limited supervision, achieving Dice improvements of up to 2.6% on GlaS and 8.6% on CRAG with only 10% labeled data, and strong improvements at 20% supervision. Code is available at: https://github.com/thailevann/UniSemAlign

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

UniSemAlign adds text and prototype alignment branches to a pathology-pretrained encoder for semi-supervised segmentation and reports Dice gains on GlaS and CRAG, but the experiments do not directly verify that the alignments improve pseudo-label quality.

read the letter

The paper's core idea is to run prototype-level and text-level alignment in a shared embedding space on top of a pathology foundation encoder, then fuse those signals with visual predictions to refine pseudo-labels for unlabeled histopathology images. Training uses the usual supervised segmentation loss plus cross-view consistency and cross-modal alignment terms. They test on GlaS and CRAG at 10% and 20% labeled data and show gains over recent baselines, with the biggest lift on CRAG. Code is released, which helps anyone who wants to reproduce or adapt it.

Referee Report

2 major / 2 minor

Summary. The paper proposes UniSemAlign, a semi-supervised segmentation framework for histopathology images that builds on a pathology-pretrained Transformer encoder. It adds complementary prototype-level and text-level alignment branches operating in a shared embedding space; the aligned representations are fused with visual predictions to produce higher-quality pseudo-labels for unlabeled data. Training combines supervised segmentation loss, cross-view consistency, and cross-modal alignment objectives. On the GlaS and CRAG datasets the method reports Dice gains of up to 2.6 % and 8.6 % respectively at 10 % labeled data, with further gains at 20 % supervision.

Significance. If the performance claims are substantiated, the work offers a practical way to inject class-level semantic structure into pixel-wise semi-supervised learning via text and prototype alignment, which is relevant for computational pathology where glandular morphology varies and pixel annotations are expensive. The use of an external pathology-pretrained encoder plus publicly released code are positive elements that aid reproducibility and potential adoption.

major comments (2)

[Experiments] Experiments section: the reported Dice improvements (2.6 % on GlaS, 8.6 % on CRAG at 10 % labels) are presented without accompanying details on data splits, statistical significance testing, ablation studies isolating the prototype-level versus text-level branches, or the procedure used to select pseudo-label thresholds. These omissions prevent a reader from determining whether the gains are robust or attributable to the proposed dual-alignment mechanism.
[Method] Method / Experiments: no direct quantitative evaluation of pseudo-label accuracy or cross-modal alignment fidelity on the unlabeled set is provided (e.g., no per-class pseudo-label precision/recall or alignment-error metrics). Because the central claim rests on the assertion that the shared-embedding alignments reduce class ambiguity and stabilize pseudo-label refinement, the absence of such isolating measurements leaves the mechanistic contribution unverified.

minor comments (2)

The abstract states improvements “up to” specific percentages; reporting the exact per-setting Dice values and standard deviations in the main results table would improve clarity.
[Method] Notation for the fused prediction used to generate pseudo-labels should be defined explicitly (e.g., an equation showing how prototype and text embeddings are combined with the visual head output).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We have carefully reviewed the major comments and provide point-by-point responses below. Where the comments identify gaps in experimental detail and mechanistic verification, we have revised the manuscript accordingly to strengthen the presentation and substantiate our claims.

read point-by-point responses

Referee: Experiments section: the reported Dice improvements (2.6 % on GlaS, 8.6 % on CRAG at 10 % labels) are presented without accompanying details on data splits, statistical significance testing, ablation studies isolating the prototype-level versus text-level branches, or the procedure used to select pseudo-label thresholds. These omissions prevent a reader from determining whether the gains are robust or attributable to the proposed dual-alignment mechanism.

Authors: We agree that these details are necessary for readers to evaluate robustness and attribute the gains to the dual-alignment mechanism. In the revised manuscript, we have expanded the Experiments section with: (i) explicit description of the data splits, including random sampling of the 10% and 20% labeled subsets using fixed seeds for reproducibility across runs; (ii) results averaged over three independent runs, reported as mean ± standard deviation, together with paired t-test p-values against baselines to establish statistical significance; (iii) dedicated ablation tables that isolate the prototype-level branch, the text-level branch, and their combination; and (iv) clarification that the pseudo-label threshold is fixed at 0.7 after selection on a small labeled validation split. These additions directly address the concern and confirm that the reported improvements stem from the proposed components. revision: yes
Referee: Method / Experiments: no direct quantitative evaluation of pseudo-label accuracy or cross-modal alignment fidelity on the unlabeled set is provided (e.g., no per-class pseudo-label precision/recall or alignment-error metrics). Because the central claim rests on the assertion that the shared-embedding alignments reduce class ambiguity and stabilize pseudo-label refinement, the absence of such isolating measurements leaves the mechanistic contribution unverified.

Authors: We acknowledge that direct, isolating measurements would provide stronger verification of the claimed mechanism. In the revised manuscript we have added a new subsection under Experiments that reports quantitative pseudo-label evaluation on the unlabeled data. Using a small held-out fully annotated subset (excluded from all training), we compute per-class precision and recall for pseudo-labels generated with and without the alignment branches. We also introduce alignment-fidelity metrics (mean cosine similarity between visual features and the corresponding aligned text/prototype embeddings, computed only on pixels whose pseudo-label matches the held-out ground truth). These results show measurable improvements in pseudo-label quality attributable to the shared-embedding alignments, thereby substantiating the central claim. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical framework with external components

full rationale

The paper describes a semi-supervised segmentation framework using a pathology-pretrained Transformer encoder plus prototype-level and text-level alignment branches trained with standard supervised, consistency, and cross-modal losses. No equations or derivations are shown that reduce the reported Dice gains or pseudo-label improvements to quantities fitted from the same data by construction, nor do any self-citations form a load-bearing chain that tautologically defines the central claims. Performance is validated empirically on GlaS and CRAG datasets under limited supervision, making the results falsifiable against external benchmarks rather than self-referential.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; the approach relies on a pathology-pretrained Transformer (external) and standard supervised plus alignment losses whose details are not supplied.

pith-pipeline@v0.9.0 · 5511 in / 1105 out tokens · 34234 ms · 2026-05-10T17:10:12.457729+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

60 extracted references · 9 canonical work pages · 1 internal anchor

[1]

Multi-scale domain-adversarial multiple-instance cnn for cancer subtype classification with unannotated histopathological images

Noriaki Hashimoto, Daisuke Fukushima, Ryoichi Koga, Yusuke Takagi, Kaho Ko, Kei Kohno, Masato Nakaguro, Shigeo Nakamura, Hidekata Hontani, and Ichiro Takeuchi. Multi-scale domain-adversarial multiple-instance cnn for cancer subtype classification with unannotated histopathological images. InProceedings of the IEEE/CVF conference on computer vision and pat...

2020
[2]

Pixelseg: Pixel-by-pixel stochastic semantic segmentation for ambiguous medical images

Wei Zhang, Xiaohong Zhang, Sheng Huang, Yuting Lu, and Kun Wang. Pixelseg: Pixel-by-pixel stochastic semantic segmentation for ambiguous medical images. InProceedings of the 30th ACM International Conference on Multimedia, pages 4742–4750, 2022. 1

2022
[3]

Accurate diagnostic tissue segmentation and concurrent disease subtyping with small datasets.Journal of Pathology Informatics, 14:100174, 2023

Steven J Frank. Accurate diagnostic tissue segmentation and concurrent disease subtyping with small datasets.Journal of Pathology Informatics, 14:100174, 2023. 1

2023
[4]

Scribblevc: Scribble-supervised medical image segmentation with vision-class embedding

Zihan Li, Yuan Zheng, Xiangde Luo, Dandan Shan, and Qingqi Hong. Scribblevc: Scribble-supervised medical image segmentation with vision-class embedding. In Proceedings of the 31st ACM International Conference on Multimedia, pages 3384–3393, 2023. 1

2023
[5]

Fully convolutional networks for semantic segmentation

Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3431–3440, 2015. 1, 2

2015
[6]

U-net: Convolutional networks for biomedical image segmentation

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. InInternational Conference on Medical image computing and computer-assisted intervention, pages 234–241. Springer, 2015. 2

2015
[7]

Encoder-decoder with atrous separable convolution for semantic image segmentation,

Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. Encoder-decoder with atrous separable convolution for semantic image segmentation,
[8]

TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation

Jieneng Chen, Yongyi Lu, Qihang Yu, Xiangde Luo, Ehsan Adeli, Yan Wang, Le Lu, Alan L Yuille, and Yuyin Zhou. Transunet: Transformers make strong encoders for medical image segmentation.arXiv preprint arXiv:2102.04306,

work page internal anchor Pith review arXiv
[9]

Swin-unet: Unet-like pure transformer for medical image segmentation

Hu Cao, Yueyue Wang, Joy Chen, Dongsheng Jiang, Xiaopeng Zhang, Qi Tian, and Manning Wang. Swin-unet: Unet-like pure transformer for medical image segmentation. InEuropean conference on computer vision, pages 205–218. Springer, 2022. 1

2022
[10]

Annotation-efficient deep learning for automatic medical image segmentation.Nature communications, 12(1):5915, 2021

Shanshan Wang, Cheng Li, Rongpin Wang, Zaiyi Liu, Meiyun Wang, Hongna Tan, Yaping Wu, Xinfeng Liu, Hui Sun, Rui Yang, et al. Annotation-efficient deep learning for automatic medical image segmentation.Nature communications, 12(1):5915, 2021. 1

2021
[11]

Co-training with high- confidence pseudo labels for semi-supervised medical image segmentation.arXiv preprint arXiv:2301.04465, 2023

Zhiqiang Shen, Peng Cao, Hua Yang, Xiaoli Liu, Jinzhu Yang, and Osmar R Zaiane. Co-training with high- confidence pseudo labels for semi-supervised medical image segmentation.arXiv preprint arXiv:2301.04465, 2023

work page arXiv 2023
[12]

Dslsm: Dual-kernel-induced statistic level set model for image segmentation.Expert Systems with Applications, 242: 122772, 2024

Fan Zhang, Huiying Liu, Xiaojun Duan, Binglu Wang, Qing Cai, Huafeng Li, Junyu Dong, and David Zhang. Dslsm: Dual-kernel-induced statistic level set model for image segmentation.Expert Systems with Applications, 242: 122772, 2024. 1

2024
[13]

Si- mil: Taming deep mil for self-interpretability in gigapixel histopathology

Saarthak Kapse, Pushpak Pati, Srijan Das, Jingwei Zhang, Chao Chen, Maria Vakalopoulou, Joel Saltz, Dimitris Samaras, Rajarsi R Gupta, and Prateek Prasanna. Si- mil: Taming deep mil for self-interpretability in gigapixel histopathology. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11226– 11237, 2024. 1

2024
[14]

Boundary-aware uncertainty suppression for semi-supervised medical image segmentation.IEEE Transactions on Artificial Intelligence, 5(8):4074–4086, 2024

Congcong Li, Jinshuo Zhang, Dongmei Niu, Xiuyang Zhao, Bo Yang, and Caiming Zhang. Boundary-aware uncertainty suppression for semi-supervised medical image segmentation.IEEE Transactions on Artificial Intelligence, 5(8):4074–4086, 2024

2024
[15]

Learning heterogeneous tissues with mixture of experts for gigapixel whole slide images

Junxian Wu, Minheng Chen, Xinyi Ke, Tianwang Xun, Xiaoming Jiang, Hongyu Zhou, Lizhi Shao, and Youyong Kong. Learning heterogeneous tissues with mixture of experts for gigapixel whole slide images. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 5144–5153, 2025. 1

2025
[16]

Deep semi-supervised learning for medical image segmentation: A review.Expert Systems with Applications, 245:123052, 2024

Kai Han, Victor S Sheng, Yuqing Song, Yi Liu, Chengjian Qiu, Siqi Ma, and Zhe Liu. Deep semi-supervised learning for medical image segmentation: A review.Expert Systems with Applications, 245:123052, 2024. 1, 2

2024
[17]

Urca: Uncertainty-based region clipping algorithm for semi- supervised medical image segmentation.Computer Methods and Programs in Biomedicine, 254:108278, 2024

Chendong Qin, Yongxiong Wang, and Jiapeng Zhang. Urca: Uncertainty-based region clipping algorithm for semi- supervised medical image segmentation.Computer Methods and Programs in Biomedicine, 254:108278, 2024

2024
[18]

Segmenting visuals with querying words: Language anchors for semi-supervised image segmentation

Numair Nadeem, Saeed Anwar, Muhammad Hamza Asad, and Abdul Bais. Segmenting visuals with querying words: Language anchors for semi-supervised image segmentation. arXiv preprint arXiv:2506.13925, 2025. 1

work page arXiv 2025
[19]

Fixmatch: Simplifying semi-supervised learning with consistency and confidence

Kihyuk Sohn, David Berthelot, Nicholas Carlini, Zizhao Zhang, Han Zhang, Colin A Raffel, Ekin Dogus Cubuk, Alexey Kurakin, and Chun-Liang Li. Fixmatch: Simplifying semi-supervised learning with consistency and confidence. Advances in neural information processing systems, 33:596– 608, 2020. 1, 2, 6

2020
[20]

Unimatch: Revisiting weak-to-strong consistency in semi-supervised semantic segmentation

Lihe Yang, Lei Qi, Litong Feng, Wayne Zhang, and Yinghuan Shi. Unimatch: Revisiting weak-to-strong consistency in semi-supervised semantic segmentation. In CVPR, pages 7236–7246, 2023. 1, 2

2023
[21]

Bidirectional copy-paste for semi-supervised medical image segmentation

Yunhao Bai, Duowen Chen, Qingli Li, Wei Shen, and Yan Wang. Bidirectional copy-paste for semi-supervised medical image segmentation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11514–11524, 2023. 1

2023
[22]

Clims: Cross language image matching for weakly supervised semantic segmentation

Jinheng Xie, Xianxu Hou, Kai Ye, and Linlin Shen. Clims: Cross language image matching for weakly supervised semantic segmentation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4483–4492, 2022. 1

2022
[23]

Trustmatch: mitigating pseudo-label bias in semi-supervised learning with trust- aware refinement

Hongyang He and Yundi Hong. Trustmatch: mitigating pseudo-label bias in semi-supervised learning with trust- aware refinement. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 594– 603, 2025. 1

2025
[24]

Pseudo-label refinement using superpixels for semi- supervised brain tumour segmentation

Bethany H Thompson, Gaetano Di Caterina, and Jeremy P V oisey. Pseudo-label refinement using superpixels for semi- supervised brain tumour segmentation. In2022 IEEE 19th International Symposium on Biomedical Imaging (ISBI), pages 1–5. IEEE, 2022. 1

2022
[25]

Ssa-net: Spatial self-attention network for covid-19 pneumonia infection segmentation with semi- supervised few-shot learning.Medical image analysis, 79: 102459, 2022

Xiaoyan Wang, Yiwen Yuan, Dongyan Guo, Xiaojie Huang, Ying Cui, Ming Xia, Zhenhua Wang, Cong Bai, and Shengyong Chen. Ssa-net: Spatial self-attention network for covid-19 pneumonia infection segmentation with semi- supervised few-shot learning.Medical image analysis, 79: 102459, 2022. 1

2022
[26]

Semi-moe: Mixture-of-experts meets semi- supervised histopathology segmentation.arXiv preprint arXiv:2509.13834, 2025

Nguyen Lan Vi Vu, Thanh-Huy Nguyen, Thien Nguyen, Daisuke Kihara, Tianyang Wang, Xingjian Li, and Min Xu. Semi-moe: Mixture-of-experts meets semi- supervised histopathology segmentation.arXiv preprint arXiv:2509.13834, 2025. 1

work page arXiv 2025
[27]

Multi-granularity cross-modal alignment for generalized medical visual representation learning.Advances in neural information processing systems, 35:33536–33549, 2022

Fuying Wang, Yuyin Zhou, Shujun Wang, Varut Vardhanabhuti, and Lequan Yu. Multi-granularity cross-modal alignment for generalized medical visual representation learning.Advances in neural information processing systems, 35:33536–33549, 2022. 1

2022
[28]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021

2021
[29]

Medclip: Contrastive learning from unpaired medical images and text

Zifeng Wang, Zhenbang Wu, Dinesh Agarwal, and Jimeng Sun. Medclip: Contrastive learning from unpaired medical images and text. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 3876–3887, 2022

2022
[30]

A simple framework for text- supervised semantic segmentation

Muyang Yi, Quan Cui, Hao Wu, Cheng Yang, Osamu Yoshie, and Hongtao Lu. A simple framework for text- supervised semantic segmentation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7071–7080, 2023. 1

2023
[31]

A visual–language foundation model for pathology image analysis using medical twitter

Zhi Huang, Federico Bianchi, Mert Yuksekgonul, Thomas J Montine, and James Zou. A visual–language foundation model for pathology image analysis using medical twitter. Nature medicine, 29(9):2307–2316, 2023. 1

2023
[32]

A visual- language foundation model for computational pathology

Ming Y Lu, Bowen Chen, Drew FK Williamson, Richard J Chen, Ivy Liang, Tong Ding, Guillaume Jaume, Igor Odintsov, Long Phi Le, Georg Gerber, et al. A visual- language foundation model for computational pathology. Nature medicine, 30(3):863–874, 2024. 1, 2, 3

2024
[33]

Towards a general-purpose foundation model for computational pathology.Nature medicine, 30(3):850–862,

Richard J Chen, Tong Ding, Ming Y Lu, Drew FK Williamson, Guillaume Jaume, Andrew H Song, Bowen Chen, Andrew Zhang, Daniel Shao, Muhammad Shaban, et al. Towards a general-purpose foundation model for computational pathology.Nature medicine, 30(3):850–862,
[34]

Knowledge-enhanced visual- language pretraining for computational pathology

Xiao Zhou, Xiaoman Zhang, Chaoyi Wu, Ya Zhang, Weidi Xie, and Yanfeng Wang. Knowledge-enhanced visual- language pretraining for computational pathology. In European Conference on Computer Vision, pages 345–362. Springer, 2024. 1

2024
[35]

Pathvlm-eval: Evaluation of open vision language models in histopathology.Journal of Pathology Informatics, 18:100455, 2025

Nauman Ullah Gilal, Rachida Zegour, Khaled Al-Thelaya, Erdener Özer, Marco Agus, Jens Schneider, and Sabri Boughorbel. Pathvlm-eval: Evaluation of open vision language models in histopathology.Journal of Pathology Informatics, 18:100455, 2025. 1

2025
[36]

Generalization of vision pre-trained models for histopathology.Scientific reports, 13(1):6065, 2023

Milad Sikaroudi, Maryam Hosseini, Ricardo Gonzalez, Shahryar Rahnamayan, and HR Tizhoosh. Generalization of vision pre-trained models for histopathology.Scientific reports, 13(1):6065, 2023. 1

2023
[37]

Multimodal prototype alignment for semi-supervised pathology image segmentation.arXiv preprint arXiv:2508.19574, 2025

Mingxi Fu, Fanglei Fu, Xitong Ling, Huaitian Yuan, Tian Guan, Yonghong He, and Lianghui Zhu. Multimodal prototype alignment for semi-supervised pathology image segmentation.arXiv preprint arXiv:2508.19574, 2025. 1, 2

work page arXiv 2025
[38]

Semi-supervised segmentation of histopathology images with noise-aware topological consistency

Meilong Xu, Xiaoling Hu, Saumya Gupta, Shahira Abousamra, and Chao Chen. Semi-supervised segmentation of histopathology images with noise-aware topological consistency. InEuropean Conference on Computer Vision, pages 271–289. Springer, 2024. 1

2024
[39]

Dusss: Dual semantic similarity-supervised vision-language model for semi-supervised medical image segmentation.arXiv preprint arXiv:2412.12492, 2024

Qingtao Pan, Wenhao Qiao, Jingjiao Lou, Bing Ji, and Shuo Li. Dusss: Dual semantic similarity-supervised vision-language model for semi-supervised medical image segmentation.arXiv preprint arXiv:2412.12492, 2024. 1, 2, 6

work page arXiv 2024
[40]

Learning to prompt for vision-language models

Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu. Learning to prompt for vision-language models. International journal of computer vision, 130(9):2337–2348,
[41]

Corrmatch: Label propagation via correlation matching for semi-supervised semantic segmentation, 2023

Boyuan Sun, Yuqi Yang, Le Zhang, Ming-Ming Cheng, and Qibin Hou. Corrmatch: Label propagation via correlation matching for semi-supervised semantic segmentation, 2023. 2, 5, 6

2023
[42]

Semi-supervised learning literature survey

Xiaojin Jerry Zhu. Semi-supervised learning literature survey. 2005. 2

2005
[43]

Semi-supervised semantic segmentation with cross pseudo supervision

Xiaokang Chen, Yuhui Yuan, Gang Zeng, and Jingdong Wang. Semi-supervised semantic segmentation with cross pseudo supervision. InCVPR, pages 2613–2622, 2021. 2

2021
[44]

Yassine Ouali, Céline Hudelot, and Myriam Tami. Semi- supervised semantic segmentation with cross-consistency training.Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12674–12684, 2020. 2, 6

2020
[45]

Semi-supervised semantic segmentation needs strong, varied perturbations

Geoffrey French, Samuli Laine, Timo Aila, Michal Mackiewicz, and Graham Finlayson. Semi-supervised semantic segmentation needs strong, varied perturbations. British Machine Vision Conference (BMVC), 2020. 2

2020
[46]

Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results.Advances in neural information processing systems, 30, 2017

Antti Tarvainen and Harri Valpola. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results.Advances in neural information processing systems, 30, 2017. 2

2017
[47]

Semi-supervised semantic segmentation using unreliable pseudo-labels.Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4248–4257, 2022

Yuchao Wang, Haochen Wang, Yujun Shen, Jingjing Fei, Wei Li, Guoqiang Jin, Liwei Wu, Rui Zhao, and Xinyi Le. Semi-supervised semantic segmentation using unreliable pseudo-labels.Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4248–4257, 2022. 2

2022
[48]

Learning disentangled stain and structural representations for semi-supervised histopathology segmentation.arXiv preprint arXiv:2507.03923, 2025

Ha-Hieu Pham, Nguyen Lan Vi Vu, Thanh-Huy Nguyen, Ulas Bagci, Min Xu, Trung-Nghia Le, and Huy-Hieu Pham. Learning disentangled stain and structural representations for semi-supervised histopathology segmentation.arXiv preprint arXiv:2507.03923, 2025. 2, 6

work page arXiv 2025
[49]

Uncertainty-aware self-ensembling model for semi-supervised 3d left atrium segmentation

Lequan Yu, Shujun Wang, Xiaomeng Li, Chi-Wing Fu, and Pheng-Ann Heng. Uncertainty-aware self-ensembling model for semi-supervised 3d left atrium segmentation. In International conference on medical image computing and computer-assisted intervention, pages 605–613. Springer,
[50]

Tizhoosh

David Maleki and H.R. Tizhoosh. Lile: Look in- depth before looking elsewhere–a dual attention network using transformers for cross-modal information retrieval in histopathology archives. InInternational Conference on Medical Imaging with Deep Learning (MIDL), pages 879–
[51]

Quilt-1m: One million image-text pairs for histopathology, 2023

Wisdom Oluchi Ikezogwo, Mehmet Saygin Seyfioglu, Fatemeh Ghezloo, Dylan Stefan Chan Geva, Fatwir Sheikh Mohammed, Pavan Kumar Anand, Ranjay Krishna, and Linda Shapiro. Quilt-1m: One million image-text pairs for histopathology, 2023. 2

2023
[52]

Clinical-grade computational pathology using weakly supervised deep learning on whole slide images.Nature medicine, 25(8):1301–1309, 2019

Gabriele Campanella, Matthew G Hanna, Luke Geneslaw, Allen Miraflor, Vitor Werneck Krauss Silva, Klaus J Busam, Edi Brogi, Victor E Reuter, David S Klimstra, and Thomas J Fuchs. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images.Nature medicine, 25(8):1301–1309, 2019. 2

2019
[53]

Visual language pretrained multiple instance zero-shot transfer for histopathology images

Ming Y Lu, Bowen Chen, Andrew Zhang, Drew FK Williamson, Richard J Chen, Tong Ding, Long Phi Le, Yung-Sung Chuang, and Faisal Mahmood. Visual language pretrained multiple instance zero-shot transfer for histopathology images. InCVPR, pages 19764–19775, 2023. 2

2023
[54]

Text-driven multiplanar visual interaction for semi-supervised medical image segmentation,

Kaiwen Huang, Yi Zhou, Huazhu Fu, Yizhe Zhang, Chen Gong, and Tao Zhou. Text-driven multiplanar visual interaction for semi-supervised medical image segmentation,
[55]

Graham, H

S. Graham, H. Chen, J. Gamper, Q. Dou, P. A. Heng, D. Snead, Y . W. Tsang, and N. Rajpoot. Mild-net: Minimal information loss dilated network for gland instance segmentation in colon histology images.Medical Image Analysis, 52:199–211, 2019. 5

2019
[56]

Sirinukunwattana, J

K. Sirinukunwattana, J. P. W. Pluim, H. Chen, X. Qi, P. A. Heng, Y . B. Guo, L. Y . Wang, B. J. Matuszewski, E. Bruni, U. Sanchez, et al. Gland segmentation in colon histology images: The GlaS challenge contest.Medical Image Analysis, 35:489–502, 2017. 5

2017
[57]

Semi-supervised medical image segmentation via cross teaching between cnn and transformer.arXiv preprint arXiv:2112.04894, 2021

Xiangde Luo, Minhao Hu, Tao Song, Guotai Wang, and Shaoting Zhang. Semi-supervised medical image segmentation via cross teaching between cnn and transformer.arXiv preprint arXiv:2112.04894, 2021. 6

work page arXiv 2021
[58]

Xnet v2: Fewer limitations, better results and greater universality.arXiv preprint arXiv:2409.00947,

Yanfeng Zhou, Lingrui Li, Zichen Wang, Guole Liu, Ziwen Liu, and Ge Yang. Xnet v2: Fewer limitations, better results and greater universality.arXiv preprint arXiv:2409.00947,

work page arXiv
[59]

Cutmix: Regularization strategy to train strong classifiers with localizable features, 2019

Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, and Youngjoon Yoo. Cutmix: Regularization strategy to train strong classifiers with localizable features, 2019. 6

2019
[60]

Deep residual learning for image recognition, 2015

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition, 2015. 7

2015