HAPS: Rethinking Image Similarity for Virtual Staining
Pith reviewed 2026-05-21 07:24 UTC · model grok-4.3
The pith
HAPS scores histology image pairs with a pretrained encoder and linear head to match expert judgments and improve virtual staining by filtering poor training data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
HAPS computes distances in the feature space of a frozen encoder pretrained on histopathology data, adding a linear head to aggregate feature-level differences into a final score that aligns with expert assessments. Filtering training pairs in the MIST dataset by these HAPS scores produces a cleaner set on which virtual staining models achieve higher performance than models trained on the unfiltered dataset.
What carries the argument
The Histology-Aware Perceptual Similarity (HAPS) metric, which measures differences in the feature space of a frozen histopathology-pretrained encoder and aggregates them with a linear head to produce an expert-aligned score.
If this is right
- Virtual staining models trained after HAPS-based filtering outperform models trained on the original unfiltered dataset.
- HAPS scores remain more consistent with expert judgments than generic metrics when patches undergo shifts, rotations, or non-rigid deformations.
- The same encoder-plus-linear-head construction can be reused to quantify similarity across other histological stain pairs.
- Data-cleaning pipelines for virtual staining can now incorporate an automated, domain-specific quality filter instead of manual review.
Where Pith is reading between the lines
- HAPS-style filtering could be tested on additional virtual-staining datasets to measure how much performance gain is retained across different scanners and tissue types.
- The linear head might be replaced by a small MLP or attention module to capture higher-order interactions among feature channels without retraining the encoder.
- Because the encoder is frozen, HAPS can be computed on new modalities once a modest set of expert ratings is collected to fit the head.
Load-bearing premise
Expert similarity scores on the collected H&E-IHC patch pairs constitute reliable ground truth that the linear head can generalize to new images and registration conditions.
What would settle it
Virtual staining models trained on HAPS-filtered pairs show no improvement in expert visual ratings or standard metrics compared with models trained on the original unfiltered MIST dataset.
read the original abstract
Virtual staining of histopathology images (e.g., H&E-IHC) is an emerging tool in digital pathology, enabling faster and cheaper workflows by synthesizing target stains from routinely acquired slides. Yet, the quality of virtual staining models is still predominantly assessed with generic metrics such as SSIM, PSNR, and LPIPS. Originally developed for natural images, these metrics are inherently misaligned with the domain-specific characteristics of histological data, failing to capture tissue morphology preservation and biomarker expression patterns. Consequently, a robust, domain-specific standard for quantifying similarity across diverse histological modalities remains a critical gap in the field. In this work, we formalize histology image similarity as a standalone problem and systematically evaluate a broad set of full-reference metrics against a dataset of H&E-IHC patch pairs annotated with expert similarity scores. We further analyze metrics sensitivity to controlled geometric distortions (shifts, rotations and non-rigid deformations) that mimic realistic registration errors between serial sections. Guided by these observations, we propose the Histology-Aware Perceptual Similarity (HAPS) metric. HAPS computes distances in the feature space of a frozen encoder pretrained on histopathology data, adding a linear head to aggregate feature-level differences into a final score that aligns with expert assessments. Finally, we demonstrate the practical value of HAPS for quality control of training data. By quantifying the similarity of training pairs in the MIST dataset and filtering low-scoring samples, we create a cleaner training set. Virtual staining models trained on this refined data outperform those trained on the original, unfiltered dataset.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper formalizes histology image similarity as a distinct problem, evaluates a range of full-reference metrics against expert similarity scores on H&E-IHC patch pairs, analyzes metric sensitivity to synthetic geometric distortions mimicking registration errors, proposes HAPS (distances in a frozen histopathology-pretrained encoder feature space aggregated by a learned linear head), and demonstrates that filtering low-HAPS-score pairs from the MIST dataset yields virtual staining models with improved performance over the unfiltered baseline.
Significance. If the empirical results hold, the work supplies a domain-adapted perceptual metric that better captures tissue morphology and biomarker patterns than generic measures such as SSIM or LPIPS. The data-filtering experiment provides a concrete, practical use case for training-set curation in virtual staining, which could improve model robustness in digital pathology. The reliance on a frozen pretrained encoder and explicit alignment to expert judgments are notable strengths.
major comments (1)
- [Section 5] Section 5 (data-filtering experiment): HAPS scores are produced by a linear head fitted to expert annotations on one collection of H&E-IHC pairs and then applied to filter pairs from the MIST dataset; the manuscript provides no explicit analysis of distribution shift (tissue composition, staining intensity, registration statistics, or biomarker patterns) between the annotation set and MIST, leaving open whether the observed performance gains are driven by reliable similarity ranking or by incidental removal of outliers.
minor comments (2)
- [Abstract] The abstract states that quantitative tables and sensitivity results exist but does not report any numerical values (e.g., correlation coefficients with expert scores or exact performance deltas after filtering); the main text should ensure these appear early and with statistical details.
- [Section 4] Training details for the linear head (loss, regularization, number of expert pairs, cross-validation procedure) are referenced but not fully specified in the provided description; these should be stated explicitly to allow reproduction.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and detailed summary of our work. We address the single major comment below and commit to revisions that directly respond to the concern about distribution shift in the data-filtering experiment.
read point-by-point responses
-
Referee: [Section 5] Section 5 (data-filtering experiment): HAPS scores are produced by a linear head fitted to expert annotations on one collection of H&E-IHC pairs and then applied to filter pairs from the MIST dataset; the manuscript provides no explicit analysis of distribution shift (tissue composition, staining intensity, registration statistics, or biomarker patterns) between the annotation set and MIST, leaving open whether the observed performance gains are driven by reliable similarity ranking or by incidental removal of outliers.
Authors: We acknowledge this is a valid concern. The expert-annotated collection and MIST both consist of H&E-IHC patch pairs from histopathology slides, and the frozen encoder is pretrained on a broad histopathology corpus, which we expect confers robustness. Nevertheless, we agree that an explicit comparison would strengthen the claim that gains arise from reliable similarity ranking rather than incidental outlier removal. In the revised manuscript we will add a dedicated paragraph (and supporting figure) that quantifies and compares tissue-type distributions, mean staining intensity histograms, and estimated registration error statistics between the annotation set and MIST. We will also report an ablation that removes the same number of samples uniformly at random and shows that random removal yields smaller gains than HAPS-based filtering. These additions will clarify the role of the learned similarity ranking. revision: yes
Circularity Check
No circularity: HAPS uses independent pretrained encoder and externally annotated expert scores
full rationale
The derivation begins with a frozen encoder pretrained on histopathology data (independent of the current annotations or MIST filtering) and fits a linear head explicitly to expert similarity scores collected on a separate set of H&E-IHC patch pairs. This produces a metric that is then applied to score and filter pairs from the distinct MIST dataset. No equation or step reduces by construction to its own inputs; the expert annotations function as external ground truth rather than being regenerated from the filtering outcome, and the paper reports no self-citations or prior-author uniqueness theorems that bear the central claim. The construction is therefore self-contained against the provided external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- linear head weights
axioms (1)
- domain assumption Features extracted by a frozen encoder pretrained on histopathology data capture tissue morphology and biomarker patterns relevant to expert similarity judgments.
Reference graph
Works this paper leans on
-
[1]
In: Eighteenth International Conference on Machine Vision (ICMV 2025), vol
Asif, D., Illarionova, S., Hamoudi, R., Bernstein, A.V., Sharaev, M.: Optimized 14 explainable deep learning model for lung cancer diagnosis from computed tomog- raphy images. In: Eighteenth International Conference on Machine Vision (ICMV 2025), vol. 14114, pp. 767–775 (2026). SPIE
work page 2025
-
[2]
Information Sciences686, 121358 (2025)
Illarionova, S., Hamoudi, R., Zapevalina, M., Fedin, I., Alsahanova, N., Bernstein, A., Burnaev, E., Alferova, V., Khrameeva, E., Shadrin, D.,et al.: A hierar- chical algorithm with randomized learning for robust tissue segmentation and classification in digital pathology. Information Sciences686, 121358 (2025)
work page 2025
-
[3]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp
Liu, S., Zhu, C., Xu, F., Jia, X., Shi, Z., Jin, M.: Bci: Breast cancer immuno- histochemical image generation through pyramid pix2pix. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1815–1824 (2022)
work page 2022
-
[4]
Bhagat, A., Jain, M., Subramanyam, A.V.: Conditional Consistency Guided Image Translation and Enhancement (2025)
work page 2025
-
[5]
In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp
Zhang, W., Hui, T.H., Tse, P.Y., Hill, F., Lau, C., Li, X.: High-resolution medical image translation via patch alignment-based bidirectional contrastive learning. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 178–188 (2024). Springer
work page 2024
-
[6]
Peng, Q., Lin, W., Hu, Y., Bao, A., Lian, C., Wei, W., Yue, M., Liu, J., Yu, L., Wang, L.: Advancing h&e-to-ihc virtual staining with task-specific domain knowledge for her2 scoring. In: MICCAI (2024). https://doi.org/10.1007/ 978-3-031-72083-3 1
work page 2024
-
[7]
Li, F., Hu, Z., Chen, W., Kak, A.: Adaptive Supervised PatchNCE Loss for Learning H&E-to-IHC Stain Translation (2023)
work page 2023
-
[8]
Li, Y., Guan, X., Wang, Y., Zhang, Y.: Exploiting supervision information in weakly paired images for ihc virtual staining (2024) https://doi.org/10.1007/ 978-3-031-72083-3 11
work page 2024
-
[9]
Qiu, F., Zhang, Y., Guo, X., Wang, Z.: Weakly supervised virtual immunohis- tochemistry staining via schr¨ odinger bridge method. In: BIBM (2024). https: //doi.org/10.1109/BIBM62325.2024.10822509
-
[10]
Kataria, T., Knudsen, B., Elhabian, S.Y.: StainDiffuser: MultiTask Dual Diffusion Model for Virtual Staining (2025)
work page 2025
-
[11]
IEEE Transactions on Medical Imaging40(8) (2021) https://doi.org/10.1109/ TMI.2021.3069874
Liu, S., Zhang, B., Liu, Y., Han, A., Shi, H., Guan, T., He, Y.: Unpaired stain transfer using pathology-consistent constrained generative adversarial networks. IEEE Transactions on Medical Imaging40(8) (2021) https://doi.org/10.1109/ TMI.2021.3069874
-
[12]
Zhang, R., Cao, Y., Li, Y., Liu, Z., Wang, J., He, J., Zhang, C., Sui, X., Zhang, P., 15 Cui, L., Li, S.: Mvfstain: Multiple virtual functional stain histopathology images generation. Medical Image Analysis80(2022) https://doi.org/10.1016/j.media. 2022.102520
-
[13]
Dubey, S., Kataria, T., Knudsen, B., Elhabian, S.Y.: Structural Cycle GAN for Virtual Immunohistochemistry Staining of Gland Markers (2023)
work page 2023
-
[14]
In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp
Chen, F., Zhang, R., Zheng, B., Sun, Y., He, J., Qin, W.: Pathological semantics- preserving learning for h&e-to-ihc virtual staining. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 384–394 (2024). Springer
work page 2024
-
[15]
arXiv preprint arXiv:1901.04059 (2019)
Xu, Z., Huang, X., Moro, C.F., Boz´ oky, B., Zhang, Q.: Gan-based virtual re- staining: a promising solution for whole slide image analysis. arXiv preprint arXiv:1901.04059 (2019)
-
[16]
Nature communications12(1), 4884 (2021)
De Haan, K., Zhang, Y., Zuckerman, J.E., Liu, T., Sisk, A.E., Diaz, M.F., Jen, K.-Y., Nobori, A., Liou, S., Zhang, S.,et al.: Deep learning-based transformation of h&e stained tissues into special stains. Nature communications12(1), 4884 (2021)
work page 2021
-
[17]
Nature Machine Intelligence6(2024) https://doi.org/10.1038/ s42256-024-00889-5
Pati, P., Karkampouna, S., Bonollo, F., Comp´ erat, E., Radi´ c, M., Spahn, M., Martinelli, A., Wartenberg, M., Kruithof-de Julio, M., Rapsomaniki, M.: Accel- erating histopathology workflows with generative ai-based virtually multiplexed tumour profiling. Nature Machine Intelligence6(2024) https://doi.org/10.1038/ s42256-024-00889-5
work page 2024
-
[18]
Journal of Imaging Informatics in Medicine38(6), 3444–3469 (2025)
Breger, A., Biguri, A., Landman, M.S., Selby, I., Amberg, N., Brunner, E., Gr¨ ohl, J., Hatamikia, S., Karner, C., Ning, L.,et al.: A study of why we need to reassess full reference image quality assessment with medical images. Journal of Imaging Informatics in Medicine38(6), 3444–3469 (2025)
work page 2025
-
[19]
arXiv preprint arXiv:2507.12624 (2025)
Wang, Q., Tweel, J.E., Reza, P.H., Layton, A.: Pathology-guided virtual staining metric for evaluation and training. arXiv preprint arXiv:2507.12624 (2025)
-
[20]
Medical image analysis83, 102645 (2023)
Wang, X., Du, Y., Yang, S., Zhang, J., Wang, M., Zhang, J., Yang, W., Huang, J., Han, X.: Retccl: Clustering-guided contrastive learning for whole-slide image retrieval. Medical image analysis83, 102645 (2023)
work page 2023
-
[21]
In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp
Li, F., Hu, Z., Chen, W., Kak, A.: Adaptive supervised patchnce loss for learn- ing h&e-to-ihc stain translation with inconsistent groundtruth image pairs. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 632–641 (2023). Springer
work page 2023
-
[22]
Nature methods19(12), 1634–1641 (2022) 16
Pachitariu, M., Stringer, C.: Cellpose 2.0: how to train your own model. Nature methods19(12), 1634–1641 (2022) 16
work page 2022
-
[23]
Nature methods22(3), 592–599 (2025)
Stringer, C., Pachitariu, M.: Cellpose3: one-click image restoration for improved cellular segmentation. Nature methods22(3), 592–599 (2025)
work page 2025
-
[24]
Medical image analysis81, 102559 (2022)
Wang, X., Yang, S., Zhang, J., Wang, M., Zhang, J., Yang, W., Huang, J., Han, X.: Transformer-based unsupervised contrastive learning for histopathological image classification. Medical image analysis81, 102559 (2022)
work page 2022
-
[25]
In: 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA), pp
Pavlov, S., Artemov, A., Sharaev, M., Bernstein, A., Burnaev, E.: Weakly super- vised fine tuning approach for brain tumor segmentation problem. In: 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA), pp. 1600–1605 (2019). IEEE
work page 2019
-
[26]
In: Twelfth International Conference on Machine Vision (ICMV 2019), vol
Bernstein, A., Burnaev, E., Sharaev, M., Kondrateva, E., Kachan, O.: Topological data analysis in computer vision. In: Twelfth International Conference on Machine Vision (ICMV 2019), vol. 11433, pp. 673–679 (2020). SPIE 17
work page 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.