pith. sign in

arxiv: 2605.20362 · v1 · pith:LJHY5RJBnew · submitted 2026-05-19 · 💻 cs.CV

HAPS: Rethinking Image Similarity for Virtual Staining

Pith reviewed 2026-05-21 07:24 UTC · model grok-4.3

classification 💻 cs.CV
keywords virtual staininghistopathologyimage similarityperceptual metricsH&E to IHCdata filteringfeature space distances
0
0 comments X

The pith

HAPS scores histology image pairs with a pretrained encoder and linear head to match expert judgments and improve virtual staining by filtering poor training data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that generic metrics such as SSIM, PSNR, and LPIPS fail to capture morphology and biomarker patterns in histological images. It systematically compares a range of full-reference metrics on expert-annotated H&E-IHC patch pairs and measures their response to realistic registration distortions. From these evaluations the authors derive HAPS, which extracts features from a frozen histopathology-pretrained encoder and passes the differences through a learned linear head to produce scores aligned with expert similarity ratings. They then apply HAPS to rank and discard low-scoring pairs from the MIST training set, showing that virtual staining models trained on the filtered subset outperform models trained on the complete original data.

Core claim

HAPS computes distances in the feature space of a frozen encoder pretrained on histopathology data, adding a linear head to aggregate feature-level differences into a final score that aligns with expert assessments. Filtering training pairs in the MIST dataset by these HAPS scores produces a cleaner set on which virtual staining models achieve higher performance than models trained on the unfiltered dataset.

What carries the argument

The Histology-Aware Perceptual Similarity (HAPS) metric, which measures differences in the feature space of a frozen histopathology-pretrained encoder and aggregates them with a linear head to produce an expert-aligned score.

If this is right

  • Virtual staining models trained after HAPS-based filtering outperform models trained on the original unfiltered dataset.
  • HAPS scores remain more consistent with expert judgments than generic metrics when patches undergo shifts, rotations, or non-rigid deformations.
  • The same encoder-plus-linear-head construction can be reused to quantify similarity across other histological stain pairs.
  • Data-cleaning pipelines for virtual staining can now incorporate an automated, domain-specific quality filter instead of manual review.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • HAPS-style filtering could be tested on additional virtual-staining datasets to measure how much performance gain is retained across different scanners and tissue types.
  • The linear head might be replaced by a small MLP or attention module to capture higher-order interactions among feature channels without retraining the encoder.
  • Because the encoder is frozen, HAPS can be computed on new modalities once a modest set of expert ratings is collected to fit the head.

Load-bearing premise

Expert similarity scores on the collected H&E-IHC patch pairs constitute reliable ground truth that the linear head can generalize to new images and registration conditions.

What would settle it

Virtual staining models trained on HAPS-filtered pairs show no improvement in expert visual ratings or standard metrics compared with models trained on the original unfiltered MIST dataset.

read the original abstract

Virtual staining of histopathology images (e.g., H&E-IHC) is an emerging tool in digital pathology, enabling faster and cheaper workflows by synthesizing target stains from routinely acquired slides. Yet, the quality of virtual staining models is still predominantly assessed with generic metrics such as SSIM, PSNR, and LPIPS. Originally developed for natural images, these metrics are inherently misaligned with the domain-specific characteristics of histological data, failing to capture tissue morphology preservation and biomarker expression patterns. Consequently, a robust, domain-specific standard for quantifying similarity across diverse histological modalities remains a critical gap in the field. In this work, we formalize histology image similarity as a standalone problem and systematically evaluate a broad set of full-reference metrics against a dataset of H&E-IHC patch pairs annotated with expert similarity scores. We further analyze metrics sensitivity to controlled geometric distortions (shifts, rotations and non-rigid deformations) that mimic realistic registration errors between serial sections. Guided by these observations, we propose the Histology-Aware Perceptual Similarity (HAPS) metric. HAPS computes distances in the feature space of a frozen encoder pretrained on histopathology data, adding a linear head to aggregate feature-level differences into a final score that aligns with expert assessments. Finally, we demonstrate the practical value of HAPS for quality control of training data. By quantifying the similarity of training pairs in the MIST dataset and filtering low-scoring samples, we create a cleaner training set. Virtual staining models trained on this refined data outperform those trained on the original, unfiltered dataset.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper formalizes histology image similarity as a distinct problem, evaluates a range of full-reference metrics against expert similarity scores on H&E-IHC patch pairs, analyzes metric sensitivity to synthetic geometric distortions mimicking registration errors, proposes HAPS (distances in a frozen histopathology-pretrained encoder feature space aggregated by a learned linear head), and demonstrates that filtering low-HAPS-score pairs from the MIST dataset yields virtual staining models with improved performance over the unfiltered baseline.

Significance. If the empirical results hold, the work supplies a domain-adapted perceptual metric that better captures tissue morphology and biomarker patterns than generic measures such as SSIM or LPIPS. The data-filtering experiment provides a concrete, practical use case for training-set curation in virtual staining, which could improve model robustness in digital pathology. The reliance on a frozen pretrained encoder and explicit alignment to expert judgments are notable strengths.

major comments (1)
  1. [Section 5] Section 5 (data-filtering experiment): HAPS scores are produced by a linear head fitted to expert annotations on one collection of H&E-IHC pairs and then applied to filter pairs from the MIST dataset; the manuscript provides no explicit analysis of distribution shift (tissue composition, staining intensity, registration statistics, or biomarker patterns) between the annotation set and MIST, leaving open whether the observed performance gains are driven by reliable similarity ranking or by incidental removal of outliers.
minor comments (2)
  1. [Abstract] The abstract states that quantitative tables and sensitivity results exist but does not report any numerical values (e.g., correlation coefficients with expert scores or exact performance deltas after filtering); the main text should ensure these appear early and with statistical details.
  2. [Section 4] Training details for the linear head (loss, regularization, number of expert pairs, cross-validation procedure) are referenced but not fully specified in the provided description; these should be stated explicitly to allow reproduction.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and detailed summary of our work. We address the single major comment below and commit to revisions that directly respond to the concern about distribution shift in the data-filtering experiment.

read point-by-point responses
  1. Referee: [Section 5] Section 5 (data-filtering experiment): HAPS scores are produced by a linear head fitted to expert annotations on one collection of H&E-IHC pairs and then applied to filter pairs from the MIST dataset; the manuscript provides no explicit analysis of distribution shift (tissue composition, staining intensity, registration statistics, or biomarker patterns) between the annotation set and MIST, leaving open whether the observed performance gains are driven by reliable similarity ranking or by incidental removal of outliers.

    Authors: We acknowledge this is a valid concern. The expert-annotated collection and MIST both consist of H&E-IHC patch pairs from histopathology slides, and the frozen encoder is pretrained on a broad histopathology corpus, which we expect confers robustness. Nevertheless, we agree that an explicit comparison would strengthen the claim that gains arise from reliable similarity ranking rather than incidental outlier removal. In the revised manuscript we will add a dedicated paragraph (and supporting figure) that quantifies and compares tissue-type distributions, mean staining intensity histograms, and estimated registration error statistics between the annotation set and MIST. We will also report an ablation that removes the same number of samples uniformly at random and shows that random removal yields smaller gains than HAPS-based filtering. These additions will clarify the role of the learned similarity ranking. revision: yes

Circularity Check

0 steps flagged

No circularity: HAPS uses independent pretrained encoder and externally annotated expert scores

full rationale

The derivation begins with a frozen encoder pretrained on histopathology data (independent of the current annotations or MIST filtering) and fits a linear head explicitly to expert similarity scores collected on a separate set of H&E-IHC patch pairs. This produces a metric that is then applied to score and filter pairs from the distinct MIST dataset. No equation or step reduces by construction to its own inputs; the expert annotations function as external ground truth rather than being regenerated from the filtering outcome, and the paper reports no self-citations or prior-author uniqueness theorems that bear the central claim. The construction is therefore self-contained against the provided external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central construction rests on one fitted component (linear head) and one domain assumption (pretrained encoder features are informative); no new physical entities are postulated.

free parameters (1)
  • linear head weights
    Weights of the linear aggregation head are determined by fitting to expert similarity scores on the annotated H&E-IHC dataset.
axioms (1)
  • domain assumption Features extracted by a frozen encoder pretrained on histopathology data capture tissue morphology and biomarker patterns relevant to expert similarity judgments.
    This premise is invoked when the paper selects the frozen encoder as the feature extractor for HAPS.

pith-pipeline@v0.9.0 · 5848 in / 1472 out tokens · 59251 ms · 2026-05-21T07:24:09.766722+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages

  1. [1]

    In: Eighteenth International Conference on Machine Vision (ICMV 2025), vol

    Asif, D., Illarionova, S., Hamoudi, R., Bernstein, A.V., Sharaev, M.: Optimized 14 explainable deep learning model for lung cancer diagnosis from computed tomog- raphy images. In: Eighteenth International Conference on Machine Vision (ICMV 2025), vol. 14114, pp. 767–775 (2026). SPIE

  2. [2]

    Information Sciences686, 121358 (2025)

    Illarionova, S., Hamoudi, R., Zapevalina, M., Fedin, I., Alsahanova, N., Bernstein, A., Burnaev, E., Alferova, V., Khrameeva, E., Shadrin, D.,et al.: A hierar- chical algorithm with randomized learning for robust tissue segmentation and classification in digital pathology. Information Sciences686, 121358 (2025)

  3. [3]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

    Liu, S., Zhu, C., Xu, F., Jia, X., Shi, Z., Jin, M.: Bci: Breast cancer immuno- histochemical image generation through pyramid pix2pix. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1815–1824 (2022)

  4. [4]

    Bhagat, A., Jain, M., Subramanyam, A.V.: Conditional Consistency Guided Image Translation and Enhancement (2025)

  5. [5]

    In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp

    Zhang, W., Hui, T.H., Tse, P.Y., Hill, F., Lau, C., Li, X.: High-resolution medical image translation via patch alignment-based bidirectional contrastive learning. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 178–188 (2024). Springer

  6. [6]

    In: MICCAI (2024)

    Peng, Q., Lin, W., Hu, Y., Bao, A., Lian, C., Wei, W., Yue, M., Liu, J., Yu, L., Wang, L.: Advancing h&e-to-ihc virtual staining with task-specific domain knowledge for her2 scoring. In: MICCAI (2024). https://doi.org/10.1007/ 978-3-031-72083-3 1

  7. [7]

    Li, F., Hu, Z., Chen, W., Kak, A.: Adaptive Supervised PatchNCE Loss for Learning H&E-to-IHC Stain Translation (2023)

  8. [8]

    Li, Y., Guan, X., Wang, Y., Zhang, Y.: Exploiting supervision information in weakly paired images for ihc virtual staining (2024) https://doi.org/10.1007/ 978-3-031-72083-3 11

  9. [9]

    In: BIBM (2024)

    Qiu, F., Zhang, Y., Guo, X., Wang, Z.: Weakly supervised virtual immunohis- tochemistry staining via schr¨ odinger bridge method. In: BIBM (2024). https: //doi.org/10.1109/BIBM62325.2024.10822509

  10. [10]

    Kataria, T., Knudsen, B., Elhabian, S.Y.: StainDiffuser: MultiTask Dual Diffusion Model for Virtual Staining (2025)

  11. [11]

    IEEE Transactions on Medical Imaging40(8) (2021) https://doi.org/10.1109/ TMI.2021.3069874

    Liu, S., Zhang, B., Liu, Y., Han, A., Shi, H., Guan, T., He, Y.: Unpaired stain transfer using pathology-consistent constrained generative adversarial networks. IEEE Transactions on Medical Imaging40(8) (2021) https://doi.org/10.1109/ TMI.2021.3069874

  12. [12]

    Aresta, T

    Zhang, R., Cao, Y., Li, Y., Liu, Z., Wang, J., He, J., Zhang, C., Sui, X., Zhang, P., 15 Cui, L., Li, S.: Mvfstain: Multiple virtual functional stain histopathology images generation. Medical Image Analysis80(2022) https://doi.org/10.1016/j.media. 2022.102520

  13. [13]

    Dubey, S., Kataria, T., Knudsen, B., Elhabian, S.Y.: Structural Cycle GAN for Virtual Immunohistochemistry Staining of Gland Markers (2023)

  14. [14]

    In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp

    Chen, F., Zhang, R., Zheng, B., Sun, Y., He, J., Qin, W.: Pathological semantics- preserving learning for h&e-to-ihc virtual staining. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 384–394 (2024). Springer

  15. [15]

    arXiv preprint arXiv:1901.04059 (2019)

    Xu, Z., Huang, X., Moro, C.F., Boz´ oky, B., Zhang, Q.: Gan-based virtual re- staining: a promising solution for whole slide image analysis. arXiv preprint arXiv:1901.04059 (2019)

  16. [16]

    Nature communications12(1), 4884 (2021)

    De Haan, K., Zhang, Y., Zuckerman, J.E., Liu, T., Sisk, A.E., Diaz, M.F., Jen, K.-Y., Nobori, A., Liou, S., Zhang, S.,et al.: Deep learning-based transformation of h&e stained tissues into special stains. Nature communications12(1), 4884 (2021)

  17. [17]

    Nature Machine Intelligence6(2024) https://doi.org/10.1038/ s42256-024-00889-5

    Pati, P., Karkampouna, S., Bonollo, F., Comp´ erat, E., Radi´ c, M., Spahn, M., Martinelli, A., Wartenberg, M., Kruithof-de Julio, M., Rapsomaniki, M.: Accel- erating histopathology workflows with generative ai-based virtually multiplexed tumour profiling. Nature Machine Intelligence6(2024) https://doi.org/10.1038/ s42256-024-00889-5

  18. [18]

    Journal of Imaging Informatics in Medicine38(6), 3444–3469 (2025)

    Breger, A., Biguri, A., Landman, M.S., Selby, I., Amberg, N., Brunner, E., Gr¨ ohl, J., Hatamikia, S., Karner, C., Ning, L.,et al.: A study of why we need to reassess full reference image quality assessment with medical images. Journal of Imaging Informatics in Medicine38(6), 3444–3469 (2025)

  19. [19]

    arXiv preprint arXiv:2507.12624 (2025)

    Wang, Q., Tweel, J.E., Reza, P.H., Layton, A.: Pathology-guided virtual staining metric for evaluation and training. arXiv preprint arXiv:2507.12624 (2025)

  20. [20]

    Medical image analysis83, 102645 (2023)

    Wang, X., Du, Y., Yang, S., Zhang, J., Wang, M., Zhang, J., Yang, W., Huang, J., Han, X.: Retccl: Clustering-guided contrastive learning for whole-slide image retrieval. Medical image analysis83, 102645 (2023)

  21. [21]

    In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp

    Li, F., Hu, Z., Chen, W., Kak, A.: Adaptive supervised patchnce loss for learn- ing h&e-to-ihc stain translation with inconsistent groundtruth image pairs. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 632–641 (2023). Springer

  22. [22]

    Nature methods19(12), 1634–1641 (2022) 16

    Pachitariu, M., Stringer, C.: Cellpose 2.0: how to train your own model. Nature methods19(12), 1634–1641 (2022) 16

  23. [23]

    Nature methods22(3), 592–599 (2025)

    Stringer, C., Pachitariu, M.: Cellpose3: one-click image restoration for improved cellular segmentation. Nature methods22(3), 592–599 (2025)

  24. [24]

    Medical image analysis81, 102559 (2022)

    Wang, X., Yang, S., Zhang, J., Wang, M., Zhang, J., Yang, W., Huang, J., Han, X.: Transformer-based unsupervised contrastive learning for histopathological image classification. Medical image analysis81, 102559 (2022)

  25. [25]

    In: 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA), pp

    Pavlov, S., Artemov, A., Sharaev, M., Bernstein, A., Burnaev, E.: Weakly super- vised fine tuning approach for brain tumor segmentation problem. In: 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA), pp. 1600–1605 (2019). IEEE

  26. [26]

    In: Twelfth International Conference on Machine Vision (ICMV 2019), vol

    Bernstein, A., Burnaev, E., Sharaev, M., Kondrateva, E., Kachan, O.: Topological data analysis in computer vision. In: Twelfth International Conference on Machine Vision (ICMV 2019), vol. 11433, pp. 673–679 (2020). SPIE 17