pith. sign in

arxiv: 2606.04301 · v1 · pith:JEYFKOGKnew · submitted 2026-06-03 · 💻 cs.CV

XSSR: Cross-Domain Self-Supervised Representative Selection for Efficient Annotation in Medical Image Segmentation

Pith reviewed 2026-06-28 07:26 UTC · model grok-4.3

classification 💻 cs.CV
keywords cross-domain selectionself-supervised representative selectionmedical image segmentationmasked autoencoderefficient annotationactive learningdomain adaptation
0
0 comments X

The pith

XSSR trains a Masked Autoencoder on source data to embed target samples and select a tiny representative subset that yields near full-data segmentation accuracy with 5% labels.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that an embedding space learned from unlabeled source medical images via Masked Autoencoder training can be used to score and greedily select a small number of target-domain samples for labeling. A sympathetic reader would care because medical image annotation is expensive and domain shifts between hospitals or scanners make full labeling impractical. XSSR combines density, novelty, and diversity scores with an automatic calibration of their trade-off parameter, then trains a U-Net only on the chosen subset. Experiments across Chest X-ray, retinal fundus, and multi-site Prostate MRI datasets under a fixed 5% budget show the approach reaches 99.3% of full-data performance with 22 samples on Chest X-ray while beating random and CoreSet baselines.

Core claim

XSSR trains a Masked Autoencoder solely on unlabeled source data to learn an embedding space, then uses a greedy selection algorithm on unlabeled target samples scored by a composite of density, novelty, and diversity criteria whose trade-off parameter is auto-calibrated; the selected subset is used to train a U-Net that reaches 99.3% of full-data performance with 5% annotations on Chest X-ray while outperforming random and CoreSet selection on Prostate MRI and RIGA+.

What carries the argument

The greedy selection algorithm operating in the MAE embedding space, using composite density-novelty-diversity scoring with auto-calibrated alpha to choose representative target samples for annotation.

If this is right

  • Only 22 labeled samples suffice for 99.3% of full-data Dice on Chest X-ray.
  • Up to 2.5 Dice point gain over random selection on Prostate MRI.
  • Consistent 0.4-1.2 Dice improvement over CoreSet across datasets.
  • Diversity component most influences the selection quality.
  • Performance improves with greater scanner similarity between source and target.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method could extend to other dense prediction tasks like detection if the embedding captures relevant features.
  • It suggests that self-supervised pretraining on one domain can proxy for similarity in another without target labels.
  • Further work might test if the approach scales to larger domain gaps like CT to MRI.
  • The auto-calibration removes a hyperparameter, potentially making deployment easier in practice.

Load-bearing premise

The embedding space learned by the Masked Autoencoder from source data alone provides a reliable way to measure similarity and select representative samples from the target domain despite differences in imaging equipment or populations.

What would settle it

An experiment showing that samples selected by XSSR yield segmentation accuracy no higher than randomly chosen samples of the same size on a held-out cross-domain medical dataset would falsify the central claim.

Figures

Figures reproduced from arXiv: 2606.04301 by Aleksei Anisimov, Byunghyun Ko, Jeongkyu Lee, Kobe Ke, Suhas Bharthepude.

Figure 1
Figure 1. Figure 1: Pipeline of the proposed XSSR approach. choose samples that are truly different from the source data. These limitations motivate the proposed XSSR framework, which combines self-supervised feature learning [12,13] with selection based on new and varied samples to address both dataset differences and annotation cost in medical image segmentation. 3 Methods We introduce XSSR (Cross-Domain Self-Supervised Rep… view at source ↗
Figure 2
Figure 2. Figure 2: Chest X-ray lung segmentation. Top row: ground truth (blue). Middle row: XSSR (22 samples, 5%). Bottom row: baseline (100%). Predictions use TP/FP/FN color coding: green = correct, red = over-segmentation, yellow = under-segmentation. Samples sorted by XSSR Dice from worst (left) to best (right). 5.2 Robustness Across Domain Shifts To evaluate how XSSR handles varying degrees of domain shift, [PITH_FULL_I… view at source ↗
Figure 3
Figure 3. Figure 3: Fundus optic disc segmentation. Rows and color coding as in [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Prostate MRI segmentation. Rows and color coding as in [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
read the original abstract

Acquiring labeled medical image data is resource-intensive and a challenge further exacerbated in cross-domain scenarios where source and target datasets differ in imaging equipment, population, or clinical site. This study introduces XSSR (Cross-Domain Self-Supervised Representative Selection), a framework designed to minimize annotation effort in the target domain while maintaining robust segmentation performance. XSSR comprises three stages: first, a Masked Autoencoder (MAE) is trained on unlabeled source data to establish a shared embedding space without requiring target labels; second, a greedy selection algorithm scores unlabeled target samples based on a composite density, novelty, and diversity criterion; and third, a U-Net segmentation model is trained exclusively on the selected subset. The novelty-diversity trade-off parameter, alpha, is automatically calibrated by minimizing embedding-space coverage, eliminating manual tuning. We evaluate XSSR on three public benchmarks: Chest X-ray, RIGA+ retinal fundus imaging, and multi-site Prostate MRI, each under a fixed 5% annotation budget. XSSR achieves 99.3% of full-data performance on Chest X-ray using only 22 labeled samples, surpasses random selection by up to 2.5 Dice points on Prostate MRI, and consistently outperforms the CoreSet baseline by 0.4 to 1.2 Dice points across all datasets. Ablation studies indicate that diversity is the most influential scoring component, and per-site analysis shows that performance correlates with scanner similarity to the source domain.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript introduces XSSR, a three-stage framework for minimizing target-domain annotation in cross-domain medical image segmentation. An MAE is pretrained on unlabeled source data to produce embeddings; a greedy selector then scores unlabeled target samples using a composite density-novelty-diversity criterion whose trade-off parameter alpha is set automatically by minimizing embedding-space coverage; finally a U-Net is trained on the selected 5% subset. Experiments on Chest X-ray, RIGA+ fundus, and multi-site Prostate MRI benchmarks report that the selected subsets recover 99.3% of full-data Dice on Chest X-ray (22 samples), exceed random selection by up to 2.5 Dice points on Prostate MRI, and outperform CoreSet by 0.4-1.2 Dice points, with ablations attributing most gain to the diversity term and per-site results tracking scanner similarity.

Significance. If the reported margins hold under statistical scrutiny, the work supplies a practical, largely unsupervised route to reduce labeling budgets in cross-domain medical segmentation while preserving most of the performance of full supervision. The automatic alpha calibration and the ablation isolating diversity are concrete strengths; the per-site breakdown directly tests the transferability of source-trained MAE embeddings rather than leaving the assumption unexamined.

minor comments (3)
  1. [Abstract] Abstract and results tables report point estimates (99.3% recovery, 0.4-2.5 Dice margins) without error bars, standard deviations, or the number of random seeds/runs; adding these would allow readers to judge whether the observed gains exceed run-to-run variability.
  2. [Method] The greedy selection procedure is described at a high level; the manuscript would be strengthened by including explicit pseudocode or the precise mathematical definitions of the density, novelty, and diversity scores (including how alpha is optimized) so that the algorithm can be reproduced exactly.
  3. [Experiments] The per-site Prostate MRI breakdown is useful, yet the manuscript does not state whether the source MAE was trained on a single scanner or pooled multi-site data; clarifying this detail would make the embedding-transfer claim more precise.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary of XSSR and the recommendation for minor revision. The summary correctly captures the framework, automatic alpha calibration, ablation results, and benchmark outcomes. No major comments were provided in the report.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The manuscript describes an empirical pipeline: MAE pretraining on unlabeled source data to produce embeddings, followed by a greedy algorithmic selection of target samples using composite density/novelty/diversity scores (with alpha auto-calibrated via embedding coverage), then U-Net training on the selected subset. No equations, derivations, or fitted parameters are presented that reduce the reported Dice scores or selection performance to inputs by construction. Ablations and per-site breakdowns are external to the method itself. No self-citation chains, uniqueness theorems, or ansatzes are invoked as load-bearing. The framework is self-contained against the three public benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The framework rests on transfer of MAE embeddings across domains and the validity of the composite scoring for selection; no new physical entities or free parameters beyond the auto-calibrated alpha.

free parameters (1)
  • alpha
    Novelty-diversity trade-off parameter automatically calibrated by minimizing embedding-space coverage; no explicit fitted value reported.
axioms (1)
  • domain assumption MAE trained on source domain produces a shared embedding space useful for target sample selection despite domain shift
    Invoked as the first stage to establish the representation used for all subsequent scoring.

pith-pipeline@v0.9.1-grok · 5811 in / 1327 out tokens · 24497 ms · 2026-06-28T07:26:05.336522+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

19 extracted references · 1 linked inside Pith

  1. [1]

    In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F

    Ronneberger, O., Fischer, P., Brox, T.: U-Net: Convolutional Networks for Biomed- ical Image Segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015, LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015)

  2. [2]

    In: Ourselin, S., Joskowicz, L., Sabuncu, M.R., Unal, G., Wells, W

    Çiçek, Ö., Abdulkadir, A., Lienkamp, S.S., Brox, T., Ronneberger, O.: 3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation. In: Ourselin, S., Joskowicz, L., Sabuncu, M.R., Unal, G., Wells, W. (eds.) MICCAI 2016, LNCS, vol. 9901, pp. 424–432. Springer, Cham (2016)

  3. [3]

    arXiv preprint arXiv:1804.03999 (2018) XSSR: Cross-Domain Representative Selection 13

    Oktay, O., Schlemper, J., Le Folgoc, L., Lee, M., Heinrich, M., Misawa, K., Mori, K., McDonagh, S., Hammerla, N., Kainz, B., et al.: Attention U-Net: Learning Where to Look for the Pancreas. arXiv preprint arXiv:1804.03999 (2018) XSSR: Cross-Domain Representative Selection 13

  4. [4]

    Informatics in Medicine Unlocked47, 101504 (2024)

    Rayed, M.E., Islam, S.M.S., Niha, S.I., Jim, J.R., Kabir, M.M., Mridha, M.F.: Deep learning for medical image segmentation: State-of-the-art advancements and challenges. Informatics in Medicine Unlocked47, 101504 (2024)

  5. [5]

    Journal of Machine Learning Research17(59), 1–35 (2016)

    Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., Marchand, M., Lempitsky, V.: Domain-Adversarial Training of Neural Networks. Journal of Machine Learning Research17(59), 1–35 (2016)

  6. [6]

    In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp

    Tzeng, E., Hoffman, J., Saenko, K., Darrell, T.: Adversarial Discriminative Domain Adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7167–7176 (2017)

  7. [7]

    In: Proceedings of the 35th International Conference on Machine Learning (ICML), vol

    Hoffman, J., Tzeng, E., Park, T., Zhu, J., Isola, P., Saenko, K., Efros, A.A., Darrell, T.: CyCADA: Cycle-Consistent Adversarial Domain Adaptation. In: Proceedings of the 35th International Conference on Machine Learning (ICML), vol. 80, pp. 1989–1998. PMLR (2018)

  8. [8]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp

    Yoo, D., Kweon, I.S.: Learning Loss for Active Learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 93–102 (2019)

  9. [9]

    In: International Conference on Learning Representations (ICLR) (2018)

    Sener, O., Savarese, S.: Active Learning for Convolutional Neural Networks: A Core-Set Approach. In: International Conference on Learning Representations (ICLR) (2018)

  10. [10]

    In: International Conference on Learning Representations (ICLR) (2020)

    Ash, J.T., Zhang, C., Krishnamurthy, A., Langford, J., Agarwal, A.: Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds. In: International Conference on Learning Representations (ICLR) (2020)

  11. [11]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp

    He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked Autoencoders Are Scalable Vision Learners. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 16000–16009 (2022)

  12. [12]

    In: Proceedings of the 37th Interna- tional Conference on Machine Learning (ICML), vol

    Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A Simple Framework for Con- trastive Learning of Visual Representations. In: Proceedings of the 37th Interna- tional Conference on Machine Learning (ICML), vol. 119, pp. 1597–1607 (2020)

  13. [13]

    In: Proceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), pp

    He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum Contrast for Unsuper- vised Visual Representation Learning. In: Proceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), pp. 9729–9738 (2020)

  14. [14]

    Quanti- tative Imaging in Medicine and Surgery4(6), 475–477 (2014)

    Jaeger, S., Candemir, S., Antani, S., Wang, Y., Lu, P.-X., Thoma, G.: Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quanti- tative Imaging in Medicine and Surgery4(6), 475–477 (2014)

  15. [15]

    In: International Confer- ence on Digital Image Computing: Techniques and Applications (2015)

    Almazroa, A., Burman, R., Raahemifar, K., Lakshminarayanan, V.: RIGA: An open retinal image dataset for optic nerve head analysis. In: International Confer- ence on Digital Image Computing: Techniques and Applications (2015)

  16. [16]

    arXiv preprint arXiv:2002.03366 (2020)

    Liu, Q., Yuan, Y., Dou, Q., Heng, P.-A.: MS-Net: Multi-Site Network for Im- proving Prostate Segmentation with Heterogeneous MRI Data. arXiv preprint arXiv:2002.03366 (2020)

  17. [17]

    Liu, Q., et al.: Multi-site Dataset for Prostate MRI Segmentation.https:// liuquande.github.io/SAML/(2020)

  18. [18]

    Medical Image Anal- ysis18(2), 359–373 (2014)

    Litjens, G., Toth, R., van de Ven, W., Hoeks, C., Kerkstra, S., van Ginneken, B., Vincent, G., Guillard, G., Birbeck, N., Zhang, J., et al.: Evaluation of prostate seg- mentation algorithms for MRI: The PROMISE12 challenge. Medical Image Anal- ysis18(2), 359–373 (2014)

  19. [19]

    The Cancer Imaging Archive (2015)

    Bloch, B.N., Madabhushi, A., Huisman, H., Freymann, J., Kirby, J., Grauer, M., Enzmann, D., Shen, D., et al.: NCI-ISBI 2013 Challenge: Automated Segmentation of Prostate Structures. The Cancer Imaging Archive (2015)