arxiv: 2604.02564 · v2 · submitted 2026-04-02 · 📡 eess.IV · cs.CV

Recognition: no theorem link

Why Invariance is Not Enough for Biomedical Domain Generalization and How to Fix It

Sebo Diaz , Polina Golland , Elfar Adalsteinsson , Neel Dey

Authors on Pith no claims yet

Pith reviewed 2026-05-13 19:59 UTC · model grok-4.3

classification 📡 eess.IV cs.CV

keywords domain generalizationbiomedical image segmentation3D segmentationfoundation modelsdomain shiftsMaskGenfew-shot learning

0 comments

The pith

MaskGen achieves robust 3D biomedical segmentation by combining source intensities with foundation model representations rather than relying solely on invariance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates that invariance alone fails to produce reliable segmentation models when biomedical images shift across modalities, disease severity, or clinical sites. MaskGen instead trains models on both the original source-domain image intensities and auxiliary features drawn from foundation models whose representations stay relatively stable across domains. This dual-input strategy delivers measurable gains in fully supervised settings and in few-shot adaptation while adding only marginal overhead. The approach remains compatible with any network architecture, loss function, or standard augmentation pipeline and applies to arbitrary anatomical regions.

Core claim

MaskGen presents a simple learning strategy that utilizes both source-domain image intensities and domain-stable foundation model representations to train robust segmentation models for 3D biomedical images, achieving strong gains in both fully supervised and few-shot segmentation across broad clinical shifts.

What carries the argument

MaskGen training strategy, which augments standard segmentation loss with domain-stable foundation model representations supplied as auxiliary inputs alongside source image intensities.

Load-bearing premise

Representations from existing foundation models remain sufficiently stable across biomedical domains to serve as reliable auxiliary signals without further adaptation.

What would settle it

An experiment on a new clinical shift dataset where MaskGen produces no accuracy improvement over a standard baseline while the foundation model features vary substantially across domains would falsify the central claim.

Figures

Figures reproduced from arXiv: 2604.02564 by Elfar Adalsteinsson, Neel Dey, Polina Golland, Sebo Diaz.

**Figure 1.** Figure 1: Training on Stable Representations. When trained on in-domain CT (A, top) and tested on out-of-domain MRI (A, bottom), standard ERM models produce representations that are unstable under domain shifts (B). Although performant on unseen in-domain data (E, top), this instability leads to degraded performance on new out-of-distribution images (E, bottom). DropGen instead jointly trains on both in-domain image… view at source ↗

**Figure 2.** Figure 2: Method overview. Left: Given a standard PyTorch training loop, the green lines are the only additions required, demonstrating DropGen’s simplicity. Right: The probabilistic graphical model we use for domain generalization. Label Y generates both stable Xs and unstable Xu variables and the environment E influences only Xu. 3.1 Methods DropGen is outlined in [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Qualitative all-data segmentation results. [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Ablating Feature Combination Regularization [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 5.** Figure 5: Comparing cross-modality representations extracted by foundation [PITH_FULL_IMAGE:figures/full_fig_p023_5.png] view at source ↗

**Figure 6.** Figure 6: CKA Representation Similarity of Foundation Models Under Sequence [PITH_FULL_IMAGE:figures/full_fig_p024_6.png] view at source ↗

**Figure 7.** Figure 7: Choice of layer from which to use representations. [PITH_FULL_IMAGE:figures/full_fig_p025_7.png] view at source ↗

**Figure 8.** Figure 8: Histogram of Cosine Similarities. Cosine similarities for the two masked regimes of (1,0) and (1,1) for our method sampled every 100 steps during an "All-data" training experiment on BraTS [1, 46] to empirically support Condition 1. A KDE curve is fitted to the histogram for clarity. stable inputs. Specifically, we consider the regime where we mask the original input versus the regime where we mask nothing… view at source ↗

**Figure 9.** Figure 9: Corruption Analysis. We apply simulated domain shifts with increasing strengths, via bias (top) and contrast/gamma (middle) corruptions and observe that DropGen maintains high robustness. In the bottom row, we perturb the trained model weights with additive Gaussian noise, and find that DropGen is stable under these corruptions, indicating a flatter and more generalizable solution. Dashed vertical line in … view at source ↗

read the original abstract

We present MaskGen, a theoretically grounded and deliberately simple approach for domain generalization in 3D biomedical image segmentation. Modern segmentation models degrade sharply under shifts in modality, disease severity, clinical sites, and more, limiting their reliable adoption. Existing generalization methods address this using extreme augmentations, hand-engineered domain statistics mixing, or architectural redesigns that add significant implementation overhead while yielding inconsistent performance across biomedical settings. MaskGen instead presents a principled learning strategy with marginal overhead that utilizes both source-domain image intensities and domain-stable foundation model representations to train robust segmentation models. As a result, MaskGen achieves strong gains in both fully supervised and few-shot segmentation across broad clinical shifts in biomedical studies. Unlike prior approaches, MaskGen is architecture- and loss-agnostic, compatible with standard augmentation pipelines, easy to implement, and tackles arbitrary anatomical regions. Its implementation is freely available at https://github.com/sebodiaz/MaskGen.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MaskGen is a lightweight wrapper that adds frozen foundation-model features to raw intensities for biomedical segmentation, but the stability of those features is not directly measured.

read the letter

The core of this paper is MaskGen, which trains a segmenter on both the original source intensities and features pulled from an existing foundation model. The claim is that pure invariance training falls short for clinical shifts in modality, severity, and site, and that this extra stable signal fixes it with almost no extra cost. The method stays architecture- and loss-agnostic and works with ordinary augmentations, which is the main practical selling point. They also release the code, so anyone can try it on their own data without re-implementing anything complicated. That combination of simplicity and public implementation is the part worth paying attention to if you work on medical segmentation under distribution shift. The experiments are said to show gains in both fully supervised and few-shot regimes across several clinical datasets. What is missing is any direct check on the central assumption. There are no reported feature-space distances, invariance metrics, or ablations that remove the foundation-model branch to show the gains actually come from domain-stable representations rather than from extra regularization or input fusion. Without those controls it is hard to know whether the reported improvements are reproducible for the reason the authors give. The abstract itself contains no numbers, so the size of the effect also remains unclear until the tables are examined. This is the kind of paper that belongs in the medical imaging domain-generalization literature. A reader who needs a drop-in trick that does not require new architectures or heavy data mixing could find it useful, provided the stability premise holds up under scrutiny. It is coherent enough on its own terms to deserve a serious referee who can ask for the missing ablations and quantitative checks on the foundation features.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces MaskGen, a simple learning strategy for domain generalization in 3D biomedical image segmentation. It argues that invariance-based methods are insufficient and instead trains segmenters on both source-domain image intensities and representations from foundation models that are posited to be domain-stable. The approach is presented as architecture- and loss-agnostic with low overhead, and the abstract claims strong performance gains in fully supervised and few-shot segmentation across clinical shifts in modality, severity, and sites.

Significance. If the performance claims are substantiated by rigorous, quantitative experiments with appropriate ablations, MaskGen could offer a practical, low-overhead alternative to existing generalization techniques in biomedical imaging. Its compatibility with standard pipelines and public code release would be strengths for reproducibility and adoption.

major comments (2)

[Abstract] Abstract: the central claim of 'strong gains' in fully supervised and few-shot segmentation is asserted without any quantitative metrics, baseline comparisons, dataset details, or ablation results, making it impossible to assess whether the improvements are real, statistically significant, or attributable to the proposed mechanism.
[Methods] Methods (foundation-model branch): the manuscript supplies no direct quantification of domain-stability for the foundation-model representations (e.g., feature-space distances, invariance metrics, or cross-shift correlation scores) under the tested clinical shifts; without such evidence or an ablation that removes the foundation-model input, it remains possible that any observed gains derive from other components rather than the claimed stability.

minor comments (1)

[Methods] The manuscript would benefit from an explicit statement of the exact foundation models used and the precise manner in which their representations are fused with source intensities.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We agree that the abstract requires quantitative support and that additional evidence is needed for the foundation-model branch. We will revise the manuscript to address both points directly.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim of 'strong gains' in fully supervised and few-shot segmentation is asserted without any quantitative metrics, baseline comparisons, dataset details, or ablation results, making it impossible to assess whether the improvements are real, statistically significant, or attributable to the proposed mechanism.

Authors: We agree that the abstract should include concrete quantitative support. In the revised manuscript we will update the abstract to report specific performance metrics (e.g., mean Dice-score gains over the strongest baselines on the primary datasets) together with brief statements of the evaluation settings. Full tables with statistical significance, baseline comparisons, and ablation results already appear in the Experiments section and will be referenced more explicitly from the abstract. revision: yes
Referee: [Methods] Methods (foundation-model branch): the manuscript supplies no direct quantification of domain-stability for the foundation-model representations (e.g., feature-space distances, invariance metrics, or cross-shift correlation scores) under the tested clinical shifts; without such evidence or an ablation that removes the foundation-model input, it remains possible that any observed gains derive from other components rather than the claimed stability.

Authors: We acknowledge the absence of direct stability metrics. We will add a dedicated ablation that trains the identical segmentation architecture with and without the foundation-model branch, quantifying the generalization drop under each clinical shift. This isolates the contribution of the foundation-model input. If space allows we will also include a short supplementary analysis of feature-space distances across domains to support the stability claim. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The manuscript introduces MaskGen as an empirical training strategy that combines source intensities with representations from external foundation models. No equations, self-citations, or fitted parameters are presented that reduce the central performance claim to a tautology or to the inputs by construction. The domain-stability premise is treated as an external assumption rather than derived internally, and results are reported via standard supervised and few-shot experiments. This is consistent with a self-contained empirical contribution without load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the unproven premise that foundation-model features remain stable under biomedical domain shifts; no free parameters or new entities are introduced in the abstract description.

axioms (1)

domain assumption Representations extracted from existing foundation models are domain-stable across modality, site, and severity shifts in biomedical imaging.
This stability is invoked to justify using the representations as auxiliary training signal for generalization.

pith-pipeline@v0.9.0 · 5469 in / 1151 out tokens · 34462 ms · 2026-05-13T19:59:14.835375+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

78 extracted references · 78 canonical work pages · 8 internal anchors

[1]

Nature communications13(1), 4128 (2022) 9, 12, 13, 14, 26, 27, 28

Antonelli, M., Reinke, A., Bakas, S., Farahani, K., Kopp-Schneider, A., Landman, B.A., Litjens, G., Menze, B., Ronneberger, O., Summers, R.M., et al.: The medical segmentation decathlon. Nature communications13(1), 4128 (2022) 9, 12, 13, 14, 26, 27, 28

work page 2022
[2]

Invariant Risk Minimization

Arjovsky, M., Bottou, L., Gulrajani, I., Lopez-Paz, D.: Invariant risk minimization. arXiv preprint arXiv:1907.02893 (2019) 4

work page internal anchor Pith review Pith/arXiv arXiv 1907
[3]

Medical image analysis86, 102789 (2023) 4

Billot, B., Greve, D.N., Puonti, O., Thielscher, A., Van Leemput, K., Fischl, B., Dalca, A.V., Iglesias, J.E., et al.: Synthseg: Segmentation of brain mri scans of any contrast and resolution without retraining. Medical image analysis86, 102789 (2023) 4

work page 2023
[4]

The Cancer Imaging Archive (2015)

Bloch, N., Madabhushi, A., Huisman, H., Freymann, J., Kirby, J., Grauer, M., Enquobahrie, A., Jaffe, C., Clarke, L., Farahani, K.: Nci-isbi 2013 challenge: Auto- mated segmentation of prostate structures. The Cancer Imaging Archive (2015). https://doi.org/10.7937/K9/TCIA.2015.zF0vlOPv , http://doi.org/10.7937/K9/ TCIA.2015.zF0vlOPv9

work page doi:10.7937/k9/tcia.2015.zf0vlopv 2013
[5]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Butoi, V.I., Ortiz, J.J.G., Ma, T., Sabuncu, M.R., Guttag, J., Dalca, A.V.: Uni- verseg: Universal medical image segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 21438–21451 (2023) 2, 4

work page 2023
[6]

MONAI: An open-source framework for deep learning in healthcare

Cardoso, M.J., Li, W., Brown, R., Ma, N., Kerfoot, E., Wang, Y., Murrey, B., Myronenko, A., Zhao, C., Yang, D., et al.: Monai: An open-source framework for deep learning in healthcare. arXiv preprint arXiv:2211.02701 (2022) 32

work page internal anchor Pith review arXiv 2022
[7]

In: Proceedings of the AAAI conference on artificial intelligence

Chen, C., Dou, Q., Chen, H., Qin, J., Heng, P.A.: Synergistic image and feature adaptation: Towards cross-modality domain adaptation for medical image segmen- tation. In: Proceedings of the AAAI conference on artificial intelligence. vol. 33, pp. 865–872 (2019) 2, 3 16 Diaz et al

work page 2019
[8]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Chen, D., Wang, D., Darrell, T., Ebrahimi, S.: Contrastive test-time adaptation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 295–305 (2022) 2, 3

work page 2022
[9]

arXiv preprint arXiv:2407.01419 (2024) 4

Chollet, E., Balbastre, Y., Mauri, C., Magnain, C., Fischl, B., Wang, H.: Neurovas- cular segmentation in soct with deep learning and synthetic training data. arXiv preprint arXiv:2407.01419 (2024) 4

work page arXiv 2024
[10]

Improved Regularization of Convolutional Neural Networks with Cutout

DeVries, T., Taylor, G.W.: Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552 (2017) 3, 9, 35

work page internal anchor Pith review arXiv 2017
[11]

In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision

Dey, N., Abulnaga, M., Billot, B., Turk, E.A., Grant, E., Dalca, A.V., Golland, P.: Anystar: Domain randomized universal star-convex 3d instance segmentation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 7593–7603 (2024) 4

work page 2024
[12]

In: International Conference on Learning Representations (2025) 2, 3, 4, 5, 8, 10, 13, 23, 24, 26, 33

Dey, N., Billot, B., Wong, H.E., Wang, C.J., Ren, M., Grant, P.E., Dalca, A.V., Golland, P.: Learning general-purpose biomedical volume representations using randomized synthesis. In: International Conference on Learning Representations (2025) 2, 3, 4, 5, 8, 10, 13, 23, 24, 26, 33

work page 2025
[13]

In: International Conference on Medical Image Computing and Computer-Assisted Intervention

Diaz, S., Billot, B., Dey, N., Zhang, M., Abaci Turk, E., Grant, P.E., Golland, P., Adalsteinsson, E.: Robust fetal pose estimation across gestational ages via cross-population augmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 549–559. Springer (2025) 4

work page 2025
[14]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Dong, H., Konz, N., Gu, H., Mazurowski, M.A.: Medical image segmentation with intent: Integrated entropy weighting for single image test-time adaptation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 5046–5055 (2024) 3

work page 2024
[15]

In: International conference on learning representations (2019) 14

Dosovitskiy, A., Djolonga, J.: You only train once: Loss-conditional training of deep networks. In: International conference on learning representations (2019) 14

work page 2019
[16]

Advances in Neural Information Processing Systems36, 18291–18324 (2023) 2, 4, 5, 6, 12, 13, 36

Eastwood, C., Singh, S., Nicolicioiu, A.L., Vlastelica Pogančić, M., von Kügelgen, J., Schölkopf, B.: Spuriosity didn’t kill the classifier: Using invariant predictions to harness spurious features. Advances in Neural Information Processing Systems36, 18291–18324 (2023) 2, 4, 5, 6, 12, 13, 36

work page 2023
[17]

In: 2025 IEEE 22nd International Symposium on Biomedical Imaging (ISBI)

Fu, J., Dalca, A.V., Fischl, B., Moreno, R., Hoffmann, M.: Learning accurate rigid registration for longitudinal brain mri from synthetic data. In: 2025 IEEE 22nd International Symposium on Biomedical Imaging (ISBI). pp. 1–5. IEEE (2025) 4

work page 2025
[18]

Imaging Neuroscience2, 1–22 (2024) 4

Gopinath, K., Hoopes, A., Alexander, D.C., Arnold, S.E., Balbastre, Y., Billot, B., Casamitjana, A., Cheng, Y., Chua, R.Y.Z., Edlow, B.L., et al.: Synthetic data in generalizable, learning-based neuroimaging. Imaging Neuroscience2, 1–22 (2024) 4

work page 2024
[19]

HyperNetworks

Ha, D., Dai, A., Le, Q.V.: Hypernetworks. arXiv preprint arXiv:1609.09106 (2016) 14

work page internal anchor Pith review arXiv 2016
[20]

arXiv e-prints pp

He, Y., Guo, P., Tang, Y., Myronenko, A., Nath, V., Xu, Z., Yang, D., Zhao, C., Simon, B., Belue, M., et al.: Vista3d: Versatile imaging segmentation and annotation model for 3d computed tomography. arXiv e-prints pp. arXiv–2406 (2024) 2

work page 2024
[21]

Neural computation9(1), 1–42 (1997) 30

Hochreiter, S., Schmidhuber, J.: Flat minima. Neural computation9(1), 1–42 (1997) 30

work page 1997
[22]

arXiv preprint arXiv:2507.13458 (2025) 4

Hoffmann, M.: Domain-randomized deep learning for neuroimage analysis. arXiv preprint arXiv:2507.13458 (2025) 4

work page arXiv 2025
[23]

IEEE transactions on medical imaging41(3), 543–558 (2021) 4

Hoffmann, M., Billot, B., Greve, D.N., Iglesias, J.E., Fischl, B., Dalca, A.V.: Synthmorph: learning contrast-invariant registration without acquired images. IEEE transactions on medical imaging41(3), 543–558 (2021) 4

work page 2021
[24]

Imaging Neuroscience 2, 1–33 (2024) 4 DropGen 17

Hoffmann, M., Hoopes, A., Greve, D.N., Fischl, B., Dalca, A.V.: Anatomy-aware and acquisition-agnostic joint registration with synthmorph. Imaging Neuroscience 2, 1–33 (2024) 4 DropGen 17

work page 2024
[25]

Hoopes, A.: Voxelprompt: A vision-language agent for grounded medical image analysis. Ph.D. thesis, Massachusetts Institute of Technology (2025) 4

work page 2025
[26]

Hu, J., Shen, L., Albanie, S., Sun, G., Wu, E.: Squeeze-and-excitation networks (2017) 27

work page 2017
[27]

IEEE Transactions on Medical Imaging42(1), 233–244 (2022) 3

Hu, S., Liao, Z., Zhang, J., Xia, Y.: Domain and content adaptive convolution based multi-source domain generalization for medical image segmentation. IEEE Transactions on Medical Imaging42(1), 233–244 (2022) 3

work page 2022
[28]

In: European conference on computer vision

Huang, Z., Wang, H., Xing, E.P., Huang, D.: Self-challenging improves cross-domain generalization. In: European conference on computer vision. pp. 124–140. Springer (2020) 3, 9, 10, 36

work page 2020
[29]

Nature methods18(2), 203–211 (2021) 1, 9, 10, 32, 34

Isensee, F., Jaeger, P.F., Kohl, S.A., Petersen, J., Maier-Hein, K.H.: nnu-net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods18(2), 203–211 (2021) 1, 9, 10, 32, 34

work page 2021
[30]

arXiv preprint arXiv:2503.08373 (2025) 2, 4, 5, 13, 23, 24, 33

Isensee, F., Rokuss, M., Krämer, L., Dinkelacker, S., Ravindran, A., Stritzke, F., Hamm, B., Wald, T., Langenberg, M., Ulrich, C., et al.: nninteractive: Redefining 3d promptable segmentation. arXiv preprint arXiv:2503.08373 (2025) 2, 4, 5, 13, 23, 24, 33

work page arXiv 2025
[31]

In: BVM workshop

Isensee, F., Ulrich, C., Wald, T., Maier-Hein, K.H.: Extending nnu-net is all you need. In: BVM workshop. pp. 12–17. Springer (2023) 34

work page 2023
[32]

In: International Conference on Medical Image Computing and Computer- Assisted Intervention

Isensee, F., Wald, T., Ulrich, C., Baumgartner, M., Roy, S., Maier-Hein, K., Jaeger, P.F.: nnu-net revisited: A call for rigorous validation in 3d medical image segmen- tation. In: International Conference on Medical Image Computing and Computer- Assisted Intervention. pp. 488–498. Springer (2024) 1, 8

work page 2024
[33]

Advances in neural information processing systems 35, 36722–36732 (2022) 9, 12, 13, 14, 27, 28, 29

Ji, Y., Bai, H., Ge, C., Yang, J., Zhu, Y., Zhang, R., Li, Z., Zhanng, L., Ma, W., Wan, X., et al.: Amos: A large-scale abdominal multi-organ benchmark for versatile medical image segmentation. Advances in neural information processing systems 35, 36722–36732 (2022) 9, 12, 13, 14, 27, 28, 29

work page 2022
[34]

Jiang, Y., Veitch, V.: Invariant and transportable representations for anti-causal domain shifts (2022) 2, 4, 5, 12, 13, 36

work page 2022
[35]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Kang, G., Jiang, L., Yang, Y., Hauptmann, A.G.: Contrastive adaptation network for unsupervised domain adaptation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 4893–4902 (2019) 3

work page 2019
[36]

Medical Image Analysis68, 101907 (2021) 2, 3

Karani, N., Erdil, E., Chaitanya, K., Konukoglu, E.: Test-time adaptable neural networks for robust medical image segmentation. Medical Image Analysis68, 101907 (2021) 2, 3

work page 2021
[37]

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

Keskar, N.S., Mudigere, D., Nocedal, J., Smelyanskiy, M., Tang, P.T.P.: On large- batch training for deep learning: Generalization gap and sharp minima. arXiv preprint arXiv:1609.04836 (2016) 30

work page internal anchor Pith review Pith/arXiv arXiv 2016
[38]

In: Proceedings of the IEEE/CVF international conference on computer vision

Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A.C., Lo, W.Y., et al.: Segment anything. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 4015–4026 (2023) 4

work page 2023
[39]

In: International conference on machine learning

Kornblith, S., Norouzi, M., Lee, H., Hinton, G.: Similarity of neural network representations revisited. In: International conference on machine learning. pp. 3519–3529. PMlR (2019) 24

work page 2019
[40]

In: 2024 IEEE International Symposium on Biomedical Imaging (ISBI)

Laso, P., Cerri, S., Sorby-Adams, A., Guo, J., Mateen, F., Goebl, P., Wu, J., Liu, P., Li, H.B., Young, S.I., et al.: Quantifying white matter hyperintensity and brain volumes in heterogeneous clinical and low-field portable mri. In: 2024 IEEE International Symposium on Biomedical Imaging (ISBI). pp. 1–5. IEEE (2024) 4 18 Diaz et al

work page 2024
[41]

Computers in biology and medicine60, 8–31 (2015) 9

Lemaître, G., Martí, R., Freixenet, J., Vilanova, J.C., Walker, P.M., Meriaudeau, F.: Computer-aided detection and diagnosis for prostate cancer based on mono and multi-parametric mri: a review. Computers in biology and medicine60, 8–31 (2015) 9

work page 2015
[42]

Advances in neural information processing systems31(2018) 30

Li, H., Xu, Z., Taylor, G., Studer, C., Goldstein, T.: Visualizing the loss landscape of neural nets. Advances in neural information processing systems31(2018) 30

work page 2018
[43]

Medical image analysis 18(2), 359–373 (2014) 9

Litjens, G., Toth, R., Van De Ven, W., Hoeks, C., Kerkstra, S., Van Ginneken, B., Vincent, G., Guillard, G., Birbeck, N., Zhang, J., et al.: Evaluation of prostate segmentation algorithms for mri: the promise12 challenge. Medical image analysis 18(2), 359–373 (2014) 9

work page 2014
[44]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Liu, Q., Chen, C., Qin, J., Dou, Q., Heng, P.A.: Feddg: Federated domain general- ization on medical image segmentation via episodic learning in continuous frequency space. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 1013–1023 (2021) 3, 9, 12, 13, 14, 27, 28, 29

work page 2021
[45]

Decoupled Weight Decay Regularization

Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017) 9

work page internal anchor Pith review Pith/arXiv arXiv 2017
[46]

IEEE transactions on medical imaging 34(10), 1993–2024 (2014) 14, 26, 27, 28

Menze, B.H., Jakab, A., Bauer, S., Kalpathy-Cramer, J., Farahani, K., Kirby, J., Burren, Y., Porz, N., Slotboom, J., Wiest, R., et al.: The multimodal brain tumor image segmentation benchmark (brats). IEEE transactions on medical imaging 34(10), 1993–2024 (2014) 14, 26, 27, 28

work page 1993
[47]

IEEE Transactions on Medical Imaging42(4), 1095–1106 (2022) 1, 3, 9, 14, 27, 28, 29, 35

Ouyang, C., Chen, C., Li, S., Li, Z., Qin, C., Bai, W., Rueckert, D.: Causality- inspired single-source domain generalization for medical image segmentation. IEEE Transactions on Medical Imaging42(4), 1095–1106 (2022) 1, 3, 9, 14, 27, 28, 29, 35

work page 2022
[48]

Scientific Data 11(1), 721 (2024) 9, 12, 13, 14, 27, 28, 29

Pace, D.F., Contreras, H.T., Romanowicz, J., Ghelani, S., Rahaman, I., Zhang, Y., Gao, P., Jubair, M.I., Yeh, T., Golland, P., et al.: Hvsmr-2.0: A 3d cardiovascular mr dataset for whole-heart segmentation in congenital heart disease. Scientific Data 11(1), 721 (2024) 9, 12, 13, 14, 27, 28, 29

work page 2024
[49]

SAM 2: Segment Anything in Images and Videos

Ravi, N., Gabeur, V., Hu, Y.T., Hu, R., Ryali, C., Ma, T., Khedr, H., Rädle, R., Rolland, C., Gustafson, L., et al.: Sam 2: Segment anything in images and videos. arXiv preprint arXiv:2408.00714 (2024) 4

work page internal anchor Pith review Pith/arXiv arXiv 2024
[50]

arXiv preprint arXiv:2010.05761 , year =

Rosenfeld, E., Ravikumar, P., Risteski, A.: The risks of invariant risk minimization. arXiv preprint arXiv:2010.05761 (2020) 4, 8

work page arXiv 2010
[51]

Cambridge university press (2014) 4

Shalev-Shwartz, S., Ben-David, S.: Understanding machine learning: From theory to algorithms. Cambridge university press (2014) 4

work page 2014
[52]

Machine Learning111(3), 895–915 (2022) 4

Shui, C., Wang, B., Gagné, C.: On the benefits of representation regularization in invariance based domain generalization. Machine Learning111(3), 895–915 (2022) 4

work page 2022
[53]

In: European conference on computer vision

Sun, B., Saenko, K.: Deep coral: Correlation alignment for deep domain adaptation. In: European conference on computer vision. pp. 443–450. Springer (2016) 3

work page 2016
[54]

In: International conference on machine learning

Sun, Y., Wang, X., Liu, Z., Miller, J., Efros, A., Hardt, M.: Test-time training with self-supervision for generalization under distribution shifts. In: International conference on machine learning. pp. 9229–9248. PMLR (2020) 3

work page 2020
[55]

arXiv preprint arXiv:2505.19659 (2025) 3

Tiwary, P., Bhattacharyya, K., et al.: Langdaug: Langevin data augmentation for multi-source domain generalization in medical image segmentation. arXiv preprint arXiv:2505.19659 (2025) 3

work page arXiv 2025
[56]

In: 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS)

Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., Abbeel, P.: Domain randomization for transferring deep neural networks from simulation to the real world. In: 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS). pp. 23–30. IEEE (2017) 4 DropGen 19

work page 2017
[57]

In: Proceedings of the IEEE conference on computer vision and pattern recognition

Tzeng, E., Hoffman, J., Saenko, K., Darrell, T.: Adversarial discriminative domain adaptation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 7167–7176 (2017) 3

work page 2017
[58]

In: Medical Imaging with Deep Learning

Valanarasu, J.M.J., Guo, P., Patel, V.M., et al.: On-the-fly test-time adaptation for medical image segmentation. In: Medical Imaging with Deep Learning. pp. 586–598. PMLR (2024) 3

work page 2024
[59]

IEEE transactions on neural networks10(5), 988–999 (1999) 4

Vapnik, V.N.: An overview of statistical learning theory. IEEE transactions on neural networks10(5), 988–999 (1999) 4

work page 1999
[60]

Tent: Fully test-time adaptation by entropy minimization.arXiv preprint arXiv:2006.10726, 2020

Wang, D., Shelhamer, E., Liu, S., Olshausen, B., Darrell, T.: Tent: Fully test-time adaptation by entropy minimization. arXiv preprint arXiv:2006.10726 (2020) 2, 3, 12, 13, 36

work page arXiv 2006
[61]

Sensors25(17), 5603 (2025) 3

Weihsbach, C., Kruse, C.N., Bigalke, A., Heinrich, M.P.: Dg-tta: Out-of-domain medical image segmentation through augmentation, descriptor-driven domain gen- eralization, and test-time adaptation. Sensors25(17), 5603 (2025) 3

work page 2025
[62]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Wong, H.E., Ortiz, J.J.G., Guttag, J., Dalca, A.V.: Multiverseg: Scalable inter- active segmentation of biomedical imaging datasets with in-context guidance. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 20966–20980 (2025) 2

work page 2025
[63]

In: European Conference on Computer Vision

Wong, H.E., Rakic, M., Guttag, J., Dalca, A.V.: Scribbleprompt: fast and flexible interactive segmentation for any biomedical image. In: European Conference on Computer Vision. pp. 207–229. Springer (2024) 2, 4

work page 2024
[64]

IEEE Transactions on Medical Imaging43(9), 3098–3109 (2024) 3

Wu, J., Guo, D., Wang, G., Yue, Q., Yu, H., Li, K., Zhang, S.: Fpl+: Filtered pseudo label-based unsupervised cross-modality adaptation for 3d medical image segmentation. IEEE Transactions on Medical Imaging43(9), 3098–3109 (2024) 3

work page 2024
[65]

arXiv preprint arXiv:2007.13003 (2020) 3

Xu, Z., Liu, D., Yang, J., Raffel, C., Niethammer, M.: Robust and general- izable visual representation learning via random convolutions. arXiv preprint arXiv:2007.13003 (2020) 3

work page arXiv 2007
[66]

Yang, K., Musio, F., Ma, Y., Juchler, N., Paetzold, J.C., Al-Maskari, R., Höher, L., Li, H.B., Hamamci, I.E., Sekuboyina, A., Shit, S., Huang, H., Prabhakar, C., de la Rosa, E., Waldmannstetter, D., Kofler, F., Navarro, F., Menten, M., Ezhov, I., Rueckert, D., Vos, I., Ruigrok, Y., Velthuis, B., Kuijf, H., Hämmerli, J., Wurster, C., Bijlenga, P., Westphal...

work page 2024
[67]

In: International Conference on Medical Image Computing and Computer-Assisted Intervention

Zalevskyi, V., Sanchez, T., Roulet, M., Aviles Verdera, J., Hutter, J., Kebiri, H., Bach Cuadra, M.: Improving cross-domain brain tissue segmentation in fetal mri with synthetic data. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 437–447. Springer (2024) 4

work page 2024
[68]

In: International Conference on Medical Image Computing and Computer-Assisted Intervention

Zhang, G., Qi, X., Yan, B., Wang, G.: Iplc: iterative pseudo label correction guided by sam for source-free domain adaptation in medical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 351–360. Springer (2024) 3 20 Diaz et al

work page 2024
[69]

mixup: Beyond Empirical Risk Minimization

Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412 (2017) 3, 9, 10, 35

work page internal anchor Pith review Pith/arXiv arXiv 2017
[70]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Zhang, X., Wu, Y., Angelini, E., Li, A., Guo, J., Rasmussen, J.M., O’Connor, T.G., Wadhwa, P.D., Jackowski, A.P., Li, H., Posner, J., Laine, A.F., Wang, Y.: Mapseg: Unified unsupervised domain adaptation for heterogeneous medical image segmen- tation based on 3d masked autoencoding and pseudo-labeling. In: Proceedings of the IEEE/CVF Conference on Compute...

work page 2024
[71]

arXiv preprint arXiv:2507.23110 (2025) 9, 12, 13, 14, 27, 28

Zhang, Z., Peng, L., Dou, W., Sun, C., Aktas, H.E., Bejar, A.M., Keles, E., Du- rak, G., Bagci, U.: Rethink domain generalization in heterogeneous sequence mri segmentation. arXiv preprint arXiv:2507.23110 (2025) 9, 12, 13, 14, 27, 28

work page arXiv 2025
[72]

Journal of machine learning research23(340), 1–49 (2022) 4

Zhao, H., Dan, C., Aragam, B., Jaakkola, T.S., Gordon, G.J., Ravikumar, P.: Fundamental limits and tradeoffs in invariant representation learning. Journal of machine learning research23(340), 1–49 (2022) 4

work page 2022
[73]

In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

Zhao, X., Mithun, N.C., Rajvanshi, A., Chiu, H.P., Samarasekera, S.: Unsupervised domain adaptation for semantic segmentation with pseudo label self-refinement. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). pp. 2399–2409 (January 2024) 3

work page 2024
[74]

Medical Image Analysis97, 103275 (2024) 3

Zheng, B., Zhang, R., Diao, S., Zhu, J., Yuan, Y., Cai, J., Shao, L., Li, S., Qin, W.: Dual domain distribution disruption with semantics preservation: Unsupervised domain adaptation for medical image segmentation. Medical Image Analysis97, 103275 (2024) 3

work page 2024
[75]

arXiv preprint arXiv:2104.02008 , year=

Zhou, K., Yang, Y., Qiao, Y., Xiang, T.: Domain generalization with mixstyle. arXiv preprint arXiv:2104.02008 (2021) 1, 2, 3, 9, 35

work page arXiv 2021
[76]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Zhou, Z., Qi, L., Yang, X., Ni, D., Shi, Y.: Generalizable cross-modality medical image segmentation via style augmentation and dual normalization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 20856–20865 (2022) 2, 3, 9, 35

work page 2022
[77]

In: Proceedings of the IEEE/CVF international conference on computer vision

Zou, Y., Yu, Z., Liu, X., Kumar, B., Wang, J.: Confidence regularized self-training. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 5982–5991 (2019) 3 DropGen 21 A Proofs We first restate the propositions for completeness and then proceed with the proofs. Proposition 1 (Stationarity forces use of stable inputs).Given Assu...

work page 2019
[78]

All-data

from AMOS, representing inter-subject modality shifts. Representation channels (columns 2-7) were arbitrarily selected. Qualitatively, the penultimate output better preserves high-frequency details while retaining inter-domain stability. respectively. The illustrations indicate that, across either FLAIR vs. T2 or CT vs. MR, representations remain relative...

work page