Recognition: 1 theorem link
One Sequence to Segment Them All: Efficient Data Augmentation for CT and MRI Cross-Domain 3D Spine Segmentation
Pith reviewed 2026-05-08 18:36 UTC · model grok-4.3
The pith
Targeted data augmentations let a spine segmentation model trained on one CT or MRI sequence generalize to seven unseen domains including the opposite modality.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A set of GPU-optimized data augmentations applied during training on a single acquisition sequence enables 3D spine segmentation models to achieve an average Dice score gain of 155 percent across seven out-of-distribution datasets spanning CT and MRI sequences and contrasts, while incurring an average Dice decrease of only 0.008 percent on in-domain test sets and improving training speed by roughly 10 percent.
What carries the argument
The targeted set of data augmentation techniques that simulate cross-sequence and cross-modality variations, implemented with GPU optimization to avoid extra training cost.
If this is right
- Models exhibit an average 155 percent Dice gain on unseen domains.
- In-domain accuracy is preserved with an average Dice drop of only 0.008 percent.
- Transfer works in both directions between CT and MRI.
- Training runs approximately 10 percent faster despite the stronger augmentations.
- The released toolbox integrates directly into nnUNet and MONAI pipelines.
Where Pith is reading between the lines
- Clinics could train segmentation models on smaller single-site datasets and still expect usable performance across varied scanners without collecting new annotations.
- The efficiency improvement may make aggressive augmentation the default choice rather than an optional extra in medical imaging workflows.
- The same augmentation strategy could be tested on other anatomical targets or on 2D slice-based models to check whether the cross-modality benefit generalizes.
Load-bearing premise
The selected augmentations and the seven test datasets capture the range of real clinical scanner and protocol differences that models encounter in practice.
What would settle it
Retraining with the same augmentations on a fresh collection of CT and MRI spine volumes from additional institutions and observing no Dice improvement on the new out-of-domain cases would show the reported gains are not general.
Figures
read the original abstract
Deep learning-based medical image segmentation is increasingly used to support clinical diagnosis and develop new treatment strategies. However, model performance remains limited by the scarcity of high-quality annotated data and insufficient generalization across imaging protocols. This limitation is particularly evident in MRI and CT, where models are typically trained on a single acquisition sequence and exhibit reduced robustness when applied to unseen sequences or contrasts. Although data augmentation is widely used to improve general robustness on medical images, its impact on cross-modality generalization has not been quantitatively explored. In this work, we study a targeted set of data augmentation techniques designed to improve cross-modality transfer. We train three spine segmentation models, each on a single-modality/sequence dataset, and evaluate them across seven out-of-distribution datasets (spanning CT and MRI), reflecting a realistic single-sequence training and multi-sequence/contrast/modality deployment scenario. Our results demonstrate substantial performance gains on unseen domains (average Dice gain of 155 %) while preserving in-domain accuracy (average Dice decrease of 0.008 %), including effective transfer between CT and MRI. To mitigate the computational cost typically associated with strong data augmentation, we implement GPU-optimized augmentations that maintain, and even improve, training efficiency by approximately 10 %. We release our approach as an open-source toolbox, enabling seamless integration into commonly used frameworks such as nnUNet and MONAI. These augmentations significantly enhance robustness to heterogeneous clinical imaging scenarios without compromising training speed.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a targeted set of data augmentation techniques to improve cross-modality generalization in 3D spine segmentation models trained on single CT or MRI sequences. Three models are trained on individual datasets and evaluated on seven out-of-distribution datasets spanning CT and MRI contrasts; the work reports an average relative Dice gain of 155% on OOD data with negligible in-domain degradation (0.008%), GPU-optimized implementations that improve training efficiency by approximately 10%, and releases an open-source toolbox for integration with nnUNet and MONAI.
Significance. If the reported gains hold after providing absolute per-dataset metrics and confirming they are attributable to the augmentations, the work would offer a practical, lightweight approach to addressing domain shift in medical segmentation without requiring multi-domain training data or complex adaptation methods. The emphasis on computational efficiency and the open-source release are clear strengths that could aid reproducibility and adoption in clinical pipelines.
major comments (1)
- Abstract: The central claim of substantial OOD gains rests on an average relative Dice improvement of 155%. Relative percentages are sensitive to low baseline values (common in cross-modality spine segmentation); the manuscript must report the per-dataset baseline Dice scores, absolute improvements, the exact aggregation method for the average, and whether any OOD dataset was excluded. Without these, it is impossible to determine whether the gains are uniformly meaningful or driven by near-zero baselines, directly affecting the strength of the 'one sequence to segment them all' conclusion.
minor comments (2)
- Abstract: The phrase 'average Dice decrease of 0.008 %' should clarify whether this is an absolute or relative change and provide the corresponding standard deviation or range across the in-domain evaluations.
- The manuscript should include a table or figure with per-dataset Dice scores (baseline vs. augmented) for both in-domain and OOD evaluations to allow direct assessment of practical impact.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. The major comment raises an important point about the interpretability of relative Dice gains, and we address it directly below by committing to a clear revision.
read point-by-point responses
-
Referee: Abstract: The central claim of substantial OOD gains rests on an average relative Dice improvement of 155%. Relative percentages are sensitive to low baseline values (common in cross-modality spine segmentation); the manuscript must report the per-dataset baseline Dice scores, absolute improvements, the exact aggregation method for the average, and whether any OOD dataset was excluded. Without these, it is impossible to determine whether the gains are uniformly meaningful or driven by near-zero baselines, directly affecting the strength of the 'one sequence to segment them all' conclusion.
Authors: We agree that relative improvements must be accompanied by absolute values and per-dataset breakdowns to avoid misinterpretation, especially when baselines may be low in cross-modality settings. In the revised manuscript we will add a new table (e.g., Table 2) that reports, for each of the seven OOD datasets: (i) baseline Dice without the proposed augmentations, (ii) Dice with the augmentations, (iii) absolute improvement, and (iv) relative improvement. We will explicitly state that the 155 % figure is the arithmetic mean of the seven relative improvements and confirm that no OOD dataset was excluded. We will also include the corresponding per-training-dataset in-domain results to document the 0.008 % average degradation. These additions will be placed in the results section and referenced from the abstract, providing full transparency while preserving the original conclusions. revision: yes
Circularity Check
No circularity: empirical results on held-out OOD datasets
full rationale
The paper is a purely empirical study that trains segmentation models on single-modality datasets and measures Dice performance on seven independent out-of-distribution test sets (including cross-modality CT/MRI transfer). No equations, derivations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the abstract or described methods. All reported gains (155 % relative Dice, 0.008 % in-domain drop) are direct measurements on held-out data rather than quantities forced by construction from the inputs. The work therefore contains no self-definitional, fitted-input, or self-citation circularity.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Medical image analysis86, 102789 (2023)
Billot, B., Greve, D.N., Puonti, O., Thielscher, A., Van Leemput, K., Fischl, B., Dalca, A.V., Iglesias, J.E., et al.: Synthseg: Segmentation of brain mri scans of any contrast and resolution without retraining. Medical image analysis86, 102789 (2023)
2023
-
[2]
MONAI: An open-source framework for deep learning in healthcare
Cardoso, M.J., Li, W., Brown, R., Ma, N., Kerfoot, E., Wang, Y., Murrey, B., Myronenko, A., Zhao, C., Yang, D., et al.: Monai: An open-source framework for deep learning in healthcare. arXiv preprint arXiv:2211.02701 (2022)
work page internal anchor Pith review arXiv 2022
-
[3]
Journal of medical imaging and radiation oncology65(5), 545–563 (2021)
Chlap, P., Min, H., Vandenberg, N., Dowling, J., Holloway, L., Haworth, A.: A re- viewofmedicalimagedataaugmentationtechniquesfordeeplearningapplications. Journal of medical imaging and radiation oncology65(5), 545–563 (2021)
2021
-
[4]
IEEE transactions on Image Processing20(5), 1249–1261 (2010)
Deng, G.: A generalized unsharp masking algorithm. IEEE transactions on Image Processing20(5), 1249–1261 (2010)
2010
-
[5]
Artificial intelligence review56(11), 12561–12605 (2023)
Goceri, E.: Medical image data augmentation: techniques, comparisons and inter- pretations. Artificial intelligence review56(11), 12561–12605 (2023)
2023
-
[6]
Scientific Data 11(1), 264 (2024)
van der Graaf, J.W., van Hooff, M.L., Buckens, C.F., Rutten, M., van Susante, J.L., Kroeze, R.J., de Kleuver, M., van Ginneken, B., Lessmann, N.: Lumbar spine segmentation in mr images: a dataset and a public benchmark. Scientific Data 11(1), 264 (2024)
2024
-
[7]
In: Medical Imaging with Deep Learning (2024)
Graf, R., Möller, H., McGinnis, J., Rühling, S., Weihrauch, M., Atad, M., Shit, S., Mühlau, M., Paetzold, J.C., Rueckert, D., et al.: Modeling the acquisition shift between axial and sagittal mri for diffusion superresolution to enable axial spine segmentation. In: Medical Imaging with Deep Learning (2024)
2024
-
[8]
European Radiology Experimental7(1), 70 (2023)
Graf, R., Schmitt, J., Schlaeger, S., Möller, H.K., Sideri-Lampretsa, V., Sekuboy- ina, A., Krieg, S.M., Wiestler, B., Menze, B., Rueckert, D., et al.: Denoising diffusion-based mri to ct image translation enables automated spinal segmenta- tion. European Radiology Experimental7(1), 70 (2023)
2023
-
[9]
European Radiology Experimental9(1), 93 (2025) 10 Molinier and Möller et al
Häntze, H., Xu, L., Rattunde, M.N., Donle, L., Dorfner, F.J., Hering, A., Nawabi, J., Adams, L.C., Bressem, K.K.: Mri annotation using an inversion-based pre- processing for ct model adaptation. European Radiology Experimental9(1), 93 (2025) 10 Molinier and Möller et al
2025
-
[10]
Nature methods18(2), 203–211 (2021)
Isensee, F., Jaeger, P.F., Kohl, S.A., Petersen, J., Maier-Hein, K.H.: nnu-net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods18(2), 203–211 (2021)
2021
-
[11]
Journal of imaging9(4), 81 (2023)
Kebaili, A., Lapuyade-Lahorgue, J., Ruan, S.: Deep learning approaches for data augmentation in medical imaging: a review. Journal of imaging9(4), 81 (2023)
2023
-
[12]
arXiv preprint arXiv:2312.02608 (2023)
Kofler, F., Möller, H., Buchner, J.A., de la Rosa, E., Ezhov, I., Rosier, M., Mekki, I., Shit, S., Negwer, M., Al-Maskari, R., et al.: Panoptica–instance-wise evaluation of 3d semantic and instance segmentation maps. arXiv preprint arXiv:2312.02608 (2023)
-
[13]
IEEE transactions on medical imaging 34(10), 1993–2024 (2014)
Menze, B.H., Jakab, A., Bauer, S., Kalpathy-Cramer, J., Farahani, K., Kirby, J., Burren, Y., Porz, N., Slotboom, J., Wiest, R., et al.: The multimodal brain tumor image segmentation benchmark (brats). IEEE transactions on medical imaging 34(10), 1993–2024 (2014)
1993
-
[14]
IEEE Transactions on Medical Imaging42(4), 1095–1106 (2022)
Ouyang, C., Chen, C., Li, S., Li, Z., Qin, C., Bai, W., Rueckert, D.: Causality- inspired single-source domain generalization for medical image segmentation. IEEE Transactions on Medical Imaging42(4), 1095–1106 (2022)
2022
-
[15]
Informatics in medicine unlocked47, 101504 (2024)
Rayed, M.E., Islam, S.S., Niha, S.I., Jim, J.R., Kabir, M.M., Mridha, M.: Deep learning for medical image segmentation: State-of-the-art advancements and chal- lenges. Informatics in medicine unlocked47, 101504 (2024)
2024
-
[16]
Saydazimov, J., Ergashev, S., Nosirkulov, A.: Research of some image filter algo- rithmsusedinobjectdetection.In:Proceedingsofthe8thInternationalConference on Future Networks & Distributed Systems. p. 781–785. Association for Computing Machinery,NewYork,NY,USA(2025).https://doi.org/10.1145/3726122.3726236, https://doi.org/10.1145/3726122.3726236
-
[17]
In: 2023 IEEE Intelligent Vehicles Symposium (IV)
Schwonberg, M., El Bouazati, F., Schmidt, N.M., Gottschalk, H.: Augmentation- based domain generalization for semantic segmentation. In: 2023 IEEE Intelligent Vehicles Symposium (IV). pp. 1–8. IEEE (2023)
2023
-
[18]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Schwonberg, M., Gottschalk, H.: Domain generalization for semantic segmentation: A survey. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 6437–6448 (2025)
2025
-
[19]
In: International work- shop on simulation and synthesis in medical imaging
Shin, H.C., Tenenholtz, N.A., Rogers, J.K., Schwarz, C.G., Senjem, M.L., Gunter, J.L., Andriole, K.P., Michalski, M.: Medical image synthesis for data augmentation and anonymization using generative adversarial networks. In: International work- shop on simulation and synthesis in medical imaging. pp. 1–11. Springer (2018)
2018
-
[20]
IEEE transactions on medical imag- ing29(6), 1310–1320 (2010)
Tustison, N.J., Avants, B.B., Cook, P.A., Zheng, Y., Egan, A., Yushkevich, P.A., Gee, J.C.: N4itk: improved n3 bias correction. IEEE transactions on medical imag- ing29(6), 1310–1320 (2010)
2010
-
[21]
Warszawer, Y., Molinier, N., Valosek, J., Benveniste, P.L., Bédard, S., Shirbint, E., Mohamed, F., Tsagkas, C., Kolind, S., Lynd, L., Oh, J., Prat, A., Tam, R., Traboulsee, A., Patten, S., Lee, L.E., Ach- iron, A., Cohen-Adad, J.: Totalspineseg: Robust spine segmentation with landmark-based labeling in mri. ResearchGate preprint (2025), https://www.resear...
-
[22]
Computerized Medical Imaging and Graphics p
Xie, Z., Lin, Z., Sun, E., Ding, F., Qi, J., Zhao, S.: Deep learning for automatic ver- tebra analysis: A methodological survey of recent advances. Computerized Medical Imaging and Graphics p. 102652 (2025)
2025
-
[23]
arXiv preprint arXiv:2007.13003 (2020) 3
Xu, Z., Liu, D., Yang, J., Raffel, C., Niethammer, M.: Robust and general- izable visual representation learning via random convolutions. arXiv preprint arXiv:2007.13003 (2020)
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.