Recognition: unknown
Tumor-aware augmentation with task-guided attention analysis improves rectal cancer segmentation from magnetic resonance images
Pith reviewed 2026-05-08 15:17 UTC · model grok-4.3
The pith
Tumor-aware augmentation and anisotropic cropping restore token efficiency in CT-pretrained transformers for rectal MRI segmentation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Mechanistic analysis of attention dilution and feature reuse shows that zero-padding and ineffective adaptation cause accuracy loss in CT-to-MRI transfer; tumor-aware augmentation plus anisotropic cropping directly mitigate both issues and improve detection rates to 90.7 percent and 88.7 percent on identical rectal MRI datasets for SMIT and Swin UNETR respectively.
What carries the argument
Attention dilution index (ADI), an entropy-based metric that measures how much attention is diverted to zero-padded tokens, used together with centered kernel alignment (CKA) to assess feature reuse across modalities.
Load-bearing premise
The failure modes of token inefficiency and poor feature adaptation are the main reasons for degraded transfer performance and can be fixed by the proposed augmentation and cropping steps without creating selection bias.
What would settle it
Applying the same tumor-aware augmentation and anisotropic cropping to a model trained from scratch on MRI data alone produces no detection-rate improvement over standard fine-tuning.
Figures
read the original abstract
Pretraining on large-scale datasets has been shown to improve transformer generalizability, even for out-of-domain (OOD) modalities and tasks. However, two common assumptions often fail under OOD transfer: that downstream datasets can be adapted to the fixed input geometry of pretrained models and that pretrained representations transfer effectively across imaging modalities. We show that these assumptions break down through two interacting failure modes in CT-to-MRI transfer: inefficient token usage caused by zero-padding to match pretrained input dimensions and ineffective feature adaptation. These failures led to accuracy degradation despite extensive fine-tuning. We investigated these failure modes using two CT-pretrained hierarchical shifted-window transformer backbones, SMIT and Swin UNETR, pretrained with different objectives and datasets. Mechanistic analysis introduced an attention dilution index (ADI), an entropy-based metric quantifying attention diverted toward uninformative padding tokens, and centered kernel alignment (CKA) to measure feature reuse in MRI tasks. ADI increased with zero-padding, while high feature reuse did not necessarily correspond to improved accuracy. To mitigate these issues, we introduced two interventions: a tumor-aware augmentation strategy to improve tumor appearance heterogeneity coverage and an anisotropic cropping strategy to restore token efficiency. Fine-tuning on identical rectal MRI datasets improved detection rates to 224/247 (90.7%) for SMIT and 219/247 (88.7%) for Swin UNETR, demonstrating improved robustness under CT-to-MRI transfer. This study is among the first to examine when pretrained transformers fail to transfer effectively across imaging modalities and how simple mitigation strategies, motivated by mechanistic analysis of datasets, can reduce transfer limitations while improving robustness and MRI detection.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper examines failure modes in transferring CT-pretrained hierarchical shifted-window transformers (SMIT and Swin UNETR) to rectal MRI segmentation tasks. It identifies two interacting issues—inefficient token usage from zero-padding to match pretrained input sizes and ineffective feature adaptation across modalities—using a new Attention Dilution Index (ADI) based on entropy and centered kernel alignment (CKA) for mechanistic analysis. The authors propose tumor-aware augmentation to increase tumor heterogeneity coverage and anisotropic cropping to restore token efficiency, reporting post-intervention detection rates of 224/247 (90.7%) for SMIT and 219/247 (88.7%) for Swin UNETR after fine-tuning on rectal MRI data, claiming improved robustness under CT-to-MRI transfer.
Significance. If the central improvements can be causally attributed to the proposed interventions via proper controls, the work would offer useful insights into cross-modality transfer limitations for medical vision transformers and practical, low-cost mitigation strategies. This could have moderate significance for improving segmentation robustness in rectal cancer MRI, where pretrained models are increasingly used but often degrade on OOD data.
major comments (3)
- [Abstract and Results] Abstract and Results: The final detection rates of 224/247 (90.7%) for SMIT and 219/247 (88.7%) for Swin UNETR are presented as evidence of improved robustness, but the corresponding rates from the prior 'extensive fine-tuning' (which the text states exhibited degradation) are not reported, preventing quantification of the actual improvement magnitude attributable to the interventions.
- [Methods/Results] Experimental design (implied in Methods/Results): No ablation studies are provided that apply tumor-aware augmentation alone, anisotropic cropping alone, or neither intervention (beyond the baseline extensive fine-tuning), which is load-bearing because the central claim attributes the accuracy gains specifically to these two strategies mitigating token inefficiency and feature adaptation failures.
- [Analysis] Analysis section: ADI and CKA are introduced and described as increasing with zero-padding and measuring feature reuse, respectively, but the manuscript does not include quantitative correlation (e.g., regression or per-case analysis) between ADI/CKA values and the final accuracy deltas, weakening the mechanistic justification for the mitigations.
minor comments (2)
- [Abstract] The abstract uses 'detection rates' for what is described as a segmentation task; clarify the exact metric (e.g., whether it is tumor presence detection within segmented volumes or a proxy for segmentation performance) to avoid ambiguity.
- [Methods] The definition and computation of the Attention Dilution Index (ADI) should be formalized with an equation in the main text rather than described only qualitatively, to allow reproducibility.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which have helped us identify areas where the manuscript can be strengthened. We address each major comment below and have revised the manuscript accordingly to provide clearer quantification of improvements, additional controls, and stronger mechanistic evidence.
read point-by-point responses
-
Referee: [Abstract and Results] Abstract and Results: The final detection rates of 224/247 (90.7%) for SMIT and 219/247 (88.7%) for Swin UNETR are presented as evidence of improved robustness, but the corresponding rates from the prior 'extensive fine-tuning' (which the text states exhibited degradation) are not reported, preventing quantification of the actual improvement magnitude attributable to the interventions.
Authors: We agree that explicit baseline rates are needed for direct comparison. The manuscript text notes degradation under extensive fine-tuning alone, and the underlying per-model detection rates from that condition are available from our experiments. In the revised version, we will report these baseline rates alongside the post-intervention figures in both the abstract and results sections to quantify the improvement magnitude. revision: yes
-
Referee: [Methods/Results] Experimental design (implied in Methods/Results): No ablation studies are provided that apply tumor-aware augmentation alone, anisotropic cropping alone, or neither intervention (beyond the baseline extensive fine-tuning), which is load-bearing because the central claim attributes the accuracy gains specifically to these two strategies mitigating token inefficiency and feature adaptation failures.
Authors: We acknowledge that separate ablations would strengthen attribution of gains to each intervention. Our original design emphasized the combined application motivated by the interacting failure modes identified via ADI and CKA. In the revision, we will add ablation experiments applying tumor-aware augmentation alone and anisotropic cropping alone, reporting their individual effects on detection rates, ADI, and CKA to isolate contributions. revision: yes
-
Referee: [Analysis] Analysis section: ADI and CKA are introduced and described as increasing with zero-padding and measuring feature reuse, respectively, but the manuscript does not include quantitative correlation (e.g., regression or per-case analysis) between ADI/CKA values and the final accuracy deltas, weakening the mechanistic justification for the mitigations.
Authors: We appreciate this point on strengthening the mechanistic link. The current analysis shows ADI rising with padding and CKA patterns for feature reuse, but lacks explicit correlation to accuracy. In the revised manuscript, we will add per-case scatter plots, Pearson/Spearman correlations, and regression analysis between ADI/CKA values and segmentation accuracy deltas across cases to provide quantitative support for the proposed mitigations. revision: yes
Circularity Check
No circularity: empirical study with independent metrics and reported rates
full rationale
The paper is an empirical transfer-learning study. It defines ADI (entropy-based attention metric) and CKA externally, applies tumor-aware augmentation and anisotropic cropping as interventions, and reports concrete detection rates (224/247, 219/247) on rectal MRI data. No equations, fitted parameters, or predictions are shown to reduce by construction to the inputs; the central claim rests on experimental outcomes and mechanistic analysis rather than self-referential definitions or self-citation chains. The derivation chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Pretrained hierarchical shifted-window transformers on CT data can be meaningfully fine-tuned for MRI segmentation tasks
invented entities (1)
-
Attention Dilution Index (ADI)
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Prospective validation of a low rectal cancer magnetic resonance imaging staging system and development of a local recurrence risk stratification model: the mercury ii study
Battersby,N.J.,How,P.,Moran,B.,Stelzner,S.,West,N.P.,Branagan, G., Strassburg, J., Quirke, P., Tekkis, P., Pedersen, B.G., et al., 2016. Prospective validation of a low rectal cancer magnetic resonance imaging staging system and development of a local recurrence risk stratification model: the mercury ii study. Annals of surgery 263, 751–760
2016
-
[2]
Beets-Tan, R.G., Lambregts, D.M., Maas, M., Bipat, S., Barbaro, B., Curvo-Semedo, L., Fenlon, H.M., Gollub, M.J., Gourtsoyianni, S., Halligan, S., et al., 2018. Magnetic resonance imaging for clinical management of rectal cancer: updated recommendations from the 2016 european society of gastrointestinal and abdominal radiology (esgar) consensus meeting. E...
2018
-
[3]
MONAI: An open-source framework for deep learning in healthcare
Cardoso, M.J., Li, W., Brown, R., Ma, N., Kerfoot, E., Wang, Y., Murrey,B.,Myronenko,A.,Zhao,C.,Yang,D.,etal.,2022. MONAI: An open-source framework for deep learning in healthcare. arXiv preprint arXiv:2211.02701
work page internal anchor Pith review arXiv 2022
-
[4]
Charbel, C., Kwok, H.C., Miranda, J., Zheng, J., El Homsi, M., El Amine, M.A., Chhabra, S., Danilova, S., Gangai, N., Petkovska, I., Capanu, M., Vanguri, R.S., Chakraborty, J., Horvat, N., 2025. Reliability of rectal mri radiomic features: Comparing rectal mri radiomic features across reader expertise levels, image segmenta- tion technique, and timing of ...
-
[5]
Transunet: Rethinking the u-net architecture design for medical image segmentation through the lens of transformers
Chen, J., Mei, J., Li, X., Lu, Y., Yu, Q., Wei, Q., Luo, X., Xie, Y., Adeli, E., Wang, Y., et al., 2024. Transunet: Rethinking the u-net architecture design for medical image segmentation through the lens of transformers. Medical Image Analysis
2024
-
[6]
Med3d: Transfer learning for 3d medical image analysis
Chen, S., Ma, K., Zheng, Y., 2019. Med3d: Transfer learning for 3d medical image analysis. arXiv preprint arXiv:1904.00625
-
[7]
Algorithms for learningkernelsbasedoncenteredalignment.TheJournalofMachine Learning Research
Cortes, C., Mohri, M., Rostamizadeh, A., 2012. Algorithms for learningkernelsbasedoncenteredalignment.TheJournalofMachine Learning Research
2012
-
[8]
Vision transformersneedregisters,in:InternationalConferenceonLearning Representations
Darcet, T., Oquab, M., Mairal, J., Bojanowski, P., 2024. Vision transformersneedregisters,in:InternationalConferenceonLearning Representations
2024
-
[9]
An image is worth 16x16 words: Transformers for image recognition at scale, in: International Conference on Learning Representations (ICLR)
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N., 2021. An image is worth 16x16 words: Transformers for image recognition at scale, in: International Conference on Learning Representations (ICLR)
2021
-
[10]
Artificial intelligence–based technology for semi-automated segmentation of rectal cancer using high-resolution mri
Hamabe,A.,Ishii,M.,Kamoda,R.,Sasuga,S.,Okuya,K.,Okita,K., Akizuki, E., Sato, Y., Miura, R., Onodera, K., et al., 2022. Artificial intelligence–based technology for semi-automated segmentation of rectal cancer using high-resolution mri. PLoS One 17, e0269931
2022
-
[11]
Hatamizadeh, A., Nath, V., Tang, Y., Yang, D., Roth, H.R., Xu, D.,
-
[12]
Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images, in: International MICCAI brainlesion workshop, Springer. pp. 272–284
-
[13]
Horvat, N., Veeraraghavan, H., Khan, M., Blazic, I., Zheng, J., Ca- panu, M., Sala, E., Garcia-Aguilar, J., Gollub, M.J., Petkovska, I.,
-
[14]
Radiology 287, 833– 843
Mr imaging of rectal cancer: radiomics analysis to assess treatment response after neoadjuvant therapy. Radiology 287, 833– 843
-
[15]
Isensee, F., Jaeger, P.F., Kohl, S.A., Petersen, J., Maier-Hein, K.H.,
-
[16]
Nature Methods
nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature Methods
-
[17]
Self-supervised learning improves robustness of deep learning lung tumor segmen- tation models to ct imaging differences
Jiang, J., Rangnekar, A., Veeraraghavan, H., 2025. Self-supervised learning improves robustness of deep learning lung tumor segmen- tation models to ct imaging differences. Medical Physics 52, 1573– 1588
2025
-
[18]
Jiang,J.,Tyagi,N.,Tringale,K.,Crane,C.,Veeraraghavan,H.,2022. Self-supervised3danatomysegmentationusingself-distilledmasked image transformer (SMIT), in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer
2022
-
[19]
Semi-automatic tumor segmentation of rectal cancer based on functional magnetic resonance imaging
Knuth, F., Groendahl, A.R., Winter, R.M., Torheim, T., Negård, A., Holmedal,S.H.,Bakke,K.M.,Meltzer,S.,Futsæther,C.M.,Redalen, K.R., 2022. Semi-automatic tumor segmentation of rectal cancer based on functional magnetic resonance imaging. Physics and imag- ing in radiation oncology 22, 77–84
2022
-
[20]
Similarity of neural network representations revisited, in: International conference on machine learning, PMlR
Kornblith, S., Norouzi, M., Lee, H., Hinton, G., 2019. Similarity of neural network representations revisited, in: International conference on machine learning, PMlR. pp. 3519–3529
2019
-
[21]
Fully automated segmentation and radiomics feature extraction of hypopharyngeal cancer on mri using deep learning
Lin, Y.C., Lin, G., Pandey, S., Yeh, C.H., Wang, J.J., Lin, C.Y., Ho, T.Y., Ko, S.F., Ng, S.H., 2023. Fully automated segmentation and radiomics feature extraction of hypopharyngeal cancer on mri using deep learning. European Radiology 33, 6548–6556
2023
-
[22]
Is it time to replace cnns with transformers for medical images? arXiv preprint arXiv:2108.09038
Matsoukas, C., Haslum, J.F., Söderberg, M., Smith, K., 2021. Is it time to replace cnns with transformers for medical images? arXiv preprint arXiv:2108.09038 . A. Rangnekar et al.:Preprint submitted to ElsevierPage 12 of 13 CT-to-MRI pretraining transfer for rectal tumor segmentation
-
[23]
Matsoukas, C., Haslum, J.F., Sorkhei, M., Söderberg, M., Smith, K.,
-
[24]
9225–9234
Whatmakestransferlearningworkformedicalimages:Feature reuse&otherfactors,in:ProceedingsoftheIEEE/CVFconferenceon computer vision and pattern recognition, pp. 9225–9234
-
[25]
V-net: Fully convolu- tional neural networks for volumetric medical image segmentation, in: 2016 fourth international conference on 3D vision (3DV), Ieee
Milletari, F., Navab, N., Ahmadi, S.A., 2016. V-net: Fully convolu- tional neural networks for volumetric medical image segmentation, in: 2016 fourth international conference on 3D vision (3DV), Ieee. pp. 565–571
2016
-
[26]
Machado, F.A., Chakraborty, J., Pandini, R.V., Saraiva, S., Nahas, C.S.R., Nahas, S.C.,Nomura,C.H.,2023
Miranda, J., Horvat, N., Assuncao Jr, A.N., de M. Machado, F.A., Chakraborty, J., Pandini, R.V., Saraiva, S., Nahas, C.S.R., Nahas, S.C.,Nomura,C.H.,2023. Mri-basedradiomicscoreincreasedmrtrg accuracy in predicting rectal cancer response to neoadjuvant therapy. Abdominal Radiology 48, 1911–1920
2023
-
[27]
arXiv preprint arXiv:2010.15327 , year=
Nguyen, T., Raghu, M., Kornblith, S., 2020. Do wide and deep networks learn the same things? uncovering how neural net- work representations vary with width and depth. arXiv preprint arXiv:2010.15327
-
[28]
Clinically applicable segmentation of head and neck anatomy for radiotherapy: deep learning algorithm development and validation study
Nikolov, S., Blackwell, S., Zverovitch, A., Mendes, R., Livne, M., De Fauw, J., Patel, Y., Meyer, C., Askham, H., Romera-Paredes, B., et al., 2021. Clinically applicable segmentation of head and neck anatomy for radiotherapy: deep learning algorithm development and validation study. Journal of medical Internet research 23, e26151
2021
-
[29]
Pytorch:Animperativestyle,high-performancedeeplearninglibrary
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al., 2019. Pytorch:Animperativestyle,high-performancedeeplearninglibrary. Advances in neural information processing systems
2019
-
[30]
Clinical utility of radiomics at baseline rectal mri to predict complete response of rectal cancer after chemoradiation therapy
Petkovska, I., Tixier, F., Ortiz, E.J., Golia Pernicka, J.S., Paroder, V., Bates, D.D., Horvat, N., Fuqua, J., Schilsky, J., Gollub, M.J., et al., 2020. Clinical utility of radiomics at baseline rectal mri to predict complete response of rectal cancer after chemoradiation therapy. Abdominal Radiology 45, 3608–3617
2020
-
[31]
Transfusion: Understanding transfer learning for medical imaging
Raghu, M., Zhang, C., Kleinberg, J., Bengio, S., 2019. Transfusion: Understanding transfer learning for medical imaging. Advances in neural information processing systems 32
2019
-
[32]
U-net: Convolutional networks for biomedical image segmentation, in: International Con- ference on Medical Image Computing and Computer-Assisted Inter- vention, Springer
Ronneberger, O., Fischer, P., Brox, T., 2015. U-net: Convolutional networks for biomedical image segmentation, in: International Con- ference on Medical Image Computing and Computer-Assisted Inter- vention, Springer
2015
-
[33]
Cancer statistics, 2022
Siegel, R.L., Miller, K.D., Fuchs, H.E., Jemal, A., 2022. Cancer statistics, 2022. CA: a cancer journal for clinicians 72, 7–33
2022
-
[34]
Leading cancer deaths in people younger than 50 years
Siegel, R.L., Wagle, N.S., Jemal, A., 2026. Leading cancer deaths in people younger than 50 years. JAMA
2026
-
[35]
Feature selection via dependence maximization
Song, L., Smola, A., Gretton, A., Bedo, J., Borgwardt, K., 2012. Feature selection via dependence maximization. Journal of Machine Learning Research
2012
-
[36]
Self-supervised pre-training of swin transformers for 3d medical image analysis, in: Proceedings of the IEEE/CVFConferenceonComputerVisionandPatternRecognition
Tang, Y., Yang, D., Li, W., Roth, H.R., Landman, B., Xu, D., Nath, V., Hatamizadeh, A., 2022. Self-supervised pre-training of swin transformers for 3d medical image analysis, in: Proceedings of the IEEE/CVFConferenceonComputerVisionandPatternRecognition
2022
-
[37]
Reliability of tumor segmentation in glioblastoma: Impact on the robustness of mri-radiomic features
Tixier, F., Um, H., Young, R.J., Veeraraghavan, H., 2019. Reliability of tumor segmentation in glioblastoma: Impact on the robustness of mri-radiomic features. Medical Physics 46, 3582–3591. URL: https://pubmed.ncbi.nlm.nih.gov/31131906/, doi:10.1002/mp.13624
-
[38]
Trebeschi, S., van Griethuysen, J.J., Lambregts, D.M., Lahaye, M.J., Parmar, C., Bakers, F.C., Peters, N.H., Beets-Tan, R.G., Aerts, H.J.,
-
[39]
Scientificreports7,5301
Deep learning for fully-automated localization and segmenta- tionofrectalcanceronmultiparametricmr. Scientificreports7,5301
-
[40]
Evaluation of measures for assessing time-saving of automatic organ-at-risk segmentation in radiotherapy
Vaassen,F.,Hazelaar,C.,Vaniqui,A.,Gooding,M.,VanderHeyden, B., Canters, R., Van Elmpt, W., 2020. Evaluation of measures for assessing time-saving of automatic organ-at-risk segmentation in radiotherapy. Physics and Imaging in Radiation Oncology 13, 1–6
2020
-
[41]
The accuracy of multi- detectorrowctfortheassessmentoftumorinvasionofthemesorectal fascia in primary rectal cancer
Vliegen,R.,Dresen,R.,Beets,G.,Daniels-Gooszen,A.,Kessels,A., van Engelshoven, J., Beets-Tan, R., 2008. The accuracy of multi- detectorrowctfortheassessmentoftumorinvasionofthemesorectal fascia in primary rectal cancer. Abdominal imaging 33, 604–610
2008
-
[42]
Triad:Visionfoundationmodelfor3dmagnetic resonance imaging
Wang, S., Safari, M., Li, Q., Chang, C.W., Qiu, R.L., Roper, J., Yu, D.S.,Yang,X.,2025. Triad:Visionfoundationmodelfor3dmagnetic resonance imaging. Research Square
2025
-
[43]
Efficient streaming language models with attention sinks, in: International Conference on Learning Representations
Xiao, G., Tian, Y., Chen, B., Han, S., Lewis, M., 2024. Efficient streaming language models with attention sinks, in: International Conference on Learning Representations
2024
-
[44]
21-gene recurrence score and survival outcomes in the phase iii multicenter tailorx clinical trial
Yang, S.X., Yu, J., Wang, M., 2024. 21-gene recurrence score and survival outcomes in the phase iii multicenter tailorx clinical trial. JournaloftheNationalComprehensiveCancerNetwork22,376–381. A. Rangnekar et al.:Preprint submitted to ElsevierPage 13 of 13
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.