pith. machine review for the scientific record. sign in

arxiv: 2605.12753 · v1 · submitted 2026-05-12 · 📡 eess.IV · cs.CV· cs.LG

Recognition: no theorem link

Optimization in Sparse 2D to Dense 3D Weakly Supervised Learning: Application to Multi-Label Segmentation of Large ex vivo MRI Data

Brandon Bujak, Charidimos Tsagkas, Daniel Reich, Govind Nair, Irene Cortese, Julien Cohen-Adad, Kuan Yi Wang, Paul Hoareau, Roy Sun

Authors on Pith no claims yet

Pith reviewed 2026-05-14 19:32 UTC · model grok-4.3

classification 📡 eess.IV cs.CVcs.LG
keywords weakly supervised segmentationsparse to dense learning3D MRI segmentationex vivo spinal cordmulti-label lesion segmentationpseudo-labelingregularization strategiesdata scarcity
0
0 comments X

The pith

2D and 3D segmentation models require distinct regularization when trained from sparse 2D MRI annotations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper investigates the challenges of training 3D models for segmenting lesions in high-resolution ex vivo spinal cord MRI using only a small number of annotated 2D slices. A 2D teacher model generates pseudo-labels to supervise a 3D student model. The study reveals that methods like aggressive spatial augmentation and soft labeling, which improve the 2D model's accuracy on white matter lesions by more than 11 Dice points, actually lower performance when used with the 3D model. Human-oriented preprocessing steps such as CLAHE also cause large accuracy losses, reducing gray matter lesion Dice scores by around 25 points.

Core claim

The central finding is a divergence in optimal training strategies: while 2D teachers benefit from strong spatial augmentation and soft-label regularization to handle data scarcity, propagating these to 3D students trained on dense pseudo-labels degrades results. Human-centric preprocessing disrupts global statistical cues essential for machine learning, and 3D models need more conservative regularization due to their different optimization landscapes.

What carries the argument

The sparse-to-dense weakly supervised pipeline using a 2D teacher model to generate pseudo-labels for training a 3D student model on multi-label segmentation of MS lesions in spinal cord MRI.

If this is right

  • 3D student models exhibit different optimization needs and require conservative regularization unlike their 2D counterparts.
  • Human-centric preprocessing like CLAHE should be avoided as it harms model performance by disrupting statistical cues.
  • Soft-labeling and strong augmentation improve 2D performance on sparse data but must not be directly transferred to 3D.
  • Multi-label segmentation of white and gray matter lesions in large ex vivo MRI datasets is achievable with sparse 2D annotations under appropriate conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The performance differences may stem from 3D models being more vulnerable to errors in the pseudo-labels generated by 2D teachers.
  • These results suggest that dimensionality-specific tuning is necessary in other volumetric imaging tasks beyond spinal cord MRI.
  • Future experiments could test whether using ground truth dense labels eliminates the need for conservative regularization in 3D.

Load-bearing premise

The pseudo-labels produced by the 2D teacher are accurate enough to train the 3D student without systematic errors that account for the performance differences.

What would settle it

An experiment that trains the 3D model directly on dense ground-truth labels and tests whether applying strong augmentation and soft-labeling still degrades performance compared to conservative settings.

Figures

Figures reproduced from arXiv: 2605.12753 by Brandon Bujak, Charidimos Tsagkas, Daniel Reich, Govind Nair, Irene Cortese, Julien Cohen-Adad, Kuan Yi Wang, Paul Hoareau, Roy Sun.

Figure 1
Figure 1. Figure 1: Illustration of the magnitude, the phase and the associated segmentation mask. Cyan: Healthy WM; Green: [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Illustration of the effect of our preprocessing. [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Illustration of the soft edges on the magnitude. [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Illustration of the jitter noise on a sagittal plane [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Segmentation results on the test set. Each panel [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Sagittal view of a rough Pseudo Ground Truth and [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Illustration of the Otsu masking. The original [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗
read the original abstract

INTRODUCTION | Fully supervised 3D segmentation of high-resolution ex vivo MRI is limited by the prohibitive cost of volumetric annotation, forcing reliance on sparse 2D slices. Weakly supervised Sparse-to-Dense frameworks bridge this gap, but guidelines remain ambiguous regarding human-centric visual enhancements and transferring optimization strategies across dimensions. We analyze divergent regularization needs for multi-class segmentation of high-resolution ex vivo spinal cord MRI. METHODS | We used 9.4T MRI of multiple sclerosis spinal cords (>104,000 slices) with sparse annotations (428 slices). A 2D Teacher trained on sparse slices generated dense pseudo-labels to train a 3D Student. We systematically evaluated the impact of human-centric preprocessing, spatial augmentation, and soft-label regularization on both architectures. RESULTS | We identified a critical divergence in training dynamics. The 2D Teacher required strong spatial augmentation and soft-labeling to overcome data scarcity, improving White Matter Lesion Dice scores by >11 points. However, propagating these techniques to the 3D Student degraded its performance. Furthermore, human-centric preprocessing (e.g., CLAHE) disrupted global statistical cues, dropping Gray Matter Lesion Dice scores by ~25 points. DISCUSSION | Our study highlights a perception divergence (human-centric contrast enhancement harms machine models) and a regularization conflict across dimensions. 3D architectures trained on dense pseudo-labels exhibit fundamentally different optimization landscapes than 2D counterparts and require distinct, conservative regularization. Code and models: https://github.com/ivadomed/model_seg_sc-gm-lesion_human_ms_exvivo_t2star.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes a sparse 2D-to-dense 3D weakly supervised framework for multi-label segmentation of high-resolution ex vivo spinal cord MRI (>104k slices, 428 sparse annotations). A 2D teacher trained on sparse slices generates dense pseudo-labels for a 3D student; systematic ablations show that strong spatial augmentation plus soft labeling raises 2D White Matter Lesion Dice by >11 points but degrades the 3D student, while human-centric preprocessing (CLAHE) drops Gray Matter Lesion Dice by ~25 points. The authors conclude that 2D and 3D models occupy distinct optimization landscapes and require dimension-specific regularization.

Significance. If the reported divergence is robust, the work supplies concrete, actionable guidelines for transferring regularization and preprocessing choices across dimensions in weakly supervised medical segmentation, where full 3D annotation is prohibitive. It also supplies a large-scale ex vivo MS dataset and open code, which are valuable for reproducibility.

major comments (2)
  1. [Results] Results section: the headline claim that the 3D student exhibits a fundamentally different optimization landscape rests on the untested assumption that the 2D teacher's dense pseudo-labels are sufficiently accurate. No slice-wise or volume-wise fidelity metrics (Dice against held-out manual labels, label consistency across reconstructed 3D volumes) are reported; without them the observed 3D degradation could be explained by label noise rather than dimensionality.
  2. [Methods] Methods and Results: the reported Dice gains (>11 pt WM lesion, ~25 pt GM lesion drop) are presented without error bars, statistical tests, or complete ablation tables that isolate each factor (augmentation strength, soft-label temperature, CLAHE). This prevents assessment of whether the dimensional divergence is statistically reliable or sensitive to hyper-parameter choices.
minor comments (2)
  1. [Abstract] Abstract and Results: numerical claims should be accompanied by the corresponding baseline Dice values and the exact number of test volumes or slices used.
  2. [Discussion] Discussion: the term 'perception divergence' is introduced without a precise definition or supporting quantitative comparison between human and model sensitivity to contrast changes.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their valuable comments on our manuscript. We address each major comment point-by-point below and indicate the revisions we will make to strengthen the paper.

read point-by-point responses
  1. Referee: [Results] Results section: the headline claim that the 3D student exhibits a fundamentally different optimization landscape rests on the untested assumption that the 2D teacher's dense pseudo-labels are sufficiently accurate. No slice-wise or volume-wise fidelity metrics (Dice against held-out manual labels, label consistency across reconstructed 3D volumes) are reported; without them the observed 3D degradation could be explained by label noise rather than dimensionality.

    Authors: We appreciate this observation. The 2D teacher was trained and evaluated on the sparse annotations with a held-out validation set, providing indirect support for pseudo-label quality. Importantly, our ablation studies fix the pseudo-label generation process and vary only the 3D training regularization, demonstrating that the performance differences arise from the interaction between the 3D model and the regularization choices. To directly address the concern, we will report slice-wise Dice scores of the pseudo-labels against additional held-out manual segmentations and assess label consistency in the reconstructed volumes in the revised manuscript. revision: yes

  2. Referee: [Methods] Methods and Results: the reported Dice gains (>11 pt WM lesion, ~25 pt GM lesion drop) are presented without error bars, statistical tests, or complete ablation tables that isolate each factor (augmentation strength, soft-label temperature, CLAHE). This prevents assessment of whether the dimensional divergence is statistically reliable or sensitive to hyper-parameter choices.

    Authors: We agree that the presentation of results can be improved for statistical robustness. In the revised manuscript, we will include error bars representing standard deviation across multiple independent training runs, conduct appropriate statistical tests to confirm the significance of the reported differences, and provide expanded ablation tables that systematically isolate the effects of each factor (augmentation strength, soft-label temperature, and CLAHE) separately for the 2D teacher and 3D student models. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical results from held-out evaluation

full rationale

The manuscript reports measured Dice-score differences from training a 2D teacher on sparse slices and a 3D student on its dense pseudo-labels. No equations, first-principles derivations, or fitted parameters are presented whose outputs reduce to the inputs by construction. Performance gaps (e.g., +11 pt WM-lesion Dice for 2D with augmentation/soft labels, degradation for 3D, -25 pt GM-lesion Dice with CLAHE) are direct experimental observations on held-out data, not tautological re-statements. Any self-citations are incidental and not load-bearing for the central empirical claims. The analysis is therefore self-contained and externally falsifiable via the reported metrics.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that 2D-generated pseudo-labels provide a valid training signal for 3D models and that observed differences stem from dimensionality rather than dataset-specific artifacts.

free parameters (2)
  • spatial augmentation strength
    Chosen to optimize 2D teacher but shown to harm 3D student; value is tuned rather than derived.
  • soft-label temperature
    Hyperparameter controlling label softness whose optimal setting differs by architecture.
axioms (1)
  • domain assumption Pseudo-labels from 2D teacher are sufficiently accurate for 3D training
    Core premise of the sparse-to-dense pipeline invoked in the methods description.

pith-pipeline@v0.9.0 · 5639 in / 1299 out tokens · 52581 ms · 2026-05-14T19:32:59.085289+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

24 extracted references · 3 canonical work pages

  1. [1]

    Image Augmentation Techniques for Mammogram Analysis

    Oza, Parita and Sharma, Paawan and Patel, Samir and Adedoyin, Festus and Bruno, Alessandro. Image Augmentation Techniques for Mammogram Analysis. J Imaging

  2. [2]

    Yoshimi, Yuki and Mine, Yuichi and Ito, Shota and Takeda, Saori and Okazaki, Shota and Nakamoto, Takashi and Nagasaki, Toshikazu and Kakimoto, Naoya and Murayama, Takeshi and Tanimoto, Kotaro. Image preprocessing with contrast-limited adaptive histogram equalization improves the segmentation performance of deep learning for the articular disk of the tempo...

  3. [3]

    ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness

    Geirhos, Robert and Rubisch, Patricia and Michaelis, Claudio and Bethge, Matthias and Wichmann, Felix A and Brendel, Wieland. ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. 1811.12231

  4. [4]

    Contrastive learning of global and local features for medical image segmentation with limited annotations

    Chaitanya, Krishna and Erdil, Ertunc and Karani, Neerav and Konukoglu, Ender. Contrastive learning of global and local features for medical image segmentation with limited annotations

  5. [5]

    Grey matter pathology in multiple sclerosis

    Geurts, Jeroen J G and Barkhof, Frederik. Grey matter pathology in multiple sclerosis. Lancet Neurol

  6. [6]

    Gray matter imaging in multiple sclerosis: what have we learned?

    Hulst, Hanneke E and Geurts, Jeroen J G. Gray matter imaging in multiple sclerosis: what have we learned?. BMC Neurol

  7. [7]

    Accuracy of Marginal and Internal Adaptation of Advanced Lithium Disilicate Crowns Using Different Margin Designs (In Vitro Study)

    Mohamed, Hossam A and Azer, Amir and AboElHassan, Rewaa G. Accuracy of Marginal and Internal Adaptation of Advanced Lithium Disilicate Crowns Using Different Margin Designs (In Vitro Study). Int J Dent

  8. [8]

    When does label smoothing help?

    M \"u ller, Rafael and Kornblith, Simon and Hinton, Geoffrey. When does label smoothing help?. 1906.02629

  9. [9]

    One network to segment them all: A general, lightweight system for accurate 3D medical image segmentation

    Perslev, Mathias and Dam, Erik Bj rnager and Pai, Akshay and Igel, Christian. One network to segment them all: A general, lightweight system for accurate 3D medical image segmentation. Lecture Notes in Computer Science

  10. [10]

    SoftSeg : Advantages of soft versus binary training for image segmentation

    Gros, Charley and Lemay, Andreanne and Cohen-Adad, Julien. SoftSeg : Advantages of soft versus binary training for image segmentation. Med. Image Anal

  11. [11]

    nnU-Net : a self-configuring method for deep learning-based biomedical image segmentation

    Isensee, Fabian and Jaeger, Paul F and Kohl, Simon A A and Petersen, Jens and Maier-Hein, Klaus H. nnU-Net : a self-configuring method for deep learning-based biomedical image segmentation. Nat Methods

  12. [12]

    Spinal cord MRI in multiple sclerosis--diagnostic, prognostic and clinical value

    Kearney, Hugh and Miller, David H and Ciccarelli, Olga. Spinal cord MRI in multiple sclerosis--diagnostic, prognostic and clinical value. Nat Rev Neurol

  13. [13]

    Embracing imperfect datasets: A review of deep learning solutions for medical image segmentation

    Tajbakhsh, Nima and Jeyaseelan, Laura and Li, Qian and Chiang, Jeffrey N and Wu, Zhihao and Ding, Xiaowei. Embracing imperfect datasets: A review of deep learning solutions for medical image segmentation. Med Image Anal

  14. [14]

    Incorporating Boundary Uncertainty into loss functions for biomedical image segmentation

    Yeung, Michael and Yang, Guang and Sala, Evis and Sch \"o nlieb, Carola-Bibiane and Rundo, Leonardo. Incorporating Boundary Uncertainty into loss functions for biomedical image segmentation. 2111.00533

  15. [15]

    Shortcut learning in deep neural networks

    Geirhos, Robert and Jacobsen, J \"o rn-Henrik and Michaelis, Claudio and Zemel, Richard and Brendel, Wieland and Bethge, Matthias and Wichmann, Felix A. Shortcut learning in deep neural networks. Nat. Mach. Intell

  16. [16]

    High-field MRI of brain cortical substructure based on signal phase

    Duyn, Jeff H and van Gelderen, Peter and Li, Tie-Qiang and de Zwart, Jacco A and Koretsky, Alan P and Fukunaga, Masaki. High-field MRI of brain cortical substructure based on signal phase. Proc Natl Acad Sci U S A

  17. [17]

    Objective Evaluation of Multiple Sclerosis Lesion Segmentation using a Data Management and Processing Infrastructure

    Commowick, Olivier and Istace, Audrey and Kain, Micha \"e l and Laurent, Baptiste and Leray, Florent and Simon, Mathieu and Pop, Sorina Camarasu and Girard, Pascal and Am \'e li, Roxana and Ferr \'e , Jean-Christophe and Kerbrat, Anne and Tourdias, Thomas and Cervenansky, Fr \'e d \'e ric and Glatard, Tristan and Beaumont, J \'e r \'e my and Doyle, Senan ...

  18. [18]

    3D medical image segmentation with sparse annotation via cross-teaching between 3D and 2D networks

    Cai, Heng and Qi, Lei and Yu, Qian and Shi, Yinghuan and Gao, Yang. 3D medical image segmentation with sparse annotation via cross-teaching between 3D and 2D networks. Lecture Notes in Computer Science

  19. [19]

    Deep Learning Techniques for Medical Image Segmentation: Achievements and Challenges

    Hesamian, Mohammad Hesam and Jia, Wenjing and He, Xiangjian and Kennedy, Paul. Deep Learning Techniques for Medical Image Segmentation: Achievements and Challenges. J Digit Imaging

  20. [20]

    SCT : Spinal Cord Toolbox, an open-source software for processing spinal cord MRI data

    De Leener, Benjamin and L \'e vy, Simon and Dupont, Sara M and Fonov, Vladimir S and Stikov, Nikola and Louis Collins, D and Callot, Virginie and Cohen-Adad, Julien. SCT : Spinal Cord Toolbox, an open-source software for processing spinal cord MRI data. Neuroimage

  21. [21]

    Contrast limited adaptive histogram equalization

    Zuiderveld, Karel. Contrast limited adaptive histogram equalization. Graphics Gems

  22. [22]

    Journal of the Neurological Sciences , volume=

    Hallmarks of spinal cord pathology in multiple sclerosis , author=. Journal of the Neurological Sciences , volume=. 2024 , publisher=

  23. [23]

    Acta Neuropathologica , volume=

    The prevalence and topography of spinal cord demyelination in multiple sclerosis: a retrospective study , author=. Acta Neuropathologica , volume=. 2024 , publisher=

  24. [24]

    Imaging Neuroscience , volume=

    Automatic segmentation of spinal cord lesions in MS: A robust tool for axial T2-weighted MRI scans , author=. Imaging Neuroscience , volume=. 2025 , publisher=