Disentangling Sampling from Training Budget in Class-Imbalanced CT Body Composition Segmentation
Pith reviewed 2026-05-21 07:17 UTC · model grok-4.3
The pith
Episodic sampling for class-balanced batches in low-data CT segmentation delays overfitting compared to random or weighted sampling, revealing training iteration budget as a key evaluation confound.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Under matched training budgets, random and weighted sampling overfit earlier while episodic sampling improved for approximately three times more iterations before plateauing, identifying the training iteration budget as an under-recognized confound.
Load-bearing premise
That differences in performance are driven primarily by the class-balancing property of episodic sampling rather than by unstated differences in hyperparameter schedules, batch construction details, or optimization dynamics.
Figures
read the original abstract
Class imbalance is a fundamental challenge in medical image segmentation, where frequent classes typically dominate training at the expense of rare classes. Loss-based approaches mitigate imbalance by reweighting the per-pixel loss within the batch, while sampling strategies control which images enter the batch. Yet neither explicitly controls which classes appear within the batch, leaving rare-class exposure only partially rebalanced. In this work, we adopt episodic sampling from few-shot learning to promote class-balanced batch construction in a fully supervised setting. We decouple episodic sampling from its conventional metric-learning context and evaluate it in body composition segmentation in CT. We compare episodic sampling against random and weighted sampling on nine muscle and adipose tissues, derived from 210 scans of the public SAROS dataset. Training is performed under full- and low-data regimes, with additional comparisons under matched training iteration budgets. Under full-data training, all three strategies performed comparably (mean Dice 0.882 for episodic, 0.878 for random and weighted). Under low-data training, episodic sampling outperformed random and weighted (0.787 vs. 0.758 and 0.762), driven by a 12-fold difference in training iterations. Under matched training budgets, random and weighted overfit earlier, while episodic improved for approximately three times more iterations before plateauing. Our findings identify the training iteration budget as under-recognized confound in sampling strategies, motivating iteration-aware evaluation protocols for small datasets. Furthermore, the residual advantage of episodic sampling is consistent with an implicit regularization effect of class-balanced batches, offering a low-cost, model-agnostic strategy for class-imbalanced medical image segmentation. Code is available at https://github.com/iasonsky/episodic-sampling.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that episodic sampling (decoupled from metric learning) outperforms random and weighted sampling for class-imbalanced CT body composition segmentation on the public SAROS dataset (210 scans, 9 tissues). Under full-data training all three strategies yield comparable mean Dice scores (~0.88). Under low-data regimes episodic sampling reaches higher Dice (0.787 vs. 0.758/0.762) because it trains for ~12× more iterations before plateauing. When iteration budgets are explicitly matched, random and weighted sampling overfit earlier while episodic sampling continues improving for ~3× more iterations; the authors attribute the residual advantage to an implicit regularization effect of class-balanced batches and identify training budget as an under-recognized confound, motivating iteration-aware evaluation protocols. Code is released.
Significance. If the matched-budget results hold under fully controlled conditions, the work is significant for medical image segmentation: it provides concrete evidence that sampling-strategy comparisons are confounded by iteration count, supplies a simple model-agnostic remedy via episodic batch construction, and demonstrates the effect on a public dataset with explicit full-data / low-data / matched-budget contrasts. The released code supports reproducibility.
major comments (1)
- [Matched-budget experiments] The central attribution—that the ~3× longer plateau under matched budgets arises from the class-balancing property of episodic sampling rather than from unstated differences in optimizer, learning-rate schedule, weight decay, batch size, or episode-construction details—is load-bearing. The manuscript must explicitly state (and ideally tabulate) that every hyper-parameter and optimization setting was identical across the three sampling strategies; without this, the longer training trajectory could be explained by optimization dynamics instead of class balance. (See the matched-budget paragraph in the abstract and the corresponding experimental subsection.)
minor comments (2)
- [Abstract / Results] The abstract and results sections report point estimates for Dice scores but omit per-fold or per-run standard deviations and any statistical significance tests for the reported differences (e.g., 0.787 vs. 0.758/0.762).
- [Methods] Implementation details on how episodic sampling is realized in the fully-supervised regime (exact episode size, how rare-class co-occurrence is enforced, batch-size equivalence with random/weighted baselines) should be expanded for reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed review. The major comment raises a valid point about ensuring full transparency in the matched-budget experiments. We have revised the manuscript to explicitly confirm and tabulate identical hyperparameters across sampling strategies, which directly addresses the concern and strengthens the attribution to class balancing.
read point-by-point responses
-
Referee: The central attribution—that the ~3× longer plateau under matched budgets arises from the class-balancing property of episodic sampling rather than from unstated differences in optimizer, learning-rate schedule, weight decay, batch size, or episode-construction details—is load-bearing. The manuscript must explicitly state (and ideally tabulate) that every hyper-parameter and optimization setting was identical across the three sampling strategies; without this, the longer training trajectory could be explained by optimization dynamics instead of class balance. (See the matched-budget paragraph in the abstract and the corresponding experimental subsection.)
Authors: We agree that explicit confirmation is necessary to rule out optimization confounds. In the revised manuscript we have added a new paragraph in Section 3.2 (Experimental Setup) stating that all three sampling strategies used identical settings: Adam optimizer with the same initial learning rate and cosine decay schedule, identical weight decay, fixed batch size of 4, and the same episode-construction hyperparameters for episodic sampling. We have also inserted Table 2, which tabulates every hyperparameter value side-by-side for random, weighted, and episodic sampling. These additions make clear that the observed differences in plateau length arise from the sampling strategy itself rather than from any unstated optimization differences. revision: yes
Circularity Check
No circularity: purely empirical comparison with no derivations or self-referential predictions
full rationale
The paper reports direct empirical results from training segmentation models under different sampling strategies (random, weighted, episodic) on the SAROS dataset, measuring Dice scores in full-data and low-data regimes with matched iteration budgets. No equations, fitted parameters renamed as predictions, or derivation chains appear; performance differences are presented as observed outcomes rather than constructed from internal definitions. The adoption of episodic sampling is described as a methodological choice decoupled from metric learning, with no load-bearing self-citations or uniqueness theorems invoked to force the conclusions. The central claim that training iteration budget is a confound rests on the reported iteration counts and plateau behaviors, which are external to any internal fitting or redefinition.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Dice coefficient is a suitable primary metric for comparing segmentation performance across sampling strategies.
Reference graph
Works this paper leans on
-
[1]
Scientific Data , author =. 2024 , keywords =. doi:10.1038/s41597-024-03337-6 , abstract =
-
[2]
International Conference on Learning Representations (ICLR) , year=
Decoupling Representation and Classifier for Long-Tailed Recognition , author=. International Conference on Learning Representations (ICLR) , year=
- [3]
-
[4]
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages=
BBN: Bilateral-Branch Network with Cumulative Learning for Long-Tailed Visual Recognition , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages=. 2020 , url=
work page 2020
-
[5]
Proceedings of the 31st International Conference on Machine Learning (ICML) , pages=
Accelerating Minibatch Stochastic Gradient Descent using Stratified Sampling , author=. Proceedings of the 31st International Conference on Machine Learning (ICML) , pages=. 2014 , url=
work page 2014
-
[6]
Generalised Dice overlap as a deep learning loss function for highly unbalanced segmentations
Carole H. Sudre and Wenqi Li and Tom Vercauteren and S. Generalised Dice overlap as a deep learning loss function for highly unbalanced segmentations , journal =. 2017 , url =. 1707.03237 , timestamp =
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[7]
Simplifying Neural Network Training Under Class Imbalance , author=. 2023 , eprint=
work page 2023
-
[8]
What is the Effect of Importance Weighting in Deep Learning? , author=. 2019 , eprint=
work page 2019
- [9]
-
[10]
Isensee, Fabian and Jaeger, Paul F. and Kohl, Simon A. A. and Petersen, Jens and Maier-Hein, Klaus H. , year =. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation , volume =. Nature Methods , publisher =. doi:10.1038/s41592-020-01008-z , number =
-
[11]
How Important is Importance Sampling for Deep Budgeted Training? , author=. 2021 , eprint=
work page 2021
-
[12]
Budgeted Training: Rethinking Deep Neural Network Training Under Resource Constraints , author=. 2020 , eprint=
work page 2020
-
[13]
Agrim Gupta and Piotr Doll. CoRR , volume =. 2019 , url =. 1908.03195 , timestamp =
-
[14]
Not All Samples Are Created Equal: Deep Learning with Importance Sampling , journal =
Angelos Katharopoulos and Fran. Not All Samples Are Created Equal: Deep Learning with Importance Sampling , journal =. 2018 , url =. 1803.00942 , timestamp =
-
[15]
Rethinking Semi-Supervised Medical Image Segmentation: A Variance-Reduction Perspective , author=. 2023 , eprint=
work page 2023
-
[16]
Chawla, N. V. and Bowyer, K. W. and Hall, L. O. and Kegelmeyer, W. P. , year=. SMOTE: Synthetic Minority Over-sampling Technique , volume=. doi:10.1613/jair.953 , journal=
-
[17]
Instance-Aware Repeat Factor Sampling for Long-Tailed Object Detection , author=. 2023 , eprint=
work page 2023
-
[18]
Kamnitsas, Konstantinos and Ledig, Christian and Newcombe, Virginia F.J. and Simpson, Joanna P. and Kane, Andrew D. and Menon, David K. and Rueckert, Daniel and Glocker, Ben , year =. Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation , volume =. doi:10.1016/j.media.2016.10.004 , journal =
-
[20]
Haibo He and Garcia, E.A. , year =. Learning from Imbalanced Data , volume =. IEEE Transactions on Knowledge and Data Engineering , publisher =. doi:10.1109/tkde.2008.239 , number =
-
[21]
PANet: Few-Shot Image Semantic Segmentation with Prototype Alignment , author=. 2020 , eprint=
work page 2020
-
[22]
Ouyang, Cheng and Biffi, Carlo and Chen, Chen and Kart, Turkay and Qiu, Huaqi and Rueckert, Daniel , year =. Self-supervision with Superpixels: Training Few-Shot Medical Image Segmentation Without Annotation , ISBN =. doi:10.1007/978-3-030-58526-6_45 , booktitle =
-
[23]
Ma, Jun and Chen, Jianan and Ng, Matthew and Huang, Rui and Li, Yu and Li, Chen and Yang, Xiaoping and Martel, Anne L. , year =. Loss odyssey in medical image segmentation , volume =. doi:10.1016/j.media.2021.102035 , journal =
- [24]
-
[25]
Measures of the amount of ecologic association between species , author=. Ecology , volume=. 1945 , publisher=
work page 1945
-
[26]
Taha, Abdel Aziz and Hanbury, Allan , year =. Metrics for evaluating 3D medical image segmentation: analysis, selection, and tool , volume =. BMC Medical Imaging , publisher =. doi:10.1186/s12880-015-0068-x , number =
-
[27]
Investigative Radiology , volume=
BOA: a CT-based body and organ analysis for radiologists at the point of care , author=. Investigative Radiology , volume=. 2024 , publisher=
work page 2024
-
[28]
Blankemeier, Louis and Desai, Arjun and Chaves, Juan Manuel Zambrano and Wentland, Andrew and Yao, Sally and Reis, Eduardo and others , title =. 2023 , doi =
work page 2023
-
[29]
and Heiliger, Christian and Tschaidse, Tengis and Jarmusch, Stefanie and Auhage, Liv A
Hofmann, Felix O. and Heiliger, Christian and Tschaidse, Tengis and Jarmusch, Stefanie and Auhage, Liv A. and Aghamaliyev, Ughur and others , title =. Sci. Rep. , volume =. 2025 , doi =
work page 2025
-
[30]
European Journal of Radiology , author =
Artificial intelligence for body composition and sarcopenia evaluation on computed tomography:. European Journal of Radiology , author =. 2022 , note =. doi:10.1016/j.ejrad.2022.110218 , abstract =
-
[31]
Opportunistic. RadioGraphics , author =. 2021 , keywords =. doi:10.1148/rg.2021200056 , language =
-
[32]
Accounting for Variance in Machine Learning Benchmarks , author=. 2021 , eprint=
work page 2021
-
[33]
IEEE Transactions on Medical Imaging , author =
Imbalanced. IEEE Transactions on Medical Imaging , author =. 2025 , keywords =. doi:10.1109/TMI.2024.3524253 , abstract =
-
[34]
Computers in Biology and Medicine , author =
A deep ensemble medical image segmentation with novel sampling method and loss function , volume =. Computers in Biology and Medicine , author =. 2024 , pages =. doi:10.1016/j.compbiomed.2024.108305 , abstract =
-
[35]
IEEE Journal of Biomedical and Health Informatics , author =
An. IEEE Journal of Biomedical and Health Informatics , author =. 2024 , keywords =. doi:10.1109/JBHI.2023.3330667 , abstract =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.