pith. machine review for the scientific record. sign in

arxiv: 2604.26437 · v1 · submitted 2026-04-29 · 💻 cs.CV

Recognition: unknown

Are Data Augmentation and Segmentation Always Necessary? Insights from COVID-19 X-Rays and a Methodology Thereof

Authors on Pith no claims yet

Pith reviewed 2026-05-07 10:58 UTC · model grok-4.3

classification 💻 cs.CV
keywords COVID-19 detectionchest X-raylung segmentationdata augmentationclass activation mappingdeep learningoverfittingCNN
0
0 comments X

The pith

Lung segmentation is required for reliable COVID-19 detection in chest X-rays, while excessive data augmentation causes overfitting and accuracy loss.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether lung segmentation and data augmentation are always needed for deep learning classification of COVID-19 from chest X-rays. Expert review of class activation maps shows that models attend to lung regions, establishing segmentation as necessary to prevent unreliable predictions based on irrelevant image areas. Controlled experiments demonstrate that test accuracy declines once augmentation exceeds a certain threshold, which the authors attribute to overfitting. They present the SDL-COVID method that reaches 95.21 percent precision and a lower false negative rate as a more dependable alternative.

Core claim

Careful analysis of X-ray images and their corresponding heat maps under expert medical supervision reveals that lung segmentation is necessary for accurate COVID-19 prediction. Test accuracy significantly drops beyond a certain threshold with additional augmented images, indicating model overfitting. The proposed SDL-COVID methodology achieves a precision of 95.21% and a lower false negative rate, ensuring its reliability for COVID-19 detection using chest X-rays.

What carries the argument

Class activation mapping (CAM) to generate heatmaps that visualize CNN attention regions on lung areas, paired with side-by-side training on augmented and non-augmented datasets to locate the overfitting threshold in the SDL-COVID pipeline.

If this is right

  • Skipping lung segmentation leaves CNNs free to base COVID-19 predictions on non-lung image features visible in heatmaps.
  • Augmenting the dataset past an optimal point lowers test accuracy, showing overfitting in medical X-ray classification.
  • SDL-COVID reaches 95.21 percent precision with fewer false negatives than unoptimized approaches.
  • Expert validation of activation maps is required to confirm that models attend to the correct anatomical structures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • CAM-based checks could be used for other chest X-ray tasks such as pneumonia detection to decide when segmentation is required.
  • Augmentation levels should be tested empirically for each medical imaging dataset rather than assumed to have a universal safe limit.
  • Embedding expert heatmap review into model pipelines may increase clinical acceptance of AI tools for respiratory diagnostics.

Load-bearing premise

Expert-reviewed class activation maps definitively show that all accurate models must focus on lung regions, and that accuracy drops with more augmentation are caused only by overfitting rather than dataset size or model choices.

What would settle it

A CNN trained on unsegmented COVID-19 X-rays that reaches high test accuracy while its class activation maps show attention mainly outside the lungs, or a model where unlimited augmentation continues to raise or hold test accuracy without overfitting indicators.

Figures

Figures reproduced from arXiv: 2604.26437 by Aman Swaraj, Arnav Agarwal, Hitendra Singh Bhadouria, Karan Verma, Sandeep Kumar.

Figure 1
Figure 1. Figure 1: Presence of pacemakers and ECG wires in X view at source ↗
Figure 2
Figure 2. Figure 2: Steps involving RQ1 for validation of usage of lungs segmentation. view at source ↗
Figure 3
Figure 3. Figure 3: Steps involved in RQ2 concerning data augmentation. view at source ↗
Figure 4
Figure 4. Figure 4: Our proposed approach, SDL-COVID view at source ↗
Figure 6
Figure 6. Figure 6: From left – Original, Gaussian Unsharp Mask, Laplacian Unsharp Mask, Histogram Equalization and CLAHE. 2.2 Lungs segmentation As per our hypothesis, we apply CAM to validate the usage of lungs segmentation. Post validation, we use U-Net architecture to segment the lungs. U-Net is a fully convolutional neural network adept for semantic segmentation in biomedical images. Fig.7 shows the resultant images afte… view at source ↗
Figure 7
Figure 7. Figure 7: From left- Original X-ray, Enhanced X-ray, Lung segmentation Mask, Segmented Lungs view at source ↗
Figure 8
Figure 8. Figure 8: Performance of various CNN models on original dataset without segmentation. view at source ↗
Figure 9
Figure 9. Figure 9: Heatmap visualization of unsegmented chest X view at source ↗
Figure 10
Figure 10. Figure 10: Performance of models in regards to percentage augmentation of total dataset. view at source ↗
Figure 11
Figure 11. Figure 11: Confusion matrix obtained while evaluation of the proposed model on the test dataset. view at source ↗
Figure 12
Figure 12. Figure 12: Comparison of accuracy of Unsegmented and Segmented results over various models. view at source ↗
Figure 13
Figure 13. Figure 13: From left – Original CXR, HE-segmented lungs and respective heat map. Additionally, we also provide counterfactual explanation of the predicted class by highlighting which region correspond to the opposite class through grad cam analysis (fig. 14, 15) view at source ↗
Figure 14
Figure 14. Figure 14: From left – Original CXR (Covid +ve), HE-segmented lungs, heatmap showing particular region that triggered the positive prediction and its counterfactual explanation view at source ↗
Figure 15
Figure 15. Figure 15: From left – Original CXR (Covid -ve), HE-segmented lungs, heatmap showing particular region that triggered the negative prediction and its counterfactual explanation. It is quite evident from fig 14 and 15, that the positive and negative classification are complimentary and the pixels that trigger the positive prediction are opposite to the pixels that trigger the negative ones. Another point to note is t… view at source ↗
Figure 16
Figure 16. Figure 16: From left –Confusion matrix generated over test dataset of Unsegmented and segmented lungs images. Regarding augmentation, while most of the earlier works have assumed that augmentation will give good results and directly applied augmentation to their dataset, some have not considered augmentation at all. Both these cases have an assumption based upon standard practices or theoretical backing. Our work ho… view at source ↗
read the original abstract

Purpose: Rapid and reliable diagnostic tools are crucial for managing respiratory diseases like COVID-19, where chest X-ray analysis coupled with artificial intelligence techniques has proven invaluable. However, most existing works on X-ray images have not considered lung segmentation, raising concerns about their reliability. Additionally, some have employed disproportionate and impractical augmentation techniques, making models less generalized and prone to overfitting. This study presents a critical analysis of both issues and proposes a methodology (SDL-COVID) for more reliable classification of chest X-rays for COVID-19 detection. Methods: We use class activation mapping to obtain a visual understanding of the predictions made by Convolutional Neural Networks (CNNs), validating the necessity of lung segmentation. To analyze the effect of data augmentation, deep learning models are implemented on two levels: one for an augmented dataset and another for a non-augmented dataset. Results: Careful analysis of X-ray images and their corresponding heat maps under expert medical supervision reveals that lung segmentation is necessary for accurate COVID-19 prediction. Regarding data augmentation, test accuracy significantly drops beyond a certain threshold with additional augmented images, indicating model overfitting. Conclusion: Our proposed methodology, SDL-COVID, achieves a precision of 95.21% and a lower false negative rate, ensuring its reliability for COVID-19 detection using chest X-rays.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper claims that expert-reviewed class activation maps demonstrate the necessity of lung segmentation for accurate COVID-19 X-ray classification, that test accuracy declines with excessive data augmentation due to overfitting, and that the proposed SDL-COVID methodology achieves 95.21% precision with a lower false-negative rate.

Significance. If the necessity and overfitting claims were supported by controlled ablations, the work would usefully caution against unexamined preprocessing in medical imaging pipelines and highlight the value of expert oversight on model attention maps. The reported precision figure, if reproducible, would indicate a competitive baseline for COVID-19 detection.

major comments (3)
  1. [Abstract/Results] Abstract and Results: the conclusion that 'lung segmentation is necessary' rests on expert inspection of CAM heatmaps, yet no ablation is described that trains identical CNNs on raw versus explicitly segmented images while holding data splits, hyperparameters, and augmentation fixed; without this counterfactual, the necessity claim remains interpretive rather than demonstrated.
  2. [Results] Results: the statement that 'test accuracy significantly drops beyond a certain threshold with additional augmented images' is attributed to overfitting, but no training/validation curves, learning-rate schedules, or statistical tests (e.g., paired t-test on accuracy differences) are referenced to rule out dataset-specific effects or augmentation realism issues.
  3. [Methods] Methods: quantitative details on dataset sizes, sources, train/test splits, exact CNN architectures, augmentation parameters (e.g., rotation range, intensity thresholds), and the concrete components of SDL-COVID are absent, preventing verification of the reported 95.21% precision and lower false-negative rate.
minor comments (1)
  1. [Abstract] Abstract: the phrase 'two levels' for the augmentation experiments is undefined; clarifying what these levels consist of (e.g., specific augmentation counts or policies) would improve readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and commit to revisions where the manuscript was incomplete.

read point-by-point responses
  1. Referee: [Abstract/Results] Abstract and Results: the conclusion that 'lung segmentation is necessary' rests on expert inspection of CAM heatmaps, yet no ablation is described that trains identical CNNs on raw versus explicitly segmented images while holding data splits, hyperparameters, and augmentation fixed; without this counterfactual, the necessity claim remains interpretive rather than demonstrated.

    Authors: We agree that the current evidence is interpretive, relying on expert-reviewed CAMs that show non-segmented models attending to extraneous regions. A direct ablation with identical CNNs, fixed splits, hyperparameters, and augmentation would strengthen the necessity claim. We will add this controlled comparison in the revised manuscript. revision: yes

  2. Referee: [Results] Results: the statement that 'test accuracy significantly drops beyond a certain threshold with additional augmented images' is attributed to overfitting, but no training/validation curves, learning-rate schedules, or statistical tests (e.g., paired t-test on accuracy differences) are referenced to rule out dataset-specific effects or augmentation realism issues.

    Authors: The observed accuracy decline with excessive augmentation was noted across our experiments. To better support the overfitting interpretation and exclude confounds, we will include training/validation curves, learning-rate details, and statistical tests such as paired t-tests on accuracy differences in the revised Results. revision: yes

  3. Referee: [Methods] Methods: quantitative details on dataset sizes, sources, train/test splits, exact CNN architectures, augmentation parameters (e.g., rotation range, intensity thresholds), and the concrete components of SDL-COVID are absent, preventing verification of the reported 95.21% precision and lower false-negative rate.

    Authors: We acknowledge these details were omitted. The revised Methods section will provide full quantitative information on dataset sizes, sources, splits, CNN architectures, augmentation parameters, and the specific components of SDL-COVID to enable reproducibility and verification of the 95.21% precision result. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical claims rest on experiments, not self-referential definitions or fitted predictions

full rationale

The paper's core claims—that lung segmentation is necessary based on CAM heatmaps under expert review and that excessive augmentation causes overfitting—are presented as outcomes of direct experimental comparisons (augmented vs. non-augmented datasets) and visual analysis rather than any derivation, equation, or parameter fit that reduces to its own inputs. No mathematical modeling, uniqueness theorems, or self-citations are invoked as load-bearing steps in the provided abstract or methodology description; the SDL-COVID performance numbers are reported as measured results from implementation. This leaves the derivation chain self-contained against external benchmarks with no reduction by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only abstract available; no explicit free parameters, axioms, or invented entities are described. The necessity of segmentation and overfitting threshold are presented as empirical findings without stated mathematical assumptions.

pith-pipeline@v0.9.0 · 5549 in / 1109 out tokens · 56954 ms · 2026-05-07T10:58:02.281444+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

34 extracted references · 6 canonical work pages

  1. [1]

    Swaraj, A., Verma, K., Kaur, A., Singh, G., Kumar, A., & de Sales, L. M. (2021). Implementation of stacking based ARIMA model for prediction of Covid-19 cases in India. Journal of Biomedical Informatics, 121, 103887

  2. [2]

    Y., Ng, M

    Lee, E. Y., Ng, M. Y., & Khong, P. L. (2020). COVID-19 pneumonia: what has CT taught us? The Lancet Infectious Diseases, 20(4), 384- 385

  3. [3]

    Ker, J., Wang, L., Rao, J., & Lim, T. (2017). Deep learning applications in medical image analysis. Ieee Access, 6, 9375 -9389

  4. [4]

    A., Alsaeedi, A., & Saeed, F

    Moujahid, H., Cherradi, B., Al-Sarem, M., Bahatti, L., Eljialy, B. A., Alsaeedi, A., & Saeed, F. (2021). Combining CNN and Grad -Cam for COVID-19 Disease Prediction and Visual Explanation. Intelligent Automation & Soft Computing, 32(2), 723 -745

  5. [5]

    Z., Islam, M

    Islam, M. Z., Islam, M. M., & Asraf, A. (2020). A combined deep CNN -LSTM network for the detection of novel coronavirus (COVID -19) using X-ray images. Informatics in medicine unlocked, 20, 100412

  6. [6]

    Khalifa, N. E. M., Taha, M. H. N., Hassanien, A. E., & Elghamrawy, S. (2020). Detection of coronavirus (COVID-19) associated pneumonia based on generative adversarial networks and a fine -tuned deep transfer learning model using chest X -ray dataset. arXiv p reprint arXiv:2004.01184

  7. [7]

    B., Sarker, S., Rahman, S., & Shah, F

    Ahmed, S., Hossain, T., Hoque, O. B., Sarker, S., Rahman, S., & Shah, F. M. (2021). Automated COVID -19 detection from chest x -ray images: a high-resolution network (hrnet) approach. SN computer science, 2(4), 1 -17

  8. [8]

    Rahman, T., Khandakar, A., Qiblawey, Y., Tahir, A., Kiranyaz, S., Kashem, S. B. A., ... & Chowdhury, M. E. (2021). Exploring the effect of image enhancement techniques on COVID-19 detection using chest X-ray images. Computers in biology and medicine, 132, 104319

  9. [9]

    I., de Moura Ramos, J

    Morís, D. I., de Moura Ramos, J. J., Buján, J. N., & Hortas, M. O. (2021). Data augmentation approaches using cycle -consistent adversarial networks for improving COVID-19 screening in portable chest X-ray images. Expert Systems with Applications, 185, 115681

  10. [10]

    Q., & Wong, A

    Wang, L., Lin, Z. Q., & Wong, A. (2020). Covid -net: A tailored deep convolutional neural network design for detection of covid -19 cases from chest x-ray images. Scientific Reports, 10(1), 1-12

  11. [11]

    Narin, A., Kaya, C., & Pamuk, Z. (2021). Automatic detection of coronavirus disease (covid-19) using x-ray images and deep convolutional neural networks. Pattern Analysis and Applications, 24(3), 1207 -1220

  12. [12]

    H., Abdulkareem, G

    Abed, M., Mohammed, K. H., Abdulkareem, G. Z., Begonya, M., Salama, A., Maashi, M. S., ... & Mutlag, L. (2021). A comprehensi ve 22 investigation of machine learning feature extraction and classification methods for automated diagnosis of COVID -19 based on X -ray images. Computers, Materials, & Continua, 3289-3310

  13. [13]

    O., Pereira, R

    Teixeira, L. O., Pereira, R. M., Bertolini, D., Oliveira, L. S., Nanni, L., Cavalcanti, G. D., & Costa, Y. M. (2021). Impact of lung segmentation on the diagnosis and explanation of COVID-19 in chest X-ray images. Sensors, 21(21), 7116

  14. [14]

    Waheed, A., Goyal, M., Gupta, D., Khanna, A., Al -Turjman, F., & Pinheiro, P. R. (2020). Covidgan: data augmentation using auxiliary classifier gan for improved covid-19 detection. Ieee Access, 8, 91916-91923

  15. [15]

    H., & Masadeh, M

    Masadeh, M., Masadeh, A., Alshorman, O., Khasawneh, F. H., & Masadeh, M. A. (2022). An efficient machine learning -based COVID- 19 identification utilizing chest X-ray images. IAES International Journal of Artificial Intelligence, 11(1), 356

  16. [16]

    Maguolo, G., & Nanni, L. (2021). A critic evaluation of methods for COVID-19 automatic detection from X-ray images. Information Fusion, 76, 1-7

  17. [17]

    S., Kumar, K., Swaraj, A., Verma, K., Kaur, A., Sharma, S.,

    Bhadouria, H. S., Kumar, K., Swaraj, A., Verma, K., Kaur, A., Sharma, S., ... & de Sales, L. M. (2021). Classification of COV ID-19 on chest X-Ray images using Deep Learning model with Histogram Equalization and Lungs Segmentation. arXiv preprint arXiv:2112.02478

  18. [18]

    AI for radiographic COVID-19 detection selects short- cuts over signal.Nature Machine Intelligence, 3:610– 619, 2021

    DeGrave, A.J., Janizek, J.D. & Lee, SI. AI for radiographic COVID-19 detection selects shortcuts over signal. Nat Mach Intell 3, 610–619 (2021). https://doi.org/10.1038/s42256-021-00338-7

  19. [19]

    Roberts, M., Driggs, D., Thorpe, M. et al. Common pitfalls and recommendations for using machine learning to detect and progn osticate for COVID-19 using chest radiographs and CT scans. Nat Mach Intell 3, 199 –217 (2021). https://doi.org/10.1038/s42256-021-00307-0

  20. [20]

    L., Sevillano -García, I., Rey -Area, M., Charte, D.,

    Tabik, S., Gómez -Ríos, A., Martín -Rodríguez, J. L., Sevillano -García, I., Rey -Area, M., Charte, D., ... & Herrera, F. (2020). COVIDGR dataset and COVID-SDNet methodology for predicting COVID -19 based on chest X -ray images. IEEE journal of biomedical and he alth informatics, 24(12), 3595-3605

  21. [21]

    Validating deep learning inference during chest X -ray classification for COVID-19 screening

    Sadre, Robbie, Baskaran Sundaram, Sharmila Majumdar, and Daniela Ushizima. "Validating deep learning inference during chest X -ray classification for COVID-19 screening." Scientific reports 11, no. 1 (2021): 1 -10

  22. [22]

    & Ren, K

    Fang, Z., Zhao, H., Ren, J., Maclellan, C., Xia, Y., Li, S., ... & Ren, K. (2022). SC2Net: A Novel Segmentation -based Classification Network for Detection of COVID-19 in Chest X-ray Images. IEEE Journal of Biomedical and Health Informatics

  23. [23]

    Detection and analysis of COVID- 19 in medical images using deep learning techniques

    Yang, Dandi, Cristhian Martinez, Lara Visuña, Hardev Khandhar, Chintan Bhatt, and Jesus Carretero. "Detection and analysis of COVID- 19 in medical images using deep learning techniques." Scientific Reports 11, no. 1 (2021): 1 -13

  24. [24]

    EDL-COVID: Ensemble Deep Learning for COVID-19 Case Detection From Chest X-Ray Images,

    S. Tang et al., "EDL-COVID: Ensemble Deep Learning for COVID-19 Case Detection From Chest X-Ray Images," in IEEE Transactions on Industrial Informatics, vol. 17, no. 9, pp. 6539-6549, Sept. 2021, doi: 10.1109/TII.2021.3057683

  25. [25]

    COVIDiagnosis -Net: Deep Bayes -SqueezeNet based diagnosis of the coronavirus disease 2019 (COVID-19) from X-ray images

    Ucar, Ferhat, and Deniz Korkmaz. "COVIDiagnosis -Net: Deep Bayes -SqueezeNet based diagnosis of the coronavirus disease 2019 (COVID-19) from X-ray images." Medical hypotheses 140 (2020): 109761

  26. [26]

    (2015, October)

    Ronneberger, O., Fischer, P., & Brox, T. (2015, October). U -net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer -assisted intervention (pp. 234-241). Springer, Cham

  27. [27]

    Zhao, W., Zhong, Z., Xie, X., Yu, Q., & Liu, J. (2020). Relation between chest CT findings and clinical conditions of coronav irus disease (COVID-19) pneumonia: a multicenter study. AJR Am J Roentgenol, 214(5), 1072 -1077

  28. [28]

    Yasin, R., & Gouda, W. (2020). Chest X -ray findings monitoring COVID -19 disease course and severity. Egyptian Journal of Radiology and Nuclear Medicine, 51(1), 1-18

  29. [29]

    He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE confer ence on computer vision and pattern recognition (pp. 770-778)

  30. [30]

    & Rabinovich, A

    Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., ... & Rabinovich, A. (2015). Going deeper with convoluti ons. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1 -9)

  31. [31]

    Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25

  32. [32]

    SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size.arXiv2016, arXiv:1602.07360

    Iandola, F. N., Han, S., Moskewicz, M. W., Ashraf, K., Dally, W. J., & Keutzer, K. (2016). SqueezeNet: AlexNet -level accuracy with 50x fewer parameters and< 0.5 MB model size. arXiv preprint arXiv:1602.07360

  33. [33]

    J., Li, K., & Fei-Fei, L

    Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009, June). Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition (pp. 248 -255). Ieee

  34. [34]

    B., Díaz -Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A.,

    Arrieta, A. B., Díaz -Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., ... & Herrera, F. (2020). Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information fusion, 58, 8 2-115