arxiv: 2604.26437 · v1 · submitted 2026-04-29 · 💻 cs.CV

Recognition: unknown

Are Data Augmentation and Segmentation Always Necessary? Insights from COVID-19 X-Rays and a Methodology Thereof

Aman Swaraj , Arnav Agarwal , Hitendra Singh Bhadouria , Sandeep Kumar , Karan Verma

Authors on Pith no claims yet

Pith reviewed 2026-05-07 10:58 UTC · model grok-4.3

classification 💻 cs.CV

keywords COVID-19 detectionchest X-raylung segmentationdata augmentationclass activation mappingdeep learningoverfittingCNN

0 comments

The pith

Lung segmentation is required for reliable COVID-19 detection in chest X-rays, while excessive data augmentation causes overfitting and accuracy loss.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether lung segmentation and data augmentation are always needed for deep learning classification of COVID-19 from chest X-rays. Expert review of class activation maps shows that models attend to lung regions, establishing segmentation as necessary to prevent unreliable predictions based on irrelevant image areas. Controlled experiments demonstrate that test accuracy declines once augmentation exceeds a certain threshold, which the authors attribute to overfitting. They present the SDL-COVID method that reaches 95.21 percent precision and a lower false negative rate as a more dependable alternative.

Core claim

Careful analysis of X-ray images and their corresponding heat maps under expert medical supervision reveals that lung segmentation is necessary for accurate COVID-19 prediction. Test accuracy significantly drops beyond a certain threshold with additional augmented images, indicating model overfitting. The proposed SDL-COVID methodology achieves a precision of 95.21% and a lower false negative rate, ensuring its reliability for COVID-19 detection using chest X-rays.

What carries the argument

Class activation mapping (CAM) to generate heatmaps that visualize CNN attention regions on lung areas, paired with side-by-side training on augmented and non-augmented datasets to locate the overfitting threshold in the SDL-COVID pipeline.

If this is right

Skipping lung segmentation leaves CNNs free to base COVID-19 predictions on non-lung image features visible in heatmaps.
Augmenting the dataset past an optimal point lowers test accuracy, showing overfitting in medical X-ray classification.
SDL-COVID reaches 95.21 percent precision with fewer false negatives than unoptimized approaches.
Expert validation of activation maps is required to confirm that models attend to the correct anatomical structures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

CAM-based checks could be used for other chest X-ray tasks such as pneumonia detection to decide when segmentation is required.
Augmentation levels should be tested empirically for each medical imaging dataset rather than assumed to have a universal safe limit.
Embedding expert heatmap review into model pipelines may increase clinical acceptance of AI tools for respiratory diagnostics.

Load-bearing premise

Expert-reviewed class activation maps definitively show that all accurate models must focus on lung regions, and that accuracy drops with more augmentation are caused only by overfitting rather than dataset size or model choices.

What would settle it

A CNN trained on unsegmented COVID-19 X-rays that reaches high test accuracy while its class activation maps show attention mainly outside the lungs, or a model where unlimited augmentation continues to raise or hold test accuracy without overfitting indicators.

Figures

Figures reproduced from arXiv: 2604.26437 by Aman Swaraj, Arnav Agarwal, Hitendra Singh Bhadouria, Karan Verma, Sandeep Kumar.

**Figure 1.** Figure 1: Presence of pacemakers and ECG wires in X view at source ↗

**Figure 2.** Figure 2: Steps involving RQ1 for validation of usage of lungs segmentation. view at source ↗

**Figure 3.** Figure 3: Steps involved in RQ2 concerning data augmentation. view at source ↗

**Figure 4.** Figure 4: Our proposed approach, SDL-COVID view at source ↗

**Figure 6.** Figure 6: From left – Original, Gaussian Unsharp Mask, Laplacian Unsharp Mask, Histogram Equalization and CLAHE. 2.2 Lungs segmentation As per our hypothesis, we apply CAM to validate the usage of lungs segmentation. Post validation, we use U-Net architecture to segment the lungs. U-Net is a fully convolutional neural network adept for semantic segmentation in biomedical images. Fig.7 shows the resultant images afte… view at source ↗

**Figure 7.** Figure 7: From left- Original X-ray, Enhanced X-ray, Lung segmentation Mask, Segmented Lungs view at source ↗

**Figure 8.** Figure 8: Performance of various CNN models on original dataset without segmentation. view at source ↗

**Figure 9.** Figure 9: Heatmap visualization of unsegmented chest X view at source ↗

**Figure 10.** Figure 10: Performance of models in regards to percentage augmentation of total dataset. view at source ↗

**Figure 11.** Figure 11: Confusion matrix obtained while evaluation of the proposed model on the test dataset. view at source ↗

**Figure 12.** Figure 12: Comparison of accuracy of Unsegmented and Segmented results over various models. view at source ↗

**Figure 13.** Figure 13: From left – Original CXR, HE-segmented lungs and respective heat map. Additionally, we also provide counterfactual explanation of the predicted class by highlighting which region correspond to the opposite class through grad cam analysis (fig. 14, 15) view at source ↗

**Figure 14.** Figure 14: From left – Original CXR (Covid +ve), HE-segmented lungs, heatmap showing particular region that triggered the positive prediction and its counterfactual explanation view at source ↗

**Figure 15.** Figure 15: From left – Original CXR (Covid -ve), HE-segmented lungs, heatmap showing particular region that triggered the negative prediction and its counterfactual explanation. It is quite evident from fig 14 and 15, that the positive and negative classification are complimentary and the pixels that trigger the positive prediction are opposite to the pixels that trigger the negative ones. Another point to note is t… view at source ↗

**Figure 16.** Figure 16: From left –Confusion matrix generated over test dataset of Unsegmented and segmented lungs images. Regarding augmentation, while most of the earlier works have assumed that augmentation will give good results and directly applied augmentation to their dataset, some have not considered augmentation at all. Both these cases have an assumption based upon standard practices or theoretical backing. Our work ho… view at source ↗

read the original abstract

Purpose: Rapid and reliable diagnostic tools are crucial for managing respiratory diseases like COVID-19, where chest X-ray analysis coupled with artificial intelligence techniques has proven invaluable. However, most existing works on X-ray images have not considered lung segmentation, raising concerns about their reliability. Additionally, some have employed disproportionate and impractical augmentation techniques, making models less generalized and prone to overfitting. This study presents a critical analysis of both issues and proposes a methodology (SDL-COVID) for more reliable classification of chest X-rays for COVID-19 detection. Methods: We use class activation mapping to obtain a visual understanding of the predictions made by Convolutional Neural Networks (CNNs), validating the necessity of lung segmentation. To analyze the effect of data augmentation, deep learning models are implemented on two levels: one for an augmented dataset and another for a non-augmented dataset. Results: Careful analysis of X-ray images and their corresponding heat maps under expert medical supervision reveals that lung segmentation is necessary for accurate COVID-19 prediction. Regarding data augmentation, test accuracy significantly drops beyond a certain threshold with additional augmented images, indicating model overfitting. Conclusion: Our proposed methodology, SDL-COVID, achieves a precision of 95.21% and a lower false negative rate, ensuring its reliability for COVID-19 detection using chest X-rays.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CAM heatmaps with expert review flag background attention in COVID X-ray models and show augmentation limits before accuracy drops, but the paper skips the ablations needed to prove segmentation or moderation are required.

read the letter

The paper's main message is that lung segmentation appears necessary for reliable COVID-19 detection in X-rays because CNN attention maps often highlight areas outside the lungs, and that data augmentation has a limit before models overfit and test accuracy falls. They introduce SDL-COVID to address this and claim 95.21% precision with fewer false negatives. It does well in applying class activation mapping with medical expert oversight to visualize model behavior on real X-ray data. This gives a concrete example of why ignoring segmentation might reduce trustworthiness in clinical settings. The augmentation analysis also serves as a reminder that more data is not always better if it leads to unrealistic variations. The soft spots come from the lack of controlled comparisons. The conclusion on segmentation necessity comes from inspecting heatmaps, but the paper does not describe training the same architecture on segmented lung regions versus the original images to measure the actual gain in accuracy or false negative rate. For the augmentation part, it notes a significant drop beyond a threshold without giving the threshold value, the corresponding accuracy numbers, or train versus test curves to confirm overfitting rather than other issues like poor augmentation choices for medical images. The results section in the abstract is light on these details. This kind of work is for applied researchers and engineers developing AI tools for chest X-ray analysis in respiratory diseases. It would appeal to those concerned with model reliability and generalization in medical contexts, offering a methodology that seems tuned for lower error rates. I recommend sending it to peer review. The issues it raises are practical and worth discussing, even if the current evidence is more suggestive than conclusive. Referees can guide the authors on adding the necessary ablations and quantitative reporting.

Referee Report

3 major / 1 minor

Summary. The paper claims that expert-reviewed class activation maps demonstrate the necessity of lung segmentation for accurate COVID-19 X-ray classification, that test accuracy declines with excessive data augmentation due to overfitting, and that the proposed SDL-COVID methodology achieves 95.21% precision with a lower false-negative rate.

Significance. If the necessity and overfitting claims were supported by controlled ablations, the work would usefully caution against unexamined preprocessing in medical imaging pipelines and highlight the value of expert oversight on model attention maps. The reported precision figure, if reproducible, would indicate a competitive baseline for COVID-19 detection.

major comments (3)

[Abstract/Results] Abstract and Results: the conclusion that 'lung segmentation is necessary' rests on expert inspection of CAM heatmaps, yet no ablation is described that trains identical CNNs on raw versus explicitly segmented images while holding data splits, hyperparameters, and augmentation fixed; without this counterfactual, the necessity claim remains interpretive rather than demonstrated.
[Results] Results: the statement that 'test accuracy significantly drops beyond a certain threshold with additional augmented images' is attributed to overfitting, but no training/validation curves, learning-rate schedules, or statistical tests (e.g., paired t-test on accuracy differences) are referenced to rule out dataset-specific effects or augmentation realism issues.
[Methods] Methods: quantitative details on dataset sizes, sources, train/test splits, exact CNN architectures, augmentation parameters (e.g., rotation range, intensity thresholds), and the concrete components of SDL-COVID are absent, preventing verification of the reported 95.21% precision and lower false-negative rate.

minor comments (1)

[Abstract] Abstract: the phrase 'two levels' for the augmentation experiments is undefined; clarifying what these levels consist of (e.g., specific augmentation counts or policies) would improve readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and commit to revisions where the manuscript was incomplete.

read point-by-point responses

Referee: [Abstract/Results] Abstract and Results: the conclusion that 'lung segmentation is necessary' rests on expert inspection of CAM heatmaps, yet no ablation is described that trains identical CNNs on raw versus explicitly segmented images while holding data splits, hyperparameters, and augmentation fixed; without this counterfactual, the necessity claim remains interpretive rather than demonstrated.

Authors: We agree that the current evidence is interpretive, relying on expert-reviewed CAMs that show non-segmented models attending to extraneous regions. A direct ablation with identical CNNs, fixed splits, hyperparameters, and augmentation would strengthen the necessity claim. We will add this controlled comparison in the revised manuscript. revision: yes
Referee: [Results] Results: the statement that 'test accuracy significantly drops beyond a certain threshold with additional augmented images' is attributed to overfitting, but no training/validation curves, learning-rate schedules, or statistical tests (e.g., paired t-test on accuracy differences) are referenced to rule out dataset-specific effects or augmentation realism issues.

Authors: The observed accuracy decline with excessive augmentation was noted across our experiments. To better support the overfitting interpretation and exclude confounds, we will include training/validation curves, learning-rate details, and statistical tests such as paired t-tests on accuracy differences in the revised Results. revision: yes
Referee: [Methods] Methods: quantitative details on dataset sizes, sources, train/test splits, exact CNN architectures, augmentation parameters (e.g., rotation range, intensity thresholds), and the concrete components of SDL-COVID are absent, preventing verification of the reported 95.21% precision and lower false-negative rate.

Authors: We acknowledge these details were omitted. The revised Methods section will provide full quantitative information on dataset sizes, sources, splits, CNN architectures, augmentation parameters, and the specific components of SDL-COVID to enable reproducibility and verification of the 95.21% precision result. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical claims rest on experiments, not self-referential definitions or fitted predictions

full rationale

The paper's core claims—that lung segmentation is necessary based on CAM heatmaps under expert review and that excessive augmentation causes overfitting—are presented as outcomes of direct experimental comparisons (augmented vs. non-augmented datasets) and visual analysis rather than any derivation, equation, or parameter fit that reduces to its own inputs. No mathematical modeling, uniqueness theorems, or self-citations are invoked as load-bearing steps in the provided abstract or methodology description; the SDL-COVID performance numbers are reported as measured results from implementation. This leaves the derivation chain self-contained against external benchmarks with no reduction by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only abstract available; no explicit free parameters, axioms, or invented entities are described. The necessity of segmentation and overfitting threshold are presented as empirical findings without stated mathematical assumptions.

pith-pipeline@v0.9.0 · 5549 in / 1109 out tokens · 56954 ms · 2026-05-07T10:58:02.281444+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

34 extracted references · 6 canonical work pages

[1]

Swaraj, A., Verma, K., Kaur, A., Singh, G., Kumar, A., & de Sales, L. M. (2021). Implementation of stacking based ARIMA model for prediction of Covid-19 cases in India. Journal of Biomedical Informatics, 121, 103887

2021
[2]

Y., Ng, M

Lee, E. Y., Ng, M. Y., & Khong, P. L. (2020). COVID-19 pneumonia: what has CT taught us? The Lancet Infectious Diseases, 20(4), 384- 385

2020
[3]

Ker, J., Wang, L., Rao, J., & Lim, T. (2017). Deep learning applications in medical image analysis. Ieee Access, 6, 9375 -9389

2017
[4]

A., Alsaeedi, A., & Saeed, F

Moujahid, H., Cherradi, B., Al-Sarem, M., Bahatti, L., Eljialy, B. A., Alsaeedi, A., & Saeed, F. (2021). Combining CNN and Grad -Cam for COVID-19 Disease Prediction and Visual Explanation. Intelligent Automation & Soft Computing, 32(2), 723 -745

2021
[5]

Z., Islam, M

Islam, M. Z., Islam, M. M., & Asraf, A. (2020). A combined deep CNN -LSTM network for the detection of novel coronavirus (COVID -19) using X-ray images. Informatics in medicine unlocked, 20, 100412

2020
[6]

Khalifa, N. E. M., Taha, M. H. N., Hassanien, A. E., & Elghamrawy, S. (2020). Detection of coronavirus (COVID-19) associated pneumonia based on generative adversarial networks and a fine -tuned deep transfer learning model using chest X -ray dataset. arXiv p reprint arXiv:2004.01184

work page arXiv 2020
[7]

B., Sarker, S., Rahman, S., & Shah, F

Ahmed, S., Hossain, T., Hoque, O. B., Sarker, S., Rahman, S., & Shah, F. M. (2021). Automated COVID -19 detection from chest x -ray images: a high-resolution network (hrnet) approach. SN computer science, 2(4), 1 -17

2021
[8]

Rahman, T., Khandakar, A., Qiblawey, Y., Tahir, A., Kiranyaz, S., Kashem, S. B. A., ... & Chowdhury, M. E. (2021). Exploring the effect of image enhancement techniques on COVID-19 detection using chest X-ray images. Computers in biology and medicine, 132, 104319

2021
[9]

I., de Moura Ramos, J

Morís, D. I., de Moura Ramos, J. J., Buján, J. N., & Hortas, M. O. (2021). Data augmentation approaches using cycle -consistent adversarial networks for improving COVID-19 screening in portable chest X-ray images. Expert Systems with Applications, 185, 115681

2021
[10]

Q., & Wong, A

Wang, L., Lin, Z. Q., & Wong, A. (2020). Covid -net: A tailored deep convolutional neural network design for detection of covid -19 cases from chest x-ray images. Scientific Reports, 10(1), 1-12

2020
[11]

Narin, A., Kaya, C., & Pamuk, Z. (2021). Automatic detection of coronavirus disease (covid-19) using x-ray images and deep convolutional neural networks. Pattern Analysis and Applications, 24(3), 1207 -1220

2021
[12]

H., Abdulkareem, G

Abed, M., Mohammed, K. H., Abdulkareem, G. Z., Begonya, M., Salama, A., Maashi, M. S., ... & Mutlag, L. (2021). A comprehensi ve 22 investigation of machine learning feature extraction and classification methods for automated diagnosis of COVID -19 based on X -ray images. Computers, Materials, & Continua, 3289-3310

2021
[13]

O., Pereira, R

Teixeira, L. O., Pereira, R. M., Bertolini, D., Oliveira, L. S., Nanni, L., Cavalcanti, G. D., & Costa, Y. M. (2021). Impact of lung segmentation on the diagnosis and explanation of COVID-19 in chest X-ray images. Sensors, 21(21), 7116

2021
[14]

Waheed, A., Goyal, M., Gupta, D., Khanna, A., Al -Turjman, F., & Pinheiro, P. R. (2020). Covidgan: data augmentation using auxiliary classifier gan for improved covid-19 detection. Ieee Access, 8, 91916-91923

2020
[15]

H., & Masadeh, M

Masadeh, M., Masadeh, A., Alshorman, O., Khasawneh, F. H., & Masadeh, M. A. (2022). An efficient machine learning -based COVID- 19 identification utilizing chest X-ray images. IAES International Journal of Artificial Intelligence, 11(1), 356

2022
[16]

Maguolo, G., & Nanni, L. (2021). A critic evaluation of methods for COVID-19 automatic detection from X-ray images. Information Fusion, 76, 1-7

2021
[17]

S., Kumar, K., Swaraj, A., Verma, K., Kaur, A., Sharma, S.,

Bhadouria, H. S., Kumar, K., Swaraj, A., Verma, K., Kaur, A., Sharma, S., ... & de Sales, L. M. (2021). Classification of COV ID-19 on chest X-Ray images using Deep Learning model with Histogram Equalization and Lungs Segmentation. arXiv preprint arXiv:2112.02478

work page arXiv 2021
[18]

AI for radiographic COVID-19 detection selects short- cuts over signal.Nature Machine Intelligence, 3:610– 619, 2021

DeGrave, A.J., Janizek, J.D. & Lee, SI. AI for radiographic COVID-19 detection selects shortcuts over signal. Nat Mach Intell 3, 610–619 (2021). https://doi.org/10.1038/s42256-021-00338-7

work page doi:10.1038/s42256-021-00338-7 2021
[19]

Roberts, M., Driggs, D., Thorpe, M. et al. Common pitfalls and recommendations for using machine learning to detect and progn osticate for COVID-19 using chest radiographs and CT scans. Nat Mach Intell 3, 199 –217 (2021). https://doi.org/10.1038/s42256-021-00307-0

work page doi:10.1038/s42256-021-00307-0 2021
[20]

L., Sevillano -García, I., Rey -Area, M., Charte, D.,

Tabik, S., Gómez -Ríos, A., Martín -Rodríguez, J. L., Sevillano -García, I., Rey -Area, M., Charte, D., ... & Herrera, F. (2020). COVIDGR dataset and COVID-SDNet methodology for predicting COVID -19 based on chest X -ray images. IEEE journal of biomedical and he alth informatics, 24(12), 3595-3605

2020
[21]

Validating deep learning inference during chest X -ray classification for COVID-19 screening

Sadre, Robbie, Baskaran Sundaram, Sharmila Majumdar, and Daniela Ushizima. "Validating deep learning inference during chest X -ray classification for COVID-19 screening." Scientific reports 11, no. 1 (2021): 1 -10

2021
[22]

& Ren, K

Fang, Z., Zhao, H., Ren, J., Maclellan, C., Xia, Y., Li, S., ... & Ren, K. (2022). SC2Net: A Novel Segmentation -based Classification Network for Detection of COVID-19 in Chest X-ray Images. IEEE Journal of Biomedical and Health Informatics

2022
[23]

Detection and analysis of COVID- 19 in medical images using deep learning techniques

Yang, Dandi, Cristhian Martinez, Lara Visuña, Hardev Khandhar, Chintan Bhatt, and Jesus Carretero. "Detection and analysis of COVID- 19 in medical images using deep learning techniques." Scientific Reports 11, no. 1 (2021): 1 -13

2021
[24]

EDL-COVID: Ensemble Deep Learning for COVID-19 Case Detection From Chest X-Ray Images,

S. Tang et al., "EDL-COVID: Ensemble Deep Learning for COVID-19 Case Detection From Chest X-Ray Images," in IEEE Transactions on Industrial Informatics, vol. 17, no. 9, pp. 6539-6549, Sept. 2021, doi: 10.1109/TII.2021.3057683

work page doi:10.1109/tii.2021.3057683 2021
[25]

COVIDiagnosis -Net: Deep Bayes -SqueezeNet based diagnosis of the coronavirus disease 2019 (COVID-19) from X-ray images

Ucar, Ferhat, and Deniz Korkmaz. "COVIDiagnosis -Net: Deep Bayes -SqueezeNet based diagnosis of the coronavirus disease 2019 (COVID-19) from X-ray images." Medical hypotheses 140 (2020): 109761

2019
[26]

(2015, October)

Ronneberger, O., Fischer, P., & Brox, T. (2015, October). U -net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer -assisted intervention (pp. 234-241). Springer, Cham

2015
[27]

Zhao, W., Zhong, Z., Xie, X., Yu, Q., & Liu, J. (2020). Relation between chest CT findings and clinical conditions of coronav irus disease (COVID-19) pneumonia: a multicenter study. AJR Am J Roentgenol, 214(5), 1072 -1077

2020
[28]

Yasin, R., & Gouda, W. (2020). Chest X -ray findings monitoring COVID -19 disease course and severity. Egyptian Journal of Radiology and Nuclear Medicine, 51(1), 1-18

2020
[29]

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE confer ence on computer vision and pattern recognition (pp. 770-778)

2016
[30]

& Rabinovich, A

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., ... & Rabinovich, A. (2015). Going deeper with convoluti ons. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1 -9)

2015
[31]

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25

2012
[32]

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size.arXiv2016, arXiv:1602.07360

Iandola, F. N., Han, S., Moskewicz, M. W., Ashraf, K., Dally, W. J., & Keutzer, K. (2016). SqueezeNet: AlexNet -level accuracy with 50x fewer parameters and< 0.5 MB model size. arXiv preprint arXiv:1602.07360

work page arXiv 2016
[33]

J., Li, K., & Fei-Fei, L

Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009, June). Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition (pp. 248 -255). Ieee

2009
[34]

B., Díaz -Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A.,

Arrieta, A. B., Díaz -Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., ... & Herrera, F. (2020). Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information fusion, 58, 8 2-115

2020