MalariAI: A Label-Resilient Decoupled Framework for Universal Cell Segmentation and Explainable Stage Classification in Dense Malaria Blood Smears
Pith reviewed 2026-07-02 04:48 UTC · model grok-4.3
The pith
A decoupled framework isolates cells in malaria smears without annotations and classifies stages at 98.36 percent accuracy with per-cell explanations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
MalariAI is a two-stage decoupled framework for universal cell segmentation and explainable stage classification in dense malaria blood smears. Stage 1 uses an annotation-agnostic distance-transform guided watershed algorithm to isolate every cell in a 1600x1200 image, recovering 75.95 percent of ground-truth cells by centroid localisation across the 120-image NIH BBBC041 test set without any ground-truth input. Stage 2 fine-tunes EfficientNet-B0 with focal loss (gamma 2.0, per-class inverse-frequency weights) on 64x64 crops, achieving 98.36 percent overall classification accuracy with 87.5 percent and 75.0 percent per-class accuracy on the rare schizont and gametocyte stages, compared to on
What carries the argument
The annotation-agnostic distance-transform guided watershed algorithm for cell isolation, paired with per-cell Grad-CAM++ explainability on the EfficientNet-B0 classifier.
If this is right
- Recovers 75.95 percent of ground-truth cells by centroid localisation without any ground-truth input during segmentation.
- Classifies infection stages at 98.36 percent overall accuracy on the NIH BBBC041 test set.
- Achieves 87.5 percent and 75.0 percent per-class accuracy on schizont and gametocyte stages, exceeding Faster R-CNN average precision on those classes.
- Supplies instance-level spatial evidence via Grad-CAM++ heatmaps for clinical audit without sacrificing classification performance.
Where Pith is reading between the lines
- The decoupled design may reduce the need for fully annotated training data in other dense cell microscopy tasks.
- Instance-level explanations could support clinical adoption by letting microscopists verify individual predictions.
- The focal loss weighting appears to help with class imbalance, which could be tested on additional imbalanced medical imaging datasets.
- Performance on the specific NIH BBBC041 split would need confirmation on smears from varied sources and staining protocols.
Load-bearing premise
The distance-transform guided watershed algorithm can reliably isolate every individual cell in dense smear regions without any annotation input or post-processing tuned to the test set.
What would settle it
Applying the watershed algorithm to a fresh collection of dense blood smear images from a different preparation or microscope and measuring whether cell recovery by centroid localisation falls below 70 percent.
Figures
read the original abstract
Automated malaria diagnosis from blood smear microscopy is a critical challenge in global health AI; in resource-limited settings, the scarcity of expert microscopists remains the primary bottleneck to timely and accurate diagnosis. Three compounding failure modes prevent reliable clinical deployment of existing deep learning systems. First, end-to-end detectors treat unannotated cells as background during training, producing recall figures that are strongly influenced by annotation completeness rather than reflecting true cell recovery. Second, Non-Maximum Suppression tends to suppress valid detections in dense smear regions where infection counts matter most. Third, existing whole-slide detection pipelines lack per-cell spatial evidence for clinical audit, despite image-level explainability methods such as Grad-CAM having been applied to malaria image classification tasks. We present MalariAI, a two-stage decoupled framework that addresses all three failure modes in a unified pipeline. Stage 1 applies an annotation-agnostic distance-transform guided watershed algorithm to isolate every cell in a full 1600x1200 blood smear image, recovering 75.95% of ground-truth cells by centroid localisation across the 120-image NIH BBBC041 test set without any ground-truth input. Stage 2 fine-tunes EfficientNet-B0 with Focal Loss (gamma = 2.0, per-class inverse-frequency weights) on 64x64 crops, achieving 98.36% overall classification accuracy with 87.5% and 75.0% per-class accuracy on the rare schizont and gametocyte stages, compared to only 24.57% and 25.95% AP for a Faster R-CNN baseline on the same classes. Grad-CAM++ heatmaps generated per detected cell provide instance-level spatial evidence for clinical audit, enabling microscopists to verify model predictions at the individual parasite level without sacrificing classification performance.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces MalariAI, a decoupled two-stage framework for malaria parasite detection and classification in blood smears. Stage 1 employs an annotation-agnostic distance-transform guided watershed algorithm to segment individual cells from full 1600x1200 images, reporting 75.95% centroid recovery on the NIH BBBC041 test set without using ground-truth annotations. Stage 2 fine-tunes an EfficientNet-B0 model using focal loss on 64x64 cell crops for five-stage classification, achieving 98.36% overall accuracy and superior performance on rare classes compared to Faster R-CNN, with Grad-CAM++ providing per-cell explainability.
Significance. If the central claims regarding parameter-independent cell isolation and split-independent classification hold, this work would be significant for global health AI by mitigating annotation bias, improving detection in dense regions, and providing instance-level explainability for clinical validation in malaria diagnosis.
major comments (2)
- [Abstract] The watershed-based cell isolation claim (75.95% recovery without ground-truth input) is load-bearing for the 'label-resilient' and 'annotation-agnostic' framing, but the abstract supplies no details on the specific fixed parameters of the distance-transform and watershed steps (e.g., Gaussian sigma, local maxima thresholds), contrary to the requirement that they be dataset-independent and untuned on the test set.
- [Abstract] The classification results (98.36% accuracy, 75.0% on gametocytes) are evaluated on the same 120-image test set as the segmentation stage; the manuscript does not state whether the training crops for EfficientNet-B0 fine-tuning are drawn from a completely disjoint set of images, which is necessary to establish that the reported metrics reflect generalization rather than leakage.
minor comments (1)
- [Abstract] The abstract reports headline metrics without error bars or statistical tests; this presentation issue should be addressed with confidence intervals in the results.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on improving the transparency of our claims. We address the two major comments point-by-point below.
read point-by-point responses
-
Referee: [Abstract] The watershed-based cell isolation claim (75.95% recovery without ground-truth input) is load-bearing for the 'label-resilient' and 'annotation-agnostic' framing, but the abstract supplies no details on the specific fixed parameters of the distance-transform and watershed steps (e.g., Gaussian sigma, local maxima thresholds), contrary to the requirement that they be dataset-independent and untuned on the test set.
Authors: We agree the abstract should be self-contained on this point. The full manuscript describes the fixed parameters in the Methods section; these were selected a priori based on typical cell morphology and held constant without any tuning on the NIH BBBC041 test set (or any evaluation data). We will revise the abstract to explicitly list the parameters and restate that they are dataset-independent. revision: yes
-
Referee: [Abstract] The classification results (98.36% accuracy, 75.0% on gametocytes) are evaluated on the same 120-image test set as the segmentation stage; the manuscript does not state whether the training crops for EfficientNet-B0 fine-tuning are drawn from a completely disjoint set of images, which is necessary to establish that the reported metrics reflect generalization rather than leakage.
Authors: The training crops are drawn exclusively from images disjoint from the 120-image test set. We will add an explicit statement clarifying the image-level split in the revised manuscript (and abstract where space permits) to confirm the absence of leakage. revision: yes
Circularity Check
No circularity: empirical measurements on named public dataset
full rationale
The paper reports direct empirical results from applying a fixed distance-transform watershed pipeline (Stage 1) and fine-tuning EfficientNet-B0 with Focal Loss (Stage 2) to the NIH BBBC041 test set. No equations, derivations, or self-citations are presented that reduce any claimed output to the inputs by construction. The 75.95% recovery and 98.36% accuracy figures are framed as measurements on an external benchmark without parameter fitting to the evaluation split or renaming of known results. The derivation chain is therefore self-contained.
Axiom & Free-Parameter Ledger
free parameters (1)
- gamma =
2.0
Reference graph
Works this paper leans on
-
[1]
rep., WHO Press, Geneva (2023)
World Health Organization, World malaria report 2023, Tech. rep., WHO Press, Geneva (2023). URL https://www.who.int/teams/global-malaria-programme/ reports/world-malaria-report-2023
2023
-
[2]
P. J. Delves, S. J. Martin, D. R. Burton, I. M. Roitt, Roitt’s Essential Immunology, 13th Edition, Wiley-Blackwell, 2017
2017
-
[3]
M. Poostchi, K. Silamut, R. J. Maude, S. Jaeger, G. Thoma, Image anal- ysis and machine learning for detecting malaria, Translational Research 194 (2018) 36–55.doi:10.1016/j.trsl.2017.12.004
-
[4]
S. Rajaraman, S. K. Antani, M. Poostchi, K. Silamut, M. A. Hossain, R. J. Maude, S. Jaeger, G. R. Thoma, Pre-trained convolutional neural networksasfeatureextractorstowardimprovedmalariaparasitedetection in thin blood smear images, PeerJ 6 (2018) e4568.doi:10.7717/peerj. 4568
-
[5]
R. Singh, C. Prabha, S. Abdulla, Optimized CNN framework for malaria detection using Otsu thresholding-based image segmentation, Scientific Reports 15 (2025) 40117.doi:10.1038/s41598-025-23961-5
-
[6]
S. Ren, K. He, R. Girshick, J. Sun, Faster R-CNN: Towards real-time object detection with region proposal networks, in: Advances in Neural Information Processing Systems (NeurIPS), 2015, pp. 91–99. URLhttps://arxiv.org/abs/1506.01497 63
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[7]
D.Sukumarran, K.Hasikin, A.S.M.Khairuddin, R.Ngui, W.Y.WanSu- laiman, I. Vythilingam, P. C. S. Divis, An optimised YOLOv4 deep learn- ing model for efficient malarial cell detection in thin blood smear images, Parasites & Vectors 17 (2024) 188.doi:10.1186/s13071-024-06215-7
-
[8]
R. Parveen, B. Qui, W. Song, N. Al-Kahtani, M. M. Jamjoom, S. M. Mostafa, N. Sultan, J. Fatima, Trustworthy deep learning for malaria diagnosis using explainable artificial intelligence, Scientific Reports 15 (2025) 45037.doi:10.1038/s41598-025-28387-7
-
[9]
M. R. Islam, M. Nahiduzzaman, M. O. F. Goni, A. Sayeed, M. S. Anower, M. Ahsan, J. Haider, Explainable transformer-based deep learning model for the detection of malaria parasites from blood cell images, Sensors 22 (12) (2022) 4358.doi:10.3390/s22124358
-
[10]
O. O. Awe, P. N. Mwangi, S. K. Goudoungou, R. V. Esho, O. S. Oyejide, Explainable AI for enhanced accuracy in malaria diagnosis using ensemble machine learning models, BMC Medical Informatics and Decision Making 25 (2025) 152.doi:10.1186/s12911-025-02874-3
- [11]
-
[12]
Grad-CAM++: Improved Visual Explanations for Deep Convolutional Networks
A. Chattopadhyay, A. Sarkar, P. Howlader, V. N. Balasubramanian, Grad-CAM++: Generalized gradient-based visual explanations for deep 64 convolutional networks, in: IEEE Winter Conference on Applications of Computer Vision (WACV), 2018, pp. 839–847. URLhttps://arxiv.org/abs/1710.11063
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[13]
M. Mujahid, F. Rustam, R. Shafique, E. Caro Montero, E. Silva Alvarado, I. de la Torre Diez, I. Ashraf, Efficient deep learning-based approach for malaria detection using red blood cell smears, Scientific Reports 14 (2024) 13249.doi:10.1038/s41598-024-63831-0
-
[14]
O. P. Mmileng, A. Whata, M. Olusanya, S. Mhlongo, Application of Con- vNeXt with transfer learning and data augmentation for malaria parasite detection in resource-limited settings using microscopic images, PLOS One 20 (6) (2025) e0313734.doi:10.1371/journal.pone.0313734
-
[15]
O. O. Oladimeji, A. O. Ibitoye, A novel attention-enhanced hybrid deep learning approach for malaria diagnosis in microscopic cell images, Informatics and Health 3 (2026) 41–47.doi:10.1016/j.infoh.2025. 11.004
-
[16]
A. Gaouar, S. H. Cherif, A. Rahmoun, M. El Habib Daho, Explain- able AI for early malaria detection using stacked-LSTM and atten- tion mechanisms, Informatics in Medicine Unlocked 57 (2025) 101667. doi:10.1016/j.imu.2025.101667
-
[17]
A. T. Issah, I. Seidu, C. Mukamakuza, Detection versus instance segmen- tation for multi-species malaria diagnosis: A head-to-head comparison and multi-dataset validation of YOLOv12 architectures with small object optimization, in: Proceedings of Machine Learning Research, Vol. 315, 65 2026, pp. 4683–4702. URLhttps://proceedings.mlr.press/v315/issah26a.html
2026
-
[18]
D. Sukumarran, K. Hasikin, A. S. M. Khairuddin, R. Ngui, W. Y. W. Sulaiman, I. Vythilingam, P. C. S. Divis, Machine and deep learning methods in identifying malaria through microscopic blood smear: A systematic review, Engineering Applications of Artificial Intelligence 133 (2024) 108529.doi:10.1016/j.engappai.2024.108529
-
[19]
N. Otsu, A threshold selection method from gray-level histograms, IEEE Transactions on Systems, Man, and Cybernetics 9 (1) (1979) 62–66. doi:10.1109/TSMC.1979.4310076
-
[20]
M. Delgado-Ortet, A. Molina, S. Alférez, J. Rodellar, A. Merino, A deep learning approach for segmentation of red blood cell images and malaria detection, Entropy 22 (6) (2020) 657.doi:10.3390/e22060657
-
[21]
Beucher, F
S. Beucher, F. Meyer, The morphological approach to segmentation: the watershed transformation, in: E. Dougherty (Ed.), Mathematical Morphology in Image Processing, Marcel Dekker, New York, 1992, pp. 433–481
1992
-
[22]
K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778.doi:10.1109/CVPR.2016.90
-
[23]
T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, S. Belongie, Feature pyramid networks for object detection, in: IEEE/CVF Confer- 66 ence on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 2117–2125.doi:10.1109/CVPR.2017.106
-
[24]
J. Hung, A. Carpenter, Applying faster R-CNN for object detection on malaria images, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2017. doi:10.1109/CVPRW.2017.112
-
[25]
D. R. Loh, W. X. Yong, J. Yapeter, K. Subburaj, R. Chandramohanadas, A deep learning approach to the screening of malaria infection: Auto- matedandrapidcellcounting, objectdetectionandinstancesegmentation using Mask R-CNN, Computerized Medical Imaging and Graphics 88 (2021) 101845.doi:10.1016/j.compmedimag.2020.101845
-
[26]
K. He, G. Gkioxari, P. Dollár, R. Girshick, Mask R-CNN, in: IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2961– 2969. URLhttps://arxiv.org/abs/1703.06870
work page internal anchor Pith review Pith/arXiv arXiv 2017
- [27]
-
[28]
E. Guemas, B. Routier, T. Ghelfenstein-Ferreira, C. Cordier, S. Hartuis, B. Marion, S. Bertout, E. Varlet-Marie, D. Costa, G. Pasquier, Auto- matic patient-level recognition of fourPlasmodiumspecies on thin blood 67 smear by a real-time detection transformer (RT-DETR) object detection algorithm: a proof-of-concept and evaluation, Microbiology Spectrum 12 ...
-
[29]
X. Bai, B. Ma, C. Li, Y. Xia, Tackling the incomplete annotation issue in universallesiondetectiontaskbyexploratorytraining, IEEETransactions on Medical Imaging (2023).doi:10.1109/TMI.2023.3321488
-
[30]
M. Marks, U. Israel, R. Dilip, Q. Li, C. Yu, E. Laubscher, A. Iqbal, E. Pradhan, A. Ates, M. Abt, C. Brown, E. Pao, S. Li, A. Pearson- Goulart, P. Perona, G. Gkioxari, R. Barnowski, Y. Yue, D. Van Valen, CellSAM: a foundation model for cell segmentation, Nature Methods 22 (2025) 2585–2593.doi:10.1038/s41592-025-02879-w
-
[31]
A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo, P. Dollár, R. Girshick, Segment anything, in: IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 4015–4026. URLhttps://arxiv.org/abs/2304.02643
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[32]
R. Sun, Y. Yang, K. Guo, C. Jiang, D. Xu, Z. Liu, T. Pan, L. Han, X. Jiang, W. Wei, Y. Cheng, Disco: Densely-overlapping cell instance seg- mentation via adjacency-aware collaborative coloring, in: International Conference on Learning Representations (ICLR), 2026
2026
-
[33]
Ronneberger, P
O. Ronneberger, P. Fischer, T. Brox, U-Net: Convolutional networks for biomedical image segmentation, in: Medical Image Computing and Computer-Assisted Intervention (MICCAI), Vol. 9351 of Lec- 68 ture Notes in Computer Science, 2015, pp. 234–241.doi:10.1007/ 978-3-319-24574-4_28
2015
-
[34]
V. Badrinarayanan, A. Kendall, R. Cipolla, SegNet: A deep convolutional encoder-decoder architecture for image segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence 39 (12) (2017) 2481–2495. doi:10.1109/TPAMI.2016.2644615
-
[35]
Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, J. Liang, UNet++: A nested U-Net architecture for medical image segmentation, in: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support (MICCAI Workshop), 2018, pp. 3–11. URLhttps://arxiv.org/abs/1807.10165
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[36]
F. Horst, T. Rempe, J. Li, C. Brockmann, T. Lawitzki, S. Amirpour, G.Baldini, C.Ulrich, S.Denner, J.Kleesiek, CellViT:Visiontransformers for precise cell segmentation and classification, Medical Image Analysis 94 (2024) 103143.doi:10.1016/j.media.2024.103143
-
[37]
A. Pandiaraj, P. R. Kshirsagar, R. Thiagarajan, T. K. Tak, B. Sivaneasan, A robust malaria cell detection framework using adaptive and atrous convolution-based recurrent MobileNetV2 with Trans-MobileUNet++- based abnormality segmentation, Journal of Imaging Informatics in Medicine 38 (2025) 2381–2411.doi:10.1007/s10278-024-01311-7
-
[38]
V. Petsiuk, R. Jain, V. Manjunatha, V. I. Morariu, A. Mehra, V. Ordonez, K. Saenko, Black-box explanation of object detectors via saliency maps, in: IEEE/CVF Conference on Computer Vision and Pattern Recognition 69 (CVPR), 2021, pp. 11443–11452. URLhttps://arxiv.org/abs/2006.03204
-
[39]
Y. Zheng, E. Abila, E. Chrenková, I. Buljan, J. Winkler, A. F. Rendeiro, LazySlide: accessible and interoperable whole-slide image analysis, Na- ture Methods 23 (2026) 728–731.doi:10.1038/s41592-026-03044-7
-
[40]
H. Guan, M. Liu, Domain adaptation for medical image analysis: A survey, IEEE Transactions on Biomedical Engineering 69 (3) (2022) 1173–1185.doi:10.1109/TBME.2021.3117407
-
[41]
R. Nakasi, J. N. Nabende, J. F. Tusubira, A. L. Bamundaga, A. Andama, A dataset of blood slide images for AI-based diagnosis of malaria, Data in Brief 58 (2025) 111190.doi:10.1016/j.dib.2024.111190
-
[42]
Ljosa, K
V. Ljosa, K. L. Sokolnicki, A. E. Carpenter, Annotated high-throughput microscopy image sets for validation, Nature Methods 9 (7) (2012) 637, dataset: https://bbbc.broadinstitute.org/BBBC041. doi:10.1038/ nmeth.2083
2012
-
[43]
T.-Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollár, Focal loss for dense object detection, in: IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2980–2988. URLhttps://arxiv.org/abs/1708.02002
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[44]
A. Loddo, C. Di Ruberto, M. Kocher, G. Prod’Hom, MP-IDB: The malaria parasite image database for image processing and analysis, in: Processing and Analysis of Biomedical Information (SaMBa 2018), Vol. 70 11379 of Lecture Notes in Computer Science, 2019, pp. 57–65.doi: 10.1007/978-3-030-13835-6_7
-
[45]
M. Tan, Q. V. Le, EfficientNet: Rethinking model scaling for convolu- tional neural networks, in: Proceedings of the International Conference on Machine Learning (ICML), 2019, pp. 6105–6114. URLhttps://arxiv.org/abs/1905.11946
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[46]
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei, ImageNet: A large-scale hierarchical image database, in: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2009, pp. 248–255. doi:10.1109/CVPR.2009.5206848
-
[47]
M. Yeung, E. Sala, C.-B. Schönlieb, L. Rundo, Unified focal loss: Gener- alising Dice and cross entropy-based losses to handle class imbalanced medical image segmentation, Computerized Medical Imaging and Graph- ics 95 (2022) 102026.doi:10.1016/j.compmedimag.2021.102026
- [48]
-
[49]
S. van der Walt, J. L. Schönberger, J. Nunez-Iglesias, F. Boulogne, J. D. Warner, N. Yager, E. Gouillart, T. Yu, the scikit-image contributors, scikit-image: image processing in Python, PeerJ 2 (2014) e453.doi: 10.7717/peerj.453. 71
-
[50]
T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, C. L. Zitnick, Microsoft COCO: Common objects in context, in: European Conference on Computer Vision (ECCV), Vol. 8693 of Lecture Notes in Computer Science, 2014, pp. 740–755.doi:10.1007/ 978-3-319-10602-1_48. Appendix A. Inference Pipeline Development: Iterative Improve- ment Th...
2014
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.