A Logistic Regression Model to Predict Malaria Severity in Children
Pith reviewed 2026-05-20 13:11 UTC · model grok-4.3
The pith
A logistic regression model predicts malaria severity in children using environmental and biological factors with 83.3 percent accuracy
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A logistic regression model was developed to predict malaria severity from factors including sickle cell disease, stagnant water, garbage dump, wet lawns, and use of treated mosquito nets. Applied to 417 respondents in the Bosomtwe District, the model attains 83.3 percent accuracy. The study deduces that although children in the District are highly prone to malaria infection, the severity is very low.
What carries the argument
A logistic regression model that classifies malaria cases as severe or non-severe based on the presence of sickle cell disease and local environmental conditions such as stagnant water and wet lawns.
Load-bearing premise
The 417 respondents in the Bosomtwe District provide a representative sample of both severe and non-severe malaria cases that allows the model to generalize.
What would settle it
Gathering a new dataset of malaria cases and factors from the same district and verifying whether the logistic regression model maintains approximately 83 percent accuracy on the unseen data.
Figures
read the original abstract
One of the main causes of death around the globe is malaria. Researchers have sought to develop predictive models for malaria outbreaks based on meteorological data, climate data and the breeding cycle of Plasmodium, the causative agent of malaria. This study predicts the severity of malaria based on environmental and biological factors. A logistic regression model was developed in this study to predict the severity of malaria based on such factors as sickle cell disease, stagnant water, garbage dump, wet lawns, and the use of treated mosquito nets, with an 83.3% accuracy rate. The study was carried out in the Bosomtwe District of Ghana with 417 respondents. It was deduced that although children in the District are highly prone to malaria infection, the severity is very low. The study recommends that not just having a good sample size alone is important during machine learning model development, but also having a good sample representation of the various class labels is equally important.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript describes the development of a logistic regression model to predict malaria severity in children based on biological factors like sickle cell disease and environmental factors such as stagnant water, garbage dumps, wet lawns, and use of treated mosquito nets. Conducted in the Bosomtwe District of Ghana with a sample of 417 respondents, the model is reported to achieve 83.3% accuracy. The authors conclude that malaria infection is common but severity is low in the district and stress the importance of representative sampling for class labels in model development.
Significance. Should the accuracy claim be validated with proper out-of-sample testing, this work could contribute to identifying modifiable risk factors for severe malaria in children in similar settings, supporting public health efforts in malaria-endemic areas. The emphasis on sample representation is a useful reminder for applied ML in epidemiology.
major comments (2)
- Abstract: The central performance claim of 83.3% accuracy lacks any description of the train-test split, cross-validation, or class imbalance handling. Given that logistic regression coefficients are fitted directly to the 417 records, this accuracy likely measures in-sample fit rather than generalization, which is load-bearing for the predictive utility asserted in the title and abstract.
- Abstract: The deduction that 'the severity is very low' is stated without reference to specific model outputs, odds ratios, or statistical tests from the logistic regression, making it unclear how this conclusion follows from the analysis.
minor comments (2)
- Abstract: The recommendation regarding sample representation in machine learning is presented as a deduction from the study but would benefit from more explicit linkage to the observed class distribution in the 417 respondents.
- Consider adding standard references for logistic regression assumptions and validation practices in biomedical prediction models.
Simulated Author's Rebuttal
We are grateful to the referee for their detailed review and constructive feedback on our manuscript. Below, we respond to each of the major comments raised.
read point-by-point responses
-
Referee: Abstract: The central performance claim of 83.3% accuracy lacks any description of the train-test split, cross-validation, or class imbalance handling. Given that logistic regression coefficients are fitted directly to the 417 records, this accuracy likely measures in-sample fit rather than generalization, which is load-bearing for the predictive utility asserted in the title and abstract.
Authors: We agree with the referee that the abstract should provide more details on how the accuracy was computed. In our analysis, the logistic regression model was fitted to the entire sample of 417 records, and the reported accuracy of 83.3% is the in-sample classification accuracy. We did not use a separate test set or cross-validation for the primary reported metric. This is a valid concern for assessing the model's predictive performance. In the revised manuscript, we will clarify this in the abstract and methods, and we will add results from a 5-fold cross-validation to better demonstrate generalization. We will also address class imbalance if present in the data. revision: yes
-
Referee: Abstract: The deduction that 'the severity is very low' is stated without reference to specific model outputs, odds ratios, or statistical tests from the logistic regression, making it unclear how this conclusion follows from the analysis.
Authors: The conclusion that severity is very low is based on the empirical observation in our dataset that the majority of malaria cases among the children were mild, as determined by clinical assessment. The logistic regression model was used to identify factors associated with severity (e.g., presence of stagnant water increasing odds of severe malaria), but the overall statement reflects the low proportion of severe cases in the sample. To make this clearer, we will revise the abstract to reference the descriptive statistics or specific odds ratios from the model that indicate low risk for severity in this population. revision: yes
Circularity Check
Reported 83.3% accuracy reduces to in-sample training fit by construction
specific steps
-
fitted input called prediction
[Abstract]
"A logistic regression model was developed in this study to predict the severity of malaria based on such factors as sickle cell disease, stagnant water, garbage dump, wet lawns, and the use of treated mosquito nets, with an 83.3% accuracy rate."
The 417 respondents constitute the sole dataset on which the logistic regression coefficients are estimated. The reported accuracy is then presented as the model's predictive performance, but without any partitioning or hold-out procedure described, this accuracy is necessarily the training-set fit and therefore equivalent to the input data by construction rather than an independent test of generalization.
full rationale
The paper's central claim rests on a logistic regression model that 'predicts' malaria severity with 83.3% accuracy using the listed environmental and biological factors. The abstract states the model was developed on the 417 respondents and directly reports this accuracy figure, with no description of any train/test split, cross-validation, or external validation cohort. This makes the accuracy metric equivalent to the in-sample goodness-of-fit on the exact data used to estimate the coefficients, satisfying the fitted-input-called-prediction pattern. No self-citations, self-definitional steps, or imported uniqueness theorems appear in the text; the remainder of the derivation (factor selection and sample description) is independent of the performance claim.
Axiom & Free-Parameter Ledger
free parameters (1)
- logistic regression coefficients
axioms (2)
- domain assumption The logit of the probability of severe malaria is a linear function of the listed predictors.
- ad hoc to paper The 417 respondents constitute an unbiased sample of malaria cases in the district.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
A logistic regression model was developed in this study to predict the severity of malaria based on such factors as sickle cell disease, stagnant water, garbage dump, wet lawns, and the use of treated mosquito nets, with an 83.3% accuracy rate.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Xu TL, Sun YW, Feng XY, Zhou XN, Zheng B. Development of miRNA-Based Approaches to Explore the Interruption of Mosquito-Borne Disease Transmission. Frontiers in Cellular and Infection Microbiology. 2021; 11:665444 9 [1] Baba E, Hamade P, Kivumbi H, Marasciulo M, Maxwell K, Moroso D, Milligan P. Effectiveness of seasonal malaria chemoprevention at scale in...
work page 2021
-
[2]
Biogents.com. Malaria [Internet] 2021 [August 16; cited 2023 August 13] A vailable from: https://eu.biogents.com/malaria/
work page 2021
-
[3]
Malaria Vaccine: Prospects and Challenges
Hassan AO, Oso OV, Obeagu EI, Adeyemo AT. Malaria Vaccine: Prospects and Challenges. Madonna University Journal of Medicine and Health Sciences. 2022; 2(2): 22-40
work page 2022
-
[4]
Molecular mechanisms of Plasmodium development in male and female Anopheles mosquitoes
Haraguchi A, Takano M, Hakozaki J, Nakayama K, Nakamura S, Yoshikawa Y, Ikadai H. Molecular mechanisms of Plasmodium development in male and female Anopheles mosquitoes. bioRxiv. 2022; 2022-01
work page 2022
-
[5]
Transfusion-transmitted malaria and mitigation strategies in nonendemic regions
Niederhauser C, Galel SA. Transfusion-transmitted malaria and mitigation strategies in nonendemic regions. Transfusion medicine and hemotherapy. 2022; 49(4): 205-217
work page 2022
-
[6]
A model for predicting malaria outbreak using machine learning technique
Stephen A, Akomolafe PO, Ogundoyin KI. A model for predicting malaria outbreak using machine learning technique. Annals. Computer Science Series. 2020; 9(1):9-15
work page 2020
-
[7]
A Deep Learning Approach for Segmentation of Red Blood Cell Images and Malaria Detection
Delgado-Ortet M, Molina A, Alférez S, Rodellar J, Merino A. A Deep Learning Approach for Segmentation of Red Blood Cell Images and Malaria Detection. Entropy. 2020; 22(6):657 10 [1] Baba E, Hamade P, Kivumbi H, Marasciulo M, Maxwell K, Moroso D, Milligan P. Effectiveness of seasonal malaria chemoprevention at scale in west and central Africa: an observati...
work page 2020
-
[8]
Spatial and spatio-temporal methods for mapping malaria risk: a systematic review
Odhiambo JN, Kalinda C, Macharia PM, Snow R W, Sartorius B. Spatial and spatio-temporal methods for mapping malaria risk: a systematic review. BMJ Global Health. 2020; 5(10):e002919
work page 2020
-
[9]
Mohapatra P, Tripathi NK, Pal I, Shrestha S. Determining suitable machine learning classifier technique for prediction of malaria incidents attributed to climate of Odisha. International Journal of Environmental Health Research. 2021; 32(8):1716-1732
work page 2021
-
[10]
Machine learning based malaria prediction using clinical findings
Yadav SS, Kadam VJ, Jadhav SM, Jagtap S, Pathak PR. Machine learning based malaria prediction using clinical findings. International Conference on Emerging Smart Computing and Informatics. pp. 216-222, March 2021
work page 2021
-
[11]
Sow B, Suguri H, Mukhtar H, Ahmad HF. Using Biological Variables and Social Determinants to Predict Malaria and Anemia among Children in Senegal. IEICE Technical Report; IEICE Tech. Report. 2017; 117(336):3-20
work page 2017
-
[12]
Masinde M. Africa’s Malaria Epidemic Predictor: Application of Machine Learning on Malaria Incidence and Climate Data. ACM International Conference Proceeding Series. pp. 29-37, 2020. 11 [1] Baba E, Hamade P, Kivumbi H, Marasciulo M, Maxwell K, Moroso D, Milligan P. Effectiveness of seasonal malaria chemoprevention at scale in west and central Africa: an ...
work page 2020
-
[13]
Juhn YH. Artificial intelligence approaches using natural language processing to advance EHR-based clinical research. Journal of Allergy and Clinical Immunology. 2020; 145(2):463-469
work page 2020
-
[14]
Bucher BT, Shi J, Ferraro JP, Skarda DE, Samore MH, Hurdle JF, Finlayson SR. Portable Automated Surveillance of Surgical Site Infections Using Natural Language Processing: Development and Validation. Annals of Surgery, 2020; 272(4):629
work page 2020
-
[15]
Oteng G, Kenu E, Bandoh D, Nortey P, Afari E. Compliance with the who strategy of test, treat and track for malaria control at Bosomtwi district in Ghana. Ghana Medical Journal. 2020; 54(2):40-44
work page 2020
-
[16]
Adebayo TS, Odugbesan JA. Modeling CO 2 emissions in South Africa: empirical evidence from ARDL based bounds and wavelet coherence techniques. Environmental Science and Pollution Research. 202; 28(8):9377-9389
-
[17]
Scikit-learn: Machine Learning in Python
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Duchesnay É. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research. 2021; 12:2825-2830. 12 [1] Baba E, Hamade P, Kivumbi H, Marasciulo M, Maxwell K, Moroso D, Milligan P. Effectiveness of seasonal malaria chemoprevention at scale in west and central Africa: an ob...
work page 2021
-
[18]
Lavanya K, Rambabu P, Suresh GV, Bhandari R. Gene expression data classification with robust sparse logistic regression using fused regularisation. International Journal of Ad Hoc and Ubiquitous Computing, 2023; 42(4):281-291
work page 2023
-
[19]
Feature Space Sketching for Logistic Regression
Dexter G, Khanna R, Raheel J, Drineas P. Feature Space Sketching for Logistic Regression. arXiv preprint, 2023; arXiv:2303.14284. 13
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.