Interpretable Machine Learning for Antepartum Prediction of Pregnancy-Associated Thrombotic Microangiopathy Using Routine Longitudinal Laboratory Data
Pith reviewed 2026-05-14 19:12 UTC · model grok-4.3
Add this Pith Number to your LaTeX paper
What is a Pith Number?\usepackage{pith}
\pithnumber{6RA7CANS}
Prints a linked pith:6RA7CANS badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more
The pith
Gradient boosting on routine longitudinal lab tests predicts pregnancy-associated thrombotic microangiopathy risk with AUROC 0.872.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A gradient boosting classifier trained on 146 longitudinal laboratory predictors from routine prenatal care was selected by cross-validation and achieved an AUROC of 0.872 (95% CI: 0.769-0.952) and AUPRC of 0.883 (95% CI: 0.780-0.959) in a held-out test cohort, with sensitivity 0.750 and specificity 0.812; interpretability analyses highlighted clinically plausible signals including cystatin C at week 6 as an early indicator.
What carries the argument
Gradient boosting ensemble applied to 146 longitudinal laboratory variables to extract time-dependent multidimensional risk signatures for P-TMA.
Load-bearing premise
The single-center retrospective cohort of 300 pregnancies represents future patients and the held-out test performance will generalize without external validation.
What would settle it
A prospective multi-center study reporting AUROC below 0.75 on new patients would indicate the model does not reliably predict P-TMA from routine labs.
Figures
read the original abstract
Background: Pregnancy-associated thrombotic microangiopathy (P-TMA) is rare but life-threatening. Early risk prediction before overt clinical presentation remains challenging, as the associated laboratory abnormalities are subtle, multidimensional, and frequently masked by common physiological changes such as gestational thrombocytopenia and pregnancy-related proteinuria, thus overlapping heavily with benign obstetric and renal conditions. This complexity is poorly captured by univariate or rule-based approaches; however, it is addressable by machine learning, which can extract latent, time-dependent risk signatures from longitudinal clinical tests. Methods: This retrospective study included 300 pregnancies comprising 142 P-TMA cases and 158 controls. After exclusion of identifiers and non-informative variables, 146 longitudinal laboratory predictors were retained. Participants were divided into a training cohort (80%) and a held-out test cohort (20%) using stratified sampling. Five algorithms were evaluated: logistic regression, support vector machine with radial basis function kernel, random forest, extra trees, and gradient boosting. The final model was selected by mean cross-validated AUROC, refitted on the full training cohort, and evaluated once in the held-out test cohort. Interpretability analyses examined global feature importance and distributional patterns of leading predictors. Results: Gradient boosting was prespecified by cross-validation in the training cohort. The model achieved an AUROC of 0.872 (95% CI: 0.769-0.952) and an AUPRC of 0.883 (95% CI: 0.780-0.959) in a held-out test cohort, with sensitivity of 0.750 and specificity of 0.812. Conclusions: Longitudinal clinical laboratory tests obtained during routine care contained informative and clinically plausible signals for P-TMA risk. Notably, cystatin C at week 6 showed promise as an early monitoring indicator.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a retrospective single-center study of 300 pregnancies (142 P-TMA cases) that trains five classifiers on 146 longitudinal laboratory predictors, selects gradient boosting by cross-validated AUROC, refits on the full training set, and reports AUROC 0.872 (95% CI 0.769-0.952) and AUPRC 0.883 on a 20% stratified held-out test set, together with global feature-importance and distributional analyses that highlight cystatin C at week 6.
Significance. If the performance generalizes, the work supplies a concrete, interpretable route to early risk stratification for a rare, high-mortality obstetric condition using only routine labs; the held-out evaluation with confidence intervals and the emphasis on longitudinal signals are clear strengths that distinguish it from univariate or rule-based approaches.
major comments (2)
- [Methods] Methods, cohort and validation design: the single-center retrospective sample of 300 pregnancies is split 80/20 by stratified random sampling with no external, multi-center, or temporal validation; this directly undermines the claim that the AUROC of 0.872 reflects transportable risk signatures, given the small test size (~60 pregnancies) and wide confidence intervals.
- [Methods] Methods, data preprocessing: neither the handling of class imbalance (142 vs 158) nor the treatment of missing longitudinal laboratory values is described; both choices are load-bearing for the cross-validation model selection and the reported test metrics.
minor comments (2)
- [Results] Results: report the number of pregnancies and events in the final training and test partitions explicitly, and state whether any predictor standardization or imputation was performed before fitting.
- [Abstract] Abstract and Conclusions: the phrase 'prespecified by cross-validation' is ambiguous; clarify whether the algorithm choice was fixed before seeing test performance or selected after inspecting CV results.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment below with clarifications on our methods and explicit acknowledgment of limitations. Revisions will be made to improve transparency on preprocessing and to expand discussion of generalizability.
read point-by-point responses
-
Referee: [Methods] Methods, cohort and validation design: the single-center retrospective sample of 300 pregnancies is split 80/20 by stratified random sampling with no external, multi-center, or temporal validation; this directly undermines the claim that the AUROC of 0.872 reflects transportable risk signatures, given the small test size (~60 pregnancies) and wide confidence intervals.
Authors: We agree that the single-center retrospective design and lack of external validation constrain claims of transportability. The rarity of P-TMA makes multi-center data collection resource-intensive and was not feasible here. The 80/20 stratified split and reported 95% CIs (0.769-0.952) are presented to convey uncertainty on the small held-out set. In revision we will add an expanded limitations paragraph in the Discussion that explicitly discusses the need for prospective multi-center validation and will moderate language on generalizability of the performance metrics. revision: partial
-
Referee: [Methods] Methods, data preprocessing: neither the handling of class imbalance (142 vs 158) nor the treatment of missing longitudinal laboratory values is described; both choices are load-bearing for the cross-validation model selection and the reported test metrics.
Authors: We apologize for this omission. Class imbalance was handled via the class_weight='balanced' parameter in the GradientBoostingClassifier, which weights samples inversely to class frequencies. Missing longitudinal values were addressed by forward-fill imputation within each pregnancy's time series (to preserve temporal ordering) followed by cohort-level median imputation for residual gaps. These steps were performed inside each cross-validation fold. We will insert a dedicated 'Data Preprocessing' subsection in Methods describing these choices, with pseudocode and a note on reproducibility. revision: yes
- External, multi-center, or temporal validation of the model performance, as the study uses a single-center retrospective cohort and no additional independent datasets are available.
Circularity Check
No circularity: standard ML train/test split with CV model selection yields independent held-out metrics
full rationale
The paper performs an 80/20 stratified split, selects gradient boosting via cross-validated AUROC on the training portion only, then reports a single evaluation on the untouched held-out test cohort. No equations, parameters, or self-citations reduce the reported AUROC/AUPRC to a quantity that is true by construction. The performance numbers are ordinary empirical estimates on unseen data; the central claim does not collapse into its own inputs.
Axiom & Free-Parameter Ledger
free parameters (1)
- gradient boosting hyperparameters
axioms (1)
- domain assumption Held-out test set performance estimates true generalization error
Reference graph
Works this paper leans on
-
[1]
Thrombotic microangiopathy in pregnancy: Current understanding and management strategies[J]
Urra M, Lyons S, Teodosiu C G, et al. Thrombotic microangiopathy in pregnancy: Current understanding and management strategies[J]. Kidney International Reports, 2024, 9(8): 2353-2371. [2] Fakhouri F, Scully M, Provôt F, et al. Management of thrombotic microangiopathy in pregnancy and postpartum: Report from an international working group[J]. Blood, 2020, ...
work page 2024
-
[2]
Machine learning models for predicting preeclampsia: A systematic review[J]
Ranjbar A, Montazeri F, Ghamsari S R, et al. Machine learning models for predicting preeclampsia: A systematic review[J]. BMC Pregnancy and Childbirth, 2024, 24: 6. [19] Mustafa H J, Kalafat E, Prasad S, et al. Prediction of hypertension and diabetes in twin pregnancy using machine learning model based on characteristics at first prenatal visit: National ...
work page 2024
-
[3]
Cystatin C versus creatinine in determining risk based on kidney function[J]
Shlipak M G, Matsushita K, Ärnlöv J, et al. Cystatin C versus creatinine in determining risk based on kidney function[J]. The New England Journal of Medicine, 2013, 369(10): 932-943. [35] Bellos I, Fitrou G, Daskalakis G, et al. Serum cystatin-c as predictive factor of preeclampsia: A meta-analysis of 27 observational studies[J]. Pregnancy Hypertension, 2...
work page 2013
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.