arxiv: 2605.13786 · v1 · pith:6RA7CANSnew · submitted 2026-05-13 · 💻 cs.LG

Interpretable Machine Learning for Antepartum Prediction of Pregnancy-Associated Thrombotic Microangiopathy Using Routine Longitudinal Laboratory Data

Chuanchuan Sun , Zhen Yu , Qin Fan , Qingchao Chen , Feng Yu This is my paper

Pith reviewed 2026-05-14 19:12 UTC · model grok-4.3

classification 💻 cs.LG

keywords machine learningpregnancythrombotic microangiopathylongitudinal laboratory datagradient boostingrisk predictioninterpretability

0 comments p. Extension

Add this Pith Number to your LaTeX paper

What is a Pith Number?

\usepackage{pith}
\pithnumber{6RA7CANS}

Prints a linked pith:6RA7CANS badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

The pith

Gradient boosting on routine longitudinal lab tests predicts pregnancy-associated thrombotic microangiopathy risk with AUROC 0.872.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops and validates a machine learning approach to forecast the rare, life-threatening condition of pregnancy-associated thrombotic microangiopathy using only data from standard prenatal blood tests collected over time. It demonstrates that complex, subtle patterns across many lab values can be extracted to flag risk before symptoms appear, even when those values overlap with normal pregnancy changes. A reader would care because early identification from existing records could support closer monitoring or timely intervention in affected pregnancies. The study trained five algorithms on 300 cases, selected gradient boosting via cross-validation, and confirmed solid performance in a held-out test set.

Core claim

A gradient boosting classifier trained on 146 longitudinal laboratory predictors from routine prenatal care was selected by cross-validation and achieved an AUROC of 0.872 (95% CI: 0.769-0.952) and AUPRC of 0.883 (95% CI: 0.780-0.959) in a held-out test cohort, with sensitivity 0.750 and specificity 0.812; interpretability analyses highlighted clinically plausible signals including cystatin C at week 6 as an early indicator.

What carries the argument

Gradient boosting ensemble applied to 146 longitudinal laboratory variables to extract time-dependent multidimensional risk signatures for P-TMA.

Load-bearing premise

The single-center retrospective cohort of 300 pregnancies represents future patients and the held-out test performance will generalize without external validation.

What would settle it

A prospective multi-center study reporting AUROC below 0.75 on new patients would indicate the model does not reliably predict P-TMA from routine labs.

Figures

Figures reproduced from arXiv: 2605.13786 by Chuanchuan Sun, Feng Yu, Qin Fan, Qingchao Chen, Zhen Yu.

**Figure 3.** Figure 3: Held-out test utility and selected-model performance. (A) Held-out test AUROC across candidate models. (B) Selected model (Gradient boosting). (C) Decision curve analysis. (D) Predicted probability distribution in the held-out test cohort. 3.3 Clinical utility and risk Decision-curve analysis demonstrated positive net benefit for the selected model across a clinically relevant range of threshold probabilit… view at source ↗

**Figure 4.** Figure 4: Model interpretation and leading laboratory signatures. [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗

read the original abstract

Background: Pregnancy-associated thrombotic microangiopathy (P-TMA) is rare but life-threatening. Early risk prediction before overt clinical presentation remains challenging, as the associated laboratory abnormalities are subtle, multidimensional, and frequently masked by common physiological changes such as gestational thrombocytopenia and pregnancy-related proteinuria, thus overlapping heavily with benign obstetric and renal conditions. This complexity is poorly captured by univariate or rule-based approaches; however, it is addressable by machine learning, which can extract latent, time-dependent risk signatures from longitudinal clinical tests. Methods: This retrospective study included 300 pregnancies comprising 142 P-TMA cases and 158 controls. After exclusion of identifiers and non-informative variables, 146 longitudinal laboratory predictors were retained. Participants were divided into a training cohort (80%) and a held-out test cohort (20%) using stratified sampling. Five algorithms were evaluated: logistic regression, support vector machine with radial basis function kernel, random forest, extra trees, and gradient boosting. The final model was selected by mean cross-validated AUROC, refitted on the full training cohort, and evaluated once in the held-out test cohort. Interpretability analyses examined global feature importance and distributional patterns of leading predictors. Results: Gradient boosting was prespecified by cross-validation in the training cohort. The model achieved an AUROC of 0.872 (95% CI: 0.769-0.952) and an AUPRC of 0.883 (95% CI: 0.780-0.959) in a held-out test cohort, with sensitivity of 0.750 and specificity of 0.812. Conclusions: Longitudinal clinical laboratory tests obtained during routine care contained informative and clinically plausible signals for P-TMA risk. Notably, cystatin C at week 6 showed promise as an early monitoring indicator.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Gradient boosting on routine longitudinal labs hits AUROC 0.87 internally for P-TMA but the single-center retrospective sample of 300 pregnancies makes generalization uncertain without external validation.

read the letter

The core result is that gradient boosting, chosen via cross-validation among five standard classifiers, reaches an AUROC of 0.872 (CI 0.769-0.952) and AUPRC 0.883 on a 20% held-out test set from 300 pregnancies (142 cases). That number comes from 146 longitudinal lab predictors and includes some interpretability checks, such as highlighting cystatin C at week 6. The work is new in the narrow sense that no prior paper applies this exact pipeline to antepartum P-TMA prediction from routine data alone. It does a few things cleanly: it reports confidence intervals, uses stratified sampling, and compares multiple algorithms rather than just reporting one favorite. The feature-importance step also points to clinically plausible signals instead of black-box output only. The main limitation is the data setup. A single-center retrospective cohort of this size for a rare condition leaves room for site-specific lab practices, unmeasured confounders, and demographic shifts to drive the performance. The test set is only about 60 pregnancies, the confidence intervals are wide, and there is no external or temporal validation beyond the random split. Handling of missing longitudinal values and class imbalance is not detailed in the abstract. This paper is aimed at maternal-fetal medicine groups that already run ML pilots on electronic records and want a concrete example for this particular endpoint. A reader who needs a ready-to-deploy tool will find the evidence too thin; someone looking for a starting point on longitudinal lab modeling in obstetrics can extract useful details. It deserves peer review so referees can press on the validation gap and check the full methods for data leakage or preprocessing choices.

Referee Report

2 major / 2 minor

Summary. The manuscript presents a retrospective single-center study of 300 pregnancies (142 P-TMA cases) that trains five classifiers on 146 longitudinal laboratory predictors, selects gradient boosting by cross-validated AUROC, refits on the full training set, and reports AUROC 0.872 (95% CI 0.769-0.952) and AUPRC 0.883 on a 20% stratified held-out test set, together with global feature-importance and distributional analyses that highlight cystatin C at week 6.

Significance. If the performance generalizes, the work supplies a concrete, interpretable route to early risk stratification for a rare, high-mortality obstetric condition using only routine labs; the held-out evaluation with confidence intervals and the emphasis on longitudinal signals are clear strengths that distinguish it from univariate or rule-based approaches.

major comments (2)

[Methods] Methods, cohort and validation design: the single-center retrospective sample of 300 pregnancies is split 80/20 by stratified random sampling with no external, multi-center, or temporal validation; this directly undermines the claim that the AUROC of 0.872 reflects transportable risk signatures, given the small test size (~60 pregnancies) and wide confidence intervals.
[Methods] Methods, data preprocessing: neither the handling of class imbalance (142 vs 158) nor the treatment of missing longitudinal laboratory values is described; both choices are load-bearing for the cross-validation model selection and the reported test metrics.

minor comments (2)

[Results] Results: report the number of pregnancies and events in the final training and test partitions explicitly, and state whether any predictor standardization or imputation was performed before fitting.
[Abstract] Abstract and Conclusions: the phrase 'prespecified by cross-validation' is ambiguous; clarify whether the algorithm choice was fixed before seeing test performance or selected after inspecting CV results.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment below with clarifications on our methods and explicit acknowledgment of limitations. Revisions will be made to improve transparency on preprocessing and to expand discussion of generalizability.

read point-by-point responses

Referee: [Methods] Methods, cohort and validation design: the single-center retrospective sample of 300 pregnancies is split 80/20 by stratified random sampling with no external, multi-center, or temporal validation; this directly undermines the claim that the AUROC of 0.872 reflects transportable risk signatures, given the small test size (~60 pregnancies) and wide confidence intervals.

Authors: We agree that the single-center retrospective design and lack of external validation constrain claims of transportability. The rarity of P-TMA makes multi-center data collection resource-intensive and was not feasible here. The 80/20 stratified split and reported 95% CIs (0.769-0.952) are presented to convey uncertainty on the small held-out set. In revision we will add an expanded limitations paragraph in the Discussion that explicitly discusses the need for prospective multi-center validation and will moderate language on generalizability of the performance metrics. revision: partial
Referee: [Methods] Methods, data preprocessing: neither the handling of class imbalance (142 vs 158) nor the treatment of missing longitudinal laboratory values is described; both choices are load-bearing for the cross-validation model selection and the reported test metrics.

Authors: We apologize for this omission. Class imbalance was handled via the class_weight='balanced' parameter in the GradientBoostingClassifier, which weights samples inversely to class frequencies. Missing longitudinal values were addressed by forward-fill imputation within each pregnancy's time series (to preserve temporal ordering) followed by cohort-level median imputation for residual gaps. These steps were performed inside each cross-validation fold. We will insert a dedicated 'Data Preprocessing' subsection in Methods describing these choices, with pseudocode and a note on reproducibility. revision: yes

standing simulated objections not resolved

External, multi-center, or temporal validation of the model performance, as the study uses a single-center retrospective cohort and no additional independent datasets are available.

Circularity Check

0 steps flagged

No circularity: standard ML train/test split with CV model selection yields independent held-out metrics

full rationale

The paper performs an 80/20 stratified split, selects gradient boosting via cross-validated AUROC on the training portion only, then reports a single evaluation on the untouched held-out test cohort. No equations, parameters, or self-citations reduce the reported AUROC/AUPRC to a quantity that is true by construction. The performance numbers are ordinary empirical estimates on unseen data; the central claim does not collapse into its own inputs.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The claim rests on standard supervised learning assumptions plus retrospective data representativeness; no new entities are postulated.

free parameters (1)

gradient boosting hyperparameters
Tuned via cross-validation on the training cohort; exact values not reported in abstract.

axioms (1)

domain assumption Held-out test set performance estimates true generalization error
Standard ML evaluation assumption invoked by the train/test split description.

pith-pipeline@v0.9.0 · 5652 in / 1246 out tokens · 30924 ms · 2026-05-14T19:12:24.161586+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

3 extracted references · 3 canonical work pages

[1]

Thrombotic microangiopathy in pregnancy: Current understanding and management strategies[J]

Urra M, Lyons S, Teodosiu C G, et al. Thrombotic microangiopathy in pregnancy: Current understanding and management strategies[J]. Kidney International Reports, 2024, 9(8): 2353-2371. [2] Fakhouri F, Scully M, Provôt F, et al. Management of thrombotic microangiopathy in pregnancy and postpartum: Report from an international working group[J]. Blood, 2020, ...

work page 2024
[2]

Machine learning models for predicting preeclampsia: A systematic review[J]

Ranjbar A, Montazeri F, Ghamsari S R, et al. Machine learning models for predicting preeclampsia: A systematic review[J]. BMC Pregnancy and Childbirth, 2024, 24: 6. [19] Mustafa H J, Kalafat E, Prasad S, et al. Prediction of hypertension and diabetes in twin pregnancy using machine learning model based on characteristics at first prenatal visit: National ...

work page 2024
[3]

Cystatin C versus creatinine in determining risk based on kidney function[J]

Shlipak M G, Matsushita K, Ärnlöv J, et al. Cystatin C versus creatinine in determining risk based on kidney function[J]. The New England Journal of Medicine, 2013, 369(10): 932-943. [35] Bellos I, Fitrou G, Daskalakis G, et al. Serum cystatin-c as predictive factor of preeclampsia: A meta-analysis of 27 observational studies[J]. Pregnancy Hypertension, 2...

work page 2013