Use of What-if Scenarios to Help Explain Artificial Intelligence Models for Neonatal Health

Abdullah Mamun; David W. Britt; Hassan Ghasemzadeh; Judith Klein-Seetharaman; Lawrence D. Devoe; Mark I. Evans

arxiv: 2410.09635 · v2 · submitted 2024-10-12 · 💻 cs.LG · cs.AI

Use of What-if Scenarios to Help Explain Artificial Intelligence Models for Neonatal Health

Abdullah Mamun , Lawrence D. Devoe , Mark I. Evans , David W. Britt , Judith Klein-Seetharaman , Hassan Ghasemzadeh This is my paper

Pith reviewed 2026-05-23 18:37 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords counterfactual explanationsneonatal healthdeep learningdata augmentationintrapartum risksinterpretable AICTGANhigh-risk delivery prediction

0 comments

The pith

A neural network predicts high-risk deliveries at 0.784 F1 score and explains predictions through changes to two or three factors.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents AIMEN as a deep learning framework that forecasts adverse labor outcomes using maternal, fetal, obstetrical, and intrapartum data. It augments limited and imbalanced clinical records with CTGAN-generated synthetic samples, relaxing some feature bounds and filtering by silhouette score before training an ensemble of fully connected neural networks. AIMEN exceeds XGBoost, TabNet, DANet, and LightGBM on average F1 score while producing counterfactual explanations that require modifications to only two or three input attributes on average. These what-if scenarios are intended to reveal how specific variable adjustments could shift a prediction, supplying clinicians with concrete reasoning behind each output. The approach directly targets the lack of accurate, interpretable automated support for intrapartum decision-making.

Core claim

AIMEN is an ensemble of fully connected neural networks trained on real data plus CTGAN-augmented samples that classifies high-risk deliveries more accurately than XGBoost, TabNet, DANet, and LightGBM while generating counterfactual explanations that identify actionable changes to an average of only two to three attributes.

What carries the argument

Counterfactual explanations that search for the smallest set of input-feature modifications sufficient to flip the model's output from high-risk to low-risk.

If this is right

Higher F1 performance enables earlier detection of intrapartum risks that could prompt timely interventions to reduce adverse outcomes such as cerebral palsy.
Explanations limited to two or three attributes make the model's reasoning practical for real-time clinical review during delivery.
CTGAN augmentation with bound relaxation and silhouette filtering mitigates class imbalance and small sample size in neonatal datasets.
The ensemble architecture supports both accurate classification and the generation of minimal-change counterfactuals.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the suggested attribute changes align with known modifiable risk factors, the explanations could directly inform bedside adjustments to maternal or fetal monitoring.
The same minimal-change counterfactual technique could be adapted to other medical domains that face class imbalance and require interpretable outputs.
Prospective deployment across multiple hospitals would test whether the performance and explanation simplicity hold under varying population demographics and recording practices.

Load-bearing premise

The CTGAN synthetic samples, after relaxing feature bounds for some points and applying silhouette-score filtering, faithfully represent the true joint distribution of real clinical variables without introducing artifacts that distort the learned decision boundary or the validity of the resulting counterfactual explanations.

What would settle it

A held-out set of real deliveries where the two-to-three-attribute counterfactuals proposed by AIMEN fail to match observed changes in actual patient outcomes or established clinical guidelines.

Figures

Figures reproduced from arXiv: 2410.09635 by Abdullah Mamun, David W. Britt, Hassan Ghasemzadeh, Judith Klein-Seetharaman, Lawrence D. Devoe, Mark I. Evans.

**Figure 1.** Figure 1: AIMEN uses 34 risk factors of four categories. A machine learning model is trained and used to infer the risk of cerebral palsy. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: The AIMEN system is made of three major components: a data generator, a risk predictor, and a risk explainer. These three [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: AIMEN’s backbone is an ensemble of eight fully connected neural networks. The default AIMEN has a specific backbone [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Performance metrics on the test set using different methods of data generation. The real training data features have only [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Training, validation, and test set metrics along with the test set confusion matrix with the AIMEN system with CTGAN data [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗

**Figure 6.** Figure 6: ROC curve and the performance of the classification based on decision threshold. (a) The ROC curve using true positive rate [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗

**Figure 7.** Figure 7: Examples of CE using the nearest instance CE algorithm. Attributes to be changed are underlined. Here, a specific example [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗

**Figure 8.** Figure 8: SHAP values for 34 input features on 1385 real training data and 38 real test data using the AIMEN system (MLP_v5 backbone). [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗

read the original abstract

Early detection of intrapartum risks enables timely interventions to prevent or mitigate adverse labor outcomes such as cerebral palsy. However, accurate automated systems to support clinical decision-making during delivery are currently lacking. To address this gap, we propose Artificial Intelligence for Modeling and Explaining Neonatal Health (AIMEN), a deep learning framework that predicts adverse labor outcomes from maternal, fetal, obstetrical, and intrapartum factors while providing interpretable reasoning behind its predictions. AIMEN reveals how specific modifications to input variables could alter predicted outcomes, enhancing clinical insight. To address class imbalance and limited sample size, AIMEN employs Conditional Tabular GAN (CTGAN) for data augmentation. This process includes synthetic data generation, and we investigate in detail properties such as relaxing feature bounds for a subset of training points to explore slightly out-of-range physiological values, and applying silhouette-score-based filtering to increase the separability of synthetic samples. AIMEN uses an ensemble of fully connected neural networks for classification and outperforms state-of-the-art models such as XGBoost, TabNet, DANet, and LightGBM, achieving an average F1 score of 0.784 in predicting high-risk deliveries. Moreover, AIMEN generates counterfactual explanations that identify actionable changes involving only two to three attributes on average. Resources: https://github.com/ab9mamun/AIMEN.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The manuscript proposes AIMEN, an ensemble of fully-connected neural networks for predicting high-risk deliveries from maternal/fetal/obstetrical/intrapartum features. It augments limited real data via CTGAN, with bound relaxation on a subset of points and silhouette-score filtering, reports an average F1 of 0.784 that outperforms XGBoost/TabNet/DANet/LightGBM, and supplies counterfactual explanations that on average modify only two-to-three attributes.

Significance. If the CTGAN-augmented distribution faithfully matches the real clinical joint statistics, the work would supply a practically useful predictor together with sparse, actionable explanations for intrapartum risk. The absence of any reported fidelity diagnostics for the synthetic data, however, leaves both the performance gain and the validity of the counterfactuals unverified on true patient distributions.

major comments (3)

[Abstract] Abstract: the headline claim that AIMEN 'outperforms state-of-the-art models … achieving an average F1 score of 0.784' supplies no information on dataset size, class balance, cross-validation scheme, or statistical significance testing, rendering the central empirical result impossible to evaluate.
[Abstract] Abstract (CTGAN paragraph): the augmentation pipeline (CTGAN generation, feature-bound relaxation, silhouette-score filtering) is presented as essential to both the reported F1 gain and the counterfactuals, yet no marginal/conditional distribution checks, correlation preservation metrics, or real-only ablation results are supplied to confirm that the filtered synthetics do not distort decision boundaries or introduce separability artifacts.
[Abstract] Abstract: the claim that counterfactuals 'identify actionable changes involving only two to three attributes on average' rests on the same unvalidated augmented distribution; without fidelity evidence the sparsity and actionability of the explanations cannot be trusted on real clinical data.

minor comments (1)

The GitHub link is supplied but the manuscript does not state whether the released code includes the exact CTGAN hyperparameters, silhouette threshold, and bound-relaxation fraction used for the reported results.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract and the validation of the CTGAN augmentation. We address each major comment below and will revise the manuscript to incorporate the requested details and analyses.

read point-by-point responses

Referee: [Abstract] Abstract: the headline claim that AIMEN 'outperforms state-of-the-art models … achieving an average F1 score of 0.784' supplies no information on dataset size, class balance, cross-validation scheme, or statistical significance testing, rendering the central empirical result impossible to evaluate.

Authors: We agree that the abstract should provide sufficient context. In the revised version we will expand the abstract to report the original dataset size, class balance (imbalance ratio), the cross-validation scheme (stratified k-fold), and results of statistical significance testing against the baselines. These details already appear in the methods and results sections of the full manuscript. revision: yes
Referee: [Abstract] Abstract (CTGAN paragraph): the augmentation pipeline (CTGAN generation, feature-bound relaxation, silhouette-score filtering) is presented as essential to both the reported F1 gain and the counterfactuals, yet no marginal/conditional distribution checks, correlation preservation metrics, or real-only ablation results are supplied to confirm that the filtered synthetics do not distort decision boundaries or introduce separability artifacts.

Authors: This point is correct; the current manuscript does not report explicit fidelity diagnostics. We will add marginal and conditional distribution comparisons, correlation preservation metrics, and a real-data-only ablation study in the revision to demonstrate that the filtered synthetic samples do not introduce decision-boundary artifacts. revision: yes
Referee: [Abstract] Abstract: the claim that counterfactuals 'identify actionable changes involving only two to three attributes on average' rests on the same unvalidated augmented distribution; without fidelity evidence the sparsity and actionability of the explanations cannot be trusted on real clinical data.

Authors: We acknowledge that the counterfactual sparsity claim depends on the fidelity of the augmented distribution. The revision will include the fidelity checks noted above and, where feasible, evaluate counterfactual sparsity on held-out real samples or clearly articulate the associated assumptions and limitations in the discussion. revision: yes

Circularity Check

0 steps flagged

No circularity; performance and explanations are empirical outputs of standard supervised training

full rationale

The paper describes training an ensemble of fully connected neural networks on data augmented by CTGAN (with bound relaxation and silhouette filtering) to address imbalance, then reports F1=0.784 and 2-3 attribute counterfactuals as direct results of that training. No step equates a fitted parameter to a prediction by construction, renames a known result, or reduces the central claim to a self-citation chain or self-definition. The augmentation is presented as preprocessing whose validity is assumed rather than enforced by the reported metrics themselves. The derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

4 free parameters · 2 axioms · 0 invented entities

The central performance and explanation claims rest on the quality of the CTGAN augmentation step and on standard supervised-learning assumptions about data representativeness; the ledger therefore records the tunable components of that augmentation and the domain assumptions required for clinical utility.

free parameters (4)

CTGAN training hyperparameters
Parameters controlling the generative model that produces the synthetic samples used for augmentation.
Silhouette-score filtering threshold
Threshold applied to retain only well-separated synthetic samples.
Feature-bound relaxation fraction
Fraction or rule determining which training points are allowed to have slightly out-of-range values during augmentation.
Ensemble architecture and training hyperparameters
Number of networks, layer sizes, learning rates, etc., fitted to the augmented data.

axioms (2)

domain assumption The original clinical dataset is representative of the population on which the model will be deployed.
Required for any claim that the reported F1 will translate to real-world performance.
domain assumption Counterfactual changes identified by the model correspond to clinically feasible and causally relevant interventions.
Necessary for the explanations to be actionable rather than merely mathematical.

pith-pipeline@v0.9.0 · 5790 in / 1281 out tokens · 46048 ms · 2026-05-23T18:37:27.826816+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

GlyTwin: Digital Twin for Glucose Control in Type 1 Diabetes Through Optimal Behavioral Modifications Using Patient-Centric Counterfactuals
cs.LG 2025-04 unverdicted novelty 7.0

GlyTwin generates patient-centric counterfactual behavioral interventions to reduce hyperglycemia in type 1 diabetes, evaluated on a new dataset from 50 patients showing 85.8% valid explanations and 87.3% effectiveness.

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

Ki Hoon Ahn and Kwang-Sig Lee. 2022. Artificial intelligence in obstetrics. Obstetrics & Gynecology Science 65, 2 (2022), 113–124

work page 2022
[2]

Virginia Apgar. 1953. A proposal for a new method of evaluation of the newborn infant. Anesthesia & Analgesia 32, 4 (1953), 260–267

work page 1953
[3]

Asiful Arefeen and Hassan Ghasemzadeh. 2023. Glysim: Modeling and simulating glycemic response for behavioral lifestyle interventions. In 2023 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI). IEEE, 1–5

work page 2023
[4]

Sercan Ö Arik and Tomas Pfister. 2021. Tabnet: Attentive interpretable tabular learning. In Proceedings of the AAAI conference on artificial intelligence, Vol. 35. 6679–6687

work page 2021
[5]

Reza Rahimi Azghan, Nicholas C Glodosky, Ramesh Kumar Sah, Carrie Cuttler, Ryan McLaughlin, Michael J Cleveland, and Hassan Ghasemzadeh

work page
[6]

In 2023 IEEE 19th International Conference on Body Sensor Networks (BSN)

Personalized Modeling and Detection of Moments of Cannabis Use in Free-Living Environments. In 2023 IEEE 19th International Conference on Body Sensor Networks (BSN). IEEE, 1–4

work page 2023
[7]

Gregor Bachmann, Sotiris Anagnostidis, and Thomas Hofmann. 2024. Scaling mlps: A tale of inductive bias. Advances in Neural Information Processing Systems 36 (2024)

work page 2024
[8]

Jacques Balayla and Guy Shrem. 2019. Use of artificial intelligence (AI) in the interpretation of intrapartum fetal heart rate (FHR) tracings: a systematic review and meta-analysis. Archives of gynecology and obstetrics 300 (2019), 7–14

work page 2019
[9]

Andrew P Bradley. 1997. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern recognition 30, 7 (1997), 1145–1159

work page 1997
[10]

Dieter Brughmans, Pieter Leyman, and David Martens. 2023. Nice: an algorithm for nearest instance counterfactual explanations. Data Mining and Knowledge Discovery (2023), 1–39

work page 2023
[11]

Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 785–794

work page 2016
[12]

Lena Davidson and Mary Regina Boland. 2021. Towards deep phenotyping pregnancy: a systematic review on artificial intelligence and machine learning methods to improve pregnancy outcomes. Briefings in Bioinformatics 22, 5 (2021), bbaa369

work page 2021
[13]

Lawrence Devoe, Steven Golde, Yevgeny Kilman, Debra Morton, Kimberly Shea, and Jennifer Waller. 2000. A comparison of visual analyses of intrapartum fetal heart rate tracings according to the new national institute of child health and human development guidelines with computer 16 Mamun et al. analyses by an automated fetal heart rate monitoring system. Am...

work page 2000
[14]

Lawrence D Devoe. 2016. Future perspectives in intrapartum fetal surveillance. Best Practice & Research Clinical Obstetrics & Gynaecology 30 (2016), 98–106

work page 2016
[15]

Finale Doshi-Velez and Been Kim. 2017. Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[16]

Mark I Evans, David W Britt, Shara M Evans, and Lawrence D Devoe. 2021. Changing perspectives of electronic fetal monitoring. Reproductive Sciences (2021), 1–21

work page 2021
[17]

Joshua Guedalia, Michal Lipschuetz, Michal Novoselsky-Persky, Sarah M Cohen, Amihai Rottenstreich, Gabriel Levin, Simcha Yagel, Ron Unger, and Yishai Sompolinsky. 2020. Real-time data analysis using a machine learning model significantly improves prediction of successful vaginal deliveries. American Journal of Obstetrics and Gynecology 223, 3 (2020), 437–e1

work page 2020
[18]

J Guedalia, Y Sompolinsky, M Novoselsky Persky, SM Cohen, D Kabiri, S Yagel, R Unger, and M Lipschuetz. 2021. Prediction of severe adverse neonatal outcomes at the second stage of labour using machine learning: a retrospective cohort study.BJOG: An International Journal of Obstetrics & Gynaecology 128, 11 (2021), 1824–1832

work page 2021
[19]

Haibo He, Yang Bai, Edwardo A Garcia, and Shutao Li. 2008. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence). Ieee, 1322–1328

work page 2008
[20]

Niloofar Hezarjaribi, Sepideh Mazrouee, and Hassan Ghasemzadeh. 2017. Speech2Health: a mobile framework for monitoring dietary composition from spoken data. IEEE journal of biomedical and health informatics 22, 1 (2017), 252–264

work page 2017
[21]

Cornelius A James, Robert M Wachter, and James O Woolliscroft. 2022. Preparing clinicians for a clinical world influenced by artificial intelligence. Jama 327, 14 (2022), 1333–1334

work page 2022
[22]

Robert DF Keith, Sarah Beckley, Jonathan M Garibaldi, Jenny A Westgate, Emmanuel C Ifeachor, and Keith R Greene. 1995. A multicentre comparative study of 17 experts and an intelligent computer system for managing labour using the cardiotocogram. BJOG: an international journal of obstetrics & gynaecology 102, 9 (1995), 688–700

work page 1995
[23]

So Ling Lau, Zara Lin Zau Lok, Shuk Yi Annie Hui, Genevieve Po Gee Fung, Hugh Simon Lam, and Tak Yeung Leung. 2023. Neonatal outcome of infants with umbilical cord arterial pH less than 7. Acta Obstetricia et Gynecologica Scandinavica 102, 2 (2023), 174–180

work page 2023
[24]

Hanxiao Liu, Zihang Dai, David So, and Quoc V Le. 2021. Pay attention to mlps. Advances in neural information processing systems 34 (2021), 9204–9215

work page 2021
[25]

Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. Advances in neural information processing systems 30 (2017)

work page 2017
[26]

Yuanfei Luo, Hao Zhou, Wei-Wei Tu, Yuqiang Chen, Wenyuan Dai, and Qiang Yang. 2020. Network on network for tabular data classification in real-world applications. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2317–2326

work page 2020
[27]

Britt, Lawrence D

Abdullah Mamun, Chia-Cheng Kuo, David W. Britt, Lawrence D. Devoe, Mark I. Evans, Hassan Ghasemzadeh, and Judith Klein-Seetharaman. 2023. Neonatal Risk Modeling and Prediction. In 2023 IEEE 19th International Conference on Body Sensor Networks (BSN). 1–4. https://doi.org/10.1109/ BSN58485.2023.10331196

work page arXiv 2023
[28]

Abdullah Mamun, Krista S Leonard, Matthew P Buman, and Hassan Ghasemzadeh. 2022. Multimodal Time-Series Activity Forecasting for Adaptive Lifestyle Intervention Design. In2022 IEEE-EMBS International Conference on Wearable and Implantable Body Sensor Networks (BSN). IEEE, 1–4

work page 2022
[29]

Abdullah Mamun, Seyed Iman Mirzadeh, and Hassan Ghasemzadeh. 2022. Designing deep neural networks robust to sensor failure in mobile health environments. In 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). IEEE, 2442–2446

work page 2022
[30]

Abimbola Michael-Asalu, Genevieve Taylor, Heather Campbell, Latashia-Lika Lelea, and Russell S Kirby. 2019. Cerebral palsy: diagnosis, epidemiology, genetics, and clinical update. Advances in pediatrics 66 (2019), 189–208

work page 2019
[31]

Christoph Molnar. 2020. Interpretable machine learning. Lulu. com

work page 2020
[32]

Ramaravind K Mothilal, Amit Sharma, and Chenhao Tan. 2020. Explaining machine learning classifiers through diverse counterfactual explanations. In Proceedings of the 2020 conference on fairness, accountability, and transparency. 607–617

work page 2020
[33]

Khushboo Munir, Hassan Elahi, Afsheen Ayub, Fabrizio Frezza, and Antonello Rizzi. 2019. Cancer diagnosis using deep learning: a bibliographic review. Cancers 11, 9 (2019), 1235

work page 2019
[34]

Jun Ogasawara, Satoru Ikenoue, Hiroko Yamamoto, Motoshige Sato, Yoshifumi Kasuga, Yasue Mitsukura, Yuji Ikegaya, Masato Yasui, Mamoru Tanaka, and Daigo Ochiai. 2021. Deep neural network-based classification of cardiotocograms outperformed conventional algorithms. Scientific reports 11, 1 (2021), 13367

work page 2021
[35]

Deborah Plana, Dennis L Shung, Alyssa A Grimshaw, Anurag Saraf, Joseph JY Sung, and Benjamin H Kann. 2022. Randomized clinical trials of machine learning interventions in health care: a systematic review. JAMA Network Open 5, 9 (2022), e2233946–e2233946

work page 2022
[36]

Ninlapa Pruksanusak, Natthicha Chainarong, Siriwan Boripan, and Alan Geater. 2022. Comparison of the predictive ability for perinatal acidemia in neonates between the NICHD 3-tier FHR system combined with clinical risk factors and the fetal reserve index. Plos one 17, 10 (2022), e0276451

work page 2022
[37]

Ishraq R Rahman, Shovito Barua Soumma, and Faisal Bin Ashraf. 2022. Machine learning approaches to metastasis bladder and secondary pulmonary cancer classification using gene expression data. In 2022 25th International Conference on Computer and Information Technology (ICCIT). IEEE, 430–435

work page 2022
[38]

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, Use of What-if Scenarios to Help Explain Artificial Intelligence Models for Neo...

work page 2015
[39]

Ramyar Saeedi, Brian Schimert, and Hassan Ghasemzadeh. 2014. Cost-sensitive feature selection for on-body sensor localization. In Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct Publication. 833–842

work page 2014
[40]

Laura Sarno, Daniele Neola, Luigi Carbone, Gabriele Saccone, Annunziata Carlea, Marco Miceli, Giuseppe Gabriele Iorio, Ilenia Mappa, Giuseppe Rizzo, Raffaella Di Girolamo, et al. 2023. Use of artificial intelligence in obstetrics: not quite ready for prime time. American Journal of Obstetrics & Gynecology MFM 5, 2 (2023), 100792

work page 2023
[41]

Thomas P Sartwelle and James C Johnston. 2018. Continuous electronic fetal monitoring during labor: a critique and a reply to contemporary proponents. The Surgery Journal 4, 01 (2018), e23–e28

work page 2018
[42]

Ketan Rajshekhar Shahapure and Charles Nicholas. 2020. Cluster quality analysis using silhouette score. In 2020 IEEE 7th international conference on data science and advanced analytics (DSAA). IEEE, 747–748

work page 2020
[43]

Sherif A Shazly, Bijan J Borah, Che G Ngufor, Vanessa E Torbenson, Regan N Theiler, and Abimbola O Famuyide. 2022. Impact of labor characteristics on maternal and neonatal outcomes of labor: A machine-learning model. Plos one 17, 8 (2022), e0273178

work page 2022
[44]

Edoardo Spairani, Beniamino Daniele, Maria Gabriella Signorini, and Giovanni Magenes. 2022. A deep learning mixed-data type approach for the classification of FHR signals. Frontiers in Bioengineering and Biotechnology 10 (2022)

work page 2022
[45]

Max Wiznitzer. 2017. Electronic fetal monitoring: are we asking the correct questions? , 344–345 pages

work page 2017
[46]

Lei Xu, Maria Skoularidou, Alfredo Cuesta-Infante, and Kalyan Veeramachaneni. 2019. Modeling tabular data using conditional gan. Advances in neural information processing systems 32 (2019)

work page 2019
[47]

Ming Zeng, Le T Nguyen, Bo Yu, Ole J Mengshoel, Jiang Zhu, Pang Wu, and Joy Zhang. 2014. Convolutional neural networks for human activity recognition using mobile sensors. In 6th international conference on mobile computing, applications and services. IEEE, 197–205

work page 2014
[48]

Jun Zhang, Helain J Landy, D Ware Branch, Ronald Burkman, Shoshana Haberman, Kimberly D Gregory, Christos G Hatjis, Mildred M Ramirez, Jennifer L Bailit, Victor H Gonzalez-Quintero, et al. 2010. Contemporary patterns of spontaneous labor with normal neonatal outcomes. Obstetrics and gynecology 116, 6 (2010), 1281

work page 2010

[1] [1]

Ki Hoon Ahn and Kwang-Sig Lee. 2022. Artificial intelligence in obstetrics. Obstetrics & Gynecology Science 65, 2 (2022), 113–124

work page 2022

[2] [2]

Virginia Apgar. 1953. A proposal for a new method of evaluation of the newborn infant. Anesthesia & Analgesia 32, 4 (1953), 260–267

work page 1953

[3] [3]

Asiful Arefeen and Hassan Ghasemzadeh. 2023. Glysim: Modeling and simulating glycemic response for behavioral lifestyle interventions. In 2023 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI). IEEE, 1–5

work page 2023

[4] [4]

Sercan Ö Arik and Tomas Pfister. 2021. Tabnet: Attentive interpretable tabular learning. In Proceedings of the AAAI conference on artificial intelligence, Vol. 35. 6679–6687

work page 2021

[5] [5]

Reza Rahimi Azghan, Nicholas C Glodosky, Ramesh Kumar Sah, Carrie Cuttler, Ryan McLaughlin, Michael J Cleveland, and Hassan Ghasemzadeh

work page

[6] [6]

In 2023 IEEE 19th International Conference on Body Sensor Networks (BSN)

Personalized Modeling and Detection of Moments of Cannabis Use in Free-Living Environments. In 2023 IEEE 19th International Conference on Body Sensor Networks (BSN). IEEE, 1–4

work page 2023

[7] [7]

Gregor Bachmann, Sotiris Anagnostidis, and Thomas Hofmann. 2024. Scaling mlps: A tale of inductive bias. Advances in Neural Information Processing Systems 36 (2024)

work page 2024

[8] [8]

Jacques Balayla and Guy Shrem. 2019. Use of artificial intelligence (AI) in the interpretation of intrapartum fetal heart rate (FHR) tracings: a systematic review and meta-analysis. Archives of gynecology and obstetrics 300 (2019), 7–14

work page 2019

[9] [9]

Andrew P Bradley. 1997. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern recognition 30, 7 (1997), 1145–1159

work page 1997

[10] [10]

Dieter Brughmans, Pieter Leyman, and David Martens. 2023. Nice: an algorithm for nearest instance counterfactual explanations. Data Mining and Knowledge Discovery (2023), 1–39

work page 2023

[11] [11]

Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 785–794

work page 2016

[12] [12]

Lena Davidson and Mary Regina Boland. 2021. Towards deep phenotyping pregnancy: a systematic review on artificial intelligence and machine learning methods to improve pregnancy outcomes. Briefings in Bioinformatics 22, 5 (2021), bbaa369

work page 2021

[13] [13]

Lawrence Devoe, Steven Golde, Yevgeny Kilman, Debra Morton, Kimberly Shea, and Jennifer Waller. 2000. A comparison of visual analyses of intrapartum fetal heart rate tracings according to the new national institute of child health and human development guidelines with computer 16 Mamun et al. analyses by an automated fetal heart rate monitoring system. Am...

work page 2000

[14] [14]

Lawrence D Devoe. 2016. Future perspectives in intrapartum fetal surveillance. Best Practice & Research Clinical Obstetrics & Gynaecology 30 (2016), 98–106

work page 2016

[15] [15]

Finale Doshi-Velez and Been Kim. 2017. Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[16] [16]

Mark I Evans, David W Britt, Shara M Evans, and Lawrence D Devoe. 2021. Changing perspectives of electronic fetal monitoring. Reproductive Sciences (2021), 1–21

work page 2021

[17] [17]

Joshua Guedalia, Michal Lipschuetz, Michal Novoselsky-Persky, Sarah M Cohen, Amihai Rottenstreich, Gabriel Levin, Simcha Yagel, Ron Unger, and Yishai Sompolinsky. 2020. Real-time data analysis using a machine learning model significantly improves prediction of successful vaginal deliveries. American Journal of Obstetrics and Gynecology 223, 3 (2020), 437–e1

work page 2020

[18] [18]

J Guedalia, Y Sompolinsky, M Novoselsky Persky, SM Cohen, D Kabiri, S Yagel, R Unger, and M Lipschuetz. 2021. Prediction of severe adverse neonatal outcomes at the second stage of labour using machine learning: a retrospective cohort study.BJOG: An International Journal of Obstetrics & Gynaecology 128, 11 (2021), 1824–1832

work page 2021

[19] [19]

Haibo He, Yang Bai, Edwardo A Garcia, and Shutao Li. 2008. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence). Ieee, 1322–1328

work page 2008

[20] [20]

Niloofar Hezarjaribi, Sepideh Mazrouee, and Hassan Ghasemzadeh. 2017. Speech2Health: a mobile framework for monitoring dietary composition from spoken data. IEEE journal of biomedical and health informatics 22, 1 (2017), 252–264

work page 2017

[21] [21]

Cornelius A James, Robert M Wachter, and James O Woolliscroft. 2022. Preparing clinicians for a clinical world influenced by artificial intelligence. Jama 327, 14 (2022), 1333–1334

work page 2022

[22] [22]

Robert DF Keith, Sarah Beckley, Jonathan M Garibaldi, Jenny A Westgate, Emmanuel C Ifeachor, and Keith R Greene. 1995. A multicentre comparative study of 17 experts and an intelligent computer system for managing labour using the cardiotocogram. BJOG: an international journal of obstetrics & gynaecology 102, 9 (1995), 688–700

work page 1995

[23] [23]

So Ling Lau, Zara Lin Zau Lok, Shuk Yi Annie Hui, Genevieve Po Gee Fung, Hugh Simon Lam, and Tak Yeung Leung. 2023. Neonatal outcome of infants with umbilical cord arterial pH less than 7. Acta Obstetricia et Gynecologica Scandinavica 102, 2 (2023), 174–180

work page 2023

[24] [24]

Hanxiao Liu, Zihang Dai, David So, and Quoc V Le. 2021. Pay attention to mlps. Advances in neural information processing systems 34 (2021), 9204–9215

work page 2021

[25] [25]

Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. Advances in neural information processing systems 30 (2017)

work page 2017

[26] [26]

Yuanfei Luo, Hao Zhou, Wei-Wei Tu, Yuqiang Chen, Wenyuan Dai, and Qiang Yang. 2020. Network on network for tabular data classification in real-world applications. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2317–2326

work page 2020

[27] [27]

Britt, Lawrence D

Abdullah Mamun, Chia-Cheng Kuo, David W. Britt, Lawrence D. Devoe, Mark I. Evans, Hassan Ghasemzadeh, and Judith Klein-Seetharaman. 2023. Neonatal Risk Modeling and Prediction. In 2023 IEEE 19th International Conference on Body Sensor Networks (BSN). 1–4. https://doi.org/10.1109/ BSN58485.2023.10331196

work page arXiv 2023

[28] [28]

Abdullah Mamun, Krista S Leonard, Matthew P Buman, and Hassan Ghasemzadeh. 2022. Multimodal Time-Series Activity Forecasting for Adaptive Lifestyle Intervention Design. In2022 IEEE-EMBS International Conference on Wearable and Implantable Body Sensor Networks (BSN). IEEE, 1–4

work page 2022

[29] [29]

Abdullah Mamun, Seyed Iman Mirzadeh, and Hassan Ghasemzadeh. 2022. Designing deep neural networks robust to sensor failure in mobile health environments. In 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). IEEE, 2442–2446

work page 2022

[30] [30]

Abimbola Michael-Asalu, Genevieve Taylor, Heather Campbell, Latashia-Lika Lelea, and Russell S Kirby. 2019. Cerebral palsy: diagnosis, epidemiology, genetics, and clinical update. Advances in pediatrics 66 (2019), 189–208

work page 2019

[31] [31]

Christoph Molnar. 2020. Interpretable machine learning. Lulu. com

work page 2020

[32] [32]

Ramaravind K Mothilal, Amit Sharma, and Chenhao Tan. 2020. Explaining machine learning classifiers through diverse counterfactual explanations. In Proceedings of the 2020 conference on fairness, accountability, and transparency. 607–617

work page 2020

[33] [33]

Khushboo Munir, Hassan Elahi, Afsheen Ayub, Fabrizio Frezza, and Antonello Rizzi. 2019. Cancer diagnosis using deep learning: a bibliographic review. Cancers 11, 9 (2019), 1235

work page 2019

[34] [34]

Jun Ogasawara, Satoru Ikenoue, Hiroko Yamamoto, Motoshige Sato, Yoshifumi Kasuga, Yasue Mitsukura, Yuji Ikegaya, Masato Yasui, Mamoru Tanaka, and Daigo Ochiai. 2021. Deep neural network-based classification of cardiotocograms outperformed conventional algorithms. Scientific reports 11, 1 (2021), 13367

work page 2021

[35] [35]

Deborah Plana, Dennis L Shung, Alyssa A Grimshaw, Anurag Saraf, Joseph JY Sung, and Benjamin H Kann. 2022. Randomized clinical trials of machine learning interventions in health care: a systematic review. JAMA Network Open 5, 9 (2022), e2233946–e2233946

work page 2022

[36] [36]

Ninlapa Pruksanusak, Natthicha Chainarong, Siriwan Boripan, and Alan Geater. 2022. Comparison of the predictive ability for perinatal acidemia in neonates between the NICHD 3-tier FHR system combined with clinical risk factors and the fetal reserve index. Plos one 17, 10 (2022), e0276451

work page 2022

[37] [37]

Ishraq R Rahman, Shovito Barua Soumma, and Faisal Bin Ashraf. 2022. Machine learning approaches to metastasis bladder and secondary pulmonary cancer classification using gene expression data. In 2022 25th International Conference on Computer and Information Technology (ICCIT). IEEE, 430–435

work page 2022

[38] [38]

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, Use of What-if Scenarios to Help Explain Artificial Intelligence Models for Neo...

work page 2015

[39] [39]

Ramyar Saeedi, Brian Schimert, and Hassan Ghasemzadeh. 2014. Cost-sensitive feature selection for on-body sensor localization. In Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct Publication. 833–842

work page 2014

[40] [40]

Laura Sarno, Daniele Neola, Luigi Carbone, Gabriele Saccone, Annunziata Carlea, Marco Miceli, Giuseppe Gabriele Iorio, Ilenia Mappa, Giuseppe Rizzo, Raffaella Di Girolamo, et al. 2023. Use of artificial intelligence in obstetrics: not quite ready for prime time. American Journal of Obstetrics & Gynecology MFM 5, 2 (2023), 100792

work page 2023

[41] [41]

Thomas P Sartwelle and James C Johnston. 2018. Continuous electronic fetal monitoring during labor: a critique and a reply to contemporary proponents. The Surgery Journal 4, 01 (2018), e23–e28

work page 2018

[42] [42]

Ketan Rajshekhar Shahapure and Charles Nicholas. 2020. Cluster quality analysis using silhouette score. In 2020 IEEE 7th international conference on data science and advanced analytics (DSAA). IEEE, 747–748

work page 2020

[43] [43]

Sherif A Shazly, Bijan J Borah, Che G Ngufor, Vanessa E Torbenson, Regan N Theiler, and Abimbola O Famuyide. 2022. Impact of labor characteristics on maternal and neonatal outcomes of labor: A machine-learning model. Plos one 17, 8 (2022), e0273178

work page 2022

[44] [44]

Edoardo Spairani, Beniamino Daniele, Maria Gabriella Signorini, and Giovanni Magenes. 2022. A deep learning mixed-data type approach for the classification of FHR signals. Frontiers in Bioengineering and Biotechnology 10 (2022)

work page 2022

[45] [45]

Max Wiznitzer. 2017. Electronic fetal monitoring: are we asking the correct questions? , 344–345 pages

work page 2017

[46] [46]

Lei Xu, Maria Skoularidou, Alfredo Cuesta-Infante, and Kalyan Veeramachaneni. 2019. Modeling tabular data using conditional gan. Advances in neural information processing systems 32 (2019)

work page 2019

[47] [47]

Ming Zeng, Le T Nguyen, Bo Yu, Ole J Mengshoel, Jiang Zhu, Pang Wu, and Joy Zhang. 2014. Convolutional neural networks for human activity recognition using mobile sensors. In 6th international conference on mobile computing, applications and services. IEEE, 197–205

work page 2014

[48] [48]

Jun Zhang, Helain J Landy, D Ware Branch, Ronald Burkman, Shoshana Haberman, Kimberly D Gregory, Christos G Hatjis, Mildred M Ramirez, Jennifer L Bailit, Victor H Gonzalez-Quintero, et al. 2010. Contemporary patterns of spontaneous labor with normal neonatal outcomes. Obstetrics and gynecology 116, 6 (2010), 1281

work page 2010