Deep learning-based detection of cessation of breathing in pre-term infants
Pith reviewed 2026-06-26 09:17 UTC · model grok-4.3
The pith
Deep learning models detect cessation of breathing events in pre-term infants from routine NICU signals at up to 88.7 percent balanced accuracy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A ConvNeXt architecture that combines impedance pneumography and photoplethysmography inputs achieves 88.7 percent balanced accuracy and an F1 score of 0.75 for detecting labelled COBE events on an independent test set, outperforming ECG- or PPG-only models and showing only modest gains from further multimodal fusion.
What carries the argument
ConvNeXt convolutional network trained on short segments of impedance pneumography and photoplethysmography waveforms, with signal modality as the dominant performance driver.
If this is right
- Impedance pneumography alone yields balanced accuracies of 86.8 to 88.0 percent, exceeding ECG-derived and PPG-derived respiratory surrogates.
- Adding PPG to IP inputs produces a small further lift while ECG adds little value.
- Architectural differences matter less than signal choice in this data-limited setting.
- The approach uses only existing bedside monitors and therefore requires no new hardware.
Where Pith is reading between the lines
- If the accuracy holds in prospective use, bedside alarms could be filtered to reduce the high false-positive rates that currently desensitise staff.
- The same models could be retrained on larger multi-centre datasets to test generalisation across different NICU equipment and infant populations.
- Because the method relies on short waveform segments rather than long time-series, it may be deployable on existing low-power bedside hardware.
Load-bearing premise
Annotations produced by three independent reviewers form accurate ground truth for clinically meaningful cessation-of-breathing events.
What would settle it
A replication study in which the same signals are re-annotated by a new panel of reviewers and the resulting labels produce substantially lower model performance on the original test set.
Figures
read the original abstract
Apnoea of prematurity is characterised by recurrent episodes of cessation of breathing and remains difficult to detect reliably using routinely monitored physiological signals in the Neonatal Intensive Care Unit (NICU). Existing bedside monitors rely primarily on respiratory rate and oxygen saturation thresholds, often generating high false-positive alarm rates and missing short or irregular events. Improving automated detection using routinely acquired clinical signals could enhance identification of clinically meaningful events without additional sensing hardware. We evaluated deep learning-based detection of apnoea-related Cessation Of BrEathing (COBE) events using impedance pneumography (IP), electrocardiography (ECG), and photoplethysmography (PPG) signals from approximately 430 hours of NICU recordings collected from 24 pre-term infants. Three independent reviewers annotated COBE events, producing a dataset of 346 COBE and 608 non-COBE events. We compared a shallow convolutional neural network (CNN), residual networks (ResNets), and a ConvNeXt architecture using an independent held-out test set. Across all architectures, detection performance was influenced more strongly by signal modality than by architectural complexity. Unimodal IP-based models achieved balanced accuracies of 86.8-88.0%, outperforming ECG-derived (62.6-69.7%) and PPG-derived (65.1-66.4%) respiratory surrogates. Multimodal fusion yielded modest improvements over IP alone. The best-performing model, a ConvNeXt architecture combining IP and PPG inputs, achieved 88.7% balanced accuracy and an F1 score of 0.75 on the independent test set. These findings demonstrate that deep learning models applied to routinely monitored NICU signals can reliably detect COBE events and highlight the importance of signal modality in data-constrained neonatal monitoring settings.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that deep learning models applied to routinely monitored NICU signals (IP, ECG, PPG) from 24 pre-term infants can reliably detect COBE events, with a ConvNeXt model fusing IP and PPG achieving 88.7% balanced accuracy and F1=0.75 on an independent held-out test set; performance depends more on signal modality than architecture, and IP-based models outperform ECG/PPG surrogates.
Significance. If the labels prove reliable, the work shows that DL on existing NICU hardware can improve apnoea detection over threshold-based monitors, with the modality-over-architecture finding useful for data-scarce neonatal settings. The held-out test evaluation is a methodological strength.
major comments (2)
- [Abstract] Abstract (annotation description): The paper states that three independent reviewers annotated the 346 COBE and 608 non-COBE events but reports no inter-rater agreement metric (Cohen/Fleiss kappa, percentage agreement, or disagreement analysis). This is load-bearing for the central claim because the 88.7% balanced accuracy and F1=0.75 rest on these labels constituting accurate ground truth; unquantified rater discordance on subtle events would make both training targets and test metrics unreliable.
- [Abstract] Abstract (evaluation description): No details are provided on construction of the independent held-out test set (e.g., patient-wise vs. event-wise split), class-imbalance handling during training, or statistical significance testing of performance differences across models/modalities. These choices directly affect interpretability of the reported metrics and the claim that IP outperforms other modalities.
minor comments (1)
- [Abstract] Abstract: The total recording duration (~430 hours) and number of infants (24) are stated, but the distribution of events across infants or any patient-level statistics are not mentioned, which would aid assessment of generalizability.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback, which correctly identifies gaps in reporting that affect the interpretability of our results. We address each major comment below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [Abstract] Abstract (annotation description): The paper states that three independent reviewers annotated the 346 COBE and 608 non-COBE events but reports no inter-rater agreement metric (Cohen/Fleiss kappa, percentage agreement, or disagreement analysis). This is load-bearing for the central claim because the 88.7% balanced accuracy and F1=0.75 rest on these labels constituting accurate ground truth; unquantified rater discordance on subtle events would make both training targets and test metrics unreliable.
Authors: We agree that an inter-rater agreement metric is necessary to substantiate the reliability of the annotations as ground truth. The three reviewers annotated independently, but this metric was omitted from the original submission. We will compute and report Fleiss' kappa (along with a brief disagreement analysis) in the revised methods and results sections to quantify consistency across raters. revision: yes
-
Referee: [Abstract] Abstract (evaluation description): No details are provided on construction of the independent held-out test set (e.g., patient-wise vs. event-wise split), class-imbalance handling during training, or statistical significance testing of performance differences across models/modalities. These choices directly affect interpretability of the reported metrics and the claim that IP outperforms other modalities.
Authors: We agree that these methodological details are required for full transparency. The held-out test set was constructed via a patient-wise split (to avoid intra-patient leakage); class imbalance was handled with weighted loss functions; and performance differences will be assessed with statistical tests (e.g., bootstrap confidence intervals or McNemar's test). We will add these specifics to the methods and results in the revision. revision: yes
Circularity Check
No circularity: empirical ML evaluation on held-out data
full rationale
The paper reports an empirical machine-learning study that trains and evaluates convolutional architectures on a fixed dataset of NICU signals with reviewer-provided labels, then measures balanced accuracy and F1 on a held-out test partition. No derivation, equation, or first-principles claim is advanced; performance numbers are produced by standard supervised training and are falsifiable against the external test labels. None of the six enumerated circularity patterns apply: there are no self-definitional quantities, fitted inputs renamed as predictions, load-bearing self-citations, imported uniqueness theorems, smuggled ansatzes, or renamed known results. The central result therefore remains independent of the paper's own fitting procedure.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Seppä-Moilanen, M.Sleep and Breathing in Preterm Infants: Polysomnography Studies on the Effects of Caffeine and Supplemental Oxygen. Ph.D. thesis, University of Helsinki (2022)
2022
-
[2]
Zhao, J., Gonzalez, F. & Mu, D. Apnea of prematurity: from cause to treatment.Eur. journal pediatrics170, 1097–1105 (2011)
2011
-
[3]
& Dellimore, K
Bester, M., van den Heever, D., Joshi, R. & Dellimore, K. A study of short cessations and temporal dynamics of breathing in preterm infants. In2018 3rd Biennial South African Biomedical Engineering Conference (SAIBMEC), 1–4 (IEEE, 2018). 4.Kondamudi, N. P., Krata, L. & Wilt, A. S.Infant apnea(StatPearls Publishing LLC, 2023)
2018
-
[4]
Almutairi, H., Hassan, G. M. & Datta, A. Classification of obstructive sleep apnoea from single-lead ecg signals using convolutional neural and long short term memory networks.Biomed. Signal Process. Control.69, 102906 (2021)
2021
-
[5]
J., Abu-Shaweesh, J
Martin, R. J., Abu-Shaweesh, J. M. & Baird, T. M. Apnoea of prematurity.Paediatr. respiratory reviews5, S377–S382 (2004). 7.Mohr, M. A.et al.Very long apnea events in preterm infants.J. Appl. Physiol.118, 558–568 (2015)
2004
-
[6]
& Hartley, C
Williamson, M., Poorun, R. & Hartley, C. Apnoea of prematurity and neurodevelopmental outcomes: current understanding and future prospects for research.Front. Pediatr.9(2021)
2021
-
[7]
D.et al.Temporal association between respiratory events and reflux in patients with obstructive sleep apnea and laryngopharyngeal reflux.J
Xavier, S. D.et al.Temporal association between respiratory events and reflux in patients with obstructive sleep apnea and laryngopharyngeal reflux.J. Clin. Sleep Medicine15, 1397–1402 (2019)
2019
-
[8]
A., Tatlas, N.-A
Korompili, G., Kokkalas, L., Mitilineos, S. A., Tatlas, N.-A. & Potirakis, S. M. Detecting apnea/hypopnea events time location from sound recordings for patients with severe or moderate sleep apnea syndrome.Appl. Sci.11, 6888 (2021)
2021
-
[9]
& Isaiah, A
Bertoni, D. & Isaiah, A. Towards patient-centered diagnosis of pediatric obstructive sleep apnea—a review of biomedical engineering strategies.Expert. review medical devices16, 617–629 (2019)
2019
-
[10]
Sleep2, 1256078 (2023)
Zou, D.et al.A new approach to streamline obstructive sleep apnea therapy access using peripheral arterial tone-based home sleep test devices.Front. Sleep2, 1256078 (2023)
2023
-
[11]
& Heneghan, C
De Chazal, P., Penzel, T. & Heneghan, C. Automated detection of obstructive sleep apnoea at different time scales using the electrocardiogram.Physiol. measurement25, 967 (2004). 27/30
2004
-
[12]
A.et al.Medical devices for pediatric apnea monitoring and therapy: past and new trends.IEEE reviews biomedical engineering10, 199–212 (2017)
Pullano, S. A.et al.Medical devices for pediatric apnea monitoring and therapy: past and new trends.IEEE reviews biomedical engineering10, 199–212 (2017)
2017
-
[13]
Almutairi, H., Hassan, G. M. & Datta, A. Detection of obstructive sleep apnoea by ecg signals using deep learning architectures. In2020 28th European Signal Processing Conference (EUSIPCO), 1382–1386 (IEEE, 2021)
2021
-
[14]
K., Miaskowski, C., Hu, X., Rodway, G
Bawua, L. K., Miaskowski, C., Hu, X., Rodway, G. W. & Pelter, M. M. A review of the literature on the accuracy, strengths, and limitations of visual, thoracic impedance, and electrocardiographic methods used to measure respiratory rate in hospitalized patients.Annals Noninvasive Electrocardiol.26, e12885 (2021)
2021
-
[15]
Poets, C. F. Pulse oximetry vs. transcutaneous monitoring in neonates: practical aspects.www. bloodgas. com, Neonatol. Copenhagen: Radiom. Med. A/S(2003)
2003
-
[16]
& Asl, B
Zarei, A. & Asl, B. M. Automatic classification of apnea and normal subjects using new features extracted from hrv and ecg-derived respiration signals.Biomed. Signal Process. Control.59, 101927 (2020)
2020
-
[17]
Jorge, J.et al.Assessment of signal processing methods for measuring the respiratory rate in the neonatal intensive care unit.IEEE J. Biomed. Heal. Informatics23, 2335–2346 (2019). DOI 10.1109/JBHI.2019.2898273
-
[18]
H.et al.Breathing rate estimation from the electrocardiogram and photoplethysmogram: A review
Charlton, P. H.et al.Breathing rate estimation from the electrocardiogram and photoplethysmogram: A review. IEEE reviews biomedical engineering11, 2–20 (2017)
2017
-
[19]
M.et al.Comparison of techniques for respiratory rate extraction from electrocardiogram and photoplethysmogram.Sensors25, 5136 (2025)
Ponsiglione, A. M.et al.Comparison of techniques for respiratory rate extraction from electrocardiogram and photoplethysmogram.Sensors25, 5136 (2025). 22.Joshi, R.et al.Cardiorespiratory coupling in preterm infants.J. Appl. Physiol.126, 202–213 (2019)
2025
-
[20]
Vagedes, J., Sobh, M., Islam, M. O. A. & Poets, C. F. Averaging times for pulse oximeter measurements–a review of manuscripts published in the top five sleep medicine journals.Nat. Sci. Sleep1131–1139 (2024)
2024
-
[21]
Ostojic, D.et al.Reducing false alarm rates in neonatal intensive care: a new machine learning approach.Oxyg. Transp. to Tissue XLI285–290 (2020)
2020
-
[22]
electrocardiology51, S44–S48 (2018)
Hravnak, M.et al.A call to alarms: current state and future directions in the battle against alarm fatigue.J. electrocardiology51, S44–S48 (2018)
2018
-
[23]
& Tamil, L
Bsoul, M., Minn, H. & Tamil, L. Apnea medassist: real-time sleep apnea monitor using single-lead ecg.IEEE transactions on information technology biomedicine15, 416–427 (2010). 28/30
2010
-
[24]
& Minn, H
Xie, B. & Minn, H. Real-time sleep apnea detection by classifier combination.IEEE Transactions on information technology biomedicine16, 469–477 (2012)
2012
-
[25]
& Xian, X
Song, C., Liu, K., Zhang, X., Chen, L. & Xian, X. An obstructive sleep apnea detection approach using a discriminative hidden markov model from ecg signals.IEEE Transactions on Biomed. Eng.63, 1532–1542 (2015)
2015
-
[26]
García-López, I., Pramono, R. X. A. & Rodriguez-Villegas, E. Artifacts classification and apnea events detection in neck photoplethysmography signals.Med. & Biol. Eng. & Comput.60, 3539–3554 (2022)
2022
-
[27]
& Lee, K.-J
Urtnasan, E., Park, J.-U., Joo, E.-Y . & Lee, K.-J. Automated detection of obstructive sleep apnea events from a single-lead electrocardiogram using a convolutional neural network.J. medical systems42, 1–8 (2018)
2018
-
[28]
Fayyaz, H., D’Souza, N. S. & Beheshti, R. Multimodal sleep apnea detection with missing or noisy modalities. Proc. machine learning research252, https–proceedings (2024)
2024
-
[29]
& Yang, L
Tao, J., Huang, J., Miao, B. & Yang, L. A multimodal dataset for training deep learning models aimed at detecting and analyzing sleep apnea.Sci. Data12, 1263 (2025)
2025
-
[30]
B., Mark, R
Penzel, T., Moody, G. B., Mark, R. G., Goldberger, A. L. & Peter, J. H. The apnea-ecg database. InComputers in Cardiology 2000. V ol. 27 (Cat. 00CH37163), 255–258 (IEEE, 2000)
2000
-
[31]
R.et al.Using physiological signals to predict apnea in preterm infants
Williamson, J. R.et al.Using physiological signals to predict apnea in preterm infants. In2011 Conference Record of the F orty Fifth Asilomar Conference on Signals, Systems and Computers (ASILOMAR), 1098–1102 (IEEE, 2011)
2011
-
[32]
R.et al.Individualized apnea prediction in preterm infants using cardio-respiratory and movement signals
Williamson, J. R.et al.Individualized apnea prediction in preterm infants using cardio-respiratory and movement signals. In2013 IEEE International Conference on Body Sensor Networks, 1–6 (IEEE, 2013)
2013
-
[33]
D.et al.Optimizing neural networks for medical data sets: A case study on neonatal apnea prediction.Artif
Shirwaikar, R. D.et al.Optimizing neural networks for medical data sets: A case study on neonatal apnea prediction.Artif. intelligence medicine98, 59–76 (2019)
2019
-
[34]
Adjei, T.et al.New method to measure interbreath intervals in infants for the assessment of apnoea and respiration.BMJ open respiratory research8, e001042 (2021)
2021
-
[35]
Varisco, G.et al.Detecting central apneas using multichannel signals in premature infants.Physiol. Meas. (2024)
2024
-
[36]
Heal.3, e0000678 (2024)
Vetter, J.et al.Neonatal apnea and hypopnea prediction in infants with robin sequence with neural additive models for time series.PLOS Digit. Heal.3, e0000678 (2024). 29/30
2024
-
[37]
Krupa, A. J. D.et al.Automated hypoxia and apnea identification for neonates via enhanced respiratory signal modeling with deep learning.Sci. Reports15, 40898 (2025)
2025
-
[38]
NPJ digital medicine2, 1–18 (2019)
Villarroel, M.et al.Non-contact physiological monitoring of preterm infants in the neonatal intensive care unit. NPJ digital medicine2, 1–18 (2019)
2019
-
[39]
Meas.37, 564–579 (2016)
Joshi, R.et al.Pattern discovery in critical alarms originating from neonates under intensive care.Physiol. Meas.37, 564–579 (2016)
2016
-
[40]
& Tarassenko, L
Carter, J., Jorge, J., Venugopal, B., Gibson, O. & Tarassenko, L. Deep learning-enabled sleep staging from vital signs and activity measured using a near-infrared video camera. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5940–5949 (2023)
2023
-
[41]
InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 11976–11986 (2022)
Liu, Z.et al.A convnet for the 2020s. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 11976–11986 (2022)
2022
-
[42]
A.et al.A hybrid cardiovascular arrhythmia disease detection using convnext-x models on electrocardiogram signals.Sci
Talukder, M. A.et al.A hybrid cardiovascular arrhythmia disease detection using convnext-x models on electrocardiogram signals.Sci. Reports14, 1–20 (2024)
2024
-
[43]
Zhu, J.et al.An improved convnext with multimodal transformer for physiological signal classification.IEEE Access12, 11217–11229 (2024)
2024
-
[44]
& Tompkins, W
Pan, J. & Tompkins, W. J. A real-time qrs detection algorithm.IEEE transactions on biomedical engineering 230–236 (1985)
1985
-
[45]
Chaichulee, S.Non-contact vital sign monitoring of pre-term infants. Ph.D. thesis, University of Oxford (2018). 30/30
2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.