pith. sign in

arxiv: 2606.23213 · v1 · pith:KVMZBEZBnew · submitted 2026-06-22 · 💻 cs.LG

Deep learning-based detection of cessation of breathing in pre-term infants

Pith reviewed 2026-06-26 09:17 UTC · model grok-4.3

classification 💻 cs.LG
keywords deep learningapnoea detectionpre-term infantsNICU monitoringimpedance pneumographyphotoplethysmographycessation of breathing
0
0 comments X

The pith

Deep learning models detect cessation of breathing events in pre-term infants from routine NICU signals at up to 88.7 percent balanced accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether convolutional neural networks can identify apnoea-related cessation of breathing events more reliably than current threshold-based monitors by using only the signals already collected in neonatal intensive care. It trains and evaluates several architectures on roughly 430 hours of recordings from 24 infants, with events labelled by three independent reviewers. Performance depends more on which signals are used than on how complex the network is, with impedance pneumography providing the strongest single input and a multimodal ConvNeXt model reaching the highest scores on a held-out test set.

Core claim

A ConvNeXt architecture that combines impedance pneumography and photoplethysmography inputs achieves 88.7 percent balanced accuracy and an F1 score of 0.75 for detecting labelled COBE events on an independent test set, outperforming ECG- or PPG-only models and showing only modest gains from further multimodal fusion.

What carries the argument

ConvNeXt convolutional network trained on short segments of impedance pneumography and photoplethysmography waveforms, with signal modality as the dominant performance driver.

If this is right

  • Impedance pneumography alone yields balanced accuracies of 86.8 to 88.0 percent, exceeding ECG-derived and PPG-derived respiratory surrogates.
  • Adding PPG to IP inputs produces a small further lift while ECG adds little value.
  • Architectural differences matter less than signal choice in this data-limited setting.
  • The approach uses only existing bedside monitors and therefore requires no new hardware.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the accuracy holds in prospective use, bedside alarms could be filtered to reduce the high false-positive rates that currently desensitise staff.
  • The same models could be retrained on larger multi-centre datasets to test generalisation across different NICU equipment and infant populations.
  • Because the method relies on short waveform segments rather than long time-series, it may be deployable on existing low-power bedside hardware.

Load-bearing premise

Annotations produced by three independent reviewers form accurate ground truth for clinically meaningful cessation-of-breathing events.

What would settle it

A replication study in which the same signals are re-annotated by a new panel of reviewers and the resulting labels produce substantially lower model performance on the original test set.

Figures

Figures reproduced from arXiv: 2606.23213 by Dineo Serame, Lionel Tarassenko, Mauricio Villarroel.

Figure 1
Figure 1. Figure 1: Architecture of the CNN model modified for automated COBE episode detection. The model processes 1D physiological signals (IP, EDR, or PPG envelope) through a series of convolutional layers with ReLU activations, average pooling, and dropout, followed by fully connected layers for binary classification. The same architecture is applied to individual signals and extended to multimodal configurations using f… view at source ↗
Figure 2
Figure 2. Figure 2: The architecture of the ResNet model designed for automated COBE episode detection. The network consists of four sequential stages of residual blocks with multiple convolutional layers depending on the variant (ResNet-18, -34, or -50). The model is applied to individual input signals (IP, EDR, and PPG envelope) and extended to multimodal configurations using fusion strategies. 2.4.3 ConvNeXt architecture C… view at source ↗
Figure 3
Figure 3. Figure 3: Architecture of the modified 1D ConvNeXt model for automated COBE episode detection. The network processes 1D physiological signals (IP, EDR, and PPG envelope) through an initial 1D convolution and LayerNorm (commonly referred to as the “stem”), followed by four stages of ConvNeXt blocks with intermediate downsampling. The final stage output is aggregated via global average pooling, normalised with LayerNo… view at source ↗
Figure 4
Figure 4. Figure 4: MATLAB annotation interface used for expert review of candidate COBE episodes. The GUI displays multiple synchronised physiological signals, including ECG, PPG, IP, RR, HR, and SpO2, alongside annotation controls and keyboard shortcuts. The command window shows an example segment already annotated as “NO”. 15/30 [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Decision tree used for manual annotation of candidate COBE episodes. Reviewers assessed 5-minute windows of clinical vital signs surrounding each trigger event. B Supplemetary methods: Respiratory signal extraction B.1 Impedance pneumography (IP) The raw IP signal was filtered using an 8th-order high-pass Butterworth IIR filter (0.08 Hz) and a 6th-order low-pass Butterworth IIR filter (2.75 Hz) to remove b… view at source ↗
Figure 6
Figure 6. Figure 6: IP waveform extracted from a 60-second IP segment acquired from a 28-week-old infant. a) Raw IP; b) Filtered IP waveform. 16/30 [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: EDR waveform extracted from a 60-second ECG segment acquired from a 30.3-week-old infant. a) Raw ECG; b) Filtered ECG waveform with detected R-peaks; c) EDR signal from successive R-peak amplitudes; d) IP signal during the same period. B.3 PPG-derived respiratory envelope Respiratory modulation in the PPG signal arises from changes in venous return and peripheral blood volume during breathing. The PPG sign… view at source ↗
Figure 8
Figure 8. Figure 8: PPG envelope waveform extracted from a 30-second PPG segment acquired from a 31.9-week-old infant. a) Raw PPG; b) Filtered PPG waveform with detected peaks and troughs; c) Envelope derived from peaks; d) Peak envelope representing respiratory variations; e) IP signal during the same period. C Supplemetary results: COBE detection using CNNs This appendix reports cross-validation and test results for CNN exp… view at source ↗
read the original abstract

Apnoea of prematurity is characterised by recurrent episodes of cessation of breathing and remains difficult to detect reliably using routinely monitored physiological signals in the Neonatal Intensive Care Unit (NICU). Existing bedside monitors rely primarily on respiratory rate and oxygen saturation thresholds, often generating high false-positive alarm rates and missing short or irregular events. Improving automated detection using routinely acquired clinical signals could enhance identification of clinically meaningful events without additional sensing hardware. We evaluated deep learning-based detection of apnoea-related Cessation Of BrEathing (COBE) events using impedance pneumography (IP), electrocardiography (ECG), and photoplethysmography (PPG) signals from approximately 430 hours of NICU recordings collected from 24 pre-term infants. Three independent reviewers annotated COBE events, producing a dataset of 346 COBE and 608 non-COBE events. We compared a shallow convolutional neural network (CNN), residual networks (ResNets), and a ConvNeXt architecture using an independent held-out test set. Across all architectures, detection performance was influenced more strongly by signal modality than by architectural complexity. Unimodal IP-based models achieved balanced accuracies of 86.8-88.0%, outperforming ECG-derived (62.6-69.7%) and PPG-derived (65.1-66.4%) respiratory surrogates. Multimodal fusion yielded modest improvements over IP alone. The best-performing model, a ConvNeXt architecture combining IP and PPG inputs, achieved 88.7% balanced accuracy and an F1 score of 0.75 on the independent test set. These findings demonstrate that deep learning models applied to routinely monitored NICU signals can reliably detect COBE events and highlight the importance of signal modality in data-constrained neonatal monitoring settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims that deep learning models applied to routinely monitored NICU signals (IP, ECG, PPG) from 24 pre-term infants can reliably detect COBE events, with a ConvNeXt model fusing IP and PPG achieving 88.7% balanced accuracy and F1=0.75 on an independent held-out test set; performance depends more on signal modality than architecture, and IP-based models outperform ECG/PPG surrogates.

Significance. If the labels prove reliable, the work shows that DL on existing NICU hardware can improve apnoea detection over threshold-based monitors, with the modality-over-architecture finding useful for data-scarce neonatal settings. The held-out test evaluation is a methodological strength.

major comments (2)
  1. [Abstract] Abstract (annotation description): The paper states that three independent reviewers annotated the 346 COBE and 608 non-COBE events but reports no inter-rater agreement metric (Cohen/Fleiss kappa, percentage agreement, or disagreement analysis). This is load-bearing for the central claim because the 88.7% balanced accuracy and F1=0.75 rest on these labels constituting accurate ground truth; unquantified rater discordance on subtle events would make both training targets and test metrics unreliable.
  2. [Abstract] Abstract (evaluation description): No details are provided on construction of the independent held-out test set (e.g., patient-wise vs. event-wise split), class-imbalance handling during training, or statistical significance testing of performance differences across models/modalities. These choices directly affect interpretability of the reported metrics and the claim that IP outperforms other modalities.
minor comments (1)
  1. [Abstract] Abstract: The total recording duration (~430 hours) and number of infants (24) are stated, but the distribution of events across infants or any patient-level statistics are not mentioned, which would aid assessment of generalizability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback, which correctly identifies gaps in reporting that affect the interpretability of our results. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract (annotation description): The paper states that three independent reviewers annotated the 346 COBE and 608 non-COBE events but reports no inter-rater agreement metric (Cohen/Fleiss kappa, percentage agreement, or disagreement analysis). This is load-bearing for the central claim because the 88.7% balanced accuracy and F1=0.75 rest on these labels constituting accurate ground truth; unquantified rater discordance on subtle events would make both training targets and test metrics unreliable.

    Authors: We agree that an inter-rater agreement metric is necessary to substantiate the reliability of the annotations as ground truth. The three reviewers annotated independently, but this metric was omitted from the original submission. We will compute and report Fleiss' kappa (along with a brief disagreement analysis) in the revised methods and results sections to quantify consistency across raters. revision: yes

  2. Referee: [Abstract] Abstract (evaluation description): No details are provided on construction of the independent held-out test set (e.g., patient-wise vs. event-wise split), class-imbalance handling during training, or statistical significance testing of performance differences across models/modalities. These choices directly affect interpretability of the reported metrics and the claim that IP outperforms other modalities.

    Authors: We agree that these methodological details are required for full transparency. The held-out test set was constructed via a patient-wise split (to avoid intra-patient leakage); class imbalance was handled with weighted loss functions; and performance differences will be assessed with statistical tests (e.g., bootstrap confidence intervals or McNemar's test). We will add these specifics to the methods and results in the revision. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical ML evaluation on held-out data

full rationale

The paper reports an empirical machine-learning study that trains and evaluates convolutional architectures on a fixed dataset of NICU signals with reviewer-provided labels, then measures balanced accuracy and F1 on a held-out test partition. No derivation, equation, or first-principles claim is advanced; performance numbers are produced by standard supervised training and are falsifiable against the external test labels. None of the six enumerated circularity patterns apply: there are no self-definitional quantities, fitted inputs renamed as predictions, load-bearing self-citations, imported uniqueness theorems, smuggled ansatzes, or renamed known results. The central result therefore remains independent of the paper's own fitting procedure.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The work is an empirical machine-learning evaluation; no mathematical axioms, invented physical entities, or explicit free parameters beyond standard neural-network training choices are stated in the abstract.

pith-pipeline@v0.9.1-grok · 5862 in / 1058 out tokens · 32486 ms · 2026-06-26T09:17:02.842353+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

45 extracted references · 1 canonical work pages

  1. [1]

    Seppä-Moilanen, M.Sleep and Breathing in Preterm Infants: Polysomnography Studies on the Effects of Caffeine and Supplemental Oxygen. Ph.D. thesis, University of Helsinki (2022)

  2. [2]

    Zhao, J., Gonzalez, F. & Mu, D. Apnea of prematurity: from cause to treatment.Eur. journal pediatrics170, 1097–1105 (2011)

  3. [3]

    & Dellimore, K

    Bester, M., van den Heever, D., Joshi, R. & Dellimore, K. A study of short cessations and temporal dynamics of breathing in preterm infants. In2018 3rd Biennial South African Biomedical Engineering Conference (SAIBMEC), 1–4 (IEEE, 2018). 4.Kondamudi, N. P., Krata, L. & Wilt, A. S.Infant apnea(StatPearls Publishing LLC, 2023)

  4. [4]

    Almutairi, H., Hassan, G. M. & Datta, A. Classification of obstructive sleep apnoea from single-lead ecg signals using convolutional neural and long short term memory networks.Biomed. Signal Process. Control.69, 102906 (2021)

  5. [5]

    J., Abu-Shaweesh, J

    Martin, R. J., Abu-Shaweesh, J. M. & Baird, T. M. Apnoea of prematurity.Paediatr. respiratory reviews5, S377–S382 (2004). 7.Mohr, M. A.et al.Very long apnea events in preterm infants.J. Appl. Physiol.118, 558–568 (2015)

  6. [6]

    & Hartley, C

    Williamson, M., Poorun, R. & Hartley, C. Apnoea of prematurity and neurodevelopmental outcomes: current understanding and future prospects for research.Front. Pediatr.9(2021)

  7. [7]

    D.et al.Temporal association between respiratory events and reflux in patients with obstructive sleep apnea and laryngopharyngeal reflux.J

    Xavier, S. D.et al.Temporal association between respiratory events and reflux in patients with obstructive sleep apnea and laryngopharyngeal reflux.J. Clin. Sleep Medicine15, 1397–1402 (2019)

  8. [8]

    A., Tatlas, N.-A

    Korompili, G., Kokkalas, L., Mitilineos, S. A., Tatlas, N.-A. & Potirakis, S. M. Detecting apnea/hypopnea events time location from sound recordings for patients with severe or moderate sleep apnea syndrome.Appl. Sci.11, 6888 (2021)

  9. [9]

    & Isaiah, A

    Bertoni, D. & Isaiah, A. Towards patient-centered diagnosis of pediatric obstructive sleep apnea—a review of biomedical engineering strategies.Expert. review medical devices16, 617–629 (2019)

  10. [10]

    Sleep2, 1256078 (2023)

    Zou, D.et al.A new approach to streamline obstructive sleep apnea therapy access using peripheral arterial tone-based home sleep test devices.Front. Sleep2, 1256078 (2023)

  11. [11]

    & Heneghan, C

    De Chazal, P., Penzel, T. & Heneghan, C. Automated detection of obstructive sleep apnoea at different time scales using the electrocardiogram.Physiol. measurement25, 967 (2004). 27/30

  12. [12]

    A.et al.Medical devices for pediatric apnea monitoring and therapy: past and new trends.IEEE reviews biomedical engineering10, 199–212 (2017)

    Pullano, S. A.et al.Medical devices for pediatric apnea monitoring and therapy: past and new trends.IEEE reviews biomedical engineering10, 199–212 (2017)

  13. [13]

    Almutairi, H., Hassan, G. M. & Datta, A. Detection of obstructive sleep apnoea by ecg signals using deep learning architectures. In2020 28th European Signal Processing Conference (EUSIPCO), 1382–1386 (IEEE, 2021)

  14. [14]

    K., Miaskowski, C., Hu, X., Rodway, G

    Bawua, L. K., Miaskowski, C., Hu, X., Rodway, G. W. & Pelter, M. M. A review of the literature on the accuracy, strengths, and limitations of visual, thoracic impedance, and electrocardiographic methods used to measure respiratory rate in hospitalized patients.Annals Noninvasive Electrocardiol.26, e12885 (2021)

  15. [15]

    Poets, C. F. Pulse oximetry vs. transcutaneous monitoring in neonates: practical aspects.www. bloodgas. com, Neonatol. Copenhagen: Radiom. Med. A/S(2003)

  16. [16]

    & Asl, B

    Zarei, A. & Asl, B. M. Automatic classification of apnea and normal subjects using new features extracted from hrv and ecg-derived respiration signals.Biomed. Signal Process. Control.59, 101927 (2020)

  17. [17]

    Jorge, J.et al.Assessment of signal processing methods for measuring the respiratory rate in the neonatal intensive care unit.IEEE J. Biomed. Heal. Informatics23, 2335–2346 (2019). DOI 10.1109/JBHI.2019.2898273

  18. [18]

    H.et al.Breathing rate estimation from the electrocardiogram and photoplethysmogram: A review

    Charlton, P. H.et al.Breathing rate estimation from the electrocardiogram and photoplethysmogram: A review. IEEE reviews biomedical engineering11, 2–20 (2017)

  19. [19]

    M.et al.Comparison of techniques for respiratory rate extraction from electrocardiogram and photoplethysmogram.Sensors25, 5136 (2025)

    Ponsiglione, A. M.et al.Comparison of techniques for respiratory rate extraction from electrocardiogram and photoplethysmogram.Sensors25, 5136 (2025). 22.Joshi, R.et al.Cardiorespiratory coupling in preterm infants.J. Appl. Physiol.126, 202–213 (2019)

  20. [20]

    Vagedes, J., Sobh, M., Islam, M. O. A. & Poets, C. F. Averaging times for pulse oximeter measurements–a review of manuscripts published in the top five sleep medicine journals.Nat. Sci. Sleep1131–1139 (2024)

  21. [21]

    Ostojic, D.et al.Reducing false alarm rates in neonatal intensive care: a new machine learning approach.Oxyg. Transp. to Tissue XLI285–290 (2020)

  22. [22]

    electrocardiology51, S44–S48 (2018)

    Hravnak, M.et al.A call to alarms: current state and future directions in the battle against alarm fatigue.J. electrocardiology51, S44–S48 (2018)

  23. [23]

    & Tamil, L

    Bsoul, M., Minn, H. & Tamil, L. Apnea medassist: real-time sleep apnea monitor using single-lead ecg.IEEE transactions on information technology biomedicine15, 416–427 (2010). 28/30

  24. [24]

    & Minn, H

    Xie, B. & Minn, H. Real-time sleep apnea detection by classifier combination.IEEE Transactions on information technology biomedicine16, 469–477 (2012)

  25. [25]

    & Xian, X

    Song, C., Liu, K., Zhang, X., Chen, L. & Xian, X. An obstructive sleep apnea detection approach using a discriminative hidden markov model from ecg signals.IEEE Transactions on Biomed. Eng.63, 1532–1542 (2015)

  26. [26]

    García-López, I., Pramono, R. X. A. & Rodriguez-Villegas, E. Artifacts classification and apnea events detection in neck photoplethysmography signals.Med. & Biol. Eng. & Comput.60, 3539–3554 (2022)

  27. [27]

    & Lee, K.-J

    Urtnasan, E., Park, J.-U., Joo, E.-Y . & Lee, K.-J. Automated detection of obstructive sleep apnea events from a single-lead electrocardiogram using a convolutional neural network.J. medical systems42, 1–8 (2018)

  28. [28]

    Fayyaz, H., D’Souza, N. S. & Beheshti, R. Multimodal sleep apnea detection with missing or noisy modalities. Proc. machine learning research252, https–proceedings (2024)

  29. [29]

    & Yang, L

    Tao, J., Huang, J., Miao, B. & Yang, L. A multimodal dataset for training deep learning models aimed at detecting and analyzing sleep apnea.Sci. Data12, 1263 (2025)

  30. [30]

    B., Mark, R

    Penzel, T., Moody, G. B., Mark, R. G., Goldberger, A. L. & Peter, J. H. The apnea-ecg database. InComputers in Cardiology 2000. V ol. 27 (Cat. 00CH37163), 255–258 (IEEE, 2000)

  31. [31]

    R.et al.Using physiological signals to predict apnea in preterm infants

    Williamson, J. R.et al.Using physiological signals to predict apnea in preterm infants. In2011 Conference Record of the F orty Fifth Asilomar Conference on Signals, Systems and Computers (ASILOMAR), 1098–1102 (IEEE, 2011)

  32. [32]

    R.et al.Individualized apnea prediction in preterm infants using cardio-respiratory and movement signals

    Williamson, J. R.et al.Individualized apnea prediction in preterm infants using cardio-respiratory and movement signals. In2013 IEEE International Conference on Body Sensor Networks, 1–6 (IEEE, 2013)

  33. [33]

    D.et al.Optimizing neural networks for medical data sets: A case study on neonatal apnea prediction.Artif

    Shirwaikar, R. D.et al.Optimizing neural networks for medical data sets: A case study on neonatal apnea prediction.Artif. intelligence medicine98, 59–76 (2019)

  34. [34]

    Adjei, T.et al.New method to measure interbreath intervals in infants for the assessment of apnoea and respiration.BMJ open respiratory research8, e001042 (2021)

  35. [35]

    Varisco, G.et al.Detecting central apneas using multichannel signals in premature infants.Physiol. Meas. (2024)

  36. [36]

    Heal.3, e0000678 (2024)

    Vetter, J.et al.Neonatal apnea and hypopnea prediction in infants with robin sequence with neural additive models for time series.PLOS Digit. Heal.3, e0000678 (2024). 29/30

  37. [37]

    Krupa, A. J. D.et al.Automated hypoxia and apnea identification for neonates via enhanced respiratory signal modeling with deep learning.Sci. Reports15, 40898 (2025)

  38. [38]

    NPJ digital medicine2, 1–18 (2019)

    Villarroel, M.et al.Non-contact physiological monitoring of preterm infants in the neonatal intensive care unit. NPJ digital medicine2, 1–18 (2019)

  39. [39]

    Meas.37, 564–579 (2016)

    Joshi, R.et al.Pattern discovery in critical alarms originating from neonates under intensive care.Physiol. Meas.37, 564–579 (2016)

  40. [40]

    & Tarassenko, L

    Carter, J., Jorge, J., Venugopal, B., Gibson, O. & Tarassenko, L. Deep learning-enabled sleep staging from vital signs and activity measured using a near-infrared video camera. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5940–5949 (2023)

  41. [41]

    InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 11976–11986 (2022)

    Liu, Z.et al.A convnet for the 2020s. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 11976–11986 (2022)

  42. [42]

    A.et al.A hybrid cardiovascular arrhythmia disease detection using convnext-x models on electrocardiogram signals.Sci

    Talukder, M. A.et al.A hybrid cardiovascular arrhythmia disease detection using convnext-x models on electrocardiogram signals.Sci. Reports14, 1–20 (2024)

  43. [43]

    Zhu, J.et al.An improved convnext with multimodal transformer for physiological signal classification.IEEE Access12, 11217–11229 (2024)

  44. [44]

    & Tompkins, W

    Pan, J. & Tompkins, W. J. A real-time qrs detection algorithm.IEEE transactions on biomedical engineering 230–236 (1985)

  45. [45]

    Chaichulee, S.Non-contact vital sign monitoring of pre-term infants. Ph.D. thesis, University of Oxford (2018). 30/30