pith. sign in

arxiv: 1907.05708 · v1 · pith:QROUANBGnew · submitted 2019-07-11 · 📡 eess.AS · cs.LG· cs.SD· eess.SP

Deep auscultation: Predicting respiratory anomalies and diseases via recurrent neural networks

Pith reviewed 2026-05-24 22:51 UTC · model grok-4.3

classification 📡 eess.AS cs.LGcs.SDeess.SP
keywords respiratory auscultationrecurrent neural networksanomaly detectionpathology classificationICBHI datasetdeep learninglung soundsaudio classification
0
0 comments X

The pith

Recurrent neural networks detect respiratory anomalies and diseases from auscultation sounds more accurately than prior methods on a standard benchmark.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a learning framework that pairs established audio feature extraction with recurrent neural networks to classify both abnormal lung sounds and specific respiratory pathologies. It claims to be the first RNN-based system operating at these two levels of analysis. On the ICBHI benchmark the approach exceeds competing methods in both anomaly-driven and pathology-driven tasks. A reader would care because respiratory diseases remain leading causes of illness and early computational detection could support prevention. The work positions the RNN architecture as the key enabler for modeling temporal patterns in the sound data.

Core claim

The authors establish a recurrent-neural-network framework for respiratory auscultation data that, when combined with standard feature extraction, yields higher accuracy than existing methods on the ICBHI dataset for both detecting abnormal sounds and classifying underlying pathologies.

What carries the argument

Recurrent neural network architecture that processes sequential audio features to perform dual-level classification of anomalies and pathologies.

If this is right

  • Higher accuracy on anomaly detection tasks supports earlier identification of abnormal breathing patterns.
  • Improved pathology classification enables more precise disease-level diagnosis from sounds alone.
  • The dual-task capability reduces the need for separate models for anomaly versus disease prediction.
  • Outperformance on the benchmark advances computational support for respiratory auscultation analysis.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Mobile or wearable devices could eventually run similar models for at-home screening if the framework scales to low-power hardware.
  • The same sequential modeling idea might transfer to other time-series medical signals such as heart sounds or cough analysis.
  • Larger and more diverse sound collections would be required to test whether the reported gains persist outside the benchmark conditions.

Load-bearing premise

The ICBHI benchmark dataset captures enough real-world clinical variability that performance will hold for new patients, devices, and disease presentations.

What would settle it

A clear drop in accuracy when the trained model is evaluated on a fresh collection of lung-sound recordings made with different equipment or from patient groups absent from the original training data.

Figures

Figures reproduced from arXiv: 1907.05708 by Andrea Tagarelli, Diego Perna.

Figure 1
Figure 1. Figure 1: Example respiratory cycle waveform of a healthy patient. [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Illustration of our RNN-based framework for the prediction of [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Comparison of RNN models in four-class anomaly-driven predic [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗
read the original abstract

Respiratory diseases are among the most common causes of severe illness and death worldwide. Prevention and early diagnosis are essential to limit or even reverse the trend that characterizes the diffusion of such diseases. In this regard, the development of advanced computational tools for the analysis of respiratory auscultation sounds can become a game changer for detecting disease-related anomalies, or diseases themselves. In this work, we propose a novel learning framework for respiratory auscultation sound data. Our approach combines state-of-the-art feature extraction techniques and advanced deep-neural-network architectures. Remarkably, to the best of our knowledge, we are the first to model a recurrent-neural-network based learning framework to support the clinician in detecting respiratory diseases, at either level of abnormal sounds or pathology classes. Results obtained on the ICBHI benchmark dataset show that our approach outperforms competing methods on both anomaly-driven and pathology-driven prediction tasks, thus advancing the state-of-the-art in respiratory disease analysis.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript proposes a recurrent neural network framework that combines standard feature extraction with deep architectures to classify respiratory auscultation sounds, claiming to be the first RNN-based approach for both anomaly detection and pathology classification and to outperform prior methods on the ICBHI benchmark for both tasks.

Significance. If the empirical superiority is shown to be robust, the work would constitute a modest incremental advance in applying sequence models to medical audio classification.

major comments (1)
  1. [Abstract] Abstract: the central claim that the method 'outperforms competing methods on both anomaly-driven and pathology-driven prediction tasks' is unsupported by any numerical results, baselines, validation protocol, error bars, or ablation data, rendering the primary contribution unverifiable from the supplied text.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their comments. We address the single major comment below regarding the abstract. The full manuscript contains the experimental results, tables, and protocol details referenced in the abstract claim.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that the method 'outperforms competing methods on both anomaly-driven and pathology-driven prediction tasks' is unsupported by any numerical results, baselines, validation protocol, error bars, or ablation data, rendering the primary contribution unverifiable from the supplied text.

    Authors: The full manuscript (Sections 4 and 5) reports the ICBHI results with tables comparing our RNN approach against prior methods for both anomaly detection and pathology classification, using the official ICBHI train/test split as the validation protocol. We acknowledge that the abstract itself contains no numbers. To improve verifiability, we will revise the abstract to include the key performance metrics (e.g., our scores and the best competing scores) along with a brief mention of the evaluation protocol. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper describes an empirical ML framework that combines standard feature extraction with RNN architectures and reports benchmark results on ICBHI. No equations, parameter-fitting procedures, or derivation steps are present in the supplied text that would reduce any claimed prediction to its own inputs by construction. No self-citation load-bearing premises, uniqueness theorems, or ansatz smuggling are referenced. The central claim is an empirical outperformance statement, which is self-contained against external benchmarks and does not exhibit any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no free parameters, axioms, or invented entities are described.

pith-pipeline@v0.9.0 · 5698 in / 962 out tokens · 20315 ms · 2026-05-24T22:51:28.345257+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages

  1. [1]

    The global impact of respiratory disease (second edition),

    “The global impact of respiratory disease (second edition),” Forum of International Respiratory Societies, 2017

  2. [2]

    A. A. Cruz, Global surveillance, prevention and control of chronic res- piratory diseases: a comprehensive approach . WHO, 2007

  3. [3]

    Global and regional trends in copd mortality, 1990–2010,

    P. G. Burney, J. Patel, R. Newson, C. Minelli, and M. Naghavi, “Global and regional trends in copd mortality, 1990–2010,” European Respira- tory J., vol. 45, no. 5, pp. 1239–1247, 2015

  4. [4]

    The global asthma report 2018,

    “The global asthma report 2018,” Global Asthma Network , 2018

  5. [5]

    Pneumonia: the leading killer of children,

    T. Wardlaw, P. Salama, E. W. Johansson, and E. Mason, “Pneumonia: the leading killer of children,” The Lancet, vol. 368, no. 9541, pp. 1048– 1050, 2006

  6. [6]

    World Health Organization, 2016

    World malaria report 2015 . World Health Organization, 2016

  7. [7]

    Global cancer statistics, 2012,

    L. A. Torre, F. Bray, R. L. Siegel, J. Ferlay, J. Lortet-Tieulent, and A. Jemal, “Global cancer statistics, 2012,” Cancer journal for clini- cians, vol. 65, no. 2, pp. 87–108, 2015

  8. [8]

    Enhancement of speech corrupted by acoustic noise,

    M. Berouti, R. Schwartz, and J. Makhoul, “Enhancement of speech corrupted by acoustic noise,” in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, vol. 4, pp. 208–211, 1979

  9. [9]

    An automated lung sound preprocessing and classification system based on spectral analysis meth- ods,

    G. Serbes, S. Ulukaya, and Y. P. Kahya, “An automated lung sound preprocessing and classification system based on spectral analysis meth- ods,” in Precision Medicine Powered by pHealth and Connected Health, pp. 45–49, Springer, 2018

  10. [10]

    Noise masking recurrent neural network for respiratory sound classifi- cation,

    K. Kochetov, E. Putin, M. Balashov, A. Filchenkov, and A. Shalyto, “Noise masking recurrent neural network for respiratory sound classifi- cation,” in Proc. Int. Conf. on Artificial Neural Networks , pp. 208–217, 2018

  11. [11]

    A respiratory sound database for the development of auto- mated classification,

    B. Rocha, D. Filos, L. Mendes, I. Vogiatzis, E. Perantoni, E. Kaimakamis, P. Natsiavas, A. Oliveira, C. J´ acome, A. Marques, et al. , “A respiratory sound database for the development of auto- mated classification,” in Precision Medicine Powered by pHealth and Connected Health, pp. 33–37, Springer, 2018. 14

  12. [12]

    Towards the standardisation of lung sound nomen- clature,

    H. Pasterkamp, P. L. Brand, M. Everard, L. Garcia-Marcos, H. Melbye, and K. N. Priftis, “Towards the standardisation of lung sound nomen- clature,” European Respiratory Journal , vol. 47, no. 3, pp. 724–732, 2016

  13. [13]

    Auscultation of the respiratory system,

    M. Sarkar, I. Madabhavi, N. Niranjan, and M. Dogra, “Auscultation of the respiratory system,” Annals of thoracic medicine , vol. 10, no. 3, p. 158, 2015

  14. [14]

    Hidden markov model based respiratory sound classification,

    N. Jakovljevi´ c and T. Lonˇ car-Turukalo, “Hidden markov model based respiratory sound classification,” in Precision Medicine Powered by pHealth and Connected Health, pp. 39–43, Springer, 2018

  15. [15]

    Wavelet transform with tunable q-factor,

    I. W. Selesnick, “Wavelet transform with tunable q-factor,” IEEE Trans. Signal Proces., vol. 59, no. 8, pp. 3560–3575, 2011

  16. [16]

    Automatic de- tection of patient with respiratory diseases using lung sound analysis,

    G. Chambres, P. Hanna, and M. Desainte-Catherine, “Automatic de- tection of patient with respiratory diseases using lung sound analysis,” in Proc. Int. Conf. on Content-Based Multimedia Indexing , pp. 1–6, 2018

  17. [17]

    Essentia: An audio analysis library for music information retrieval,

    D. Bogdanov, N. Wack, E. G´ omez, S. Gulati, P. Herrera, O. Mayor, G. Roma, J. Salamon, J. R. Zapata, and X. Serra, “Essentia: An audio analysis library for music information retrieval,” in Proc. Int. Soc. for Music Information Retrieval Conf. , pp. 493–498, 2013

  18. [18]

    Convolutional neural networks learning from respiratory data,

    D. Perna, “Convolutional neural networks learning from respiratory data,” in Proc. IEEE Int. Conf. on Bioinformatics and Biomedicine , pp. 2109–2113, 2018

  19. [19]

    I. J. Goodfellow, Y. Bengio, and A. C. Courville, Deep Learning. MIT Press, 2016

  20. [20]

    On the difficulty of training recurrent neural networks,

    R. Pascanu, T. Mikolov, and Y. Bengio, “On the difficulty of training recurrent neural networks,” in Proc. Int. Conf. on Machine Learning , pp. 1310–1318, 2013

  21. [21]

    A theoretically grounded application of dropout in recurrent neural networks,

    Y. Gal and Z. Ghahramani, “A theoretically grounded application of dropout in recurrent neural networks,” in Proc. Int. Conf. on Neural Information Processing Systems, pp. 1019–1027, 2016

  22. [22]

    Batch normalized recurrent neural networks,

    C. Laurent, G. Pereyra, P. Brakel, Y. Zhang, and Y. Bengio, “Batch normalized recurrent neural networks,” in Procs IEEE Int. Conf. on Acoustics, Speech and Signal Processing, pp. 2657–2661, 2016

  23. [23]

    A method for stochastic optimization,

    D. Kinga and J. B. Adam, “A method for stochastic optimization,” in Proc. Int. Conf. on Learning Representations , vol. 5, 2015. 15

  24. [24]

    Effect of mfcc normal- ization on vector quantization based speaker identification,

    M. H. Shirali-Shahreza and S. Shirali-Shahreza, “Effect of mfcc normal- ization on vector quantization based speaker identification,” in Proc. IEEE Int. Conf. on Signal Processing and Information Technology , pp. 250–253, 2010

  25. [25]

    The htk book,

    S. Young, G. Evermann, M. Gales, T. Hain, D. Kershaw, X. Liu, G. Moore, J. Odell, D. Ollason, D. Povey, et al. , “The htk book,” Cambridge university engineering department , vol. 3, p. 175, 2002

  26. [26]

    Montavon, G

    G. Montavon, G. B. Orr, and K. M¨ uller, eds.,Neural Networks: Tricks of the Trade - Second Edition , vol. 7700. Springer, 2012. 16