Deep auscultation: Predicting respiratory anomalies and diseases via recurrent neural networks

Andrea Tagarelli; Diego Perna

arxiv: 1907.05708 · v1 · pith:QROUANBGnew · submitted 2019-07-11 · 📡 eess.AS · cs.LG· cs.SD· eess.SP

Deep auscultation: Predicting respiratory anomalies and diseases via recurrent neural networks

Diego Perna , Andrea Tagarelli This is my paper

Pith reviewed 2026-05-24 22:51 UTC · model grok-4.3

classification 📡 eess.AS cs.LGcs.SDeess.SP

keywords respiratory auscultationrecurrent neural networksanomaly detectionpathology classificationICBHI datasetdeep learninglung soundsaudio classification

0 comments

The pith

Recurrent neural networks detect respiratory anomalies and diseases from auscultation sounds more accurately than prior methods on a standard benchmark.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a learning framework that pairs established audio feature extraction with recurrent neural networks to classify both abnormal lung sounds and specific respiratory pathologies. It claims to be the first RNN-based system operating at these two levels of analysis. On the ICBHI benchmark the approach exceeds competing methods in both anomaly-driven and pathology-driven tasks. A reader would care because respiratory diseases remain leading causes of illness and early computational detection could support prevention. The work positions the RNN architecture as the key enabler for modeling temporal patterns in the sound data.

Core claim

The authors establish a recurrent-neural-network framework for respiratory auscultation data that, when combined with standard feature extraction, yields higher accuracy than existing methods on the ICBHI dataset for both detecting abnormal sounds and classifying underlying pathologies.

What carries the argument

Recurrent neural network architecture that processes sequential audio features to perform dual-level classification of anomalies and pathologies.

If this is right

Higher accuracy on anomaly detection tasks supports earlier identification of abnormal breathing patterns.
Improved pathology classification enables more precise disease-level diagnosis from sounds alone.
The dual-task capability reduces the need for separate models for anomaly versus disease prediction.
Outperformance on the benchmark advances computational support for respiratory auscultation analysis.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Mobile or wearable devices could eventually run similar models for at-home screening if the framework scales to low-power hardware.
The same sequential modeling idea might transfer to other time-series medical signals such as heart sounds or cough analysis.
Larger and more diverse sound collections would be required to test whether the reported gains persist outside the benchmark conditions.

Load-bearing premise

The ICBHI benchmark dataset captures enough real-world clinical variability that performance will hold for new patients, devices, and disease presentations.

What would settle it

A clear drop in accuracy when the trained model is evaluated on a fresh collection of lung-sound recordings made with different equipment or from patient groups absent from the original training data.

Figures

Figures reproduced from arXiv: 1907.05708 by Andrea Tagarelli, Diego Perna.

**Figure 2.** Figure 2: Illustration of our RNN-based framework for the prediction of [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Comparison of RNN models in four-class anomaly-driven predic [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗

read the original abstract

Respiratory diseases are among the most common causes of severe illness and death worldwide. Prevention and early diagnosis are essential to limit or even reverse the trend that characterizes the diffusion of such diseases. In this regard, the development of advanced computational tools for the analysis of respiratory auscultation sounds can become a game changer for detecting disease-related anomalies, or diseases themselves. In this work, we propose a novel learning framework for respiratory auscultation sound data. Our approach combines state-of-the-art feature extraction techniques and advanced deep-neural-network architectures. Remarkably, to the best of our knowledge, we are the first to model a recurrent-neural-network based learning framework to support the clinician in detecting respiratory diseases, at either level of abnormal sounds or pathology classes. Results obtained on the ICBHI benchmark dataset show that our approach outperforms competing methods on both anomaly-driven and pathology-driven prediction tasks, thus advancing the state-of-the-art in respiratory disease analysis.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper claims to be the first RNN model for respiratory sound anomaly and pathology detection on ICBHI and to beat prior methods, but the abstract supplies no numbers, protocol, or comparisons to check any of it.

read the letter

The main point is that the authors present an RNN-based pipeline for classifying respiratory auscultation sounds, stating they are the first to do so at the level of abnormal sounds or full pathology classes, and that it outperforms other approaches on the ICBHI benchmark for both tasks. They combine standard audio feature extraction with recurrent layers, which is a direct way to handle the time-series aspect of the recordings. The clinical motivation around early detection of respiratory disease is clearly stated and relevant. That is the extent of what can be taken from the text as written. The experimental side is missing entirely: no accuracy values, no cross-validation scheme, no list of baselines, no error analysis, and no mention of how they handled the dataset's known patient and recording variability. Without those pieces the outperformance claim cannot be evaluated, so it is not possible to say whether the RNN component actually drives any improvement or whether the result would hold on new data. The work is aimed at people working on audio machine learning for medical signals who might want an early example of recurrent nets in this domain. A reader could extract the high-level architecture idea, but anyone looking for usable results or a solid baseline would have to redo the experiments themselves. I would not cite it because there are no concrete findings to build on. It should still go to peer review if the full manuscript contains proper tables, ablations, and validation details, since the application area has practical weight even if this version is preliminary.

Referee Report

1 major / 0 minor

Summary. The manuscript proposes a recurrent neural network framework that combines standard feature extraction with deep architectures to classify respiratory auscultation sounds, claiming to be the first RNN-based approach for both anomaly detection and pathology classification and to outperform prior methods on the ICBHI benchmark for both tasks.

Significance. If the empirical superiority is shown to be robust, the work would constitute a modest incremental advance in applying sequence models to medical audio classification.

major comments (1)

[Abstract] Abstract: the central claim that the method 'outperforms competing methods on both anomaly-driven and pathology-driven prediction tasks' is unsupported by any numerical results, baselines, validation protocol, error bars, or ablation data, rendering the primary contribution unverifiable from the supplied text.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their comments. We address the single major comment below regarding the abstract. The full manuscript contains the experimental results, tables, and protocol details referenced in the abstract claim.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the method 'outperforms competing methods on both anomaly-driven and pathology-driven prediction tasks' is unsupported by any numerical results, baselines, validation protocol, error bars, or ablation data, rendering the primary contribution unverifiable from the supplied text.

Authors: The full manuscript (Sections 4 and 5) reports the ICBHI results with tables comparing our RNN approach against prior methods for both anomaly detection and pathology classification, using the official ICBHI train/test split as the validation protocol. We acknowledge that the abstract itself contains no numbers. To improve verifiability, we will revise the abstract to include the key performance metrics (e.g., our scores and the best competing scores) along with a brief mention of the evaluation protocol. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper describes an empirical ML framework that combines standard feature extraction with RNN architectures and reports benchmark results on ICBHI. No equations, parameter-fitting procedures, or derivation steps are present in the supplied text that would reduce any claimed prediction to its own inputs by construction. No self-citation load-bearing premises, uniqueness theorems, or ansatz smuggling are referenced. The central claim is an empirical outperformance statement, which is self-contained against external benchmarks and does not exhibit any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no free parameters, axioms, or invented entities are described.

pith-pipeline@v0.9.0 · 5698 in / 962 out tokens · 20315 ms · 2026-05-24T22:51:28.345257+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages

[1]

The global impact of respiratory disease (second edition),

“The global impact of respiratory disease (second edition),” Forum of International Respiratory Societies, 2017

work page 2017
[2]

A. A. Cruz, Global surveillance, prevention and control of chronic res- piratory diseases: a comprehensive approach . WHO, 2007

work page 2007
[3]

Global and regional trends in copd mortality, 1990–2010,

P. G. Burney, J. Patel, R. Newson, C. Minelli, and M. Naghavi, “Global and regional trends in copd mortality, 1990–2010,” European Respira- tory J., vol. 45, no. 5, pp. 1239–1247, 2015

work page 1990
[4]

The global asthma report 2018,

“The global asthma report 2018,” Global Asthma Network , 2018

work page 2018
[5]

Pneumonia: the leading killer of children,

T. Wardlaw, P. Salama, E. W. Johansson, and E. Mason, “Pneumonia: the leading killer of children,” The Lancet, vol. 368, no. 9541, pp. 1048– 1050, 2006

work page 2006
[6]

World Health Organization, 2016

World malaria report 2015 . World Health Organization, 2016

work page 2015
[7]

Global cancer statistics, 2012,

L. A. Torre, F. Bray, R. L. Siegel, J. Ferlay, J. Lortet-Tieulent, and A. Jemal, “Global cancer statistics, 2012,” Cancer journal for clini- cians, vol. 65, no. 2, pp. 87–108, 2015

work page 2012
[8]

Enhancement of speech corrupted by acoustic noise,

M. Berouti, R. Schwartz, and J. Makhoul, “Enhancement of speech corrupted by acoustic noise,” in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, vol. 4, pp. 208–211, 1979

work page 1979
[9]

An automated lung sound preprocessing and classiﬁcation system based on spectral analysis meth- ods,

G. Serbes, S. Ulukaya, and Y. P. Kahya, “An automated lung sound preprocessing and classiﬁcation system based on spectral analysis meth- ods,” in Precision Medicine Powered by pHealth and Connected Health, pp. 45–49, Springer, 2018

work page 2018
[10]

Noise masking recurrent neural network for respiratory sound classiﬁ- cation,

K. Kochetov, E. Putin, M. Balashov, A. Filchenkov, and A. Shalyto, “Noise masking recurrent neural network for respiratory sound classiﬁ- cation,” in Proc. Int. Conf. on Artiﬁcial Neural Networks , pp. 208–217, 2018

work page 2018
[11]

A respiratory sound database for the development of auto- mated classiﬁcation,

B. Rocha, D. Filos, L. Mendes, I. Vogiatzis, E. Perantoni, E. Kaimakamis, P. Natsiavas, A. Oliveira, C. J´ acome, A. Marques, et al. , “A respiratory sound database for the development of auto- mated classiﬁcation,” in Precision Medicine Powered by pHealth and Connected Health, pp. 33–37, Springer, 2018. 14

work page 2018
[12]

Towards the standardisation of lung sound nomen- clature,

H. Pasterkamp, P. L. Brand, M. Everard, L. Garcia-Marcos, H. Melbye, and K. N. Priftis, “Towards the standardisation of lung sound nomen- clature,” European Respiratory Journal , vol. 47, no. 3, pp. 724–732, 2016

work page 2016
[13]

Auscultation of the respiratory system,

M. Sarkar, I. Madabhavi, N. Niranjan, and M. Dogra, “Auscultation of the respiratory system,” Annals of thoracic medicine , vol. 10, no. 3, p. 158, 2015

work page 2015
[14]

Hidden markov model based respiratory sound classiﬁcation,

N. Jakovljevi´ c and T. Lonˇ car-Turukalo, “Hidden markov model based respiratory sound classiﬁcation,” in Precision Medicine Powered by pHealth and Connected Health, pp. 39–43, Springer, 2018

work page 2018
[15]

Wavelet transform with tunable q-factor,

I. W. Selesnick, “Wavelet transform with tunable q-factor,” IEEE Trans. Signal Proces., vol. 59, no. 8, pp. 3560–3575, 2011

work page 2011
[16]

Automatic de- tection of patient with respiratory diseases using lung sound analysis,

G. Chambres, P. Hanna, and M. Desainte-Catherine, “Automatic de- tection of patient with respiratory diseases using lung sound analysis,” in Proc. Int. Conf. on Content-Based Multimedia Indexing , pp. 1–6, 2018

work page 2018
[17]

Essentia: An audio analysis library for music information retrieval,

D. Bogdanov, N. Wack, E. G´ omez, S. Gulati, P. Herrera, O. Mayor, G. Roma, J. Salamon, J. R. Zapata, and X. Serra, “Essentia: An audio analysis library for music information retrieval,” in Proc. Int. Soc. for Music Information Retrieval Conf. , pp. 493–498, 2013

work page 2013
[18]

Convolutional neural networks learning from respiratory data,

D. Perna, “Convolutional neural networks learning from respiratory data,” in Proc. IEEE Int. Conf. on Bioinformatics and Biomedicine , pp. 2109–2113, 2018

work page 2018
[19]

I. J. Goodfellow, Y. Bengio, and A. C. Courville, Deep Learning. MIT Press, 2016

work page 2016
[20]

On the diﬃculty of training recurrent neural networks,

R. Pascanu, T. Mikolov, and Y. Bengio, “On the diﬃculty of training recurrent neural networks,” in Proc. Int. Conf. on Machine Learning , pp. 1310–1318, 2013

work page 2013
[21]

A theoretically grounded application of dropout in recurrent neural networks,

Y. Gal and Z. Ghahramani, “A theoretically grounded application of dropout in recurrent neural networks,” in Proc. Int. Conf. on Neural Information Processing Systems, pp. 1019–1027, 2016

work page 2016
[22]

Batch normalized recurrent neural networks,

C. Laurent, G. Pereyra, P. Brakel, Y. Zhang, and Y. Bengio, “Batch normalized recurrent neural networks,” in Procs IEEE Int. Conf. on Acoustics, Speech and Signal Processing, pp. 2657–2661, 2016

work page 2016
[23]

A method for stochastic optimization,

D. Kinga and J. B. Adam, “A method for stochastic optimization,” in Proc. Int. Conf. on Learning Representations , vol. 5, 2015. 15

work page 2015
[24]

Eﬀect of mfcc normal- ization on vector quantization based speaker identiﬁcation,

M. H. Shirali-Shahreza and S. Shirali-Shahreza, “Eﬀect of mfcc normal- ization on vector quantization based speaker identiﬁcation,” in Proc. IEEE Int. Conf. on Signal Processing and Information Technology , pp. 250–253, 2010

work page 2010
[25]

The htk book,

S. Young, G. Evermann, M. Gales, T. Hain, D. Kershaw, X. Liu, G. Moore, J. Odell, D. Ollason, D. Povey, et al. , “The htk book,” Cambridge university engineering department , vol. 3, p. 175, 2002

work page 2002
[26]

Montavon, G

G. Montavon, G. B. Orr, and K. M¨ uller, eds.,Neural Networks: Tricks of the Trade - Second Edition , vol. 7700. Springer, 2012. 16

work page 2012

[1] [1]

The global impact of respiratory disease (second edition),

“The global impact of respiratory disease (second edition),” Forum of International Respiratory Societies, 2017

work page 2017

[2] [2]

A. A. Cruz, Global surveillance, prevention and control of chronic res- piratory diseases: a comprehensive approach . WHO, 2007

work page 2007

[3] [3]

Global and regional trends in copd mortality, 1990–2010,

P. G. Burney, J. Patel, R. Newson, C. Minelli, and M. Naghavi, “Global and regional trends in copd mortality, 1990–2010,” European Respira- tory J., vol. 45, no. 5, pp. 1239–1247, 2015

work page 1990

[4] [4]

The global asthma report 2018,

“The global asthma report 2018,” Global Asthma Network , 2018

work page 2018

[5] [5]

Pneumonia: the leading killer of children,

T. Wardlaw, P. Salama, E. W. Johansson, and E. Mason, “Pneumonia: the leading killer of children,” The Lancet, vol. 368, no. 9541, pp. 1048– 1050, 2006

work page 2006

[6] [6]

World Health Organization, 2016

World malaria report 2015 . World Health Organization, 2016

work page 2015

[7] [7]

Global cancer statistics, 2012,

L. A. Torre, F. Bray, R. L. Siegel, J. Ferlay, J. Lortet-Tieulent, and A. Jemal, “Global cancer statistics, 2012,” Cancer journal for clini- cians, vol. 65, no. 2, pp. 87–108, 2015

work page 2012

[8] [8]

Enhancement of speech corrupted by acoustic noise,

M. Berouti, R. Schwartz, and J. Makhoul, “Enhancement of speech corrupted by acoustic noise,” in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, vol. 4, pp. 208–211, 1979

work page 1979

[9] [9]

An automated lung sound preprocessing and classiﬁcation system based on spectral analysis meth- ods,

G. Serbes, S. Ulukaya, and Y. P. Kahya, “An automated lung sound preprocessing and classiﬁcation system based on spectral analysis meth- ods,” in Precision Medicine Powered by pHealth and Connected Health, pp. 45–49, Springer, 2018

work page 2018

[10] [10]

Noise masking recurrent neural network for respiratory sound classiﬁ- cation,

K. Kochetov, E. Putin, M. Balashov, A. Filchenkov, and A. Shalyto, “Noise masking recurrent neural network for respiratory sound classiﬁ- cation,” in Proc. Int. Conf. on Artiﬁcial Neural Networks , pp. 208–217, 2018

work page 2018

[11] [11]

A respiratory sound database for the development of auto- mated classiﬁcation,

B. Rocha, D. Filos, L. Mendes, I. Vogiatzis, E. Perantoni, E. Kaimakamis, P. Natsiavas, A. Oliveira, C. J´ acome, A. Marques, et al. , “A respiratory sound database for the development of auto- mated classiﬁcation,” in Precision Medicine Powered by pHealth and Connected Health, pp. 33–37, Springer, 2018. 14

work page 2018

[12] [12]

Towards the standardisation of lung sound nomen- clature,

H. Pasterkamp, P. L. Brand, M. Everard, L. Garcia-Marcos, H. Melbye, and K. N. Priftis, “Towards the standardisation of lung sound nomen- clature,” European Respiratory Journal , vol. 47, no. 3, pp. 724–732, 2016

work page 2016

[13] [13]

Auscultation of the respiratory system,

M. Sarkar, I. Madabhavi, N. Niranjan, and M. Dogra, “Auscultation of the respiratory system,” Annals of thoracic medicine , vol. 10, no. 3, p. 158, 2015

work page 2015

[14] [14]

Hidden markov model based respiratory sound classiﬁcation,

N. Jakovljevi´ c and T. Lonˇ car-Turukalo, “Hidden markov model based respiratory sound classiﬁcation,” in Precision Medicine Powered by pHealth and Connected Health, pp. 39–43, Springer, 2018

work page 2018

[15] [15]

Wavelet transform with tunable q-factor,

I. W. Selesnick, “Wavelet transform with tunable q-factor,” IEEE Trans. Signal Proces., vol. 59, no. 8, pp. 3560–3575, 2011

work page 2011

[16] [16]

Automatic de- tection of patient with respiratory diseases using lung sound analysis,

G. Chambres, P. Hanna, and M. Desainte-Catherine, “Automatic de- tection of patient with respiratory diseases using lung sound analysis,” in Proc. Int. Conf. on Content-Based Multimedia Indexing , pp. 1–6, 2018

work page 2018

[17] [17]

Essentia: An audio analysis library for music information retrieval,

D. Bogdanov, N. Wack, E. G´ omez, S. Gulati, P. Herrera, O. Mayor, G. Roma, J. Salamon, J. R. Zapata, and X. Serra, “Essentia: An audio analysis library for music information retrieval,” in Proc. Int. Soc. for Music Information Retrieval Conf. , pp. 493–498, 2013

work page 2013

[18] [18]

Convolutional neural networks learning from respiratory data,

D. Perna, “Convolutional neural networks learning from respiratory data,” in Proc. IEEE Int. Conf. on Bioinformatics and Biomedicine , pp. 2109–2113, 2018

work page 2018

[19] [19]

I. J. Goodfellow, Y. Bengio, and A. C. Courville, Deep Learning. MIT Press, 2016

work page 2016

[20] [20]

On the diﬃculty of training recurrent neural networks,

R. Pascanu, T. Mikolov, and Y. Bengio, “On the diﬃculty of training recurrent neural networks,” in Proc. Int. Conf. on Machine Learning , pp. 1310–1318, 2013

work page 2013

[21] [21]

A theoretically grounded application of dropout in recurrent neural networks,

Y. Gal and Z. Ghahramani, “A theoretically grounded application of dropout in recurrent neural networks,” in Proc. Int. Conf. on Neural Information Processing Systems, pp. 1019–1027, 2016

work page 2016

[22] [22]

Batch normalized recurrent neural networks,

C. Laurent, G. Pereyra, P. Brakel, Y. Zhang, and Y. Bengio, “Batch normalized recurrent neural networks,” in Procs IEEE Int. Conf. on Acoustics, Speech and Signal Processing, pp. 2657–2661, 2016

work page 2016

[23] [23]

A method for stochastic optimization,

D. Kinga and J. B. Adam, “A method for stochastic optimization,” in Proc. Int. Conf. on Learning Representations , vol. 5, 2015. 15

work page 2015

[24] [24]

Eﬀect of mfcc normal- ization on vector quantization based speaker identiﬁcation,

M. H. Shirali-Shahreza and S. Shirali-Shahreza, “Eﬀect of mfcc normal- ization on vector quantization based speaker identiﬁcation,” in Proc. IEEE Int. Conf. on Signal Processing and Information Technology , pp. 250–253, 2010

work page 2010

[25] [25]

The htk book,

S. Young, G. Evermann, M. Gales, T. Hain, D. Kershaw, X. Liu, G. Moore, J. Odell, D. Ollason, D. Povey, et al. , “The htk book,” Cambridge university engineering department , vol. 3, p. 175, 2002

work page 2002

[26] [26]

Montavon, G

G. Montavon, G. B. Orr, and K. M¨ uller, eds.,Neural Networks: Tricks of the Trade - Second Edition , vol. 7700. Springer, 2012. 16

work page 2012