Deep auscultation: Predicting respiratory anomalies and diseases via recurrent neural networks
Pith reviewed 2026-05-24 22:51 UTC · model grok-4.3
The pith
Recurrent neural networks detect respiratory anomalies and diseases from auscultation sounds more accurately than prior methods on a standard benchmark.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors establish a recurrent-neural-network framework for respiratory auscultation data that, when combined with standard feature extraction, yields higher accuracy than existing methods on the ICBHI dataset for both detecting abnormal sounds and classifying underlying pathologies.
What carries the argument
Recurrent neural network architecture that processes sequential audio features to perform dual-level classification of anomalies and pathologies.
If this is right
- Higher accuracy on anomaly detection tasks supports earlier identification of abnormal breathing patterns.
- Improved pathology classification enables more precise disease-level diagnosis from sounds alone.
- The dual-task capability reduces the need for separate models for anomaly versus disease prediction.
- Outperformance on the benchmark advances computational support for respiratory auscultation analysis.
Where Pith is reading between the lines
- Mobile or wearable devices could eventually run similar models for at-home screening if the framework scales to low-power hardware.
- The same sequential modeling idea might transfer to other time-series medical signals such as heart sounds or cough analysis.
- Larger and more diverse sound collections would be required to test whether the reported gains persist outside the benchmark conditions.
Load-bearing premise
The ICBHI benchmark dataset captures enough real-world clinical variability that performance will hold for new patients, devices, and disease presentations.
What would settle it
A clear drop in accuracy when the trained model is evaluated on a fresh collection of lung-sound recordings made with different equipment or from patient groups absent from the original training data.
Figures
read the original abstract
Respiratory diseases are among the most common causes of severe illness and death worldwide. Prevention and early diagnosis are essential to limit or even reverse the trend that characterizes the diffusion of such diseases. In this regard, the development of advanced computational tools for the analysis of respiratory auscultation sounds can become a game changer for detecting disease-related anomalies, or diseases themselves. In this work, we propose a novel learning framework for respiratory auscultation sound data. Our approach combines state-of-the-art feature extraction techniques and advanced deep-neural-network architectures. Remarkably, to the best of our knowledge, we are the first to model a recurrent-neural-network based learning framework to support the clinician in detecting respiratory diseases, at either level of abnormal sounds or pathology classes. Results obtained on the ICBHI benchmark dataset show that our approach outperforms competing methods on both anomaly-driven and pathology-driven prediction tasks, thus advancing the state-of-the-art in respiratory disease analysis.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a recurrent neural network framework that combines standard feature extraction with deep architectures to classify respiratory auscultation sounds, claiming to be the first RNN-based approach for both anomaly detection and pathology classification and to outperform prior methods on the ICBHI benchmark for both tasks.
Significance. If the empirical superiority is shown to be robust, the work would constitute a modest incremental advance in applying sequence models to medical audio classification.
major comments (1)
- [Abstract] Abstract: the central claim that the method 'outperforms competing methods on both anomaly-driven and pathology-driven prediction tasks' is unsupported by any numerical results, baselines, validation protocol, error bars, or ablation data, rendering the primary contribution unverifiable from the supplied text.
Simulated Author's Rebuttal
We thank the referee for their comments. We address the single major comment below regarding the abstract. The full manuscript contains the experimental results, tables, and protocol details referenced in the abstract claim.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that the method 'outperforms competing methods on both anomaly-driven and pathology-driven prediction tasks' is unsupported by any numerical results, baselines, validation protocol, error bars, or ablation data, rendering the primary contribution unverifiable from the supplied text.
Authors: The full manuscript (Sections 4 and 5) reports the ICBHI results with tables comparing our RNN approach against prior methods for both anomaly detection and pathology classification, using the official ICBHI train/test split as the validation protocol. We acknowledge that the abstract itself contains no numbers. To improve verifiability, we will revise the abstract to include the key performance metrics (e.g., our scores and the best competing scores) along with a brief mention of the evaluation protocol. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper describes an empirical ML framework that combines standard feature extraction with RNN architectures and reports benchmark results on ICBHI. No equations, parameter-fitting procedures, or derivation steps are present in the supplied text that would reduce any claimed prediction to its own inputs by construction. No self-citation load-bearing premises, uniqueness theorems, or ansatz smuggling are referenced. The central claim is an empirical outperformance statement, which is self-contained against external benchmarks and does not exhibit any of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
The global impact of respiratory disease (second edition),
“The global impact of respiratory disease (second edition),” Forum of International Respiratory Societies, 2017
work page 2017
-
[2]
A. A. Cruz, Global surveillance, prevention and control of chronic res- piratory diseases: a comprehensive approach . WHO, 2007
work page 2007
-
[3]
Global and regional trends in copd mortality, 1990–2010,
P. G. Burney, J. Patel, R. Newson, C. Minelli, and M. Naghavi, “Global and regional trends in copd mortality, 1990–2010,” European Respira- tory J., vol. 45, no. 5, pp. 1239–1247, 2015
work page 1990
-
[4]
The global asthma report 2018,
“The global asthma report 2018,” Global Asthma Network , 2018
work page 2018
-
[5]
Pneumonia: the leading killer of children,
T. Wardlaw, P. Salama, E. W. Johansson, and E. Mason, “Pneumonia: the leading killer of children,” The Lancet, vol. 368, no. 9541, pp. 1048– 1050, 2006
work page 2006
-
[6]
World Health Organization, 2016
World malaria report 2015 . World Health Organization, 2016
work page 2015
-
[7]
Global cancer statistics, 2012,
L. A. Torre, F. Bray, R. L. Siegel, J. Ferlay, J. Lortet-Tieulent, and A. Jemal, “Global cancer statistics, 2012,” Cancer journal for clini- cians, vol. 65, no. 2, pp. 87–108, 2015
work page 2012
-
[8]
Enhancement of speech corrupted by acoustic noise,
M. Berouti, R. Schwartz, and J. Makhoul, “Enhancement of speech corrupted by acoustic noise,” in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, vol. 4, pp. 208–211, 1979
work page 1979
-
[9]
An automated lung sound preprocessing and classification system based on spectral analysis meth- ods,
G. Serbes, S. Ulukaya, and Y. P. Kahya, “An automated lung sound preprocessing and classification system based on spectral analysis meth- ods,” in Precision Medicine Powered by pHealth and Connected Health, pp. 45–49, Springer, 2018
work page 2018
-
[10]
Noise masking recurrent neural network for respiratory sound classifi- cation,
K. Kochetov, E. Putin, M. Balashov, A. Filchenkov, and A. Shalyto, “Noise masking recurrent neural network for respiratory sound classifi- cation,” in Proc. Int. Conf. on Artificial Neural Networks , pp. 208–217, 2018
work page 2018
-
[11]
A respiratory sound database for the development of auto- mated classification,
B. Rocha, D. Filos, L. Mendes, I. Vogiatzis, E. Perantoni, E. Kaimakamis, P. Natsiavas, A. Oliveira, C. J´ acome, A. Marques, et al. , “A respiratory sound database for the development of auto- mated classification,” in Precision Medicine Powered by pHealth and Connected Health, pp. 33–37, Springer, 2018. 14
work page 2018
-
[12]
Towards the standardisation of lung sound nomen- clature,
H. Pasterkamp, P. L. Brand, M. Everard, L. Garcia-Marcos, H. Melbye, and K. N. Priftis, “Towards the standardisation of lung sound nomen- clature,” European Respiratory Journal , vol. 47, no. 3, pp. 724–732, 2016
work page 2016
-
[13]
Auscultation of the respiratory system,
M. Sarkar, I. Madabhavi, N. Niranjan, and M. Dogra, “Auscultation of the respiratory system,” Annals of thoracic medicine , vol. 10, no. 3, p. 158, 2015
work page 2015
-
[14]
Hidden markov model based respiratory sound classification,
N. Jakovljevi´ c and T. Lonˇ car-Turukalo, “Hidden markov model based respiratory sound classification,” in Precision Medicine Powered by pHealth and Connected Health, pp. 39–43, Springer, 2018
work page 2018
-
[15]
Wavelet transform with tunable q-factor,
I. W. Selesnick, “Wavelet transform with tunable q-factor,” IEEE Trans. Signal Proces., vol. 59, no. 8, pp. 3560–3575, 2011
work page 2011
-
[16]
Automatic de- tection of patient with respiratory diseases using lung sound analysis,
G. Chambres, P. Hanna, and M. Desainte-Catherine, “Automatic de- tection of patient with respiratory diseases using lung sound analysis,” in Proc. Int. Conf. on Content-Based Multimedia Indexing , pp. 1–6, 2018
work page 2018
-
[17]
Essentia: An audio analysis library for music information retrieval,
D. Bogdanov, N. Wack, E. G´ omez, S. Gulati, P. Herrera, O. Mayor, G. Roma, J. Salamon, J. R. Zapata, and X. Serra, “Essentia: An audio analysis library for music information retrieval,” in Proc. Int. Soc. for Music Information Retrieval Conf. , pp. 493–498, 2013
work page 2013
-
[18]
Convolutional neural networks learning from respiratory data,
D. Perna, “Convolutional neural networks learning from respiratory data,” in Proc. IEEE Int. Conf. on Bioinformatics and Biomedicine , pp. 2109–2113, 2018
work page 2018
-
[19]
I. J. Goodfellow, Y. Bengio, and A. C. Courville, Deep Learning. MIT Press, 2016
work page 2016
-
[20]
On the difficulty of training recurrent neural networks,
R. Pascanu, T. Mikolov, and Y. Bengio, “On the difficulty of training recurrent neural networks,” in Proc. Int. Conf. on Machine Learning , pp. 1310–1318, 2013
work page 2013
-
[21]
A theoretically grounded application of dropout in recurrent neural networks,
Y. Gal and Z. Ghahramani, “A theoretically grounded application of dropout in recurrent neural networks,” in Proc. Int. Conf. on Neural Information Processing Systems, pp. 1019–1027, 2016
work page 2016
-
[22]
Batch normalized recurrent neural networks,
C. Laurent, G. Pereyra, P. Brakel, Y. Zhang, and Y. Bengio, “Batch normalized recurrent neural networks,” in Procs IEEE Int. Conf. on Acoustics, Speech and Signal Processing, pp. 2657–2661, 2016
work page 2016
-
[23]
A method for stochastic optimization,
D. Kinga and J. B. Adam, “A method for stochastic optimization,” in Proc. Int. Conf. on Learning Representations , vol. 5, 2015. 15
work page 2015
-
[24]
Effect of mfcc normal- ization on vector quantization based speaker identification,
M. H. Shirali-Shahreza and S. Shirali-Shahreza, “Effect of mfcc normal- ization on vector quantization based speaker identification,” in Proc. IEEE Int. Conf. on Signal Processing and Information Technology , pp. 250–253, 2010
work page 2010
-
[25]
S. Young, G. Evermann, M. Gales, T. Hain, D. Kershaw, X. Liu, G. Moore, J. Odell, D. Ollason, D. Povey, et al. , “The htk book,” Cambridge university engineering department , vol. 3, p. 175, 2002
work page 2002
-
[26]
G. Montavon, G. B. Orr, and K. M¨ uller, eds.,Neural Networks: Tricks of the Trade - Second Edition , vol. 7700. Springer, 2012. 16
work page 2012
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.