Optimizing 2D Input Representations and Sub-phase Fusion Strategies for Differential Diagnosis of Asthma and COPD Using CNN- and GRU-Based Networks

Elena Battini Sonmez; Ipek Sen; Ozgur Ozdemir

arxiv: 2606.10972 · v1 · pith:DHDWOHANnew · submitted 2026-06-09 · 📡 eess.AS · cs.AI

Optimizing 2D Input Representations and Sub-phase Fusion Strategies for Differential Diagnosis of Asthma and COPD Using CNN- and GRU-Based Networks

Ipek Sen , Ozgur Ozdemir , Elena Battini Sonmez This is my paper

Pith reviewed 2026-06-27 11:41 UTC · model grok-4.3

classification 📡 eess.AS cs.AI

keywords asthmaCOPDMFCCrespiratory soundsCNNGRUdifferential diagnosispulmonary sound classification

0 comments

The pith

MFCC matrices with adaptive-length windowing and direct concatenation achieve the best F1 scores for distinguishing asthma from COPD in respiratory sound analysis.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests different 2D representations of pulmonary sounds to classify asthma versus COPD using CNN and GRU networks. It compares MFCC matrices against log-mel spectrograms and VAR models while addressing variable cycle lengths through adaptive-length windowing instead of simple trimming or padding. Sub-phase features are extracted via CNNs and fused by direct concatenation, GRU, or GRU with attention, with performance measured at both cycle and subject levels. MFCC with thirteen coefficients and specific time resolutions, paired with adaptive windowing and simple concatenation, produces the highest scores while augmentation and complex fusions do not help. A reader would care because reliable sound-based differentiation could support clinical decisions when symptoms overlap.

Core claim

MFCC matrices with thirteen coefficients and 64-point time resolution per sub-phase representation, processed by adaptive-length windowing followed by direct feature concatenation, reach a cycle-based F1-score of 0.877; MFCC with thirteen coefficients and 256-point time resolution per full-cycle representation reach a subject-based F1-score of 0.855. These outperform log-mel spectrograms and the VAR model. Sophisticated fusion strategies such as GRU with attention do not improve results over direct concatenation, and data augmentation techniques overall degrade performance, underscoring the value of authentic recordings.

What carries the argument

Adaptive-length windowing that standardizes the temporal dimensions of variable-length respiratory cycle representations before CNN feature extraction from sub-phases and their fusion.

If this is right

MFCC matrices outperform both log-mel spectrograms and the VAR model for asthma-COPD differentiation.
Direct concatenation of sub-phase features is sufficient and superior to GRU-based or attention-based fusion strategies.
Data augmentation methods, even mixup, reduce overall model performance compared with unaugmented training.
Optimized spectral and temporal dimensions of MFCC inputs matter more than the choice of fusion architecture.
Subject-based evaluation that aggregates multiple cycles yields slightly lower but still high F1 scores than cycle-based evaluation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same windowing and MFCC optimization steps could be applied to classification tasks involving other lung conditions that produce variable cycle lengths.
Preference for simple concatenation suggests that future systems could prioritize low-complexity inference suitable for portable devices.
The finding that authentic data beats augmentation implies a need for larger curated sound databases rather than synthetic expansion.
If the 13-coefficient MFCC advantage holds across sites, clinical protocols might standardize on cepstral features for initial sound triage.

Load-bearing premise

The respiratory sound recordings are representative of real-world patient variability and free of artifacts or label noise that would favor one input representation over others.

What would settle it

An independent test set of asthma and COPD respiratory recordings collected under varied clinical conditions that shows no F1 advantage for the reported MFCC configurations with adaptive windowing would falsify the optimality claim.

Figures

Figures reproduced from arXiv: 2606.10972 by Elena Battini Sonmez, Ipek Sen, Ozgur Ozdemir.

read the original abstract

This study aims to explore the performance of the VAR model in comparison with mel-frequency cepstral coefficient (MFCC) matrices and log-mel spectrograms using deep learning. In pulmonary sound classification, spectrogram-based representations suffer from inconsistent temporal dimensions due to varying respiratory cycle durations. Along with traditional trimming/zero-padding, adaptive-length windowing was presented to fix their temporal dimensions. Their spectral and temporal dimensions were optimized by testing a range of parameters. Different convolutional neural network (CNN) architectures were employed to extract features from the two-dimensional representations obtained over the sub-phases. The extracted sub-phase features were then fused using various strategies including direct concatenation, gated recurrent unit (GRU) network and GRU with attention mechanism. Model performances were assessed through respiratory cycle-based evaluation and subject-based evaluation comprising multiple respiratory cycles. Several data augmentation techniques were also studied to cope with limitations in data size. The best cycle-based F1-score (0.877) was obtained using the MFCC matrices with thirteen coefficients and 64-point time resolution per sub-phase representation followed by direct feature concatenation, and the best subject-based F1-score (0.855) was obtained using the MFCC matrices with thirteen coefficients and 256-point time resolution per full-cycle representation, both obtained by adaptive-length windowing. Augmentation degraded the performance of models overall, yet mixup augmentation was the best among the methods tested. MFCC outperformed log-mel spectrogram and VAR model in differentiation of asthma and COPD. Sophisticated fusion strategies did not improve the diagnosis. Augmentation did not contribute, demonstrating the significance of authentic data in pulmonary sound studies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a narrow empirical tuning study that finds MFCC with adaptive windowing works best for asthma-COPD classification from lung sounds, but the evidence is thin on dataset quality and statistical controls.

read the letter

The core result is that MFCC matrices at 13 coefficients with adaptive-length windowing (64-point sub-phase or 256-point full-cycle) gave the highest F1 scores—0.877 cycle-based and 0.855 subject-based—while log-mel and VAR lagged, direct concatenation beat GRU fusion, and augmentation hurt performance overall.

What the paper actually does is test a handful of established 2D representations and fusion options on respiratory cycles, with the adaptive windowing step as the main practical tweak to avoid padding artifacts. That comparison is straightforward and the numbers are reported clearly enough to replicate the ranking if someone has the same data.

The soft spots sit in the missing details. The abstract gives no dataset size, no patient count, no cross-validation scheme, and no error bars or significance tests, so the claimed superiority rests on single runs per configuration. The fact that augmentation lowered scores is consistent with models latching onto recording artifacts or label noise rather than disease-specific patterns; without multi-site data or artifact checks, it's hard to know whether MFCC genuinely wins or just fits this particular collection better. The stress-test concern about systematic biases in the recordings lands because nothing in the reported work rules it out.

This paper is for people already working on lung-sound classification who want a quick recipe for input representations and windowing. A reader looking for new methods or broad claims will find little. It deserves a serious referee because the empirical setup is honest and the task is well-defined, even if the current write-up needs more data transparency and statistical grounding before it can be trusted at face value. I would send it for review with a request for those controls.

Referee Report

2 major / 1 minor

Summary. The paper compares MFCC matrices, log-mel spectrograms, and VAR representations as 2D inputs to CNN-based networks for binary classification of asthma versus COPD from respiratory sounds. It introduces adaptive-length windowing to normalize variable cycle durations (as an alternative to trimming or zero-padding), optimizes spectral/temporal dimensions and sub-phase fusion strategies (direct concatenation, GRU, GRU+attention), evaluates both cycle-level and subject-level F1, and tests several augmentation methods. The headline empirical result is that MFCC with 13 coefficients plus adaptive windowing at 64-point sub-phase or 256-point full-cycle resolution, followed by direct concatenation, yields the highest scores (cycle-based F1 0.877, subject-based F1 0.855), that MFCC outperforms the other representations, that sophisticated fusion adds no benefit, and that augmentation harms performance.

Significance. If the ranking is robust, the work supplies concrete, actionable guidance on input-representation choices and windowing strategies for deep-learning pipelines in pulmonary audio, while reinforcing that authentic data may matter more than augmentation in this domain.

major comments (2)

[Abstract and Results] Abstract and Results sections: the superiority claims rest on single reported F1 values per configuration (0.877 / 0.855) with no dataset size, number of subjects or recordings, cross-validation scheme, statistical tests, or error bars supplied; this absence makes it impossible to judge whether the MFCC advantage is reliable or merely the outcome of an unreported post-hoc search.
[Methods and Discussion] Methods and Discussion: the finding that all augmentation techniques degraded performance is presented without any quantification of recording artifacts, device variability, label noise, or inter-annotator agreement; given that the central claim is an empirical ranking among representations, the lack of controls for systematic biases that could be exploited more readily by MFCC than by log-mel or VAR constitutes a load-bearing gap.

minor comments (1)

[Abstract] The acronym VAR is introduced in the abstract without an immediate definition or reference; a brief expansion on first use would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and indicate where revisions will be made to improve clarity and completeness.

read point-by-point responses

Referee: [Abstract and Results] Abstract and Results sections: the superiority claims rest on single reported F1 values per configuration (0.877 / 0.855) with no dataset size, number of subjects or recordings, cross-validation scheme, statistical tests, or error bars supplied; this absence makes it impossible to judge whether the MFCC advantage is reliable or merely the outcome of an unreported post-hoc search.

Authors: We agree that the abstract and results presentation would be strengthened by including these details. The Methods section describes the dataset and evaluation protocol, but we will revise the Results section to explicitly report the number of subjects and recordings, the cross-validation scheme (subject-independent splits), and add statistical tests or confidence intervals for the reported F1 scores. This will allow better assessment of result reliability. revision: yes
Referee: [Methods and Discussion] Methods and Discussion: the finding that all augmentation techniques degraded performance is presented without any quantification of recording artifacts, device variability, label noise, or inter-annotator agreement; given that the central claim is an empirical ranking among representations, the lack of controls for systematic biases that could be exploited more readily by MFCC than by log-mel or VAR constitutes a load-bearing gap.

Authors: This observation is correct and highlights a limitation in the current discussion. The manuscript does not quantify these factors. In the revised version we will expand the Discussion to describe known dataset characteristics (collection devices, potential variability) and acknowledge the absence of inter-annotator agreement metrics as a limitation. We will also note that the empirical result on augmentation still stands but requires this additional context. revision: partial

Circularity Check

0 steps flagged

No circularity: purely empirical ranking of input representations on held-out data

full rationale

The paper contains no equations, derivations, or load-bearing self-citations. All reported results (F1 scores for MFCC vs. log-mel vs. VAR, different time resolutions, fusion strategies, and augmentation) are obtained by direct experimental comparison of models trained and evaluated on respiratory sound recordings. Performance numbers are not forced by construction from fitted parameters or prior self-citations; they reflect empirical outcomes on held-out cycles and subjects. This is the standard non-circular case for an optimization study.

Axiom & Free-Parameter Ledger

3 free parameters · 2 axioms · 0 invented entities

The central performance claims rest on empirical hyperparameter search over MFCC coefficient count and time resolution plus the assumption that the chosen dataset split and evaluation protocol (cycle vs subject) generalize; no new entities are postulated.

free parameters (3)

MFCC coefficient count
Fixed at 13 after testing a range; directly affects the reported best F1 scores.
time resolution per sub-phase or cycle
Tested values include 64 and 256 points; chosen values produce the headline F1 numbers.
windowing strategy parameters
Adaptive-length windowing parameters selected to equalize temporal dimensions.

axioms (2)

domain assumption Respiratory cycles can be segmented into sub-phases whose features are statistically independent enough for concatenation or GRU fusion to be meaningful.
Invoked when sub-phase features are extracted and fused.
domain assumption The evaluation split separates cycles or subjects without leakage from the same recording session.
Required for the reported cycle-based and subject-based F1 scores to be unbiased.

pith-pipeline@v0.9.1-grok · 5843 in / 1647 out tokens · 27896 ms · 2026-06-27T11:41:14.718144+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

55 extracted references · 7 canonical work pages · 5 internal anchors

[1]

Tinkelman, David B

David G. Tinkelman, David B. Price, Robert J. Nordyke, R. J. Halbert, Sharon Isonaka, Dmitry Nonikov, Elizabeth F. Juniper, Daryl Freeman, Thomas Hausen, Mark L. Levy, Anders Ostrem, Thys van der Molen, and Constant P. van Schayck. Symptom-based questionnaire for differentiating copd and asthma.Respiration, 73(3):296–305, 2006

2006
[2]

Aging and disability affect misdiagnosis of copd in elderly asthmatics: the sara study.Chest, 123(4):1066–1072, 2003

Vincenzo Bellia, Salvatore Battaglia, Filippo Catalano, Nicola Scichilone, Raffaele Antonelli Incalzi, Claudio Imperiale, and Franco Rengo. Aging and disability affect misdiagnosis of copd in elderly asthmatics: the sara study.Chest, 123(4):1066–1072, 2003

2003
[3]

Diagnosis of asthma and chronic obstructive pulmonary disease in general practice.The British Journal of General Practice, 46(404):193–197, 1996

CP Van Schayck. Diagnosis of asthma and chronic obstructive pulmonary disease in general practice.The British Journal of General Practice, 46(404):193–197, 1996

1996
[4]

van Wheel

C. van Wheel. Underdiagnosis of asthma and copd: is the general practitioner to blame?Monaldi Archives for Chest Disease, 57(1):65–68, 2002

2002
[5]

Characteristics of asthma among elderly adults in a sample of the general population.Chest, 100(4):935–942, 1991

Benjamin Burrows, Robert A Barbee, Martha G Cline, Ronald J Knudson, and Michael D Lebowitz. Characteristics of asthma among elderly adults in a sample of the general population.Chest, 100(4):935–942, 1991. 17 APREPRINT- JUNE10, 2026

1991
[6]

Standards for the diagnosis and care of patients with chronic obstructive pulmonary disease (copd) and asthma

American Thoracic Society. Standards for the diagnosis and care of patients with chronic obstructive pulmonary disease (copd) and asthma. official statement.Am Rev Respir Dis, 136:225–224, 1987

1987
[7]

Causes of misdiagnosis of chronic obstructive pulmonary disease: a systematic scoping review.Respiratory medicine, 129:63–84, 2017

Stine Hangaard, Tina Helle, Carl Nielsen, and Ole K Hejlesen. Causes of misdiagnosis of chronic obstructive pulmonary disease: a systematic scoping review.Respiratory medicine, 129:63–84, 2017

2017
[8]

Practical management problems of stable chronic obstructive pulmonary disease in the elderly.Current opinion in pulmonary medicine, 17:S43–S48, 2011

Riccardo Pistelli, Letizia Ferrara, Clementina Misuraca, and Silvia Bustacchini. Practical management problems of stable chronic obstructive pulmonary disease in the elderly.Current opinion in pulmonary medicine, 17:S43–S48, 2011

2011
[9]

LP Malmberg, L Pesu, and AR Sovijarvi. Significant differences in flow standardised breath sound spectra in patients with chronic obstructive pulmonary disease, stable asthma, and healthy lungs.Thorax, 50(12):1285–1291, 1995

1995
[10]

The attractor recurrent neural network based on fuzzy functions: An effective model for the classification of lung abnormalities.Computers in Biology and Medicine, 84:124–136, 2017

Mohammad Bagher Khodabakhshi and Mohammad Hassan Moradi. The attractor recurrent neural network based on fuzzy functions: An effective model for the classification of lung abnormalities.Computers in Biology and Medicine, 84:124–136, 2017

2017
[11]

Ariful Islam, Irin Bandyopadhyaya, Parthasarathi Bhattacharyya, and Goutam Saha

Md. Ariful Islam, Irin Bandyopadhyaya, Parthasarathi Bhattacharyya, and Goutam Saha. Classification of normal, asthma and copd subjects using multichannel lung sound signals. In2018 International Conference on Communication and Signal Processing (ICCSP), pages 290–294, 2018

2018
[12]

Computerized lung sound based classification of asthma and chronic obstructive pulmonary disease (copd).Biocybernetics and Biomedical Engineering, 42(1):42–59, 2022

Nishi Shahnaj Haider and AK Behera. Computerized lung sound based classification of asthma and chronic obstructive pulmonary disease (copd).Biocybernetics and Biomedical Engineering, 42(1):42–59, 2022

2022
[13]

Behera, and Ranganath T

Vaibhav Koshta, Bikesh Kumar Singh, Ajoy K. Behera, and Ranganath T. Ganga. Classification of asthma, copd and healthy lung sounds using fourier bessel series expansion in machine learning and deep learning paradigm. In 2023 11th International Conference on Intelligent Systems and Embedded Design (ISED), pages 1–6, 2023

2023
[14]

Differential diagnosis of asthma and copd based on multivariate pulmonary sounds analysis.IEEE Trans Biomed Eng., 68(5):1601–1610, 2021

I Sen, M Saraclar, and Kahya YP. Differential diagnosis of asthma and copd based on multivariate pulmonary sounds analysis.IEEE Trans Biomed Eng., 68(5):1601–1610, 2021

2021
[15]

A dataset of lung sounds recorded from the chest wall using an electronic stethoscope, 2021

Mohammad Fraiwan, Luay Fraiwan, Basheer Khassawneh, and Ali Ibnian. A dataset of lung sounds recorded from the chest wall using an electronic stethoscope, 2021

2021
[16]

Srivastava, A.and Jain, R

S. Srivastava, A.and Jain, R. Miranda, S. Patil, S. Pandya, and K. Kotecha. Deep learning based respiratory sound analysis for detection of chronic obstructive pulmonary disease.PeerJ. Computer science, 7(e369), 2021

2021
[17]

Research on lung sound classification model based on dual-channel cnn-lstm algorithm.Biomedical Signal Processing and Control, 94:106257, 2024

Yipeng Zhang, Qiong Huang, Wenhui Sun, Fenlan Chen, Dongmei Lin, and Fuming Chen. Research on lung sound classification model based on dual-channel cnn-lstm algorithm.Biomedical Signal Processing and Control, 94:106257, 2024

2024
[18]

Shehab, Kamel K

Sara A. Shehab, Kamel K. Mohammed, Ashraf Darwish, and Aboul Ella Hassanien. Deep learning and feature fusion-based lung sound recognition model to diagnoses the respiratory diseases.Soft Computing, 28:11667– 11683, 2024

2024
[19]

Efficiently classifying lung sounds through depthwise separable cnn models with fused stft and mfcc features.Diagnostics (Basel), 11(4):732, 2021

Shing-Yun Jung, Chia-Hung Liao, Yu-Sheng Wu, Shyan-Ming Yuan, and Chuen-Tsai Sun. Efficiently classifying lung sounds through depthwise separable cnn models with fused stft and mfcc features.Diagnostics (Basel), 11(4):732, 2021

2021
[20]

Respiratory sound classification for crackles, wheezes, and rhonchi in the clinical field using deep learning

Yoonjoo Kim, YunKyong Hyon, Sung Soo Jung, Sunju Lee, Geon Yoo, Chaeuk Chung, and Taeyoung Ha. Respiratory sound classification for crackles, wheezes, and rhonchi in the clinical field using deep learning. Scientific Reports, 11(1):17186, 2021

2021
[21]

Classification of respiratory sounds using improved convolutional recurrent neural network.Computers & Electrical Engineering, 94:107367, 2021

Naoki Asatani, Tohru Kamiya, Shingo Mabu, and Shoji Kido. Classification of respiratory sounds using improved convolutional recurrent neural network.Computers & Electrical Engineering, 94:107367, 2021

2021
[22]

Zulfiqar et al

F. Zulfiqar et al. Abnormal respiratory sounds classification using deep cnn.Frontiers in Medicine (Lausanne), 8:714811, 2021

2021
[23]

Katsaggelos, and Nicos Maglaveras

Georgios Petmezas, Grigorios-Aris Cheimariotis, Leandros Stefanopoulos, Bruno Rocha, Rui Pedro Paiva, Aggelos K. Katsaggelos, and Nicos Maglaveras. Automated lung sound classification using a hybrid cnn-lstm network and focal loss function.Sensors, 22(3), 2022

2022
[24]

erformance evaluation of lung sounds classification using deep learning under variable parameters.EURASIP Journal on Advances in Signal Processing, 2024:51, 2024

Zhaoping Wang and Zhiqiang Sun. erformance evaluation of lung sounds classification using deep learning under variable parameters.EURASIP Journal on Advances in Signal Processing, 2024:51, 2024

2024
[25]

Enhanced respiratory sound classification using deep learning and multi-channel auscultation.Journal of Clinical Medicine, 14(15):5437, 2025

Yeonkyeong Kim, Kyu Bom Kim, Ah Young Leem, Kyuseok Kim, and Su Hwan Lee. Enhanced respiratory sound classification using deep learning and multi-channel auscultation.Journal of Clinical Medicine, 14(15):5437, 2025

2025
[26]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. 18 APREPRINT- JUNE10, 2026

2016
[27]

Wide Residual Networks

Sergey Zagoruyko and Nikos Komodakis. Wide residual networks.arXiv preprint arXiv:1605.07146, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[28]

Densely connected convolutional networks

Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. Densely connected convolutional networks. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 4700–4708, 2017

2017
[29]

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[30]

Neural Machine Translation by Jointly Learning to Align and Translate

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate.arXiv preprint arXiv:1409.0473, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[31]

An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling

Shaojie Bai, J Zico Kolter, and Vladlen Koltun. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling.arXiv preprint arXiv:1803.01271, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[32]

Lung sound classification using cepstral-based statistical features.Computers in Biology and Medicine, 75:118–129, 2016

Nandini Sengupta, Md Sahidullah, and Goutam Saha. Lung sound classification using cepstral-based statistical features.Computers in Biology and Medicine, 75:118–129, 2016

2016
[33]

A comparative study of the svm and k-nn machine learning algorithms for the diagnosis of respiratory pathologies using pulmonary acoustic signals.BMC Bioinformatics, 15(223), 2014

Rajkumar Palaniappan, Kenneth Sundaraj, and Sebastian Sundaraj. A comparative study of the svm and k-nn machine learning algorithms for the diagnosis of respiratory pathologies using pulmonary acoustic signals.BMC Bioinformatics, 15(223), 2014

2014
[34]

Periyasamy, and Ajoy K

Nishi Shahnaj Haider, Bikesh Kumar Singh, R. Periyasamy, and Ajoy K. Behera. Respiratory sound based classification of chronic obstructive pulmonary disease: a risk stratification approach in machine learning paradigm. Journal of Medical Systems, 43(255), 2019

2019
[35]

Multi-channel lung sound classification with convolutional recurrent neural networks.Computers in Biology and Medicine, 122:103831, 2020

Elmar Messner, Melanie Fediuk, Paul Swatek, Stefan Scheidl, Freyja-Maria Smolle-Jüttner, Horst Olschewski, and Franz Pernkopf. Multi-channel lung sound classification with convolutional recurrent neural networks.Computers in Biology and Medicine, 122:103831, 2020

2020
[36]

CRC Press, Florida, 1995

Noam Gavriely.Breath Sounds Methodology. CRC Press, Florida, 1995

1995
[37]

AR Nath and LH Capel. Significant differences in flow standardised breath sound spectra in patients with chronic obstructive pulmonary disease, stable asthma, and healthy lungs.Thorax, 29(2):223–227, 1974

1974
[38]

Charbonneau, J

ARA Sovijarvi, LP Malmberg, G. Charbonneau, J. Vanderschoot, F. Dalmasso, C. Sacco, M. Rossi, and JE Earis. Significant differences in flow standardised breath sound spectra in patients with chronic obstructive pulmonary disease, stable asthma, and healthy lungs.European Respiratory Review, 10(77):591–596, 2000

2000
[39]

Douros and I

K. Douros and I. Grammeniatis, V .and Loukou.Breath Sounds. Priftis, K. and Hadjileontiadis, L.and Everard M. (eds) Springer Cham, 2018

2018
[40]

Crackles in patients with fibrosing alveolitis, bronchiectasis, copd, and heart failure.Chest, 99:1076–1083, 1991

P Piirilä, AR Sovijärvi, T Kaisla, HM Rajala, and Katila T. Crackles in patients with fibrosing alveolitis, bronchiectasis, copd, and heart failure.Chest, 99:1076–1083, 1991

1991
[41]

Sen and Y .P

I. Sen and Y .P. Kahya. A multi-channel device for respiratory sound data acquisition and transient detection. In 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference, pages 6658–6661, 2005

2005
[42]

Davis and P

S. Davis and P. Mermelstein. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences.IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4):357–366, 1980

1980
[43]

Mel frequency cepstral coefficients for music modeling

Beth Logan et al. Mel frequency cepstral coefficients for music modeling. InIsmir, volume 270, page 11. Plymouth, MA, 2000

2000
[44]

Hybrid dual-input model for respiratory sound classification with mel spectrogram and waveform.IEEE Access, 13:80971–80980, 2025

Fan Wang, Jiacheng Gao, Ying Wang, Guoheng Huang, and Xiaochen Yuan. Hybrid dual-input model for respiratory sound classification with mel spectrogram and waveform.IEEE Access, 13:80971–80980, 2025

2025
[45]

Transferring cross-corpus knowledge: An investigation on data augmentation for heart sound classification

Tomoya Koike, Kun Qian, Björn W Schuller, and Yoshiharu Yamamoto. Transferring cross-corpus knowledge: An investigation on data augmentation for heart sound classification. In2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), pages 1976–1979, 2021

1976
[46]

Ipek Sen, Murat Saraclar, and Yasemin P. Kahya. Exploring an optimal vector autoregressive model for multi- channel pulmonary sound data.Computer Methods and Programs in Biomedicine, 111(3):550–560, 2013

2013
[47]

Human action recognition in videos with articulated pose information by deep networks.Pattern Analysis and Applications, 22:1307–1318, 2019

Miguel Farrajota, João MF Rodrigues, and JM Hans du Buf. Human action recognition in videos with articulated pose information by deep networks.Pattern Analysis and Applications, 22:1307–1318, 2019

2019
[48]

A comprehensive survey of deep learning for image captioning.ACM Computing Surveys (CsUR), 51(6):1–36, 2019

MD Zakir Hossain, Ferdous Sohel, Mohd Fairuz Shiratuddin, and Hamid Laga. A comprehensive survey of deep learning for image captioning.ACM Computing Surveys (CsUR), 51(6):1–36, 2019

2019
[49]

Extracting low-dimensional psychological representations from convolutional neural networks.Cognitive Science, 47(1):e13226, 2023

Aditi Jha, Joshua C Peterson, and Thomas L Griffiths. Extracting low-dimensional psychological representations from convolutional neural networks.Cognitive Science, 47(1):e13226, 2023. 19 APREPRINT- JUNE10, 2026

2023
[50]

Automated multi-class classification of skin lesions through deep convolutional neural network with dermoscopic images.Computerized medical imaging and graphics, 88:101843, 2021

Imran Iqbal, Muhammad Younus, Khuram Walayat, Mohib Ullah Kakar, and Jinwen Ma. Automated multi-class classification of skin lesions through deep convolutional neural network with dermoscopic images.Computerized medical imaging and graphics, 88:101843, 2021

2021
[51]

Measuring overfitting in convolutional neural networks using adversarial perturbations and label noise.arXiv preprint arXiv:2209.13382, 2022

Svetlana Pavlitskaya, Joël Oswald, and J Marius Zöllner. Measuring overfitting in convolutional neural networks using adversarial perturbations and label noise.arXiv preprint arXiv:2209.13382, 2022

work page arXiv 2022
[52]

bert- like

Shamane Siriwardhana, Andrew Reis, Rivindu Weerasekera, and Suranga Nanayakkara. Jointly fine-tuning" bert- like" self supervised models to improve multimodal speech emotion recognition.arXiv preprint arXiv:2008.06682, 2020

work page arXiv 2008
[53]

mixup: Beyond Empirical Risk Minimization

Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. mixup: Beyond empirical risk minimization.arXiv preprint arXiv:1710.09412, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[54]

Dalmasso, J

Anssi Sovijärvi, F. Dalmasso, J. Vanderschoot, Leo Malmberg, G. Righini, and S.A.T. Stoneman. Definition of terms for applications of respiratory sounds.Eur Respir Rev, 10(77):597–610, 2000

2000
[55]

American Thoracic Society, Updated nomenclature for membership relation.ATS News, 3:5–6, 1977. 20

1977

[1] [1]

Tinkelman, David B

David G. Tinkelman, David B. Price, Robert J. Nordyke, R. J. Halbert, Sharon Isonaka, Dmitry Nonikov, Elizabeth F. Juniper, Daryl Freeman, Thomas Hausen, Mark L. Levy, Anders Ostrem, Thys van der Molen, and Constant P. van Schayck. Symptom-based questionnaire for differentiating copd and asthma.Respiration, 73(3):296–305, 2006

2006

[2] [2]

Aging and disability affect misdiagnosis of copd in elderly asthmatics: the sara study.Chest, 123(4):1066–1072, 2003

Vincenzo Bellia, Salvatore Battaglia, Filippo Catalano, Nicola Scichilone, Raffaele Antonelli Incalzi, Claudio Imperiale, and Franco Rengo. Aging and disability affect misdiagnosis of copd in elderly asthmatics: the sara study.Chest, 123(4):1066–1072, 2003

2003

[3] [3]

Diagnosis of asthma and chronic obstructive pulmonary disease in general practice.The British Journal of General Practice, 46(404):193–197, 1996

CP Van Schayck. Diagnosis of asthma and chronic obstructive pulmonary disease in general practice.The British Journal of General Practice, 46(404):193–197, 1996

1996

[4] [4]

van Wheel

C. van Wheel. Underdiagnosis of asthma and copd: is the general practitioner to blame?Monaldi Archives for Chest Disease, 57(1):65–68, 2002

2002

[5] [5]

Characteristics of asthma among elderly adults in a sample of the general population.Chest, 100(4):935–942, 1991

Benjamin Burrows, Robert A Barbee, Martha G Cline, Ronald J Knudson, and Michael D Lebowitz. Characteristics of asthma among elderly adults in a sample of the general population.Chest, 100(4):935–942, 1991. 17 APREPRINT- JUNE10, 2026

1991

[6] [6]

Standards for the diagnosis and care of patients with chronic obstructive pulmonary disease (copd) and asthma

American Thoracic Society. Standards for the diagnosis and care of patients with chronic obstructive pulmonary disease (copd) and asthma. official statement.Am Rev Respir Dis, 136:225–224, 1987

1987

[7] [7]

Causes of misdiagnosis of chronic obstructive pulmonary disease: a systematic scoping review.Respiratory medicine, 129:63–84, 2017

Stine Hangaard, Tina Helle, Carl Nielsen, and Ole K Hejlesen. Causes of misdiagnosis of chronic obstructive pulmonary disease: a systematic scoping review.Respiratory medicine, 129:63–84, 2017

2017

[8] [8]

Practical management problems of stable chronic obstructive pulmonary disease in the elderly.Current opinion in pulmonary medicine, 17:S43–S48, 2011

Riccardo Pistelli, Letizia Ferrara, Clementina Misuraca, and Silvia Bustacchini. Practical management problems of stable chronic obstructive pulmonary disease in the elderly.Current opinion in pulmonary medicine, 17:S43–S48, 2011

2011

[9] [9]

LP Malmberg, L Pesu, and AR Sovijarvi. Significant differences in flow standardised breath sound spectra in patients with chronic obstructive pulmonary disease, stable asthma, and healthy lungs.Thorax, 50(12):1285–1291, 1995

1995

[10] [10]

The attractor recurrent neural network based on fuzzy functions: An effective model for the classification of lung abnormalities.Computers in Biology and Medicine, 84:124–136, 2017

Mohammad Bagher Khodabakhshi and Mohammad Hassan Moradi. The attractor recurrent neural network based on fuzzy functions: An effective model for the classification of lung abnormalities.Computers in Biology and Medicine, 84:124–136, 2017

2017

[11] [11]

Ariful Islam, Irin Bandyopadhyaya, Parthasarathi Bhattacharyya, and Goutam Saha

Md. Ariful Islam, Irin Bandyopadhyaya, Parthasarathi Bhattacharyya, and Goutam Saha. Classification of normal, asthma and copd subjects using multichannel lung sound signals. In2018 International Conference on Communication and Signal Processing (ICCSP), pages 290–294, 2018

2018

[12] [12]

Computerized lung sound based classification of asthma and chronic obstructive pulmonary disease (copd).Biocybernetics and Biomedical Engineering, 42(1):42–59, 2022

Nishi Shahnaj Haider and AK Behera. Computerized lung sound based classification of asthma and chronic obstructive pulmonary disease (copd).Biocybernetics and Biomedical Engineering, 42(1):42–59, 2022

2022

[13] [13]

Behera, and Ranganath T

Vaibhav Koshta, Bikesh Kumar Singh, Ajoy K. Behera, and Ranganath T. Ganga. Classification of asthma, copd and healthy lung sounds using fourier bessel series expansion in machine learning and deep learning paradigm. In 2023 11th International Conference on Intelligent Systems and Embedded Design (ISED), pages 1–6, 2023

2023

[14] [14]

Differential diagnosis of asthma and copd based on multivariate pulmonary sounds analysis.IEEE Trans Biomed Eng., 68(5):1601–1610, 2021

I Sen, M Saraclar, and Kahya YP. Differential diagnosis of asthma and copd based on multivariate pulmonary sounds analysis.IEEE Trans Biomed Eng., 68(5):1601–1610, 2021

2021

[15] [15]

A dataset of lung sounds recorded from the chest wall using an electronic stethoscope, 2021

Mohammad Fraiwan, Luay Fraiwan, Basheer Khassawneh, and Ali Ibnian. A dataset of lung sounds recorded from the chest wall using an electronic stethoscope, 2021

2021

[16] [16]

Srivastava, A.and Jain, R

S. Srivastava, A.and Jain, R. Miranda, S. Patil, S. Pandya, and K. Kotecha. Deep learning based respiratory sound analysis for detection of chronic obstructive pulmonary disease.PeerJ. Computer science, 7(e369), 2021

2021

[17] [17]

Research on lung sound classification model based on dual-channel cnn-lstm algorithm.Biomedical Signal Processing and Control, 94:106257, 2024

Yipeng Zhang, Qiong Huang, Wenhui Sun, Fenlan Chen, Dongmei Lin, and Fuming Chen. Research on lung sound classification model based on dual-channel cnn-lstm algorithm.Biomedical Signal Processing and Control, 94:106257, 2024

2024

[18] [18]

Shehab, Kamel K

Sara A. Shehab, Kamel K. Mohammed, Ashraf Darwish, and Aboul Ella Hassanien. Deep learning and feature fusion-based lung sound recognition model to diagnoses the respiratory diseases.Soft Computing, 28:11667– 11683, 2024

2024

[19] [19]

Efficiently classifying lung sounds through depthwise separable cnn models with fused stft and mfcc features.Diagnostics (Basel), 11(4):732, 2021

Shing-Yun Jung, Chia-Hung Liao, Yu-Sheng Wu, Shyan-Ming Yuan, and Chuen-Tsai Sun. Efficiently classifying lung sounds through depthwise separable cnn models with fused stft and mfcc features.Diagnostics (Basel), 11(4):732, 2021

2021

[20] [20]

Respiratory sound classification for crackles, wheezes, and rhonchi in the clinical field using deep learning

Yoonjoo Kim, YunKyong Hyon, Sung Soo Jung, Sunju Lee, Geon Yoo, Chaeuk Chung, and Taeyoung Ha. Respiratory sound classification for crackles, wheezes, and rhonchi in the clinical field using deep learning. Scientific Reports, 11(1):17186, 2021

2021

[21] [21]

Classification of respiratory sounds using improved convolutional recurrent neural network.Computers & Electrical Engineering, 94:107367, 2021

Naoki Asatani, Tohru Kamiya, Shingo Mabu, and Shoji Kido. Classification of respiratory sounds using improved convolutional recurrent neural network.Computers & Electrical Engineering, 94:107367, 2021

2021

[22] [22]

Zulfiqar et al

F. Zulfiqar et al. Abnormal respiratory sounds classification using deep cnn.Frontiers in Medicine (Lausanne), 8:714811, 2021

2021

[23] [23]

Katsaggelos, and Nicos Maglaveras

Georgios Petmezas, Grigorios-Aris Cheimariotis, Leandros Stefanopoulos, Bruno Rocha, Rui Pedro Paiva, Aggelos K. Katsaggelos, and Nicos Maglaveras. Automated lung sound classification using a hybrid cnn-lstm network and focal loss function.Sensors, 22(3), 2022

2022

[24] [24]

erformance evaluation of lung sounds classification using deep learning under variable parameters.EURASIP Journal on Advances in Signal Processing, 2024:51, 2024

Zhaoping Wang and Zhiqiang Sun. erformance evaluation of lung sounds classification using deep learning under variable parameters.EURASIP Journal on Advances in Signal Processing, 2024:51, 2024

2024

[25] [25]

Enhanced respiratory sound classification using deep learning and multi-channel auscultation.Journal of Clinical Medicine, 14(15):5437, 2025

Yeonkyeong Kim, Kyu Bom Kim, Ah Young Leem, Kyuseok Kim, and Su Hwan Lee. Enhanced respiratory sound classification using deep learning and multi-channel auscultation.Journal of Clinical Medicine, 14(15):5437, 2025

2025

[26] [26]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. 18 APREPRINT- JUNE10, 2026

2016

[27] [27]

Wide Residual Networks

Sergey Zagoruyko and Nikos Komodakis. Wide residual networks.arXiv preprint arXiv:1605.07146, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[28] [28]

Densely connected convolutional networks

Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. Densely connected convolutional networks. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 4700–4708, 2017

2017

[29] [29]

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[30] [30]

Neural Machine Translation by Jointly Learning to Align and Translate

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate.arXiv preprint arXiv:1409.0473, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[31] [31]

An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling

Shaojie Bai, J Zico Kolter, and Vladlen Koltun. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling.arXiv preprint arXiv:1803.01271, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[32] [32]

Lung sound classification using cepstral-based statistical features.Computers in Biology and Medicine, 75:118–129, 2016

Nandini Sengupta, Md Sahidullah, and Goutam Saha. Lung sound classification using cepstral-based statistical features.Computers in Biology and Medicine, 75:118–129, 2016

2016

[33] [33]

A comparative study of the svm and k-nn machine learning algorithms for the diagnosis of respiratory pathologies using pulmonary acoustic signals.BMC Bioinformatics, 15(223), 2014

Rajkumar Palaniappan, Kenneth Sundaraj, and Sebastian Sundaraj. A comparative study of the svm and k-nn machine learning algorithms for the diagnosis of respiratory pathologies using pulmonary acoustic signals.BMC Bioinformatics, 15(223), 2014

2014

[34] [34]

Periyasamy, and Ajoy K

Nishi Shahnaj Haider, Bikesh Kumar Singh, R. Periyasamy, and Ajoy K. Behera. Respiratory sound based classification of chronic obstructive pulmonary disease: a risk stratification approach in machine learning paradigm. Journal of Medical Systems, 43(255), 2019

2019

[35] [35]

Multi-channel lung sound classification with convolutional recurrent neural networks.Computers in Biology and Medicine, 122:103831, 2020

Elmar Messner, Melanie Fediuk, Paul Swatek, Stefan Scheidl, Freyja-Maria Smolle-Jüttner, Horst Olschewski, and Franz Pernkopf. Multi-channel lung sound classification with convolutional recurrent neural networks.Computers in Biology and Medicine, 122:103831, 2020

2020

[36] [36]

CRC Press, Florida, 1995

Noam Gavriely.Breath Sounds Methodology. CRC Press, Florida, 1995

1995

[37] [37]

AR Nath and LH Capel. Significant differences in flow standardised breath sound spectra in patients with chronic obstructive pulmonary disease, stable asthma, and healthy lungs.Thorax, 29(2):223–227, 1974

1974

[38] [38]

Charbonneau, J

ARA Sovijarvi, LP Malmberg, G. Charbonneau, J. Vanderschoot, F. Dalmasso, C. Sacco, M. Rossi, and JE Earis. Significant differences in flow standardised breath sound spectra in patients with chronic obstructive pulmonary disease, stable asthma, and healthy lungs.European Respiratory Review, 10(77):591–596, 2000

2000

[39] [39]

Douros and I

K. Douros and I. Grammeniatis, V .and Loukou.Breath Sounds. Priftis, K. and Hadjileontiadis, L.and Everard M. (eds) Springer Cham, 2018

2018

[40] [40]

Crackles in patients with fibrosing alveolitis, bronchiectasis, copd, and heart failure.Chest, 99:1076–1083, 1991

P Piirilä, AR Sovijärvi, T Kaisla, HM Rajala, and Katila T. Crackles in patients with fibrosing alveolitis, bronchiectasis, copd, and heart failure.Chest, 99:1076–1083, 1991

1991

[41] [41]

Sen and Y .P

I. Sen and Y .P. Kahya. A multi-channel device for respiratory sound data acquisition and transient detection. In 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference, pages 6658–6661, 2005

2005

[42] [42]

Davis and P

S. Davis and P. Mermelstein. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences.IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4):357–366, 1980

1980

[43] [43]

Mel frequency cepstral coefficients for music modeling

Beth Logan et al. Mel frequency cepstral coefficients for music modeling. InIsmir, volume 270, page 11. Plymouth, MA, 2000

2000

[44] [44]

Hybrid dual-input model for respiratory sound classification with mel spectrogram and waveform.IEEE Access, 13:80971–80980, 2025

Fan Wang, Jiacheng Gao, Ying Wang, Guoheng Huang, and Xiaochen Yuan. Hybrid dual-input model for respiratory sound classification with mel spectrogram and waveform.IEEE Access, 13:80971–80980, 2025

2025

[45] [45]

Transferring cross-corpus knowledge: An investigation on data augmentation for heart sound classification

Tomoya Koike, Kun Qian, Björn W Schuller, and Yoshiharu Yamamoto. Transferring cross-corpus knowledge: An investigation on data augmentation for heart sound classification. In2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), pages 1976–1979, 2021

1976

[46] [46]

Ipek Sen, Murat Saraclar, and Yasemin P. Kahya. Exploring an optimal vector autoregressive model for multi- channel pulmonary sound data.Computer Methods and Programs in Biomedicine, 111(3):550–560, 2013

2013

[47] [47]

Human action recognition in videos with articulated pose information by deep networks.Pattern Analysis and Applications, 22:1307–1318, 2019

Miguel Farrajota, João MF Rodrigues, and JM Hans du Buf. Human action recognition in videos with articulated pose information by deep networks.Pattern Analysis and Applications, 22:1307–1318, 2019

2019

[48] [48]

A comprehensive survey of deep learning for image captioning.ACM Computing Surveys (CsUR), 51(6):1–36, 2019

MD Zakir Hossain, Ferdous Sohel, Mohd Fairuz Shiratuddin, and Hamid Laga. A comprehensive survey of deep learning for image captioning.ACM Computing Surveys (CsUR), 51(6):1–36, 2019

2019

[49] [49]

Extracting low-dimensional psychological representations from convolutional neural networks.Cognitive Science, 47(1):e13226, 2023

Aditi Jha, Joshua C Peterson, and Thomas L Griffiths. Extracting low-dimensional psychological representations from convolutional neural networks.Cognitive Science, 47(1):e13226, 2023. 19 APREPRINT- JUNE10, 2026

2023

[50] [50]

Automated multi-class classification of skin lesions through deep convolutional neural network with dermoscopic images.Computerized medical imaging and graphics, 88:101843, 2021

Imran Iqbal, Muhammad Younus, Khuram Walayat, Mohib Ullah Kakar, and Jinwen Ma. Automated multi-class classification of skin lesions through deep convolutional neural network with dermoscopic images.Computerized medical imaging and graphics, 88:101843, 2021

2021

[51] [51]

Measuring overfitting in convolutional neural networks using adversarial perturbations and label noise.arXiv preprint arXiv:2209.13382, 2022

Svetlana Pavlitskaya, Joël Oswald, and J Marius Zöllner. Measuring overfitting in convolutional neural networks using adversarial perturbations and label noise.arXiv preprint arXiv:2209.13382, 2022

work page arXiv 2022

[52] [52]

bert- like

Shamane Siriwardhana, Andrew Reis, Rivindu Weerasekera, and Suranga Nanayakkara. Jointly fine-tuning" bert- like" self supervised models to improve multimodal speech emotion recognition.arXiv preprint arXiv:2008.06682, 2020

work page arXiv 2008

[53] [53]

mixup: Beyond Empirical Risk Minimization

Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. mixup: Beyond empirical risk minimization.arXiv preprint arXiv:1710.09412, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[54] [54]

Dalmasso, J

Anssi Sovijärvi, F. Dalmasso, J. Vanderschoot, Leo Malmberg, G. Righini, and S.A.T. Stoneman. Definition of terms for applications of respiratory sounds.Eur Respir Rev, 10(77):597–610, 2000

2000

[55] [55]

American Thoracic Society, Updated nomenclature for membership relation.ATS News, 3:5–6, 1977. 20

1977