pith. sign in

arxiv: 2606.10972 · v1 · pith:DHDWOHANnew · submitted 2026-06-09 · 📡 eess.AS · cs.AI

Optimizing 2D Input Representations and Sub-phase Fusion Strategies for Differential Diagnosis of Asthma and COPD Using CNN- and GRU-Based Networks

Pith reviewed 2026-06-27 11:41 UTC · model grok-4.3

classification 📡 eess.AS cs.AI
keywords asthmaCOPDMFCCrespiratory soundsCNNGRUdifferential diagnosispulmonary sound classification
0
0 comments X

The pith

MFCC matrices with adaptive-length windowing and direct concatenation achieve the best F1 scores for distinguishing asthma from COPD in respiratory sound analysis.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests different 2D representations of pulmonary sounds to classify asthma versus COPD using CNN and GRU networks. It compares MFCC matrices against log-mel spectrograms and VAR models while addressing variable cycle lengths through adaptive-length windowing instead of simple trimming or padding. Sub-phase features are extracted via CNNs and fused by direct concatenation, GRU, or GRU with attention, with performance measured at both cycle and subject levels. MFCC with thirteen coefficients and specific time resolutions, paired with adaptive windowing and simple concatenation, produces the highest scores while augmentation and complex fusions do not help. A reader would care because reliable sound-based differentiation could support clinical decisions when symptoms overlap.

Core claim

MFCC matrices with thirteen coefficients and 64-point time resolution per sub-phase representation, processed by adaptive-length windowing followed by direct feature concatenation, reach a cycle-based F1-score of 0.877; MFCC with thirteen coefficients and 256-point time resolution per full-cycle representation reach a subject-based F1-score of 0.855. These outperform log-mel spectrograms and the VAR model. Sophisticated fusion strategies such as GRU with attention do not improve results over direct concatenation, and data augmentation techniques overall degrade performance, underscoring the value of authentic recordings.

What carries the argument

Adaptive-length windowing that standardizes the temporal dimensions of variable-length respiratory cycle representations before CNN feature extraction from sub-phases and their fusion.

If this is right

  • MFCC matrices outperform both log-mel spectrograms and the VAR model for asthma-COPD differentiation.
  • Direct concatenation of sub-phase features is sufficient and superior to GRU-based or attention-based fusion strategies.
  • Data augmentation methods, even mixup, reduce overall model performance compared with unaugmented training.
  • Optimized spectral and temporal dimensions of MFCC inputs matter more than the choice of fusion architecture.
  • Subject-based evaluation that aggregates multiple cycles yields slightly lower but still high F1 scores than cycle-based evaluation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same windowing and MFCC optimization steps could be applied to classification tasks involving other lung conditions that produce variable cycle lengths.
  • Preference for simple concatenation suggests that future systems could prioritize low-complexity inference suitable for portable devices.
  • The finding that authentic data beats augmentation implies a need for larger curated sound databases rather than synthetic expansion.
  • If the 13-coefficient MFCC advantage holds across sites, clinical protocols might standardize on cepstral features for initial sound triage.

Load-bearing premise

The respiratory sound recordings are representative of real-world patient variability and free of artifacts or label noise that would favor one input representation over others.

What would settle it

An independent test set of asthma and COPD respiratory recordings collected under varied clinical conditions that shows no F1 advantage for the reported MFCC configurations with adaptive windowing would falsify the optimality claim.

Figures

Figures reproduced from arXiv: 2606.10972 by Elena Battini Sonmez, Ipek Sen, Ozgur Ozdemir.

Figure 1
Figure 1. Figure 1: Overall structure of the networks. *FE: Feature extractor [PITH_FULL_IMAGE:figures/full_fig_p009_1.png] view at source ↗
read the original abstract

This study aims to explore the performance of the VAR model in comparison with mel-frequency cepstral coefficient (MFCC) matrices and log-mel spectrograms using deep learning. In pulmonary sound classification, spectrogram-based representations suffer from inconsistent temporal dimensions due to varying respiratory cycle durations. Along with traditional trimming/zero-padding, adaptive-length windowing was presented to fix their temporal dimensions. Their spectral and temporal dimensions were optimized by testing a range of parameters. Different convolutional neural network (CNN) architectures were employed to extract features from the two-dimensional representations obtained over the sub-phases. The extracted sub-phase features were then fused using various strategies including direct concatenation, gated recurrent unit (GRU) network and GRU with attention mechanism. Model performances were assessed through respiratory cycle-based evaluation and subject-based evaluation comprising multiple respiratory cycles. Several data augmentation techniques were also studied to cope with limitations in data size. The best cycle-based F1-score (0.877) was obtained using the MFCC matrices with thirteen coefficients and 64-point time resolution per sub-phase representation followed by direct feature concatenation, and the best subject-based F1-score (0.855) was obtained using the MFCC matrices with thirteen coefficients and 256-point time resolution per full-cycle representation, both obtained by adaptive-length windowing. Augmentation degraded the performance of models overall, yet mixup augmentation was the best among the methods tested. MFCC outperformed log-mel spectrogram and VAR model in differentiation of asthma and COPD. Sophisticated fusion strategies did not improve the diagnosis. Augmentation did not contribute, demonstrating the significance of authentic data in pulmonary sound studies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper compares MFCC matrices, log-mel spectrograms, and VAR representations as 2D inputs to CNN-based networks for binary classification of asthma versus COPD from respiratory sounds. It introduces adaptive-length windowing to normalize variable cycle durations (as an alternative to trimming or zero-padding), optimizes spectral/temporal dimensions and sub-phase fusion strategies (direct concatenation, GRU, GRU+attention), evaluates both cycle-level and subject-level F1, and tests several augmentation methods. The headline empirical result is that MFCC with 13 coefficients plus adaptive windowing at 64-point sub-phase or 256-point full-cycle resolution, followed by direct concatenation, yields the highest scores (cycle-based F1 0.877, subject-based F1 0.855), that MFCC outperforms the other representations, that sophisticated fusion adds no benefit, and that augmentation harms performance.

Significance. If the ranking is robust, the work supplies concrete, actionable guidance on input-representation choices and windowing strategies for deep-learning pipelines in pulmonary audio, while reinforcing that authentic data may matter more than augmentation in this domain.

major comments (2)
  1. [Abstract and Results] Abstract and Results sections: the superiority claims rest on single reported F1 values per configuration (0.877 / 0.855) with no dataset size, number of subjects or recordings, cross-validation scheme, statistical tests, or error bars supplied; this absence makes it impossible to judge whether the MFCC advantage is reliable or merely the outcome of an unreported post-hoc search.
  2. [Methods and Discussion] Methods and Discussion: the finding that all augmentation techniques degraded performance is presented without any quantification of recording artifacts, device variability, label noise, or inter-annotator agreement; given that the central claim is an empirical ranking among representations, the lack of controls for systematic biases that could be exploited more readily by MFCC than by log-mel or VAR constitutes a load-bearing gap.
minor comments (1)
  1. [Abstract] The acronym VAR is introduced in the abstract without an immediate definition or reference; a brief expansion on first use would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and indicate where revisions will be made to improve clarity and completeness.

read point-by-point responses
  1. Referee: [Abstract and Results] Abstract and Results sections: the superiority claims rest on single reported F1 values per configuration (0.877 / 0.855) with no dataset size, number of subjects or recordings, cross-validation scheme, statistical tests, or error bars supplied; this absence makes it impossible to judge whether the MFCC advantage is reliable or merely the outcome of an unreported post-hoc search.

    Authors: We agree that the abstract and results presentation would be strengthened by including these details. The Methods section describes the dataset and evaluation protocol, but we will revise the Results section to explicitly report the number of subjects and recordings, the cross-validation scheme (subject-independent splits), and add statistical tests or confidence intervals for the reported F1 scores. This will allow better assessment of result reliability. revision: yes

  2. Referee: [Methods and Discussion] Methods and Discussion: the finding that all augmentation techniques degraded performance is presented without any quantification of recording artifacts, device variability, label noise, or inter-annotator agreement; given that the central claim is an empirical ranking among representations, the lack of controls for systematic biases that could be exploited more readily by MFCC than by log-mel or VAR constitutes a load-bearing gap.

    Authors: This observation is correct and highlights a limitation in the current discussion. The manuscript does not quantify these factors. In the revised version we will expand the Discussion to describe known dataset characteristics (collection devices, potential variability) and acknowledge the absence of inter-annotator agreement metrics as a limitation. We will also note that the empirical result on augmentation still stands but requires this additional context. revision: partial

Circularity Check

0 steps flagged

No circularity: purely empirical ranking of input representations on held-out data

full rationale

The paper contains no equations, derivations, or load-bearing self-citations. All reported results (F1 scores for MFCC vs. log-mel vs. VAR, different time resolutions, fusion strategies, and augmentation) are obtained by direct experimental comparison of models trained and evaluated on respiratory sound recordings. Performance numbers are not forced by construction from fitted parameters or prior self-citations; they reflect empirical outcomes on held-out cycles and subjects. This is the standard non-circular case for an optimization study.

Axiom & Free-Parameter Ledger

3 free parameters · 2 axioms · 0 invented entities

The central performance claims rest on empirical hyperparameter search over MFCC coefficient count and time resolution plus the assumption that the chosen dataset split and evaluation protocol (cycle vs subject) generalize; no new entities are postulated.

free parameters (3)
  • MFCC coefficient count
    Fixed at 13 after testing a range; directly affects the reported best F1 scores.
  • time resolution per sub-phase or cycle
    Tested values include 64 and 256 points; chosen values produce the headline F1 numbers.
  • windowing strategy parameters
    Adaptive-length windowing parameters selected to equalize temporal dimensions.
axioms (2)
  • domain assumption Respiratory cycles can be segmented into sub-phases whose features are statistically independent enough for concatenation or GRU fusion to be meaningful.
    Invoked when sub-phase features are extracted and fused.
  • domain assumption The evaluation split separates cycles or subjects without leakage from the same recording session.
    Required for the reported cycle-based and subject-based F1 scores to be unbiased.

pith-pipeline@v0.9.1-grok · 5843 in / 1647 out tokens · 27896 ms · 2026-06-27T11:41:14.718144+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

55 extracted references · 7 canonical work pages · 5 internal anchors

  1. [1]

    Tinkelman, David B

    David G. Tinkelman, David B. Price, Robert J. Nordyke, R. J. Halbert, Sharon Isonaka, Dmitry Nonikov, Elizabeth F. Juniper, Daryl Freeman, Thomas Hausen, Mark L. Levy, Anders Ostrem, Thys van der Molen, and Constant P. van Schayck. Symptom-based questionnaire for differentiating copd and asthma.Respiration, 73(3):296–305, 2006

  2. [2]

    Aging and disability affect misdiagnosis of copd in elderly asthmatics: the sara study.Chest, 123(4):1066–1072, 2003

    Vincenzo Bellia, Salvatore Battaglia, Filippo Catalano, Nicola Scichilone, Raffaele Antonelli Incalzi, Claudio Imperiale, and Franco Rengo. Aging and disability affect misdiagnosis of copd in elderly asthmatics: the sara study.Chest, 123(4):1066–1072, 2003

  3. [3]

    Diagnosis of asthma and chronic obstructive pulmonary disease in general practice.The British Journal of General Practice, 46(404):193–197, 1996

    CP Van Schayck. Diagnosis of asthma and chronic obstructive pulmonary disease in general practice.The British Journal of General Practice, 46(404):193–197, 1996

  4. [4]

    van Wheel

    C. van Wheel. Underdiagnosis of asthma and copd: is the general practitioner to blame?Monaldi Archives for Chest Disease, 57(1):65–68, 2002

  5. [5]

    Characteristics of asthma among elderly adults in a sample of the general population.Chest, 100(4):935–942, 1991

    Benjamin Burrows, Robert A Barbee, Martha G Cline, Ronald J Knudson, and Michael D Lebowitz. Characteristics of asthma among elderly adults in a sample of the general population.Chest, 100(4):935–942, 1991. 17 APREPRINT- JUNE10, 2026

  6. [6]

    Standards for the diagnosis and care of patients with chronic obstructive pulmonary disease (copd) and asthma

    American Thoracic Society. Standards for the diagnosis and care of patients with chronic obstructive pulmonary disease (copd) and asthma. official statement.Am Rev Respir Dis, 136:225–224, 1987

  7. [7]

    Causes of misdiagnosis of chronic obstructive pulmonary disease: a systematic scoping review.Respiratory medicine, 129:63–84, 2017

    Stine Hangaard, Tina Helle, Carl Nielsen, and Ole K Hejlesen. Causes of misdiagnosis of chronic obstructive pulmonary disease: a systematic scoping review.Respiratory medicine, 129:63–84, 2017

  8. [8]

    Practical management problems of stable chronic obstructive pulmonary disease in the elderly.Current opinion in pulmonary medicine, 17:S43–S48, 2011

    Riccardo Pistelli, Letizia Ferrara, Clementina Misuraca, and Silvia Bustacchini. Practical management problems of stable chronic obstructive pulmonary disease in the elderly.Current opinion in pulmonary medicine, 17:S43–S48, 2011

  9. [9]

    LP Malmberg, L Pesu, and AR Sovijarvi. Significant differences in flow standardised breath sound spectra in patients with chronic obstructive pulmonary disease, stable asthma, and healthy lungs.Thorax, 50(12):1285–1291, 1995

  10. [10]

    The attractor recurrent neural network based on fuzzy functions: An effective model for the classification of lung abnormalities.Computers in Biology and Medicine, 84:124–136, 2017

    Mohammad Bagher Khodabakhshi and Mohammad Hassan Moradi. The attractor recurrent neural network based on fuzzy functions: An effective model for the classification of lung abnormalities.Computers in Biology and Medicine, 84:124–136, 2017

  11. [11]

    Ariful Islam, Irin Bandyopadhyaya, Parthasarathi Bhattacharyya, and Goutam Saha

    Md. Ariful Islam, Irin Bandyopadhyaya, Parthasarathi Bhattacharyya, and Goutam Saha. Classification of normal, asthma and copd subjects using multichannel lung sound signals. In2018 International Conference on Communication and Signal Processing (ICCSP), pages 290–294, 2018

  12. [12]

    Computerized lung sound based classification of asthma and chronic obstructive pulmonary disease (copd).Biocybernetics and Biomedical Engineering, 42(1):42–59, 2022

    Nishi Shahnaj Haider and AK Behera. Computerized lung sound based classification of asthma and chronic obstructive pulmonary disease (copd).Biocybernetics and Biomedical Engineering, 42(1):42–59, 2022

  13. [13]

    Behera, and Ranganath T

    Vaibhav Koshta, Bikesh Kumar Singh, Ajoy K. Behera, and Ranganath T. Ganga. Classification of asthma, copd and healthy lung sounds using fourier bessel series expansion in machine learning and deep learning paradigm. In 2023 11th International Conference on Intelligent Systems and Embedded Design (ISED), pages 1–6, 2023

  14. [14]

    Differential diagnosis of asthma and copd based on multivariate pulmonary sounds analysis.IEEE Trans Biomed Eng., 68(5):1601–1610, 2021

    I Sen, M Saraclar, and Kahya YP. Differential diagnosis of asthma and copd based on multivariate pulmonary sounds analysis.IEEE Trans Biomed Eng., 68(5):1601–1610, 2021

  15. [15]

    A dataset of lung sounds recorded from the chest wall using an electronic stethoscope, 2021

    Mohammad Fraiwan, Luay Fraiwan, Basheer Khassawneh, and Ali Ibnian. A dataset of lung sounds recorded from the chest wall using an electronic stethoscope, 2021

  16. [16]

    Srivastava, A.and Jain, R

    S. Srivastava, A.and Jain, R. Miranda, S. Patil, S. Pandya, and K. Kotecha. Deep learning based respiratory sound analysis for detection of chronic obstructive pulmonary disease.PeerJ. Computer science, 7(e369), 2021

  17. [17]

    Research on lung sound classification model based on dual-channel cnn-lstm algorithm.Biomedical Signal Processing and Control, 94:106257, 2024

    Yipeng Zhang, Qiong Huang, Wenhui Sun, Fenlan Chen, Dongmei Lin, and Fuming Chen. Research on lung sound classification model based on dual-channel cnn-lstm algorithm.Biomedical Signal Processing and Control, 94:106257, 2024

  18. [18]

    Shehab, Kamel K

    Sara A. Shehab, Kamel K. Mohammed, Ashraf Darwish, and Aboul Ella Hassanien. Deep learning and feature fusion-based lung sound recognition model to diagnoses the respiratory diseases.Soft Computing, 28:11667– 11683, 2024

  19. [19]

    Efficiently classifying lung sounds through depthwise separable cnn models with fused stft and mfcc features.Diagnostics (Basel), 11(4):732, 2021

    Shing-Yun Jung, Chia-Hung Liao, Yu-Sheng Wu, Shyan-Ming Yuan, and Chuen-Tsai Sun. Efficiently classifying lung sounds through depthwise separable cnn models with fused stft and mfcc features.Diagnostics (Basel), 11(4):732, 2021

  20. [20]

    Respiratory sound classification for crackles, wheezes, and rhonchi in the clinical field using deep learning

    Yoonjoo Kim, YunKyong Hyon, Sung Soo Jung, Sunju Lee, Geon Yoo, Chaeuk Chung, and Taeyoung Ha. Respiratory sound classification for crackles, wheezes, and rhonchi in the clinical field using deep learning. Scientific Reports, 11(1):17186, 2021

  21. [21]

    Classification of respiratory sounds using improved convolutional recurrent neural network.Computers & Electrical Engineering, 94:107367, 2021

    Naoki Asatani, Tohru Kamiya, Shingo Mabu, and Shoji Kido. Classification of respiratory sounds using improved convolutional recurrent neural network.Computers & Electrical Engineering, 94:107367, 2021

  22. [22]

    Zulfiqar et al

    F. Zulfiqar et al. Abnormal respiratory sounds classification using deep cnn.Frontiers in Medicine (Lausanne), 8:714811, 2021

  23. [23]

    Katsaggelos, and Nicos Maglaveras

    Georgios Petmezas, Grigorios-Aris Cheimariotis, Leandros Stefanopoulos, Bruno Rocha, Rui Pedro Paiva, Aggelos K. Katsaggelos, and Nicos Maglaveras. Automated lung sound classification using a hybrid cnn-lstm network and focal loss function.Sensors, 22(3), 2022

  24. [24]

    erformance evaluation of lung sounds classification using deep learning under variable parameters.EURASIP Journal on Advances in Signal Processing, 2024:51, 2024

    Zhaoping Wang and Zhiqiang Sun. erformance evaluation of lung sounds classification using deep learning under variable parameters.EURASIP Journal on Advances in Signal Processing, 2024:51, 2024

  25. [25]

    Enhanced respiratory sound classification using deep learning and multi-channel auscultation.Journal of Clinical Medicine, 14(15):5437, 2025

    Yeonkyeong Kim, Kyu Bom Kim, Ah Young Leem, Kyuseok Kim, and Su Hwan Lee. Enhanced respiratory sound classification using deep learning and multi-channel auscultation.Journal of Clinical Medicine, 14(15):5437, 2025

  26. [26]

    Deep residual learning for image recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. 18 APREPRINT- JUNE10, 2026

  27. [27]

    Wide Residual Networks

    Sergey Zagoruyko and Nikos Komodakis. Wide residual networks.arXiv preprint arXiv:1605.07146, 2016

  28. [28]

    Densely connected convolutional networks

    Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. Densely connected convolutional networks. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 4700–4708, 2017

  29. [29]

    Very Deep Convolutional Networks for Large-Scale Image Recognition

    Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014

  30. [30]

    Neural Machine Translation by Jointly Learning to Align and Translate

    Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate.arXiv preprint arXiv:1409.0473, 2014

  31. [31]

    An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling

    Shaojie Bai, J Zico Kolter, and Vladlen Koltun. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling.arXiv preprint arXiv:1803.01271, 2018

  32. [32]

    Lung sound classification using cepstral-based statistical features.Computers in Biology and Medicine, 75:118–129, 2016

    Nandini Sengupta, Md Sahidullah, and Goutam Saha. Lung sound classification using cepstral-based statistical features.Computers in Biology and Medicine, 75:118–129, 2016

  33. [33]

    A comparative study of the svm and k-nn machine learning algorithms for the diagnosis of respiratory pathologies using pulmonary acoustic signals.BMC Bioinformatics, 15(223), 2014

    Rajkumar Palaniappan, Kenneth Sundaraj, and Sebastian Sundaraj. A comparative study of the svm and k-nn machine learning algorithms for the diagnosis of respiratory pathologies using pulmonary acoustic signals.BMC Bioinformatics, 15(223), 2014

  34. [34]

    Periyasamy, and Ajoy K

    Nishi Shahnaj Haider, Bikesh Kumar Singh, R. Periyasamy, and Ajoy K. Behera. Respiratory sound based classification of chronic obstructive pulmonary disease: a risk stratification approach in machine learning paradigm. Journal of Medical Systems, 43(255), 2019

  35. [35]

    Multi-channel lung sound classification with convolutional recurrent neural networks.Computers in Biology and Medicine, 122:103831, 2020

    Elmar Messner, Melanie Fediuk, Paul Swatek, Stefan Scheidl, Freyja-Maria Smolle-Jüttner, Horst Olschewski, and Franz Pernkopf. Multi-channel lung sound classification with convolutional recurrent neural networks.Computers in Biology and Medicine, 122:103831, 2020

  36. [36]

    CRC Press, Florida, 1995

    Noam Gavriely.Breath Sounds Methodology. CRC Press, Florida, 1995

  37. [37]

    AR Nath and LH Capel. Significant differences in flow standardised breath sound spectra in patients with chronic obstructive pulmonary disease, stable asthma, and healthy lungs.Thorax, 29(2):223–227, 1974

  38. [38]

    Charbonneau, J

    ARA Sovijarvi, LP Malmberg, G. Charbonneau, J. Vanderschoot, F. Dalmasso, C. Sacco, M. Rossi, and JE Earis. Significant differences in flow standardised breath sound spectra in patients with chronic obstructive pulmonary disease, stable asthma, and healthy lungs.European Respiratory Review, 10(77):591–596, 2000

  39. [39]

    Douros and I

    K. Douros and I. Grammeniatis, V .and Loukou.Breath Sounds. Priftis, K. and Hadjileontiadis, L.and Everard M. (eds) Springer Cham, 2018

  40. [40]

    Crackles in patients with fibrosing alveolitis, bronchiectasis, copd, and heart failure.Chest, 99:1076–1083, 1991

    P Piirilä, AR Sovijärvi, T Kaisla, HM Rajala, and Katila T. Crackles in patients with fibrosing alveolitis, bronchiectasis, copd, and heart failure.Chest, 99:1076–1083, 1991

  41. [41]

    Sen and Y .P

    I. Sen and Y .P. Kahya. A multi-channel device for respiratory sound data acquisition and transient detection. In 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference, pages 6658–6661, 2005

  42. [42]

    Davis and P

    S. Davis and P. Mermelstein. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences.IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4):357–366, 1980

  43. [43]

    Mel frequency cepstral coefficients for music modeling

    Beth Logan et al. Mel frequency cepstral coefficients for music modeling. InIsmir, volume 270, page 11. Plymouth, MA, 2000

  44. [44]

    Hybrid dual-input model for respiratory sound classification with mel spectrogram and waveform.IEEE Access, 13:80971–80980, 2025

    Fan Wang, Jiacheng Gao, Ying Wang, Guoheng Huang, and Xiaochen Yuan. Hybrid dual-input model for respiratory sound classification with mel spectrogram and waveform.IEEE Access, 13:80971–80980, 2025

  45. [45]

    Transferring cross-corpus knowledge: An investigation on data augmentation for heart sound classification

    Tomoya Koike, Kun Qian, Björn W Schuller, and Yoshiharu Yamamoto. Transferring cross-corpus knowledge: An investigation on data augmentation for heart sound classification. In2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), pages 1976–1979, 2021

  46. [46]

    Ipek Sen, Murat Saraclar, and Yasemin P. Kahya. Exploring an optimal vector autoregressive model for multi- channel pulmonary sound data.Computer Methods and Programs in Biomedicine, 111(3):550–560, 2013

  47. [47]

    Human action recognition in videos with articulated pose information by deep networks.Pattern Analysis and Applications, 22:1307–1318, 2019

    Miguel Farrajota, João MF Rodrigues, and JM Hans du Buf. Human action recognition in videos with articulated pose information by deep networks.Pattern Analysis and Applications, 22:1307–1318, 2019

  48. [48]

    A comprehensive survey of deep learning for image captioning.ACM Computing Surveys (CsUR), 51(6):1–36, 2019

    MD Zakir Hossain, Ferdous Sohel, Mohd Fairuz Shiratuddin, and Hamid Laga. A comprehensive survey of deep learning for image captioning.ACM Computing Surveys (CsUR), 51(6):1–36, 2019

  49. [49]

    Extracting low-dimensional psychological representations from convolutional neural networks.Cognitive Science, 47(1):e13226, 2023

    Aditi Jha, Joshua C Peterson, and Thomas L Griffiths. Extracting low-dimensional psychological representations from convolutional neural networks.Cognitive Science, 47(1):e13226, 2023. 19 APREPRINT- JUNE10, 2026

  50. [50]

    Automated multi-class classification of skin lesions through deep convolutional neural network with dermoscopic images.Computerized medical imaging and graphics, 88:101843, 2021

    Imran Iqbal, Muhammad Younus, Khuram Walayat, Mohib Ullah Kakar, and Jinwen Ma. Automated multi-class classification of skin lesions through deep convolutional neural network with dermoscopic images.Computerized medical imaging and graphics, 88:101843, 2021

  51. [51]

    Measuring overfitting in convolutional neural networks using adversarial perturbations and label noise.arXiv preprint arXiv:2209.13382, 2022

    Svetlana Pavlitskaya, Joël Oswald, and J Marius Zöllner. Measuring overfitting in convolutional neural networks using adversarial perturbations and label noise.arXiv preprint arXiv:2209.13382, 2022

  52. [52]

    bert- like

    Shamane Siriwardhana, Andrew Reis, Rivindu Weerasekera, and Suranga Nanayakkara. Jointly fine-tuning" bert- like" self supervised models to improve multimodal speech emotion recognition.arXiv preprint arXiv:2008.06682, 2020

  53. [53]

    mixup: Beyond Empirical Risk Minimization

    Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. mixup: Beyond empirical risk minimization.arXiv preprint arXiv:1710.09412, 2017

  54. [54]

    Dalmasso, J

    Anssi Sovijärvi, F. Dalmasso, J. Vanderschoot, Leo Malmberg, G. Righini, and S.A.T. Stoneman. Definition of terms for applications of respiratory sounds.Eur Respir Rev, 10(77):597–610, 2000

  55. [55]

    American Thoracic Society, Updated nomenclature for membership relation.ATS News, 3:5–6, 1977. 20