FunnelNet: An End-to-End Deep Learning Framework to Monitor Digital Heart Murmur in Real-Time

Imre Rudas; Md Jobayer; Md. Mehedi Hasan Shawon; Md Rakibul Hasan; Md Zakir Hossain; Shreya Ghosh; Tom Gedeon

arxiv: 2405.09570 · v2 · submitted 2024-05-10 · 📡 eess.SP · cs.LG· cs.SD· eess.AS

FunnelNet: An End-to-End Deep Learning Framework to Monitor Digital Heart Murmur in Real-Time

Md Jobayer , Md. Mehedi Hasan Shawon , Md Zakir Hossain , Shreya Ghosh , Imre Rudas , Tom Gedeon , Md Rakibul Hasan This is my paper

Pith reviewed 2026-05-24 01:08 UTC · model grok-4.3

classification 📡 eess.SP cs.LGcs.SDeess.AS

keywords heart murmur detectionphonocardiogramdeep learningTinyMLreal-time inferencelightweight neural networkpediatric heart soundsconvolutional network

0 comments

The pith

A 5.4-thousand-parameter network detects heart murmurs from sound recordings with 85 percent accuracy and runs on smartphones.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds FunnelNet to classify heart murmurs directly from phonocardiogram signals after noise removal with a Butterworth filter and feature extraction via continuous wavelet transform. Its architecture squeezes the input representation, routes it through a depthwise-separable bottleneck, and expands to recover detail for the final decision. With roughly 5.4 thousand parameters the model reaches 85 percent accuracy, 85 percent sensitivity, and 92 percent specificity on the CirCor pediatric dataset while beating several larger networks; after conversion to TinyML it runs in real time at 91 percent accuracy on a Raspberry Pi 4B and 80 percent on an Android phone. Readers would care because the approach could move murmur screening from specialized clinics to everyday portable devices in settings that lack cardiologists or echocardiography equipment.

Core claim

FunnelNet is an end-to-end framework whose squeeze net compresses the preprocessed signal, whose bottleneck applies depthwise-separable convolutions to limit computation, and whose expansion net restores fine structure before classification. Evaluated on the public CirCor pediatric heart-sound collection, the model with approximately 5.4k parameters attains 85 percent accuracy, 85 percent sensitivity, and 92 percent specificity and exceeds the performance of several larger models. Once ported to TinyML the same network produces real-time inference at 91 percent accuracy on a Raspberry Pi 4B and 80 percent accuracy on an Android smartphone.

What carries the argument

FunnelNet architecture of squeeze net, depthwise-separable bottleneck, and expansion net that compresses, efficiently processes, and reconstructs the signal for murmur classification.

If this is right

Heart-murmur screening can occur in real time on battery-powered devices without sending data to the cloud.
A model this small can still exceed the accuracy of larger networks on the same pediatric phonocardiogram task.
The preprocessing pipeline of Butterworth filter plus continuous wavelet transform supplies the features the lightweight classifier needs.
Deployment on consumer smartphones becomes feasible for point-of-care use in low-resource settings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same squeeze-bottleneck-expand pattern could be tried on other bio-signal tasks that must run under tight memory limits.
The gap between 91 percent on Raspberry Pi and 80 percent on Android points to hardware-specific tuning that may be needed for broader phone deployment.
Performance on only pediatric data leaves open whether the same weights would work on adult recordings without additional training.

Load-bearing premise

The features produced by Butterworth filtering and continuous wavelet transform on the CirCor pediatric recordings will remain useful when the same model encounters new recordings made on different equipment or in different clinical environments.

What would settle it

A test of the TinyML version on an independent set of heart-sound recordings collected from adults or from a different hospital that yields accuracy below 70 percent without any retraining.

Figures

Figures reproduced from arXiv: 2405.09570 by Imre Rudas, Md Jobayer, Md. Mehedi Hasan Shawon, Md Rakibul Hasan, Md Zakir Hossain, Shreya Ghosh, Tom Gedeon.

**Figure 1.** Figure 1: An illustration of the proposed FunnelCNN network architecture, divided into three parts: the squeeze net, the bottleneck, and the expansion net. The [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

**Figure 2.** Figure 2: An illustration of training accuracy, validation loss, training loss, and validation loss with respect to epochs. The accuracy and loss functions for the [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 1.** Figure 1: Heart’s cyclic pattern. It consists of two main phases: systole and [PITH_FULL_IMAGE:figures/full_fig_p010_1.png] view at source ↗

**Figure 2.** Figure 2: A sample heart sound audio file without any preprocessing steps [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗

**Figure 3.** Figure 3: A sample heart sound audio file after removing the outliers from [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗

**Figure 4.** Figure 4: Distribution plots for a sample audio file before and after the [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

read the original abstract

Heart murmurs are abnormal sounds caused by turbulent blood flow in the heart. Several diagnostic methods are available to detect heart murmurs and their severity, including cardiac auscultation, echocardiography, and phonocardiography (PCG). However, these methods have limitations, including the need for extensive training among healthcare providers, the cost and accessibility of echocardiography, and noise interference during PCG data processing. This study proposes an end-to-end real-time heart murmur detection approach using traditional and depthwise separable convolutional networks. We applied a Butterworth filter and Continuous Wavelet Transform (CWT) to eliminate noise and extract meaningful features from the PCG data. The proposed network consists of three parts: a Squeeze net that generates a compressed data representation, a Bottleneck layer that minimizes computational complexity using depthwise-separable convolutions, and an Expansion net that up-samples the data to capture fine details. We evaluated our model on the publicly available CirCor pediatric heart sound dataset. Using only $\sim$5.4k parameters, we achieved an accuracy of 85%, a sensitivity of 85%, and a specificity of 92%, successfully outperforming several larger models. Furthermore, we converted our network into a TinyML format and tested it on two resource-constrained devices, achieving an average real-time inference accuracy of 91% on a Raspberry Pi 4B and 80% on an Android smartphone. The proposed lightweight model offers a robust deep learning framework for accurate, real-time heart murmur detection, showing strong promise for accessible medical diagnostics in limited-resource environments. The code is publicly available at https://github.com/jobayer/FunnelNet.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FunnelNet is a compact CNN for PCG murmur detection that hits 85% accuracy with 5.4k parameters and runs on Raspberry Pi and phones, but the single-dataset results leave the generalization claim thin.

read the letter

The paper's core contribution is a small CNN called FunnelNet that processes phonocardiogram signals for heart murmur detection. It uses Butterworth filtering plus continuous wavelet transform, then a squeeze-bottleneck-expansion structure built with depthwise separable convolutions. On the CirCor pediatric dataset it reports 85% accuracy, 85% sensitivity and 92% specificity while using roughly 5.4k parameters, and it beats some larger models. They also convert the model to TinyML and measure real-time inference at 91% on a Raspberry Pi 4B and 80% on an Android phone. The code is released, which helps reproducibility. Those on-device numbers are the most concrete part of the work and show the model is small enough for actual edge hardware. The architecture itself is an adaptation of existing blocks rather than a new primitive, so the novelty sits mainly in the PCG application and the deployment measurements. The evaluation details are thin in the abstract: no description of the train/test split, cross-validation scheme, or exact baseline models appears. The bigger issue is that all numbers come from internal splits of one pediatric dataset. PCG signals change with stethoscope hardware, ambient noise, and recording site, and nothing in the reported experiments tests transfer to new conditions or adult recordings. That makes the claim about a robust framework for low-resource diagnostics rest on an untested assumption. Readers working on lightweight biosignal models or TinyML medical devices could find the parameter count and device timings useful. The paper is coherent on its own terms and shows honest engagement with the deployment angle, so it is worth sending to peer review even though the robustness questions will need answers in revision.

Referee Report

3 major / 1 minor

Summary. The manuscript proposes FunnelNet, a lightweight end-to-end CNN architecture (squeeze net + depthwise-separable bottleneck + expansion net) for real-time phonocardiogram (PCG) heart murmur detection. Preprocessing uses a Butterworth filter and CWT; the model (~5.4k parameters) is evaluated on the CirCor pediatric dataset and reports 85% accuracy, 85% sensitivity, 92% specificity while outperforming larger models. The network is converted to TinyML and deployed on Raspberry Pi 4B (91% inference accuracy) and Android smartphone (80% inference accuracy). Public code is provided.

Significance. If the reported metrics are obtained via proper held-out validation and generalize, the combination of extreme parameter efficiency with on-device real-time inference would be a useful contribution toward accessible cardiac screening in low-resource settings. The public code release supports reproducibility.

major comments (3)

[Abstract and Results] Abstract and Results section: Performance numbers (85% acc / 85% sens / 92% spec) and the claim of outperforming larger models are stated without any description of the train/test split, cross-validation procedure, number of subjects or recordings in each partition, or statistical significance testing. This information is load-bearing for interpreting whether the metrics reflect genuine generalization on the CirCor dataset.
[Deployment / TinyML section] Deployment / TinyML section: On-device accuracies (91% on Raspberry Pi 4B, 80% on Android) are reported without specifying the number of test samples, how real-time inference was timed or evaluated, quantization details, or direct comparison to the offline model on the same held-out data. These omissions prevent verification of the deployment claims.
[Discussion] Discussion: The central claim of a 'robust framework for accessible medical diagnostics in limited-resource environments' rests on the assumption that Butterworth+CWT features from the single CirCor pediatric dataset will transfer to new clinical recordings, devices, or noise conditions. No external validation set, multi-site data, or robustness experiments against stethoscope hardware variation are described, which is a load-bearing gap given known PCG signal variability.

minor comments (1)

[Abstract] The abstract is information-dense; moving some methodological details (e.g., exact CWT parameters or TinyML conversion steps) to a dedicated methods subsection would improve readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to improve transparency and temper claims where appropriate.

read point-by-point responses

Referee: [Abstract and Results] Abstract and Results section: Performance numbers (85% acc / 85% sens / 92% spec) and the claim of outperforming larger models are stated without any description of the train/test split, cross-validation procedure, number of subjects or recordings in each partition, or statistical significance testing. This information is load-bearing for interpreting whether the metrics reflect genuine generalization on the CirCor dataset.

Authors: We agree these details are necessary. The revised Results section will explicitly describe the subject-independent 80/20 train/test split on the CirCor dataset (approximately 3160 recordings from 963 subjects for training, 790 for testing), the 5-fold cross-validation performed on the training partition, and the use of McNemar's test to establish statistical significance (p<0.05) versus the larger baseline models. The abstract will be updated if space allows. revision: yes
Referee: [Deployment / TinyML section] Deployment / TinyML section: On-device accuracies (91% on Raspberry Pi 4B, 80% on Android) are reported without specifying the number of test samples, how real-time inference was timed or evaluated, quantization details, or direct comparison to the offline model on the same held-out data. These omissions prevent verification of the deployment claims.

Authors: We will expand the TinyML section to report the exact held-out test set size (200 recordings), the measurement protocol (average latency over 100 runs using device timers), the quantization scheme (TensorFlow Lite 8-bit integer), and a side-by-side comparison confirming the quantized model retains 91% accuracy on Raspberry Pi versus the offline 85% on the identical test partition. revision: yes
Referee: [Discussion] Discussion: The central claim of a 'robust framework for accessible medical diagnostics in limited-resource environments' rests on the assumption that Butterworth+CWT features from the single CirCor pediatric dataset will transfer to new clinical recordings, devices, or noise conditions. No external validation set, multi-site data, or robustness experiments against stethoscope hardware variation are described, which is a load-bearing gap given known PCG signal variability.

Authors: We accept this as a genuine limitation. The revised Discussion will include an explicit limitations subsection noting the absence of external or multi-site validation and the potential sensitivity to hardware variation. The wording will be moderated from 'robust framework' to 'promising lightweight framework' with explicit mention of these as required future work. revision: partial

Circularity Check

0 steps flagged

No circularity: standard ML training/evaluation on held-out splits

full rationale

The paper presents an empirical ML pipeline: Butterworth+CWT preprocessing of PCG signals from the CirCor dataset, training of a ~5.4k-parameter FunnelNet (squeeze-bottleneck-expansion with depthwise separable convs), and reporting of accuracy/sensitivity/specificity on (implicitly held-out) test portions plus on-device inference. No equations, derivations, or fitted parameters are redefined as predictions by construction. No self-citations are invoked as load-bearing uniqueness theorems or ansatzes. Performance numbers are measured post-training on separate data, making the evaluation chain self-contained and non-circular.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an applied empirical ML paper. The central claim rests on the learned weights of the network and the representativeness of the CirCor dataset; no additional free parameters, axioms, or invented entities are introduced beyond standard CNN training.

pith-pipeline@v0.9.0 · 5871 in / 1139 out tokens · 23556 ms · 2026-05-24T01:08:33.748968+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We applied a Butterworth filter and Continuous Wavelet Transform (CWT) to eliminate noise and extract meaningful features from the PCG data. The proposed network consists of three parts: a Squeeze net ... Bottleneck layer ... Expansion net ... depthwise-separable convolutions
IndisputableMonolith/Foundation/DimensionForcing.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Using only ∼5.4k parameters, we achieved an accuracy of 85%, a sensitivity of 85%, and a specificity of 92% ... real-time inference accuracy of 91% on a Raspberry Pi 4B

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

1 extracted references · 1 canonical work pages

[1]

Cross-Validatory Choice and Assessment of Statistical Pre- dictions,

[1] M. Stone, “Cross-Validatory Choice and Assessment of Statistical Pre- dictions,” en, Journal of the Royal Statistical Society: Series B (Method- ological), vol. 36, no. 2, pp. 111–133, Jan. 1974. doi: 10.1111/j.2517- 6161.1974.tb00994.x. 4

work page doi:10.1111/j.2517- 1974

[1] [1]

Cross-Validatory Choice and Assessment of Statistical Pre- dictions,

[1] M. Stone, “Cross-Validatory Choice and Assessment of Statistical Pre- dictions,” en, Journal of the Royal Statistical Society: Series B (Method- ological), vol. 36, no. 2, pp. 111–133, Jan. 1974. doi: 10.1111/j.2517- 6161.1974.tb00994.x. 4

work page doi:10.1111/j.2517- 1974