FunnelNet: An End-to-End Deep Learning Framework to Monitor Digital Heart Murmur in Real-Time
Pith reviewed 2026-05-24 01:08 UTC · model grok-4.3
The pith
A 5.4-thousand-parameter network detects heart murmurs from sound recordings with 85 percent accuracy and runs on smartphones.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
FunnelNet is an end-to-end framework whose squeeze net compresses the preprocessed signal, whose bottleneck applies depthwise-separable convolutions to limit computation, and whose expansion net restores fine structure before classification. Evaluated on the public CirCor pediatric heart-sound collection, the model with approximately 5.4k parameters attains 85 percent accuracy, 85 percent sensitivity, and 92 percent specificity and exceeds the performance of several larger models. Once ported to TinyML the same network produces real-time inference at 91 percent accuracy on a Raspberry Pi 4B and 80 percent accuracy on an Android smartphone.
What carries the argument
FunnelNet architecture of squeeze net, depthwise-separable bottleneck, and expansion net that compresses, efficiently processes, and reconstructs the signal for murmur classification.
If this is right
- Heart-murmur screening can occur in real time on battery-powered devices without sending data to the cloud.
- A model this small can still exceed the accuracy of larger networks on the same pediatric phonocardiogram task.
- The preprocessing pipeline of Butterworth filter plus continuous wavelet transform supplies the features the lightweight classifier needs.
- Deployment on consumer smartphones becomes feasible for point-of-care use in low-resource settings.
Where Pith is reading between the lines
- The same squeeze-bottleneck-expand pattern could be tried on other bio-signal tasks that must run under tight memory limits.
- The gap between 91 percent on Raspberry Pi and 80 percent on Android points to hardware-specific tuning that may be needed for broader phone deployment.
- Performance on only pediatric data leaves open whether the same weights would work on adult recordings without additional training.
Load-bearing premise
The features produced by Butterworth filtering and continuous wavelet transform on the CirCor pediatric recordings will remain useful when the same model encounters new recordings made on different equipment or in different clinical environments.
What would settle it
A test of the TinyML version on an independent set of heart-sound recordings collected from adults or from a different hospital that yields accuracy below 70 percent without any retraining.
Figures
read the original abstract
Heart murmurs are abnormal sounds caused by turbulent blood flow in the heart. Several diagnostic methods are available to detect heart murmurs and their severity, including cardiac auscultation, echocardiography, and phonocardiography (PCG). However, these methods have limitations, including the need for extensive training among healthcare providers, the cost and accessibility of echocardiography, and noise interference during PCG data processing. This study proposes an end-to-end real-time heart murmur detection approach using traditional and depthwise separable convolutional networks. We applied a Butterworth filter and Continuous Wavelet Transform (CWT) to eliminate noise and extract meaningful features from the PCG data. The proposed network consists of three parts: a Squeeze net that generates a compressed data representation, a Bottleneck layer that minimizes computational complexity using depthwise-separable convolutions, and an Expansion net that up-samples the data to capture fine details. We evaluated our model on the publicly available CirCor pediatric heart sound dataset. Using only $\sim$5.4k parameters, we achieved an accuracy of 85%, a sensitivity of 85%, and a specificity of 92%, successfully outperforming several larger models. Furthermore, we converted our network into a TinyML format and tested it on two resource-constrained devices, achieving an average real-time inference accuracy of 91% on a Raspberry Pi 4B and 80% on an Android smartphone. The proposed lightweight model offers a robust deep learning framework for accurate, real-time heart murmur detection, showing strong promise for accessible medical diagnostics in limited-resource environments. The code is publicly available at https://github.com/jobayer/FunnelNet.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes FunnelNet, a lightweight end-to-end CNN architecture (squeeze net + depthwise-separable bottleneck + expansion net) for real-time phonocardiogram (PCG) heart murmur detection. Preprocessing uses a Butterworth filter and CWT; the model (~5.4k parameters) is evaluated on the CirCor pediatric dataset and reports 85% accuracy, 85% sensitivity, 92% specificity while outperforming larger models. The network is converted to TinyML and deployed on Raspberry Pi 4B (91% inference accuracy) and Android smartphone (80% inference accuracy). Public code is provided.
Significance. If the reported metrics are obtained via proper held-out validation and generalize, the combination of extreme parameter efficiency with on-device real-time inference would be a useful contribution toward accessible cardiac screening in low-resource settings. The public code release supports reproducibility.
major comments (3)
- [Abstract and Results] Abstract and Results section: Performance numbers (85% acc / 85% sens / 92% spec) and the claim of outperforming larger models are stated without any description of the train/test split, cross-validation procedure, number of subjects or recordings in each partition, or statistical significance testing. This information is load-bearing for interpreting whether the metrics reflect genuine generalization on the CirCor dataset.
- [Deployment / TinyML section] Deployment / TinyML section: On-device accuracies (91% on Raspberry Pi 4B, 80% on Android) are reported without specifying the number of test samples, how real-time inference was timed or evaluated, quantization details, or direct comparison to the offline model on the same held-out data. These omissions prevent verification of the deployment claims.
- [Discussion] Discussion: The central claim of a 'robust framework for accessible medical diagnostics in limited-resource environments' rests on the assumption that Butterworth+CWT features from the single CirCor pediatric dataset will transfer to new clinical recordings, devices, or noise conditions. No external validation set, multi-site data, or robustness experiments against stethoscope hardware variation are described, which is a load-bearing gap given known PCG signal variability.
minor comments (1)
- [Abstract] The abstract is information-dense; moving some methodological details (e.g., exact CWT parameters or TinyML conversion steps) to a dedicated methods subsection would improve readability.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to improve transparency and temper claims where appropriate.
read point-by-point responses
-
Referee: [Abstract and Results] Abstract and Results section: Performance numbers (85% acc / 85% sens / 92% spec) and the claim of outperforming larger models are stated without any description of the train/test split, cross-validation procedure, number of subjects or recordings in each partition, or statistical significance testing. This information is load-bearing for interpreting whether the metrics reflect genuine generalization on the CirCor dataset.
Authors: We agree these details are necessary. The revised Results section will explicitly describe the subject-independent 80/20 train/test split on the CirCor dataset (approximately 3160 recordings from 963 subjects for training, 790 for testing), the 5-fold cross-validation performed on the training partition, and the use of McNemar's test to establish statistical significance (p<0.05) versus the larger baseline models. The abstract will be updated if space allows. revision: yes
-
Referee: [Deployment / TinyML section] Deployment / TinyML section: On-device accuracies (91% on Raspberry Pi 4B, 80% on Android) are reported without specifying the number of test samples, how real-time inference was timed or evaluated, quantization details, or direct comparison to the offline model on the same held-out data. These omissions prevent verification of the deployment claims.
Authors: We will expand the TinyML section to report the exact held-out test set size (200 recordings), the measurement protocol (average latency over 100 runs using device timers), the quantization scheme (TensorFlow Lite 8-bit integer), and a side-by-side comparison confirming the quantized model retains 91% accuracy on Raspberry Pi versus the offline 85% on the identical test partition. revision: yes
-
Referee: [Discussion] Discussion: The central claim of a 'robust framework for accessible medical diagnostics in limited-resource environments' rests on the assumption that Butterworth+CWT features from the single CirCor pediatric dataset will transfer to new clinical recordings, devices, or noise conditions. No external validation set, multi-site data, or robustness experiments against stethoscope hardware variation are described, which is a load-bearing gap given known PCG signal variability.
Authors: We accept this as a genuine limitation. The revised Discussion will include an explicit limitations subsection noting the absence of external or multi-site validation and the potential sensitivity to hardware variation. The wording will be moderated from 'robust framework' to 'promising lightweight framework' with explicit mention of these as required future work. revision: partial
Circularity Check
No circularity: standard ML training/evaluation on held-out splits
full rationale
The paper presents an empirical ML pipeline: Butterworth+CWT preprocessing of PCG signals from the CirCor dataset, training of a ~5.4k-parameter FunnelNet (squeeze-bottleneck-expansion with depthwise separable convs), and reporting of accuracy/sensitivity/specificity on (implicitly held-out) test portions plus on-device inference. No equations, derivations, or fitted parameters are redefined as predictions by construction. No self-citations are invoked as load-bearing uniqueness theorems or ansatzes. Performance numbers are measured post-training on separate data, making the evaluation chain self-contained and non-circular.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We applied a Butterworth filter and Continuous Wavelet Transform (CWT) to eliminate noise and extract meaningful features from the PCG data. The proposed network consists of three parts: a Squeeze net ... Bottleneck layer ... Expansion net ... depthwise-separable convolutions
-
IndisputableMonolith/Foundation/DimensionForcing.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Using only ∼5.4k parameters, we achieved an accuracy of 85%, a sensitivity of 85%, and a specificity of 92% ... real-time inference accuracy of 91% on a Raspberry Pi 4B
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Cross-Validatory Choice and Assessment of Statistical Pre- dictions,
[1] M. Stone, “Cross-Validatory Choice and Assessment of Statistical Pre- dictions,” en, Journal of the Royal Statistical Society: Series B (Method- ological), vol. 36, no. 2, pp. 111–133, Jan. 1974. doi: 10.1111/j.2517- 6161.1974.tb00994.x. 4
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.