pith. sign in

arxiv: 2403.02522 · v1 · pith:PZNKKFUJnew · submitted 2024-03-04 · 💻 cs.LG · cs.AI

HeAR -- Health Acoustic Representations

classification 💻 cs.LG cs.AI
keywords healthacoustichearlearningacousticsaudiodeeptasks
0
0 comments X
read the original abstract

Health acoustic sounds such as coughs and breaths are known to contain useful health signals with significant potential for monitoring health and disease, yet are underexplored in the medical machine learning community. The existing deep learning systems for health acoustics are often narrowly trained and evaluated on a single task, which is limited by data and may hinder generalization to other tasks. To mitigate these gaps, we develop HeAR, a scalable self-supervised learning-based deep learning system using masked autoencoders trained on a large dataset of 313 million two-second long audio clips. Through linear probes, we establish HeAR as a state-of-the-art health audio embedding model on a benchmark of 33 health acoustic tasks across 6 datasets. By introducing this work, we hope to enable and accelerate further health acoustics research.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Unlocking In-Context Learning in Audio-Language Models from Decentralized Medical Audio

    cs.LG 2026-06 unverdicted novelty 6.0

    FSC uses unsupervised clustering for pseudo-label episodes and a three-stage federated pipeline to achieve 71.6% accuracy in 2-way 2-shot in-context diagnosis of respiratory and cardiac audio conditions.

  2. CoughPhase-CLR: Designing an acoustics-informed foundation model for coughing sound classification

    cs.SD 2026-06 unverdicted novelty 5.0

    CoughPhase-CLR uses cough physiological phases to build contrastive positive pairs, outperforming random cropping on downstream tasks including COVID-19 detection and COPD classification.

  3. WEQA: Wearable hEalth Question Answering with Query-Adaptive Agentic Reasoning

    cs.AI 2026-06 unverdicted novelty 5.0

    WEQA proposes a query-adaptive agent framework combining LLMs with wearable data tools, achieving 24% higher accuracy than baselines on a benchmark from four open datasets, with gains in expert-rated usefulness.

  4. RespiraMFM: A Multimodal Foundation Model with Contrastive Audio-Language Alignment for Respiratory Disease Identification

    cs.SD 2026-06 unverdicted novelty 5.0

    RespiraMFM reports 9.15% AUROC gain in supervised fine-tuning and 20.98% in zero-shot settings over baselines by aligning respiratory audio with clinical text across seven real-world datasets for five diseases.

  5. From Objectives to Applications: Aligning Architectural Biases in Audio Self-Supervised Learning

    eess.AS 2026-07 unverdicted novelty 3.0

    A survey that organizes audio SSL into five objective paradigms, relates their demands to architectural biases, and interprets downstream applications as tests of generalization.