HeAR -- Health Acoustic Representations

Brian Shuma; Christina Chen; Diego Ardila; Greg S. Corrado; Jake Garrison; Louis Blankemeier; Minyoi Maimbolwa; Monde Muyoyeta; Nsala Sanjase; Sam Fishman

arxiv: 2403.02522 · v1 · pith:PZNKKFUJnew · submitted 2024-03-04 · 💻 cs.LG · cs.AI

HeAR -- Health Acoustic Representations

Sebastien Baur , Zaid Nabulsi , Wei-Hung Weng , Jake Garrison , Louis Blankemeier , Sam Fishman , Christina Chen , Sujay Kakarmath

show 10 more authors

Minyoi Maimbolwa Nsala Sanjase Brian Shuma Yossi Matias Greg S. Corrado Shwetak Patel Shravya Shetty Shruthi Prabhakara Monde Muyoyeta Diego Ardila

This is my paper

classification 💻 cs.LG cs.AI

keywords healthacoustichearlearningacousticsaudiodeeptasks

0 comments

read the original abstract

Health acoustic sounds such as coughs and breaths are known to contain useful health signals with significant potential for monitoring health and disease, yet are underexplored in the medical machine learning community. The existing deep learning systems for health acoustics are often narrowly trained and evaluated on a single task, which is limited by data and may hinder generalization to other tasks. To mitigate these gaps, we develop HeAR, a scalable self-supervised learning-based deep learning system using masked autoencoders trained on a large dataset of 313 million two-second long audio clips. Through linear probes, we establish HeAR as a state-of-the-art health audio embedding model on a benchmark of 33 health acoustic tasks across 6 datasets. By introducing this work, we hope to enable and accelerate further health acoustics research.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Unlocking In-Context Learning in Audio-Language Models from Decentralized Medical Audio
cs.LG 2026-06 unverdicted novelty 6.0

FSC uses unsupervised clustering for pseudo-label episodes and a three-stage federated pipeline to achieve 71.6% accuracy in 2-way 2-shot in-context diagnosis of respiratory and cardiac audio conditions.
CoughPhase-CLR: Designing an acoustics-informed foundation model for coughing sound classification
cs.SD 2026-06 unverdicted novelty 5.0

CoughPhase-CLR uses cough physiological phases to build contrastive positive pairs, outperforming random cropping on downstream tasks including COVID-19 detection and COPD classification.
WEQA: Wearable hEalth Question Answering with Query-Adaptive Agentic Reasoning
cs.AI 2026-06 unverdicted novelty 5.0

WEQA proposes a query-adaptive agent framework combining LLMs with wearable data tools, achieving 24% higher accuracy than baselines on a benchmark from four open datasets, with gains in expert-rated usefulness.
RespiraMFM: A Multimodal Foundation Model with Contrastive Audio-Language Alignment for Respiratory Disease Identification
cs.SD 2026-06 unverdicted novelty 5.0

RespiraMFM reports 9.15% AUROC gain in supervised fine-tuning and 20.98% in zero-shot settings over baselines by aligning respiratory audio with clinical text across seven real-world datasets for five diseases.
From Objectives to Applications: Aligning Architectural Biases in Audio Self-Supervised Learning
eess.AS 2026-07 unverdicted novelty 3.0

A survey that organizes audio SSL into five objective paradigms, relates their demands to architectural biases, and interprets downstream applications as tests of generalization.