Zero-Shot Parkinson's Disease Detection from Speech: Comparing Large Audio and Language Models

· 2026 · cs.SD · arXiv 2605.24806

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Large audio and language models have recently demonstrated zero-shot reasoning capabilities across various domains. However, it remains unclear how the form of audio input, whether handcrafted acoustic features extracted from speech or the raw audio waveform itself, affects performance for Parkinson's disease (PD) detection across different languages. In this study, we systematically compare two input modalities for zero-shot PD detection: (i) handcrafted acoustic features extracted from speech recordings analyzed by a general-purpose LLM, and (ii) direct waveform input analyzed by audio-capable models. Experiments on PD speech datasets in four languages show that performance varies across input modalities, speech tasks, and languages. Handcrafted acoustic features provide more stable performance in a low-resource language (e.g., Bengali), whereas audio input yields dataset-dependent gains. These findings highlight the impact of input modality on zero-shot PD detection from speech.

representative citing papers

Zero-Shot Parkinson's Disease Detection from Speech: Comparing Large Audio and Language Models

cs.SD · 2026-05-24 · unverdicted · novelty 5.0

Handcrafted acoustic features offer more stable zero-shot Parkinson's detection in low-resource languages like Bengali compared to raw audio inputs which vary by dataset.

citing papers explorer

Showing 1 of 1 citing paper.

Zero-Shot Parkinson's Disease Detection from Speech: Comparing Large Audio and Language Models cs.SD · 2026-05-24 · unverdicted · none · ref 1 · internal anchor
Handcrafted acoustic features offer more stable zero-shot Parkinson's detection in low-resource languages like Bengali compared to raw audio inputs which vary by dataset.

Zero-Shot Parkinson's Disease Detection from Speech: Comparing Large Audio and Language Models

fields

years

verdicts

representative citing papers

citing papers explorer