Listening Between the Lines: Joint Learning of ASR Embeddings and LLM-Augmented Linguistics for Dementia Detection

· 2026 · eess.AS · arXiv 2606.30675

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Early detection of dementia through speech analysis offers a non-invasive screening alternative, but capturing both acoustic and linguistic biomarkers remains challenging. We propose a multimodal framework leveraging Whisper for dual-purpose extraction: acoustic representations from encoder outputs and transcripts via automatic speech recognition (ASR). For the acoustic pathway, temporal networks with attention pooling aggregate variable-length sequences into fixed-dimensional embeddings. For the linguistic pathway, we prompt a large language model (LLM) to extract interpretable features spanning lexical diversity, syntactic complexity, semantic coherence, and discourse patterns. A gated fusion network integrates both modalities. On ADReSS and ADReSSo, our method achieves F1-scores of 89.47% and 90.14%, demonstrating effective integration of acoustic and LLM-augmented linguistic features. Ablation shows that multimodal fusion consistently outperforms either modality alone.

representative citing papers

Listening Between the Lines: Joint Learning of ASR Embeddings and LLM-Augmented Linguistics for Dementia Detection

eess.AS · 2026-06-26 · unverdicted · novelty 4.0

A multimodal model fuses Whisper acoustic embeddings with LLM-extracted linguistic features via gated fusion to achieve F1 scores of 89.47% and 90.14% on ADReSS and ADReSSo dementia detection benchmarks.

citing papers explorer

Showing 1 of 1 citing paper.

Listening Between the Lines: Joint Learning of ASR Embeddings and LLM-Augmented Linguistics for Dementia Detection eess.AS · 2026-06-26 · unverdicted · none · ref 1 · internal anchor
A multimodal model fuses Whisper acoustic embeddings with LLM-extracted linguistic features via gated fusion to achieve F1 scores of 89.47% and 90.14% on ADReSS and ADReSSo dementia detection benchmarks.

Listening Between the Lines: Joint Learning of ASR Embeddings and LLM-Augmented Linguistics for Dementia Detection

fields

years

verdicts

representative citing papers

citing papers explorer