An encoding probe reconstructs transformer representations from acoustic, phonetic, syntactic, lexical and speaker features, showing independent syntactic/lexical contributions and training-dependent speaker effects.
Title resolution pending
5 Pith papers cite this work. Polarity classification is still indexing.
years
2026 5verdicts
UNVERDICTED 5representative citing papers
LISE decomposes pretrained speaker embeddings into components that preserve ASV performance with negligible EER degradation and enable listeners to distinguish speakers at 83.9% accuracy.
Upper-face affective features improve model calibration in noisy audiovisual sentence recognition but add only small accuracy gains compared to mouth features.
Wav2Vec 2.0 embeddings for pathological speech correlate most with spectral (0.77) and prosodic (0.71) eGeMAPS features, especially the first MFCC coefficient across layers.
PRISM proposes a multi-agent system decoupling speech-to-prosody handling, LLM-based response generation, and synthesis, reporting metric improvements in empathy and prosodic fit for spoken dialogue.
citing papers explorer
-
LISE : Listenable Interpretable Speaker Embeddings
LISE decomposes pretrained speaker embeddings into components that preserve ASV performance with negligible EER degradation and enable listeners to distinguish speakers at 83.9% accuracy.
-
Beyond the Mouth: Upper-Face Affective Cues in Audiovisual Sentence Recognition under Acoustic Uncertainty
Upper-face affective features improve model calibration in noisy audiovisual sentence recognition but add only small accuracy gains compared to mouth features.
-
What Does a Pathological Speech Assessment Model Know about Acoustic Features? A Case Study on Oral and Oropharyngeal Cancer Patients
Wav2Vec 2.0 embeddings for pathological speech correlate most with spectral (0.77) and prosodic (0.71) eGeMAPS features, especially the first MFCC coefficient across layers.