Acoustic features from narration show a robust association with audiobook appeal independent of title effects, based on analysis of LibriVox data and proprietary metrics.
Opensmile: the munich versatile and fast open-source audio feature extractor,
6 Pith papers cite this work. Polarity classification is still indexing.
years
2026 6verdicts
UNVERDICTED 6representative citing papers
An encoding probe reconstructs transformer representations from acoustic, phonetic, syntactic, lexical and speaker features, showing independent syntactic/lexical contributions and training-dependent speaker effects.
LISE decomposes pretrained speaker embeddings into components that preserve ASV performance with negligible EER degradation and enable listeners to distinguish speakers at 83.9% accuracy.
Upper-face affective features improve model calibration in noisy audiovisual sentence recognition but add only small accuracy gains compared to mouth features.
Wav2Vec 2.0 embeddings for pathological speech correlate most with spectral (0.77) and prosodic (0.71) eGeMAPS features, especially the first MFCC coefficient across layers.
PRISM proposes a multi-agent system decoupling speech-to-prosody handling, LLM-based response generation, and synthesis, reporting metric improvements in empathy and prosodic fit for spoken dialogue.
citing papers explorer
-
Audio-Based Understanding of Audiobook Narration Appeal
Acoustic features from narration show a robust association with audiobook appeal independent of title effects, based on analysis of LibriVox data and proprietary metrics.
-
Beyond Decodability: Reconstructing Language Model Representations with an Encoding Probe
An encoding probe reconstructs transformer representations from acoustic, phonetic, syntactic, lexical and speaker features, showing independent syntactic/lexical contributions and training-dependent speaker effects.
-
LISE : Listenable Interpretable Speaker Embeddings
LISE decomposes pretrained speaker embeddings into components that preserve ASV performance with negligible EER degradation and enable listeners to distinguish speakers at 83.9% accuracy.
-
Beyond the Mouth: Upper-Face Affective Cues in Audiovisual Sentence Recognition under Acoustic Uncertainty
Upper-face affective features improve model calibration in noisy audiovisual sentence recognition but add only small accuracy gains compared to mouth features.
-
What Does a Pathological Speech Assessment Model Know about Acoustic Features? A Case Study on Oral and Oropharyngeal Cancer Patients
Wav2Vec 2.0 embeddings for pathological speech correlate most with spectral (0.77) and prosodic (0.71) eGeMAPS features, especially the first MFCC coefficient across layers.
-
PRISM: Prosody-Integrated Multi-Agent Reasoning Framework for Empathetic Spoken Dialogue
PRISM proposes a multi-agent system decoupling speech-to-prosody handling, LLM-based response generation, and synthesis, reporting metric improvements in empathy and prosodic fit for spoken dialogue.