Opensmile: the munich versatile and fast open-source audio feature extractor,

· 2010 · arXiv 3951.187424

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

representative citing papers

Audio-Based Understanding of Audiobook Narration Appeal

cs.CL · 2026-07-02 · unverdicted · novelty 6.0

Acoustic features from narration show a robust association with audiobook appeal independent of title effects, based on analysis of LibriVox data and proprietary metrics.

Beyond Decodability: Reconstructing Language Model Representations with an Encoding Probe

cs.CL · 2026-05-01 · unverdicted · novelty 6.0

An encoding probe reconstructs transformer representations from acoustic, phonetic, syntactic, lexical and speaker features, showing independent syntactic/lexical contributions and training-dependent speaker effects.

LISE : Listenable Interpretable Speaker Embeddings

cs.SD · 2026-06-19 · unverdicted · novelty 5.0

LISE decomposes pretrained speaker embeddings into components that preserve ASV performance with negligible EER degradation and enable listeners to distinguish speakers at 83.9% accuracy.

Beyond the Mouth: Upper-Face Affective Cues in Audiovisual Sentence Recognition under Acoustic Uncertainty

cs.SD · 2026-05-30 · unverdicted · novelty 5.0

Upper-face affective features improve model calibration in noisy audiovisual sentence recognition but add only small accuracy gains compared to mouth features.

What Does a Pathological Speech Assessment Model Know about Acoustic Features? A Case Study on Oral and Oropharyngeal Cancer Patients

cs.SD · 2026-06-23 · unverdicted · novelty 4.0

Wav2Vec 2.0 embeddings for pathological speech correlate most with spectral (0.77) and prosodic (0.71) eGeMAPS features, especially the first MFCC coefficient across layers.

PRISM: Prosody-Integrated Multi-Agent Reasoning Framework for Empathetic Spoken Dialogue

cs.CL · 2026-06-11 · unverdicted · novelty 3.0

PRISM proposes a multi-agent system decoupling speech-to-prosody handling, LLM-based response generation, and synthesis, reporting metric improvements in empathy and prosodic fit for spoken dialogue.

citing papers explorer

Showing 6 of 6 citing papers after filters.

Audio-Based Understanding of Audiobook Narration Appeal cs.CL · 2026-07-02 · unverdicted · none · ref 39
Acoustic features from narration show a robust association with audiobook appeal independent of title effects, based on analysis of LibriVox data and proprietary metrics.
Beyond Decodability: Reconstructing Language Model Representations with an Encoding Probe cs.CL · 2026-05-01 · unverdicted · none · ref 17
An encoding probe reconstructs transformer representations from acoustic, phonetic, syntactic, lexical and speaker features, showing independent syntactic/lexical contributions and training-dependent speaker effects.
LISE : Listenable Interpretable Speaker Embeddings cs.SD · 2026-06-19 · unverdicted · none · ref 23
LISE decomposes pretrained speaker embeddings into components that preserve ASV performance with negligible EER degradation and enable listeners to distinguish speakers at 83.9% accuracy.
Beyond the Mouth: Upper-Face Affective Cues in Audiovisual Sentence Recognition under Acoustic Uncertainty cs.SD · 2026-05-30 · unverdicted · none · ref 8
Upper-face affective features improve model calibration in noisy audiovisual sentence recognition but add only small accuracy gains compared to mouth features.
What Does a Pathological Speech Assessment Model Know about Acoustic Features? A Case Study on Oral and Oropharyngeal Cancer Patients cs.SD · 2026-06-23 · unverdicted · none · ref 39
Wav2Vec 2.0 embeddings for pathological speech correlate most with spectral (0.77) and prosodic (0.71) eGeMAPS features, especially the first MFCC coefficient across layers.
PRISM: Prosody-Integrated Multi-Agent Reasoning Framework for Empathetic Spoken Dialogue cs.CL · 2026-06-11 · unverdicted · none · ref 15
PRISM proposes a multi-agent system decoupling speech-to-prosody handling, LLM-based response generation, and synthesis, reporting metric improvements in empathy and prosodic fit for spoken dialogue.

Opensmile: the munich versatile and fast open-source audio feature extractor,

fields

years

verdicts

representative citing papers

citing papers explorer