Iemocap: Interactive emotional dyadic motion capture database.Language resources and evaluation, 42(4):335–359, 2008

Carlos Busso, Murtaza Bulut, Chi-Chun Lee, Abe Kazemzadeh, Emily Mower, Samuel Kim, Jeannette N Chang, Sungbok Lee, Shrikanth S Narayanan · 2008

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

citation-role summary

dataset 1

citation-polarity summary

use dataset 1

representative citing papers

AffectCodec: Emotion-Preserving Neural Speech Codec with Block-Diagonal Residual FSQ

cs.SD · 2026-05-22 · unverdicted · novelty 7.0

AffectCodec applies block-diagonal projections in residual FSQ to explicitly allocate bits to emotion and acoustic subspaces, combined with emotion conditioning, yielding better emotion preservation at low bitrates with competitive acoustic quality.

VISAFF: Speaker-Centered Visual Affective Feature Learning for Emotion Recognition in Conversation

cs.AI · 2026-05-18 · unverdicted · novelty 6.0

VISAFF is a tuning-free speaker-centered visual affective feature learning framework for emotion recognition in conversation that guides frozen VLMs to active speakers and uses reliability-guided complementation from textual and acoustic modalities to achieve competitive performance.

Controlla: Learning Controllability via Graph-Constrained Latent Geometry

cs.CV · 2026-05-15 · unverdicted · novelty 6.0

Controlla learns identity and attribute factors from multimodal inputs and aligns them with graph priors using graph-constrained optimal transport to enforce consistent attribute trajectories while preserving reference identity.

Minimizing Modality Gap from the Input Side: Your Speech LLM Can Be a Prosody-Aware Text LLM

cs.CL · 2026-05-07 · unverdicted · novelty 5.0 · 2 refs

TextPro-SLM reduces the speech-text modality gap by feeding an LLM backbone with synchronized text tokens and prosody embeddings from WhisperPro, achieving lowest gap scores at 3B/7B scales with roughly 1,000 hours of audio.

citing papers explorer

Showing 4 of 4 citing papers.

AffectCodec: Emotion-Preserving Neural Speech Codec with Block-Diagonal Residual FSQ cs.SD · 2026-05-22 · unverdicted · none · ref 2
AffectCodec applies block-diagonal projections in residual FSQ to explicitly allocate bits to emotion and acoustic subspaces, combined with emotion conditioning, yielding better emotion preservation at low bitrates with competitive acoustic quality.
VISAFF: Speaker-Centered Visual Affective Feature Learning for Emotion Recognition in Conversation cs.AI · 2026-05-18 · unverdicted · none · ref 4
VISAFF is a tuning-free speaker-centered visual affective feature learning framework for emotion recognition in conversation that guides frozen VLMs to active speakers and uses reliability-guided complementation from textual and acoustic modalities to achieve competitive performance.
Controlla: Learning Controllability via Graph-Constrained Latent Geometry cs.CV · 2026-05-15 · unverdicted · none · ref 5
Controlla learns identity and attribute factors from multimodal inputs and aligns them with graph priors using graph-constrained optimal transport to enforce consistent attribute trajectories while preserving reference identity.
Minimizing Modality Gap from the Input Side: Your Speech LLM Can Be a Prosody-Aware Text LLM cs.CL · 2026-05-07 · unverdicted · none · ref 48 · 2 links
TextPro-SLM reduces the speech-text modality gap by feeding an LLM backbone with synchronized text tokens and prosody embeddings from WhisperPro, achieving lowest gap scores at 3B/7B scales with roughly 1,000 hours of audio.

Iemocap: Interactive emotional dyadic motion capture database.Language resources and evaluation, 42(4):335–359, 2008

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer