Comparative Reasoning: Making an Audio Language Model Better at Comparing Emotions

· 2026 · eess.AS · arXiv 2606.24082

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Large audio-language models (LALMs) can reason about audio, yet it remains unclear whether they can perform comparative judgments between two speech signals along emotional, environmental, linguistic, prosodic, and interpersonal dimensions. We study this question in the context of speech emotion recognition (SER), where the model determines which utterance exhibits higher arousal, valence, or dominance. We introduce a reasoning-guided ordinal SER framework that conditions an LALM on paired speech inputs. The model is trained using reasoning traces generated from both semantic audio descriptions and acoustic evidence derived from GeMAPS features, enabling interpretable comparative decisions. Beyond direct supervision, we also employ direct preference optimization to encourage stronger separation for emotional differences. Experiments show that the proposed framework improves preference prediction while requiring only 5% of the training data used by conventional ordinal SER systems.

representative citing papers

Comparative Reasoning: Making an Audio Language Model Better at Comparing Emotions

eess.AS · 2026-06-23 · unverdicted · novelty 6.0

A reasoning-guided ordinal SER framework conditions LALMs on paired speech, trains on semantic and GeMAPS-derived reasoning traces, and applies direct preference optimization to improve comparative emotion prediction with only 5% of conventional training data.

citing papers explorer

Showing 1 of 1 citing paper.

Comparative Reasoning: Making an Audio Language Model Better at Comparing Emotions eess.AS · 2026-06-23 · unverdicted · none · ref 1 · internal anchor
A reasoning-guided ordinal SER framework conditions LALMs on paired speech, trains on semantic and GeMAPS-derived reasoning traces, and applies direct preference optimization to improve comparative emotion prediction with only 5% of conventional training data.

Comparative Reasoning: Making an Audio Language Model Better at Comparing Emotions

fields

years

verdicts

representative citing papers

citing papers explorer