A reasoning-guided ordinal SER framework conditions LALMs on paired speech, trains on semantic and GeMAPS-derived reasoning traces, and applies direct preference optimization to improve comparative emotion prediction with only 5% of conventional training data.
Wow- bench: Evaluating fine-grained acoustic perception in audio- language models via marine mammal vocalizations
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
dataset 1polarities
use dataset 1representative citing papers
TW-Sound580K dataset plus Tai-LALM model with dynamic Dual-ASR arbitration lifts localized Taiwanese audio-language accuracy to 49.1% on the TAU benchmark.
A survey of Large Audio Language Models that establishes a taxonomy of trustworthiness vulnerabilities and proposes a Defense-in-Depth roadmap for audio intelligence.
citing papers explorer
-
Comparative Reasoning: Making an Audio Language Model Better at Comparing Emotions
A reasoning-guided ordinal SER framework conditions LALMs on paired speech, trains on semantic and GeMAPS-derived reasoning traces, and applies direct preference optimization to improve comparative emotion prediction with only 5% of conventional training data.
-
TW-Sound580K: A Regional Audio-Text Dataset with Verification-Guided Curation for Localized Audio-Language Modeling
TW-Sound580K dataset plus Tai-LALM model with dynamic Dual-ASR arbitration lifts localized Taiwanese audio-language accuracy to 49.1% on the TAU benchmark.
-
A Survey of Large Audio Language Models: Generalization, Trustworthiness, and Outlook
A survey of Large Audio Language Models that establishes a taxonomy of trustworthiness vulnerabilities and proposes a Defense-in-Depth roadmap for audio intelligence.