Audio LLMs fail to use paralinguistic audio information and default to transcript content; a new adversarial benchmark plus PCLM and DPO training raise accuracy on VoxParadox from 17.4% to 65.2%.
emnlp-main.974/
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.SD 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Do Audio LLMs Listen or Read? Analyzing and Mitigating Paralinguistic Failures with VoxParadox
Audio LLMs fail to use paralinguistic audio information and default to transcript content; a new adversarial benchmark plus PCLM and DPO training raise accuracy on VoxParadox from 17.4% to 65.2%.