Interleaved SLMs implicitly transcribe spoken words to text tokens in middle layers (top candidate for 77% of data) before predicting in text space and returning to speech.
Do audio llms really listen, or just transcribe? measuring lexical vs. acoustic emotion cues reliance
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
A survey of Large Audio Language Models that establishes a taxonomy of trustworthiness vulnerabilities and proposes a Defense-in-Depth roadmap for audio intelligence.
CogAudio-LLM introduces LIME-440K dataset, EIPS chain-of-thought reasoning, and DR-SAPO optimization to address semantic dominance and improve affective responses in audio language models.
citing papers explorer
-
Beyond Semantic Dominance: Cognitive Affective Reasoning and Empathetic Response Alignment in Audio Language Models
CogAudio-LLM introduces LIME-440K dataset, EIPS chain-of-thought reasoning, and DR-SAPO optimization to address semantic dominance and improve affective responses in audio language models.