Jamendo-MT-QA is a new dataset and benchmark for multi-track comparative music question answering, constructed via an LLM-assisted pipeline from Creative Commons Jamendo tracks and used to evaluate audio-language models.
Gama: A large audio-language model with advanced audio understanding and complex reasoning abilities
4 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 4representative citing papers
HyPeR is a hybrid perception-reasoning framework that uses a new hierarchical PAQA dataset and PAUSE tokens to improve large audio language models' handling of multi-speaker and ambiguous audio.
Temporal Contrastive Decoding mitigates temporal smoothing bias in unified large audio-language models by contrasting logits from original and blurred audio inputs during decoding, yielding consistent gains on MMAU and AIR-Bench.
Kimi-Audio is an open-source audio foundation model that achieves state-of-the-art results on speech recognition, audio understanding, question answering, and conversation after pre-training on more than 13 million hours of speech, sound, and music data.
citing papers explorer
-
Kimi-Audio Technical Report
Kimi-Audio is an open-source audio foundation model that achieves state-of-the-art results on speech recognition, audio understanding, question answering, and conversation after pre-training on more than 13 million hours of speech, sound, and music data.