Gama: A large audio-language model with advanced audio understanding and complex reasoning abilities

Sreyan Ghosh et al · 2024 · arXiv 2406.11768

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

Jamendo-MT-QA: A Benchmark for Multi-Track Comparative Music Question Answering

cs.IR · 2026-04-08 · unverdicted · novelty 7.0

Jamendo-MT-QA is a new dataset and benchmark for multi-track comparative music question answering, constructed via an LLM-assisted pipeline from Creative Commons Jamendo tracks and used to evaluate audio-language models.

Listen, Pause, and Reason: Toward Perception-Grounded Hybrid Reasoning for Audio Understanding

cs.SD · 2026-04-16 · unverdicted · novelty 6.0

HyPeR is a hybrid perception-reasoning framework that uses a new hierarchical PAQA dataset and PAUSE tokens to improve large audio language models' handling of multi-speaker and ambiguous audio.

Temporal Contrastive Decoding: A Training-Free Method for Large Audio-Language Models

cs.SD · 2026-04-16 · unverdicted · novelty 6.0

Temporal Contrastive Decoding mitigates temporal smoothing bias in unified large audio-language models by contrasting logits from original and blurred audio inputs during decoding, yielding consistent gains on MMAU and AIR-Bench.

Kimi-Audio Technical Report

eess.AS · 2025-04-25 · unverdicted · novelty 5.0

Kimi-Audio is an open-source audio foundation model that achieves state-of-the-art results on speech recognition, audio understanding, question answering, and conversation after pre-training on more than 13 million hours of speech, sound, and music data.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Kimi-Audio Technical Report eess.AS · 2025-04-25 · unverdicted · none · ref 22
Kimi-Audio is an open-source audio foundation model that achieves state-of-the-art results on speech recognition, audio understanding, question answering, and conversation after pre-training on more than 13 million hours of speech, sound, and music data.

Gama: A large audio-language model with advanced audio understanding and complex reasoning abilities

fields

years

verdicts

representative citing papers

citing papers explorer