GaMMA unifies global and temporal music understanding in a single LMM via MoE audio encoders and progressive training, achieving new state-of-the-art accuracies on music benchmarks including 79.1% on MuchoMusic.
optimal solution
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.SD 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
GaMMA: Towards Joint Global-Temporal Music Understanding in Large Multimodal Models
GaMMA unifies global and temporal music understanding in a single LMM via MoE audio encoders and progressive training, achieving new state-of-the-art accuracies on music benchmarks including 79.1% on MuchoMusic.