FIGMA proposes a multi-view contrastive architecture plus the FGMCaps dataset to retrieve music from fine-grained textual descriptions of musical attributes, reporting up to 73.3% relative gains over CLAP baselines.
Jamendomaxcaps: A large scale music-caption dataset with imputed metadata
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.SD 2years
2026 2representative citing papers
Auxiliary lyric and timbre branches improve instrumental text-to-music generation quality in a controlled DiT setting even with degenerate inputs, outperforming parameter-reallocated depth variants and external baselines in objective and MOS evaluations.
citing papers explorer
-
FIGMA: Towards FIne-Grained Music retrievAl
FIGMA proposes a multi-view contrastive architecture plus the FGMCaps dataset to retrieve music from fine-grained textual descriptions of musical attributes, reporting up to 73.3% relative gains over CLAP baselines.
-
Instrumental Text-to-Music Generation with Auxiliary Conditioning Branches
Auxiliary lyric and timbre branches improve instrumental text-to-music generation quality in a controlled DiT setting even with degenerate inputs, outperforming parameter-reallocated depth variants and external baselines in objective and MOS evaluations.