Proposes keyword-appended LLM embeddings plus top-k negative loss for open-set speaker attribute prediction that outperforms closed-set baselines on LibriTTS-P and generalizes to unseen synonyms.
Leveraging llm embeddings for cross dataset label alignment and zero shot music emotion pre- diction,
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.SD 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
MERIT trains disentangled heads for melody, rhythm, and timbre via conditional audio generation and stem separation, with evaluations showing each head responds strongly to its target dimension and near chance on others across synthetic and real audio.
citing papers explorer
-
Toward Open-Set Speaker Attribute Prediction with Keyword-Appended LLM Embeddings
Proposes keyword-appended LLM embeddings plus top-k negative loss for open-set speaker attribute prediction that outperforms closed-set baselines on LibriTTS-P and generalizes to unseen synonyms.
-
MERIT: Learning Disentangled Music Representations for Audio Similarity
MERIT trains disentangled heads for melody, rhythm, and timbre via conditional audio generation and stem separation, with evaluations showing each head responds strongly to its target dimension and near chance on others across synthetic and real audio.