Proposes keyword-appended LLM embeddings plus top-k negative loss for open-set speaker attribute prediction that outperforms closed-set baselines on LibriTTS-P and generalizes to unseen synonyms.
Toward Open-Set Speaker Attribute Prediction with Keyword-Appended LLM Embeddings
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
Understanding speaker attributes is crucial for voice-related applications, yet conventional approaches rely on fixed categorical labels, lacking semantic richness and zero-shot generalizability. We propose a novel framework for open-set speaker attribute prediction leveraging Large Language Model (LLM) embeddings to represent attributes in a continuous semantic space. To bridge the cross-modal gap, we introduce a keyword-appending strategy that structures broad semantic representations into a compact, discriminative manifold. Furthermore, we employ a top-k negative loss to establish robust decision boundaries in crowded semantic regions. Experimental results on LibriTTS-P demonstrate that our method outperforms closed-set benchmarks and generalizes effectively to unseen synonyms. Geometric analysis suggests that our strategies regularize the embedding manifold, balancing semantic cohesion with predictive clarity.
fields
cs.SD 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Toward Open-Set Speaker Attribute Prediction with Keyword-Appended LLM Embeddings
Proposes keyword-appended LLM embeddings plus top-k negative loss for open-set speaker attribute prediction that outperforms closed-set baselines on LibriTTS-P and generalizes to unseen synonyms.