Prompttts 2: Describing and generating voices with text prompt

[Lenget al · 2023 · arXiv 2309.02285

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

CapTalk: Unified Voice Design for Single-Utterance and Dialogue Speech Generation

cs.SD · 2026-04-09 · unverdicted · novelty 7.0

CapTalk unifies single-utterance and dialogue voice design via utterance- and speaker-level captions plus a hierarchical variational module for stable timbre with adaptive expression.

Emo-LiPO: Listwise Preference Optimization for Fine-Grained Emotion Intensity Control in LLM-based Text-to-Speech

cs.SD · 2026-06-11 · unverdicted · novelty 6.0

Emo-LiPO applies listwise preference optimization to model global emotion intensity ordering in LLM TTS, yielding better accuracy and controllability than supervised or DPO baselines on a new multi-speaker dataset.

SpeakerCard-1M: An Evidence-Grounded Corpus for In-the-Wild Speaker Verification

eess.AS · 2026-06-02 · unverdicted · novelty 6.0

SpeakerCard-1M supplies 56.7k evidence-grounded speaker cards, 1.78M captions, and new cross-modal protocols showing audio LMs lag a dual-encoder baseline on attribute-conditioned verification while joint training barely hurts standard EER.

Multimodal Large Language Model-Enabled Video Translation: A Role-Oriented Survey

cs.CV · 2026-04-13

citing papers explorer

Showing 4 of 4 citing papers.

CapTalk: Unified Voice Design for Single-Utterance and Dialogue Speech Generation cs.SD · 2026-04-09 · unverdicted · none · ref 23
CapTalk unifies single-utterance and dialogue voice design via utterance- and speaker-level captions plus a hierarchical variational module for stable timbre with adaptive expression.
Emo-LiPO: Listwise Preference Optimization for Fine-Grained Emotion Intensity Control in LLM-based Text-to-Speech cs.SD · 2026-06-11 · unverdicted · none · ref 11
Emo-LiPO applies listwise preference optimization to model global emotion intensity ordering in LLM TTS, yielding better accuracy and controllability than supervised or DPO baselines on a new multi-speaker dataset.
SpeakerCard-1M: An Evidence-Grounded Corpus for In-the-Wild Speaker Verification eess.AS · 2026-06-02 · unverdicted · none · ref 11
SpeakerCard-1M supplies 56.7k evidence-grounded speaker cards, 1.78M captions, and new cross-modal protocols showing audio LMs lag a dual-encoder baseline on attribute-conditioned verification while joint training barely hurts standard EER.
Multimodal Large Language Model-Enabled Video Translation: A Role-Oriented Survey cs.CV · 2026-04-13 · unreviewed · ref 71

Prompttts 2: Describing and generating voices with text prompt

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer