CosyVoice 2 delivers human-parity naturalness and near-lossless streaming speech synthesis by combining finite-scalar quantization, a streamlined pre-trained LLM, and chunk-aware causal flow matching on large multilingual data.
Unicats: A unified context-aware text-to-speech framework with contextual vq-diffusion and vocoding
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
citation-role summary
dataset 1
citation-polarity summary
fields
cs.SD 1years
2024 1verdicts
UNVERDICTED 1roles
dataset 1polarities
use dataset 1representative citing papers
citing papers explorer
-
CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models
CosyVoice 2 delivers human-parity naturalness and near-lossless streaming speech synthesis by combining finite-scalar quantization, a streamlined pre-trained LLM, and chunk-aware causal flow matching on large multilingual data.