Single-stage tts with masked audio token modeling and semantic knowledge distillation

Gerard I Gállego, Roy Fejgin, Chunghsin Yeh, Xiaoyu Liu, Gautam Bhattacharya · 2025

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

OmniVoice: Towards Omnilingual Zero-Shot Text-to-Speech with Diffusion Language Models

cs.CL · 2026-04-01 · unverdicted · novelty 6.0

OmniVoice introduces a diffusion language model-style non-autoregressive TTS system that directly maps text to multi-codebook acoustic tokens, scaling zero-shot synthesis to over 600 languages with SOTA results on multilingual benchmarks using 581k hours of open data.

citing papers explorer

Showing 1 of 1 citing paper.

OmniVoice: Towards Omnilingual Zero-Shot Text-to-Speech with Diffusion Language Models cs.CL · 2026-04-01 · unverdicted · none · ref 21
OmniVoice introduces a diffusion language model-style non-autoregressive TTS system that directly maps text to multi-codebook acoustic tokens, scaling zero-shot synthesis to over 600 languages with SOTA results on multilingual benchmarks using 581k hours of open data.

Single-stage tts with masked audio token modeling and semantic knowledge distillation

fields

years

verdicts

representative citing papers

citing papers explorer