pith. sign in

Longcat-audiodit: High-fidelity diffusion text-to-speech in the waveform latent space

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

years

2026 3

verdicts

UNVERDICTED 3

representative citing papers

dots.tts Technical Report

cs.SD · 2026-06-05 · unverdicted · novelty 6.0

dots.tts reports SOTA benchmark results on Seed-TTS-Eval and other tests via continuous latent-space autoregressive modeling with three listed innovations and code release.

VoxCPM2 Technical Report

cs.SD · 2026-06-05 · unverdicted · novelty 5.0

VoxCPM2 scales hierarchical continuous-latent speech modeling to 2B parameters and over 2M hours of multilingual data, unifying voice cloning, style control, and continuation in one backbone with open release.

citing papers explorer

Showing 3 of 3 citing papers.

  • WavTTS: Towards High-Quality Zero-Shot TTS via Direct Raw Waveform Modeling eess.AS · 2026-06-02 · unverdicted · none · ref 92

    WavTTS is the first raw-waveform diffusion TTS model using DiT flow matching and multi-scale mel supervision that approaches SOTA latent zero-shot performance while beating prior end-to-end models.

  • dots.tts Technical Report cs.SD · 2026-06-05 · unverdicted · none · ref 2

    dots.tts reports SOTA benchmark results on Seed-TTS-Eval and other tests via continuous latent-space autoregressive modeling with three listed innovations and code release.

  • VoxCPM2 Technical Report cs.SD · 2026-06-05 · unverdicted · none · ref 36

    VoxCPM2 scales hierarchical continuous-latent speech modeling to 2B parameters and over 2M hours of multilingual data, unifying voice cloning, style control, and continuation in one backbone with open release.