pith. machine review for the scientific record. sign in

Sailor, and Qiongqiong Wang

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

years

2026 3

verdicts

UNVERDICTED 3

representative citing papers

X-VC: Zero-shot Streaming Voice Conversion in Codec Space

eess.AS · 2026-04-14 · unverdicted · novelty 7.0

X-VC achieves zero-shot streaming voice conversion via one-step codec-space conversion with dual-conditioning acoustic converter and role-assignment training on generated paired data.

Voxtral TTS

cs.AI · 2026-03-26 · unverdicted · novelty 5.0

Voxtral TTS produces expressive multilingual speech from 3-second reference audio with a hybrid autoregressive-plus-flow-matching architecture and a new VQ-FSQ tokenizer, achieving 68.4% win rate over ElevenLabs in human evaluations.

AT-ADD: All-Type Audio Deepfake Detection Challenge Evaluation Plan

cs.SD · 2026-04-09 · unverdicted · novelty 3.0

AT-ADD introduces standardized tracks and datasets for evaluating audio deepfake detectors on speech under real-world conditions and on diverse unknown audio types to promote generalization beyond speech-centric methods.

citing papers explorer

Showing 3 of 3 citing papers.

  • X-VC: Zero-shot Streaming Voice Conversion in Codec Space eess.AS · 2026-04-14 · unverdicted · none · ref 10

    X-VC achieves zero-shot streaming voice conversion via one-step codec-space conversion with dual-conditioning acoustic converter and role-assignment training on generated paired data.

  • Voxtral TTS cs.AI · 2026-03-26 · unverdicted · none · ref 2

    Voxtral TTS produces expressive multilingual speech from 3-second reference audio with a hybrid autoregressive-plus-flow-matching architecture and a new VQ-FSQ tokenizer, achieving 68.4% win rate over ElevenLabs in human evaluations.

  • AT-ADD: All-Type Audio Deepfake Detection Challenge Evaluation Plan cs.SD · 2026-04-09 · unverdicted · none · ref 15

    AT-ADD introduces standardized tracks and datasets for evaluating audio deepfake detectors on speech under real-world conditions and on diverse unknown audio types to promote generalization beyond speech-centric methods.