Investigating the effects of large-scale pseudo-stereo data and different speech foundation model on dialogue generative spoken language model

· 2024 · arXiv 2407.01911

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Tight Boundary Prediction in Speaker Diarization Using Causal-Anticausal Consistency

eess.AS · 2026-06-10 · unverdicted · novelty 6.0

Causal-anticausal consistency co-training recovers about 70% of the boundary-tightening effect possible with ideal tight labels in speaker diarization.

ZipVoice-Dialog: Non-Autoregressive Spoken Dialogue Generation with Flow Matching

eess.AS · 2025-07-12 · conditional · novelty 6.0

ZipVoice-Dialog is a flow-matching non-autoregressive model for zero-shot spoken dialogue generation that uses curriculum learning and speaker-turn embeddings, paired with a new 6.8k-hour OpenDialog dataset, and reports better speed and quality than autoregressive baselines.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Tight Boundary Prediction in Speaker Diarization Using Causal-Anticausal Consistency eess.AS · 2026-06-10 · unverdicted · none · ref 31
Causal-anticausal consistency co-training recovers about 70% of the boundary-tightening effect possible with ideal tight labels in speaker diarization.
ZipVoice-Dialog: Non-Autoregressive Spoken Dialogue Generation with Flow Matching eess.AS · 2025-07-12 · conditional · none · ref 15
ZipVoice-Dialog is a flow-matching non-autoregressive model for zero-shot spoken dialogue generation that uses curriculum learning and speaker-turn embeddings, paired with a new 6.8k-hour OpenDialog dataset, and reports better speed and quality than autoregressive baselines.

Investigating the effects of large-scale pseudo-stereo data and different speech foundation model on dialogue generative spoken language model

fields

years

verdicts

representative citing papers

citing papers explorer