Mind the Gap: Impact of Synthetic Conversational Data on Multi-Talker ASR and Speaker Diarization

· 2026 · eess.AS · arXiv 2605.15442

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Recent breakthroughs in multi-talker ASR (MT-ASR) and speaker diarization (SD) rely on synthetic data to mitigate the scarcity of large-scale conversational recordings, yet the impact of specific simulation choices remains poorly understood. To mind the gap between simulated mixtures and real-world interactions, we present a study of synthetic data generation for leading MT-ASR (DiCoW) and SD (Sortformer) systems. By introducing FastMSS, a highly efficient open-source simulator, we analyze turn-taking dynamics, source domain, acoustic augmentation, and data mixing strategies. Our findings reveal that optimal simulation recipes are highly task-dependent: increasing speech overlap benefits ASR but degrades diarization. Furthermore, broad source diversity consistently outperforms exact domain matching. Ultimately, synthetic-only training approaches real-data baselines, and combining simulated data with real recordings yields substantial gains over real-only training across both tasks.

representative citing papers

Mind the Gap: Impact of Synthetic Conversational Data on Multi-Talker ASR and Speaker Diarization

eess.AS · 2026-05-14 · unverdicted · novelty 5.0

Task-dependent simulation strategies for synthetic conversational data allow synthetic-only training to approach real-data baselines for multi-talker ASR and diarization, with mixing yielding further gains.

citing papers explorer

Showing 1 of 1 citing paper.

Mind the Gap: Impact of Synthetic Conversational Data on Multi-Talker ASR and Speaker Diarization eess.AS · 2026-05-14 · unverdicted · none · ref 2 · internal anchor
Task-dependent simulation strategies for synthetic conversational data allow synthetic-only training to approach real-data baselines for multi-talker ASR and diarization, with mixing yielding further gains.

Mind the Gap: Impact of Synthetic Conversational Data on Multi-Talker ASR and Speaker Diarization

fields

years

verdicts

representative citing papers

citing papers explorer