Audit of 39 deepfake speech datasets shows most lack demographic metadata making fairness checks infeasible and reveals substantial overlap in bona fide sources that undermines cross-dataset generalization claims.
Cstr vctk corpus: English multi-speaker corpus for cstr voice cloning toolkit
5 Pith papers cite this work. Polarity classification is still indexing.
years
2026 5verdicts
UNVERDICTED 5representative citing papers
AIA generates universal interference audio infused with Acoustic Latent Semantics to bypass LALM safety alignment, achieving SOTA attack success rates on 10 models across five datasets.
S5-TTS introduces a streaming T5-TTS variant with lookahead-causal masking and interleaved multi-source distillation that achieves comparable quality to full-context models while cutting end-to-end latency.
An SFF-based DoA estimator using PHAT-weighted GCC on envelopes performs comparably or better than GCC methods on real reverberant multi-speaker recordings.
A robust soft-constrained optimization framework for spatially selective active noise control that minimizes average cost over a set of secondary path estimates from human measurements to reduce performance variation under mismatch.
citing papers explorer
-
Ethical and Technical Limits of Deepfake Speech Datasets
Audit of 39 deepfake speech datasets shows most lack demographic metadata making fairness checks infeasible and reveals substantial overlap in bona fide sources that undermines cross-dataset generalization claims.
-
Acoustic Interference: A New Paradigm Weaponizing Acoustic Latent Semantic for Universal Jailbreak against Large Audio Language Models
AIA generates universal interference audio infused with Acoustic Latent Semantics to bypass LALM safety alignment, achieving SOTA attack success rates on 10 models across five datasets.
-
Streaming T5-based Text-to-Speech Synthesis with Limited Lookahead
S5-TTS introduces a streaming T5-TTS variant with lookahead-causal masking and interleaved multi-source distillation that achieves comparable quality to full-context models while cutting end-to-end latency.
-
Single frequency filtering based multi-speaker direction of arrival estimation from stereo recordings
An SFF-based DoA estimator using PHAT-weighted GCC on envelopes performs comparably or better than GCC methods on real reverberant multi-speaker recordings.
-
Robust Soft-Constrained Spatially Selective Active Noise Control for Hearables Under Secondary Path Variations
A robust soft-constrained optimization framework for spatially selective active noise control that minimizes average cost over a set of secondary path estimates from human measurements to reduce performance variation under mismatch.