RIVET enforces an idempotency objective during training of voice attribute editing models to improve robustness to noisy labels, outperforming standard training on controlled noise and the GLOBE dataset.
RIVET: Robust Idempotent Voice Attribute Editing
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
Voice attribute editing models modify characteristics such as age and gender while preserving speaker identity. In large-scale speech datasets, however, attribute annotations are often noisy or inconsistent, which can cause conditional generative models to produce unstable edits. In this work, we show that idempotency provides an effective mechanism for improving robustness to noisy labels. An idempotent operator is one for which repeated application does not change the result, i.e., f(f(x)) = f(x). Enforcing this property acts as an implicit regularizer that reduces sensitivity to mislabeled examples. We introduce RIVET, a training framework that incorporates an idempotency objective to improve robustness to label noise. We evaluate RIVET under controlled label noise and on the GLOBE dataset with naturally noisy annotations. RIVET improves editing success and better preserves speaker identity than standard training, showing that idempotency improves robustness in voice editing models.
fields
cs.SD 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
RIVET: Robust Idempotent Voice Attribute Editing
RIVET enforces an idempotency objective during training of voice attribute editing models to improve robustness to noisy labels, outperforming standard training on controlled noise and the GLOBE dataset.