REDAT: Accent-Invariant Representation for End-to-End ASR by Domain Adversarial Training with Relabeling

Andreas Stolcke; Ariya Rastrow; Gokce Keskin; Harish Arsikere; Hu Hu; Jinxi Guo; Roland Maas; Xuesong Yang; Zeynab Raeesy

arxiv: 2012.07353 · v2 · pith:4CKRT44Enew · submitted 2020-12-14 · 📡 eess.AS · cs.AI· cs.SD

REDAT: Accent-Invariant Representation for End-to-End ASR by Domain Adversarial Training with Relabeling

Hu Hu , Xuesong Yang , Zeynab Raeesy , Jinxi Guo , Gokce Keskin , Harish Arsikere , Ariya Rastrow , Andreas Stolcke

show 1 more author

Roland Maas

This is my paper

classification 📡 eess.AS cs.AIcs.SD

keywords accentsdomainredataccent-invariantadversarialdataend-to-endenglish

0 comments

read the original abstract

Accents mismatching is a critical problem for end-to-end ASR. This paper aims to address this problem by building an accent-robust RNN-T system with domain adversarial training (DAT). We unveil the magic behind DAT and provide, for the first time, a theoretical guarantee that DAT learns accent-invariant representations. We also prove that performing the gradient reversal in DAT is equivalent to minimizing the Jensen-Shannon divergence between domain output distributions. Motivated by the proof of equivalence, we introduce reDAT, a novel technique based on DAT, which relabels data using either unsupervised clustering or soft labels. Experiments on 23K hours of multi-accent data show that DAT achieves competitive results over accent-specific baselines on both native and non-native English accents but up to 13% relative WER reduction on unseen accents; our reDAT yields further improvements over DAT by 3% and 8% relatively on non-native accents of American and British English.

This paper has not been read by Pith yet.

REDAT: Accent-Invariant Representation for End-to-End ASR by Domain Adversarial Training with Relabeling

discussion (0)