Towards End-to-End Code-Switching Speech Recognition

Caixia Gong; Dongwei Jiang; Ne Luo; Shuaijiang Zhao; Wei Zou; Xiangang Li

arxiv: 1810.13091 · v2 · pith:FVEGRLBYnew · submitted 2018-10-31 · 💻 cs.CL · eess.AS

Towards End-to-End Code-Switching Speech Recognition

Ne Luo , Dongwei Jiang , Shuaijiang Zhao , Caixia Gong , Wei Zou , Xiangang Li This is my paper

classification 💻 cs.CL eess.AS

keywords code-switchingrecognitionspeechend-to-endctc-attentiondifferentexperthybrid

0 comments

read the original abstract

Code-switching speech recognition has attracted an increasing interest recently, but the need for expert linguistic knowledge has always been a big issue. End-to-end automatic speech recognition (ASR) simplifies the building of ASR systems considerably by predicting graphemes or characters directly from acoustic input. In the mean time, the need of expert linguistic knowledge is also eliminated, which makes it an attractive choice for code-switching ASR. This paper presents a hybrid CTC-Attention based end-to-end Mandarin-English code-switching (CS) speech recognition system and studies the effect of hybrid CTC-Attention based models, different modeling units, the inclusion of language identification and different decoding strategies on the task of code-switching ASR. On the SEAME corpus, our system achieves a mixed error rate (MER) of 34.24%.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Joint Language Identification of Code-Switching Speech using Attention based E2E Network
cs.CL 2019-07 unverdicted novelty 5.0

Attention-based E2E network outperforms CTC-based E2E for LID on Hindi-English code-switching corpus and uses attention weights to locate switch boundaries.
End-to-End ASR for Code-switched Hindi-English Speech
eess.AS 2019-06 unverdicted novelty 4.0

End-to-end ASR for code-switched Hindi-English with <50 hours of data shows gains from multi-task learning and corpus balancing but underperforms cascaded baselines.