Joint Language Identification of Code-Switching Speech using Attention based E2E Network

Kumar Priyadarshi; Kunal Dhawan; Rohit Sinha; Sreeram Ganji

arxiv: 1907.06342 · v1 · pith:5L24MNGWnew · submitted 2019-07-15 · 💻 cs.CL · cs.SD· eess.AS

Joint Language Identification of Code-Switching Speech using Attention based E2E Network

Sreeram Ganji , Kunal Dhawan , Kumar Priyadarshi , Rohit Sinha This is my paper

Pith reviewed 2026-05-24 21:50 UTC · model grok-4.3

classification 💻 cs.CL cs.SDeess.AS

keywords language identificationcode-switchingattention mechanismend-to-end networkHindi-English corpusjoint modelingspeech processing

0 comments

The pith

An attention-based end-to-end network jointly identifies languages in code-switched speech and locates switch points via attention weights.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes modeling the languages inside a code-switched utterance jointly inside one attention-based end-to-end network rather than building separate models for each language. The system is developed and tested on a Hindi-English code-switching corpus and is compared against a connectionist temporal classification end-to-end network. The attention approach yields higher language identification accuracy and the plotted attention weights show where language switches occur inside an utterance.

Core claim

An attention-based end-to-end network that jointly models the languages present in code-switching speech achieves better language identification accuracy than a connectionist temporal classification end-to-end network on a Hindi-English corpus, and the attention weights of the network mark the locations of language switches inside utterances.

What carries the argument

Attention-based end-to-end network performing joint language modeling of the languages inside a single network.

If this is right

Joint language modeling inside one network is feasible for code-switching speech.
Attention weights inside the end-to-end network mark language boundaries inside utterances.
The attention approach outperforms the CTC-based end-to-end baseline on the Hindi-English corpus.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same joint network could feed directly into a downstream code-switched speech recognizer without an explicit language detector.
Attention plots could serve as a diagnostic tool for spotting language transition patterns in new corpora.
The joint-modeling idea might extend to three or more languages inside one utterance if the network capacity scales.

Load-bearing premise

Joint modelling of the underlying languages inside a single attention-based E2E network is feasible and superior to separate modelling of each language.

What would settle it

On a held-out code-switching test set the attention-based system shows equal or lower accuracy than the CTC-based system, or the attention weights fail to align with actual language switch points.

Figures

Figures reproduced from arXiv: 1907.06342 by Kumar Priyadarshi, Kunal Dhawan, Rohit Sinha, Sreeram Ganji.

**Figure 2.** Figure 2: Architecture of LAS network. It consists of three modules namely: listener (encoder), attender [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Creation of character-level LID tags for the training data towards conditioning the E2E networks [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Visualization of attention mechanism for LID task. For a given Hindi-English code-switching [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

read the original abstract

Language identification (LID) has relevance in many speech processing applications. For the automatic recognition of code-switching speech, the conventional approaches often employ an LID system for detecting the languages present within an utterance. In the existing works, the LID on code-switching speech involves modelling of the underlying languages separately. In this work, we propose a joint modelling based LID system for code-switching speech. To achieve the same, an attention-based end-to-end (E2E) network has been explored. For the development and evaluation of the proposed approach, a recently created Hindi-English code-switching corpus has been used. For the contrast purpose, an LID system employing the connectionist temporal classification-based E2E network is also developed. On comparing both the LID systems, the attention based approach is noted to result in better LID accuracy. The effective location of code-switching boundaries within the utterance by the proposed approach has been demonstrated by plotting the attention weights of E2E network.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Attention E2E beats CTC on code-switch LID but the joint-vs-separate test the authors motivate is missing.

read the letter

The paper shows an attention-based E2E network giving better LID accuracy than a CTC E2E network on Hindi-English code-switched speech, with attention weights that appear to mark the language boundaries. That is the concrete result. The work applies the attention E2E architecture to joint modelling of the two languages inside one network on a recent Hindi-English corpus, which is a new application relative to the separate-modelling baselines mentioned in the abstract. The boundary visualization is a useful detail that makes the output more interpretable than a plain accuracy number. The implementation is standard and the comparison is direct, so the execution looks clean on its own terms. The main gap is that the stated motivation is never tested. The abstract criticizes conventional separate modelling of each language and presents joint modelling as the improvement, yet the only head-to-head is between two joint E2E systems. Without a separate-modelling baseline, it is not possible to know whether the joint approach itself is feasible or better. The abstract also gives no accuracy figures, dataset sizes, or error bars, which leaves the size of the gain unclear. This is narrow speech-processing work aimed at people already building LID systems for code-switched audio, especially in Indian languages. A reader focused on E2E methods might find the attention comparison and plots worth seeing, but anyone expecting a decisive test of joint versus separate modelling will be disappointed. The thinking is straightforward and the architecture choice is reasonable, but the evidence does not match the problem the authors set up. I would send it to peer review so the authors can add the missing baseline and the quantitative details.

Referee Report

1 major / 1 minor

Summary. The paper proposes a joint language identification (LID) system for code-switching speech using an attention-based end-to-end (E2E) network, motivated by the observation that prior work models the underlying languages separately. It develops and evaluates the approach on a Hindi-English code-switching corpus, contrasts it with a CTC-based E2E network (also joint), reports superior LID accuracy for the attention model, and demonstrates boundary localization by visualizing attention weights.

Significance. If the empirical comparison holds after addressing the comparison gap, the work would provide evidence that attention-based joint E2E modelling can outperform CTC-based joint modelling for LID while offering interpretable boundary detection; this could inform future code-switching speech systems, though the absence of a direct test against separate-modelling baselines reduces the ability to substantiate the joint-modelling motivation.

major comments (1)

[Abstract] Abstract (and §1): The central motivation states that 'existing works... involve modelling of the underlying languages separately' and positions the contribution as 'joint modelling based LID system,' yet the only reported comparison is between the proposed attention E2E and a CTC-based E2E network; both are joint models, so the experiment does not test whether joint modelling inside one network is feasible or superior to the conventional separate approach that motivates the paper.

minor comments (1)

[Abstract] Abstract: No quantitative accuracy figures, error bars, dataset statistics, or baseline details are supplied, which weakens the ability to evaluate the 'better LID accuracy' claim without consulting the results section.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback. We address the major comment point-by-point below and propose targeted revisions to improve clarity without altering the core experiments.

read point-by-point responses

Referee: [Abstract] Abstract (and §1): The central motivation states that 'existing works... involve modelling of the underlying languages separately' and positions the contribution as 'joint modelling based LID system,' yet the only reported comparison is between the proposed attention E2E and a CTC-based E2E network; both are joint models, so the experiment does not test whether joint modelling inside one network is feasible or superior to the conventional separate approach that motivates the paper.

Authors: We acknowledge the observation. The manuscript's motivation correctly notes that prior LID work on code-switching typically models languages separately, and our contribution is a joint E2E architecture. The reported experiments compare two joint implementations (attention vs. CTC) to isolate the effect of the attention mechanism on LID accuracy and boundary localization. This design demonstrates that joint modelling is feasible and effective within a single network, but does not include a head-to-head evaluation against separate-modelling baselines. In revision we will (i) rephrase the abstract and §1 to state that the work shows joint modelling is viable rather than claiming superiority over separate approaches, and (ii) add a limitations paragraph noting the absence of such a baseline comparison as future work. No new experiments will be added. revision: partial

Circularity Check

0 steps flagged

No circularity; empirical architecture comparison on external corpus

full rationale

The paper reports an empirical comparison of two joint E2E LID architectures (attention vs. CTC) on a Hindi-English code-switching corpus. No mathematical derivations, fitted parameters renamed as predictions, self-definitional quantities, or load-bearing self-citations appear in the provided text. The central result is an accuracy delta between two independently trained networks, which does not reduce to its own inputs by construction. The noted mismatch between stated motivation (joint vs. separate modelling) and actual experiment is a design limitation, not circularity.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The paper rests on standard assumptions of deep learning for sequence tasks and the representativeness of the cited Hindi-English corpus; no new entities are introduced and hyperparameters are not enumerated in the abstract.

free parameters (1)

E2E network hyperparameters
Typical training choices such as learning rate and attention configuration are required for any E2E model but are not specified.

axioms (1)

domain assumption Joint modelling inside one network is feasible and preferable to separate per-language modelling for code-switching LID
This premise is invoked to motivate the proposed attention-based system over conventional approaches.

pith-pipeline@v0.9.0 · 5711 in / 1175 out tokens · 26292 ms · 2026-05-24T21:50:37.772948+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages · 2 internal anchors

[1]

John J Gumperz, Discourse Strategies, Cambridge University Press, 1982

work page 1982
[2]

Codeswitching as an urban language-contact phenomenon,

Carol M Eastman, “Codeswitching as an urban language-contact phenomenon,” Journal of Multilingual & Multicultural Development , vol. 13, no. 1-2, pp. 1–17, 1992

work page 1992
[3]

Comparing codeswitching and borrowing,

Carol Myers Scotton, “Comparing codeswitching and borrowing,” Journal of Multilingual & Multicul- tural Development, vol. 13, no. 1-2, pp. 19–39, 1992

work page 1992
[4]

I am borrowing ya mixing? An Analysis of English-Hindi Code Mixing in Facebook,

Kalika Bali, Jatin Sharma, Monojit Choudhury, and Yogarshi Vyas, “I am borrowing ya mixing? An Analysis of English-Hindi Code Mixing in Facebook,” in Proc. of the First Workshop on Computational Approaches to Code Switching , 2014, pp. 116–126

work page 2014
[5]

Code-mixing in social media text: The last language identiﬁcation frontier?,

Amitava Das and Bj¨ orn Gamb¨ ack, “Code-mixing in social media text: The last language identiﬁcation frontier?,” in Proc.of Traitement Automatique des Langues (ATALA) , 2015

work page 2015
[6]

LTD., 1994

Lalita Malik, Socio-linguistics: A study of code-switching , Anmol Publications PVT. LTD., 1994

work page 1994
[7]

Code-switching between Mandarin and Taiwanese in three telephone conversation: The negotiation of interpersonal relationships among bilingual speakers in Taiwan,

Hsi-Yao Su, “Code-switching between Mandarin and Taiwanese in three telephone conversation: The negotiation of interpersonal relationships among bilingual speakers in Taiwan,” in Proc. of the Sympo- sium about Language and Society , 2001

work page 2001
[8]

Building a First Language Model for Code- switch Arabic-English,

Injy Hamed, Mohamed Elmahdy, and Slim Abdennadher, “Building a First Language Model for Code- switch Arabic-English,” Procedia Computer Science, vol. 117, pp. 208–216, 2017

work page 2017
[9]

The French-Algerian code-switching triggered audio corpus (FACST).,

Djegdjiga Amazouz, Martine Adda-Decker, and Lori Lamel, “The French-Algerian code-switching triggered audio corpus (FACST).,” inProc. of Language Resources and Evaluation Conference (LREC) , 2018

work page 2018
[10]

MediaParl: Bilingual mixed language accented speech database,

David Imseng, Herv´ e Bourlard, Holger Caesar, Philip N Garner, Gw´ enol´ e Lecorv´ e, and Alexandre 11 Nanchen, “MediaParl: Bilingual mixed language accented speech database,” in Proc. of Spoken Language Technology Workshop (SLT) , 2012, pp. 263–268

work page 2012
[11]

A longitudinal bilingual Frisian-Dutch radio broadcast database designed for code-switching research,

Emre Yilmaz, Maaike Andringa, Sigrid Kingma, Jelske Dijkstra, Frits Van der Kuip, Hans Van de Velde, Frederik Kampstra, Jouke Algra, H Heuvel, and David Van Leeuwen, “A longitudinal bilingual Frisian-Dutch radio broadcast database designed for code-switching research,” in Proceedings of the International Conference on Language Resources and Evaluation (LR...

work page 2016
[12]

Hindi-English, Code Switching and Language Choice in Urban, Uppermiddle-class Indian Families,

Sunita Malhotra, “Hindi-English, Code Switching and Language Choice in Urban, Uppermiddle-class Indian Families,” University of Kansas. Linguistics Graduate Student Association , 1980

work page 1980
[13]

A Hindi-English Code-Switching Corpus.,

Anik Dey and Pascale Fung, “A Hindi-English Code-Switching Corpus.,” in Proc. of the Language Resources and Evaluation Conference (LREC) , 2014, pp. 2410–2413

work page 2014
[14]

Automatic speech recognition of code switching speech using 1-best rescoring,

Basem HA Ahmed and Tien-Ping Tan, “Automatic speech recognition of code switching speech using 1-best rescoring,” in Proc. of International Conference on Asian Language Processing (IALP) , 2012, pp. 137–140

work page 2012
[15]

SEAME: A Mandarin-English code-switching speech corpus in South-East Asia,

Dau-Cheng Lyu, Tien-Ping Tan, Eng Siong Chng, and Haizhou Li, “SEAME: A Mandarin-English code-switching speech corpus in South-East Asia,” in Proc. of Interspeech, an Annual Conference of International Speech Communication Association , 2010

work page 2010
[16]

Speech recognition on code-switching among the Chinese dialects,

Dau Cheng Lyu, Ren Yuan Lyu, Yuang Chin Chiang, and Chun Nan Hsu, “Speech recognition on code-switching among the Chinese dialects,” in Proc. of International Conference on Acoustics, Speech and Signal Processing (ICASSP) . IEEE, 2006, vol. 1

work page 2006
[17]

Part-of-Speech tagging for English-Spanish code-switched text,

Thamar Solorio and Yang Liu, “Part-of-Speech tagging for English-Spanish code-switched text,” in Proc. of the Conference on Empirical Methods in Natural Language Processing . Association for Com- putational Linguistics, 2008, pp. 1051–1060

work page 2008
[18]

Mixed language speech recognition without explicit identiﬁcation of language,

Kiran Bhuvanagirir and Sunil Kumar Kopparapu, “Mixed language speech recognition without explicit identiﬁcation of language,” American Journal of Signal Processing , vol. 2, no. 5, pp. 92–97, 2012

work page 2012
[19]

Language identiﬁcation on code-switching utterances using mul- tiple cues,

Dau Cheng Lyu and Ren Yuan Lyu, “Language identiﬁcation on code-switching utterances using mul- tiple cues,” in Proc. of Interspeech, an Annual Conference of the International Speech Communication Association, 2008

work page 2008
[20]

Semantics-based language modeling for Cantonese-English code-mixing speech recognition,

Houwei Cao, PC Ching, Tan Lee, and Yu Ting Yeung, “Semantics-based language modeling for Cantonese-English code-mixing speech recognition,” in Proc. of 7th International Symposium on Chi- nese Spoken Language Processing (ISCSLP) , 2010, pp. 246–250

work page 2010
[21]

An integrated framework for transcribing Mandarin-English code-mixed lectures with improved acoustic and language modeling,

Ching Feng Yeh, Chao Yu Huang, Liang Che Sun, Che Liang, and Lin Shan Lee, “An integrated framework for transcribing Mandarin-English code-mixed lectures with improved acoustic and language modeling,” in Proc. of 7th International Symposium on Chinese Spoken Language Processing (ISCSLP) , 2010, pp. 214–219. 12

work page 2010
[22]

Speech Synthesis of Code-Mixed Text.,

Sunayana Sitaram and Alan W Black, “Speech Synthesis of Code-Mixed Text.,” in Proc. of Language Resources and Evaluation Conference LREC , 2016

work page 2016
[23]

Hindi-English Code-Switching Speech Corpus

Ganji Sreeram, Kunal Dhawan, and Rohit Sinha, “Hindi-English code-switching speech corpus,” arXiv:1810.00662, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[24]

Connectionist temporal classiﬁcation: Labelling unsegmented sequence data with recurrent neural networks,

Alex Graves, Santiago Fern´ andez, Faustino Gomez, and J¨ urgen Schmidhuber, “Connectionist temporal classiﬁcation: Labelling unsegmented sequence data with recurrent neural networks,” in Proc. of the 23rd International Conference on Machine learning , 2006, pp. 369–376

work page 2006
[25]

End-to-end continuous speech recognition using attention-based recurrent NN: First results,

Jan Chorowski, Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio, “End-to-end continuous speech recognition using attention-based recurrent NN: First results,” in Proc. of Deep Learning and Representation Learning Workshop, 2014

work page 2014
[26]

End-to-end language identiﬁcation using attention-based recurrent neural networks,

Wang Geng, Wenfu Wang, Yuanyuan Zhao, Xinyuan Cai, Bo Xu, Cai Xinyuan, et al., “End-to-end language identiﬁcation using attention-based recurrent neural networks,” in Proc. of Interspeech, an Annual Conference of International Speech Communication Association , 2016

work page 2016
[27]

Intrasentential vs. intersentential code switching in early and late bilinguals,

Kelly Ann Hill Zirker, “Intrasentential vs. intersentential code switching in early and late bilinguals,” 2007

work page 2007
[28]

Listen, attend and spell: A neural network for large vocabulary conversational speech recognition,

William Chan, Navdeep Jaitly, Quoc Le, and Oriol Vinyals, “Listen, attend and spell: A neural network for large vocabulary conversational speech recognition,” in Proc. of International Conference on Acoustics, Speech and Signal Processing (ICASSP) , 2016, pp. 4960–4964

work page 2016
[29]

Nabu: An end-to-end speech recognition toolkit,

Vincent, “Nabu: An end-to-end speech recognition toolkit,” [Online] https://vrenkens.github.io/ nabu/, Accessed: 2019-03-24

work page 2019
[30]

Voting algorithms,

Behrooz Parhami, “Voting algorithms,” IEEE Transactions on Reliability , vol. 43, no. 4, pp. 617–629, 1994

work page 1994
[31]

Towards End-to-End Code-Switching Speech Recognition

Ne Luo, Dongwei Jiang, Shuaijiang Zhao, Caixia Gong, Wei Zou, and Xiangang Li, “Towards end-to- end code-switching speech recognition,” arXiv preprint arXiv:1810.13091 , 2018. 13

work page internal anchor Pith review Pith/arXiv arXiv 2018

[1] [1]

John J Gumperz, Discourse Strategies, Cambridge University Press, 1982

work page 1982

[2] [2]

Codeswitching as an urban language-contact phenomenon,

Carol M Eastman, “Codeswitching as an urban language-contact phenomenon,” Journal of Multilingual & Multicultural Development , vol. 13, no. 1-2, pp. 1–17, 1992

work page 1992

[3] [3]

Comparing codeswitching and borrowing,

Carol Myers Scotton, “Comparing codeswitching and borrowing,” Journal of Multilingual & Multicul- tural Development, vol. 13, no. 1-2, pp. 19–39, 1992

work page 1992

[4] [4]

I am borrowing ya mixing? An Analysis of English-Hindi Code Mixing in Facebook,

Kalika Bali, Jatin Sharma, Monojit Choudhury, and Yogarshi Vyas, “I am borrowing ya mixing? An Analysis of English-Hindi Code Mixing in Facebook,” in Proc. of the First Workshop on Computational Approaches to Code Switching , 2014, pp. 116–126

work page 2014

[5] [5]

Code-mixing in social media text: The last language identiﬁcation frontier?,

Amitava Das and Bj¨ orn Gamb¨ ack, “Code-mixing in social media text: The last language identiﬁcation frontier?,” in Proc.of Traitement Automatique des Langues (ATALA) , 2015

work page 2015

[6] [6]

LTD., 1994

Lalita Malik, Socio-linguistics: A study of code-switching , Anmol Publications PVT. LTD., 1994

work page 1994

[7] [7]

Code-switching between Mandarin and Taiwanese in three telephone conversation: The negotiation of interpersonal relationships among bilingual speakers in Taiwan,

Hsi-Yao Su, “Code-switching between Mandarin and Taiwanese in three telephone conversation: The negotiation of interpersonal relationships among bilingual speakers in Taiwan,” in Proc. of the Sympo- sium about Language and Society , 2001

work page 2001

[8] [8]

Building a First Language Model for Code- switch Arabic-English,

Injy Hamed, Mohamed Elmahdy, and Slim Abdennadher, “Building a First Language Model for Code- switch Arabic-English,” Procedia Computer Science, vol. 117, pp. 208–216, 2017

work page 2017

[9] [9]

The French-Algerian code-switching triggered audio corpus (FACST).,

Djegdjiga Amazouz, Martine Adda-Decker, and Lori Lamel, “The French-Algerian code-switching triggered audio corpus (FACST).,” inProc. of Language Resources and Evaluation Conference (LREC) , 2018

work page 2018

[10] [10]

MediaParl: Bilingual mixed language accented speech database,

David Imseng, Herv´ e Bourlard, Holger Caesar, Philip N Garner, Gw´ enol´ e Lecorv´ e, and Alexandre 11 Nanchen, “MediaParl: Bilingual mixed language accented speech database,” in Proc. of Spoken Language Technology Workshop (SLT) , 2012, pp. 263–268

work page 2012

[11] [11]

A longitudinal bilingual Frisian-Dutch radio broadcast database designed for code-switching research,

Emre Yilmaz, Maaike Andringa, Sigrid Kingma, Jelske Dijkstra, Frits Van der Kuip, Hans Van de Velde, Frederik Kampstra, Jouke Algra, H Heuvel, and David Van Leeuwen, “A longitudinal bilingual Frisian-Dutch radio broadcast database designed for code-switching research,” in Proceedings of the International Conference on Language Resources and Evaluation (LR...

work page 2016

[12] [12]

Hindi-English, Code Switching and Language Choice in Urban, Uppermiddle-class Indian Families,

Sunita Malhotra, “Hindi-English, Code Switching and Language Choice in Urban, Uppermiddle-class Indian Families,” University of Kansas. Linguistics Graduate Student Association , 1980

work page 1980

[13] [13]

A Hindi-English Code-Switching Corpus.,

Anik Dey and Pascale Fung, “A Hindi-English Code-Switching Corpus.,” in Proc. of the Language Resources and Evaluation Conference (LREC) , 2014, pp. 2410–2413

work page 2014

[14] [14]

Automatic speech recognition of code switching speech using 1-best rescoring,

Basem HA Ahmed and Tien-Ping Tan, “Automatic speech recognition of code switching speech using 1-best rescoring,” in Proc. of International Conference on Asian Language Processing (IALP) , 2012, pp. 137–140

work page 2012

[15] [15]

SEAME: A Mandarin-English code-switching speech corpus in South-East Asia,

Dau-Cheng Lyu, Tien-Ping Tan, Eng Siong Chng, and Haizhou Li, “SEAME: A Mandarin-English code-switching speech corpus in South-East Asia,” in Proc. of Interspeech, an Annual Conference of International Speech Communication Association , 2010

work page 2010

[16] [16]

Speech recognition on code-switching among the Chinese dialects,

Dau Cheng Lyu, Ren Yuan Lyu, Yuang Chin Chiang, and Chun Nan Hsu, “Speech recognition on code-switching among the Chinese dialects,” in Proc. of International Conference on Acoustics, Speech and Signal Processing (ICASSP) . IEEE, 2006, vol. 1

work page 2006

[17] [17]

Part-of-Speech tagging for English-Spanish code-switched text,

Thamar Solorio and Yang Liu, “Part-of-Speech tagging for English-Spanish code-switched text,” in Proc. of the Conference on Empirical Methods in Natural Language Processing . Association for Com- putational Linguistics, 2008, pp. 1051–1060

work page 2008

[18] [18]

Mixed language speech recognition without explicit identiﬁcation of language,

Kiran Bhuvanagirir and Sunil Kumar Kopparapu, “Mixed language speech recognition without explicit identiﬁcation of language,” American Journal of Signal Processing , vol. 2, no. 5, pp. 92–97, 2012

work page 2012

[19] [19]

Language identiﬁcation on code-switching utterances using mul- tiple cues,

Dau Cheng Lyu and Ren Yuan Lyu, “Language identiﬁcation on code-switching utterances using mul- tiple cues,” in Proc. of Interspeech, an Annual Conference of the International Speech Communication Association, 2008

work page 2008

[20] [20]

Semantics-based language modeling for Cantonese-English code-mixing speech recognition,

Houwei Cao, PC Ching, Tan Lee, and Yu Ting Yeung, “Semantics-based language modeling for Cantonese-English code-mixing speech recognition,” in Proc. of 7th International Symposium on Chi- nese Spoken Language Processing (ISCSLP) , 2010, pp. 246–250

work page 2010

[21] [21]

An integrated framework for transcribing Mandarin-English code-mixed lectures with improved acoustic and language modeling,

Ching Feng Yeh, Chao Yu Huang, Liang Che Sun, Che Liang, and Lin Shan Lee, “An integrated framework for transcribing Mandarin-English code-mixed lectures with improved acoustic and language modeling,” in Proc. of 7th International Symposium on Chinese Spoken Language Processing (ISCSLP) , 2010, pp. 214–219. 12

work page 2010

[22] [22]

Speech Synthesis of Code-Mixed Text.,

Sunayana Sitaram and Alan W Black, “Speech Synthesis of Code-Mixed Text.,” in Proc. of Language Resources and Evaluation Conference LREC , 2016

work page 2016

[23] [23]

Hindi-English Code-Switching Speech Corpus

Ganji Sreeram, Kunal Dhawan, and Rohit Sinha, “Hindi-English code-switching speech corpus,” arXiv:1810.00662, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[24] [24]

Connectionist temporal classiﬁcation: Labelling unsegmented sequence data with recurrent neural networks,

Alex Graves, Santiago Fern´ andez, Faustino Gomez, and J¨ urgen Schmidhuber, “Connectionist temporal classiﬁcation: Labelling unsegmented sequence data with recurrent neural networks,” in Proc. of the 23rd International Conference on Machine learning , 2006, pp. 369–376

work page 2006

[25] [25]

End-to-end continuous speech recognition using attention-based recurrent NN: First results,

Jan Chorowski, Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio, “End-to-end continuous speech recognition using attention-based recurrent NN: First results,” in Proc. of Deep Learning and Representation Learning Workshop, 2014

work page 2014

[26] [26]

End-to-end language identiﬁcation using attention-based recurrent neural networks,

Wang Geng, Wenfu Wang, Yuanyuan Zhao, Xinyuan Cai, Bo Xu, Cai Xinyuan, et al., “End-to-end language identiﬁcation using attention-based recurrent neural networks,” in Proc. of Interspeech, an Annual Conference of International Speech Communication Association , 2016

work page 2016

[27] [27]

Intrasentential vs. intersentential code switching in early and late bilinguals,

Kelly Ann Hill Zirker, “Intrasentential vs. intersentential code switching in early and late bilinguals,” 2007

work page 2007

[28] [28]

Listen, attend and spell: A neural network for large vocabulary conversational speech recognition,

William Chan, Navdeep Jaitly, Quoc Le, and Oriol Vinyals, “Listen, attend and spell: A neural network for large vocabulary conversational speech recognition,” in Proc. of International Conference on Acoustics, Speech and Signal Processing (ICASSP) , 2016, pp. 4960–4964

work page 2016

[29] [29]

Nabu: An end-to-end speech recognition toolkit,

Vincent, “Nabu: An end-to-end speech recognition toolkit,” [Online] https://vrenkens.github.io/ nabu/, Accessed: 2019-03-24

work page 2019

[30] [30]

Voting algorithms,

Behrooz Parhami, “Voting algorithms,” IEEE Transactions on Reliability , vol. 43, no. 4, pp. 617–629, 1994

work page 1994

[31] [31]

Towards End-to-End Code-Switching Speech Recognition

Ne Luo, Dongwei Jiang, Shuaijiang Zhao, Caixia Gong, Wei Zou, and Xiangang Li, “Towards end-to- end code-switching speech recognition,” arXiv preprint arXiv:1810.13091 , 2018. 13

work page internal anchor Pith review Pith/arXiv arXiv 2018