Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

Adam Coates, Andrew Ng, Awni Hannun, Billy Jun, Bo Xiao, Bryan Catanzaro, Carl Case, Chong Wang, Christopher Fougner, Dani Yogatama, Dario Amodei, David Seetapun, Eric Battenberg, Erich Elsen, Greg Diamos, Jared Casper, Jesse Engel, Jingdong Chen, Jonathan Raiman, Jun Zhan, Libby Lin, Linxi Fan, Mike Chrzanowski, Patrick LeGresley, Rishita Anubhai, Ryan Prenger, Sanjeev Satheesh, Sharan Narang, Sherjil Ozair, Shubho Sengupta, Tony Han, Yi Wang, Zhenyao Zhu, Zhiqian Wang

Authors on Pith no claims yet

classification 💻 cs.CL

keywords end-to-endspeechsystemapproachbecausedeepdifferentenglish

0 comments

read the original abstract

We show that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech--two vastly different languages. Because it replaces entire pipelines of hand-engineered components with neural networks, end-to-end learning allows us to handle a diverse variety of speech including noisy environments, accents and different languages. Key to our approach is our application of HPC techniques, resulting in a 7x speedup over our previous system. Because of this efficiency, experiments that previously took weeks now run in days. This enables us to iterate more quickly to identify superior architectures and algorithms. As a result, in several cases, our system is competitive with the transcription of human workers when benchmarked on standard datasets. Finally, using a technique called Batch Dispatch with GPUs in the data center, we show that our system can be inexpensively deployed in an online setting, delivering low latency when serving users at scale.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
cs.LG 2017-01 accept novelty 8.0

A noisy top-k gated mixture-of-experts layer between LSTMs scales neural networks to 137B parameters with sub-linear compute, beating SOTA on language modeling and machine translation.
Concrete Problems in AI Safety
cs.AI 2016-06 accept novelty 7.0

The paper categorizes five concrete AI safety problems arising from flawed objectives, costly evaluation, and learning dynamics.