First-Pass Large Vocabulary Continuous Speech Recognition using Bi-Directional Recurrent DNNs

Andrew L. Maas; Andrew Y. Ng; Awni Y. Hannun; Daniel Jurafsky

arxiv: 1408.2873 · v2 · pith:PGNWK44Znew · submitted 2014-08-12 · 💻 cs.CL · cs.LG· cs.NE

First-Pass Large Vocabulary Continuous Speech Recognition using Bi-Directional Recurrent DNNs

Awni Y. Hannun , Andrew L. Maas , Daniel Jurafsky , Andrew Y. Ng This is my paper

classification 💻 cs.CL cs.LGcs.NE

keywords networkrecognitionspeechfirst-passneuralsystemsapproachbi-directional

0 comments

read the original abstract

We present a method to perform first-pass large vocabulary continuous speech recognition using only a neural network and language model. Deep neural network acoustic models are now commonplace in HMM-based speech recognition systems, but building such systems is a complex, domain-specific task. Recent work demonstrated the feasibility of discarding the HMM sequence modeling framework by directly predicting transcript text from audio. This paper extends this approach in two ways. First, we demonstrate that a straightforward recurrent neural network architecture can achieve a high level of accuracy. Second, we propose and evaluate a modified prefix-search decoding algorithm. This approach to decoding enables first-pass speech recognition with a language model, completely unaided by the cumbersome infrastructure of HMM-based systems. Experiments on the Wall Street Journal corpus demonstrate fairly competitive word error rates, and the importance of bi-directional network recurrence.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

CANDLE: Character-level Arabic Noise Deduplication using Lightweight Encoder
cs.CL 2026-06 unverdicted novelty 6.0

CANDLE uses CTC on lightweight character encoders for Arabic noise deduplication, reporting 5.37% SER on benchmarks and up to 12.8% tokenizer fertility reduction.