Towards end-to-end spoken language understanding

Anuj Kumar; Baiyang Liu; Christian Fuegen; Dmitriy Serdyuk; Yongqiang Wang; Yoshua Bengio

arxiv: 1802.08395 · v1 · pith:5JMJIFOJnew · submitted 2018-02-23 · 💻 cs.CL

Towards end-to-end spoken language understanding

Dmitriy Serdyuk , Yongqiang Wang , Christian Fuegen , Anuj Kumar , Baiyang Liu , Yoshua Bengio This is my paper

classification 💻 cs.CL

keywords languagesystemunderstandingaudiospokencomponentsdirectlyend-to-end

0 comments

read the original abstract

Spoken language understanding system is traditionally designed as a pipeline of a number of components. First, the audio signal is processed by an automatic speech recognizer for transcription or n-best hypotheses. With the recognition results, a natural language understanding system classifies the text to structured data as domain, intent and slots for down-streaming consumers, such as dialog system, hands-free applications. These components are usually developed and optimized independently. In this paper, we present our study on an end-to-end learning system for spoken language understanding. With this unified approach, we can infer the semantic meaning directly from audio features without the intermediate text representation. This study showed that the trained model can achieve reasonable good result and demonstrated that the model can capture the semantic attention directly from the audio features.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

End-to-End Voice Intent Recognition for Spontaneous Human-Drone Interaction with Naive Users
eess.AS 2026-06 unverdicted novelty 6.0

An end-to-end SLU architecture with frozen SSL acoustic encoder, LSTM classification head, and cross-modal distillation achieves 93% accuracy on simple commands and 82% on spontaneous speech at 7 ms latency on the new...