pith. sign in

arxiv: 1701.02477 · v1 · pith:4F442CAZnew · submitted 2017-01-10 · 💻 cs.CL · cs.AI· cs.CV· cs.LG

Multi-task Learning Of Deep Neural Networks For Audio Visual Automatic Speech Recognition

classification 💻 cs.CL cs.AIcs.CVcs.LG
keywords modelvisualaudio-visualautomaticav-asrbase-linecomparedfeatures
0
0 comments X
read the original abstract

Multi-task learning (MTL) involves the simultaneous training of two or more related tasks over shared representations. In this work, we apply MTL to audio-visual automatic speech recognition(AV-ASR). Our primary task is to learn a mapping between audio-visual fused features and frame labels obtained from acoustic GMM/HMM model. This is combined with an auxiliary task which maps visual features to frame labels obtained from a separate visual GMM/HMM model. The MTL model is tested at various levels of babble noise and the results are compared with a base-line hybrid DNN-HMM AV-ASR model. Our results indicate that MTL is especially useful at higher level of noise. Compared to base-line, upto 7\% relative improvement in WER is reported at -3 SNR dB

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. A Multi-task Learning Balanced Attention Convolutional Neural Network Model for Few-shot Underwater Acoustic Target Recognition

    cs.SD 2025-04 unverdicted novelty 4.0

    MT-BCA-CNN achieves 97% accuracy and 95% F1-score on 27-class few-shot underwater acoustic target recognition by combining channel attention and multi-task learning on the Watkins Marine Life Dataset.

  2. LipReading with 3D-2D-CNN BLSTM-HMM and word-CTC models

    cs.CV 2019-06 unverdicted novelty 4.0

    3D-2D-CNN-BLSTM with word-CTC reaches 1.3% WER on GRID seen-speaker lipreading (55% relative gain over LCANet) and 8.6% on unseen speakers (24.5% gain over LipNet).