Bowen Shi, Wei-Ning Hsu, Kushal Lakhotia, and Ab- delrahman Mohamed

Transformer-based video front-ends for audiovisual speech recognition for single, multi-person video · 2022 · arXiv 2201.10439

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

read on arXiv browse 1 citing papers

representative citing papers

cs.AI · 2026-05-27 · unverdicted · novelty 7.0

DLLM-VSR applies diffusion LLMs to VSR via masked denoising, two-stage training, and length-guided candidate decoding to reach 19.5% WER on LRS3.

Showing 1 of 1 citing paper.

Diffusion Large Language Models for Visual Speech Recognition cs.AI · 2026-05-27 · unverdicted · none · ref 1
DLLM-VSR applies diffusion LLMs to VSR via masked denoising, two-stage training, and length-guided candidate decoding to reach 19.5% WER on LRS3.