DLLM-VSR applies diffusion LLMs to VSR via masked denoising, two-stage training, and length-guided candidate decoding to reach 19.5% WER on LRS3.
Diffusion llm with native variable generation lengths: Let [eos] lead the way.arXiv preprint arXiv:2510.24605, 2025
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
iLLaDA is an 8B masked diffusion LM trained from scratch with bidirectional attention, reporting gains of 14-21 points on BBH, ARC, MATH and HumanEval over prior diffusion models while remaining competitive with Qwen2.5-7B.
VoidPadding decouples padding from termination in MDLMs via a new [VOID] token, delivering +17.84 average benchmark points and 55.7% fewer decoding steps on Dream-7B-Instruct.
citing papers explorer
-
Diffusion Large Language Models for Visual Speech Recognition
DLLM-VSR applies diffusion LLMs to VSR via masked denoising, two-stage training, and length-guided candidate decoding to reach 19.5% WER on LRS3.
-
Improved Large Language Diffusion Models
iLLaDA is an 8B masked diffusion LM trained from scratch with bidirectional attention, reporting gains of 14-21 points on BBH, ARC, MATH and HumanEval over prior diffusion models while remaining competitive with Qwen2.5-7B.
-
VoidPadding: Let [VOID] Handle Padding in Masked Diffusion Language Models so that [EOS] Can Focus on Semantic Termination
VoidPadding decouples padding from termination in MDLMs via a new [VOID] token, delivering +17.84 average benchmark points and 55.7% fewer decoding steps on Dream-7B-Instruct.