Diffusion llm with native variable generation lengths: Let [eos] lead the way.arXiv preprint arXiv:2510.24605, 2025

Yicun Yang, Cong Wang, Shaobo Wang, Zichen Wen, Biqing Qi, Hanlin Xu, Linfeng Zhang · 2025 · arXiv 2510.24605

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

Diffusion Large Language Models for Visual Speech Recognition

cs.AI · 2026-05-27 · unverdicted · novelty 7.0

DLLM-VSR applies diffusion LLMs to VSR via masked denoising, two-stage training, and length-guided candidate decoding to reach 19.5% WER on LRS3.

Improved Large Language Diffusion Models

cs.CL · 2026-06-24 · unverdicted · novelty 6.0

iLLaDA is an 8B masked diffusion LM trained from scratch with bidirectional attention, reporting gains of 14-21 points on BBH, ARC, MATH and HumanEval over prior diffusion models while remaining competitive with Qwen2.5-7B.

VoidPadding: Let [VOID] Handle Padding in Masked Diffusion Language Models so that [EOS] Can Focus on Semantic Termination

cs.CL · 2026-06-16 · unverdicted · novelty 6.0

VoidPadding decouples padding from termination in MDLMs via a new [VOID] token, delivering +17.84 average benchmark points and 55.7% fewer decoding steps on Dream-7B-Instruct.

citing papers explorer

Showing 3 of 3 citing papers.

Diffusion Large Language Models for Visual Speech Recognition cs.AI · 2026-05-27 · unverdicted · none · ref 2
DLLM-VSR applies diffusion LLMs to VSR via masked denoising, two-stage training, and length-guided candidate decoding to reach 19.5% WER on LRS3.
Improved Large Language Diffusion Models cs.CL · 2026-06-24 · unverdicted · none · ref 30
iLLaDA is an 8B masked diffusion LM trained from scratch with bidirectional attention, reporting gains of 14-21 points on BBH, ARC, MATH and HumanEval over prior diffusion models while remaining competitive with Qwen2.5-7B.
VoidPadding: Let [VOID] Handle Padding in Masked Diffusion Language Models so that [EOS] Can Focus on Semantic Termination cs.CL · 2026-06-16 · unverdicted · none · ref 1
VoidPadding decouples padding from termination in MDLMs via a new [VOID] token, delivering +17.84 average benchmark points and 55.7% fewer decoding steps on Dream-7B-Instruct.

Diffusion llm with native variable generation lengths: Let [eos] lead the way.arXiv preprint arXiv:2510.24605, 2025

fields

years

verdicts

representative citing papers

citing papers explorer