Learning to parallel: Accelerating diffusion large language models via learnable parallel decoding

Wenrui Bao, Zhiben Chen, Dan Xu, Yuzhang Shang · 2025 · arXiv 2509.25188

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

TAD: Temporal-Aware Trajectory Self-Distillation for Fast and Accurate Diffusion LLM

cs.CL · 2026-05-10 · unverdicted · novelty 7.0

TAD improves the accuracy-parallelism trade-off in diffusion LLMs via temporal-aware self-distillation that applies hard labels to soon-to-be-decoded tokens and soft supervision to future tokens.

STDec: Spatio-Temporal Stability Guided Decoding for dLLMs

cs.CL · 2026-04-07 · unverdicted · novelty 6.0

STDec raises dLLM decoding speed by up to 14x on benchmarks like MBPP by using observed spatio-temporal stability to create dynamic, token-specific confidence thresholds while preserving task performance.

citing papers explorer

Showing 2 of 2 citing papers.

TAD: Temporal-Aware Trajectory Self-Distillation for Fast and Accurate Diffusion LLM cs.CL · 2026-05-10 · unverdicted · none · ref 52
TAD improves the accuracy-parallelism trade-off in diffusion LLMs via temporal-aware self-distillation that applies hard labels to soon-to-be-decoded tokens and soft supervision to future tokens.
STDec: Spatio-Temporal Stability Guided Decoding for dLLMs cs.CL · 2026-04-07 · unverdicted · none · ref 1
STDec raises dLLM decoding speed by up to 14x on benchmarks like MBPP by using observed spatio-temporal stability to create dynamic, token-specific confidence thresholds while preserving task performance.

Learning to parallel: Accelerating diffusion large language models via learnable parallel decoding

fields

years

verdicts

representative citing papers

citing papers explorer