ADAS improves low-NFE performance by 9-10 percentage points on math and code tasks by greedily discounting attention-strong candidates during subset construction in masked diffusion decoding.
DAPD: Dependency-Aware Parallel Decoding via Attention for Diffusion LLMs
3 Pith papers cite this work. Polarity classification is still indexing.
abstract
Parallel decoding for Diffusion LLMs (dLLMs) is difficult because each denoising step provides only token-wise marginal distributions, while unmasking multiple tokens simultaneously requires accounting for inter-token dependencies. We propose Dependency-Aware Parallel Decoding (DAPD), a simple, training-free decoding method that uses self-attention to induce a conditional dependency graph over masked tokens. At each iteration, edges in this graph capture strong token interactions, while non-edges indicate weak dependence. Parallel decoding is then reduced to selecting an independent set on the graph and unmasking the selected tokens in parallel. This avoids co-updating strongly coupled tokens without auxiliary models or retraining. Experiments on LLaDA and Dream show that DAPD improves the accuracy-steps trade-off over existing methods and enables more globally distributed parallel updates that better exploit the any-order generation capability of dLLMs. The project is available at https://ai-isl.github.io/dapd
years
2026 3verdicts
UNVERDICTED 3representative citing papers
AXON is a training-free module that selects supportive anchor tokens using attention, uncertainty, and confidence to improve the quality-latency trade-off in parallel decoding for diffusion language models.
VRCD prioritizes visually complementary positions during parallel decoding in dMLLMs by measuring attention overlap with the new Visual Redundancy Index, yielding accuracy gains over confidence-based baselines on M^3CoT and MMBench.
citing papers explorer
-
Attention-Discounted Adaptive Sampler for Masked Diffusion Language Models
ADAS improves low-NFE performance by 9-10 percentage points on math and code tasks by greedily discounting attention-strong candidates during subset construction in masked diffusion decoding.
-
Supportive Token Revealing for Fast Diffusion Language Model Decoding
AXON is a training-free module that selects supportive anchor tokens using attention, uncertainty, and confidence to improve the quality-latency trade-off in parallel decoding for diffusion language models.