A parallel-in-time τ-leaping sampler for absorbing discrete diffusion models is introduced, with an exponential-factorial convergence proof and empirical speedups of 7-9× on synthetic tasks and 1.45-1.86× on image/text tasks while using 50% fewer NFE.
Dependency-Guided Parallel Decoding in Discrete Diffusion Language Models
3 Pith papers cite this work. Polarity classification is still indexing.
abstract
Discrete diffusion language models (dLLMs) accelerate text generation by unmasking multiple tokens in parallel. However, parallel decoding introduces a distributional mismatch: it approximates the joint conditional using a fully factorized product of per-token marginals, which degrades output quality when selected tokens are strongly dependent. We propose DEMASK (DEpendency-guided unMASKing), a lightweight dependency predictor that attaches to the final hidden states of a dLLM. In a single forward pass, it estimates pairwise conditional influences between masked positions. Using these predictions, a greedy selection algorithm identifies positions with bounded cumulative dependency for simultaneous unmasking. Under a sub-additivity assumption, we prove this bounds the total variation distance between our parallel sampling and the model's joint. Empirically, DEMASK achieves 1.7-2.2$\times$ speedup on Dream-7B while matching or improving accuracy compared to confidence-based and KL-based baselines.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
AXON is a training-free module that selects supportive anchor tokens using attention, uncertainty, and confidence to improve the quality-latency trade-off in parallel decoding for diffusion language models.
VRCD prioritizes visually complementary positions during parallel decoding in dMLLMs by measuring attention overlap with the new Visual Redundancy Index, yielding accuracy gains over confidence-based baselines on M^3CoT and MMBench.
citing papers explorer
-
Supportive Token Revealing for Fast Diffusion Language Model Decoding
AXON is a training-free module that selects supportive anchor tokens using attention, uncertainty, and confidence to improve the quality-latency trade-off in parallel decoding for diffusion language models.