Dependency-Guided Parallel Decoding in Discrete Diffusion Language Models

· 2026 · cs.CL · arXiv 2604.02560

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open full Pith review browse 3 citing papers arXiv PDF

abstract

Discrete diffusion language models (dLLMs) accelerate text generation by unmasking multiple tokens in parallel. However, parallel decoding introduces a distributional mismatch: it approximates the joint conditional using a fully factorized product of per-token marginals, which degrades output quality when selected tokens are strongly dependent. We propose DEMASK (DEpendency-guided unMASKing), a lightweight dependency predictor that attaches to the final hidden states of a dLLM. In a single forward pass, it estimates pairwise conditional influences between masked positions. Using these predictions, a greedy selection algorithm identifies positions with bounded cumulative dependency for simultaneous unmasking. Under a sub-additivity assumption, we prove this bounds the total variation distance between our parallel sampling and the model's joint. Empirically, DEMASK achieves 1.7-2.2$\times$ speedup on Dream-7B while matching or improving accuracy compared to confidence-based and KL-based baselines.

representative citing papers

Accelerating Discrete Diffusion Models with Parallel-In-Time Sampling

cs.LG · 2026-07-01 · unverdicted · novelty 6.0

A parallel-in-time τ-leaping sampler for absorbing discrete diffusion models is introduced, with an exponential-factorial convergence proof and empirical speedups of 7-9× on synthetic tasks and 1.45-1.86× on image/text tasks while using 50% fewer NFE.

Supportive Token Revealing for Fast Diffusion Language Model Decoding

cs.CL · 2026-06-02 · unverdicted · novelty 6.0

AXON is a training-free module that selects supportive anchor tokens using attention, uncertainty, and confidence to improve the quality-latency trade-off in parallel decoding for diffusion language models.

Visual-Redundancy-Controlled Parallel Decoding for Diffusion-Based Multimodal Large Language Models

cs.LG · 2026-05-25 · unverdicted · novelty 6.0

VRCD prioritizes visually complementary positions during parallel decoding in dMLLMs by measuring attention overlap with the new Visual Redundancy Index, yielding accuracy gains over confidence-based baselines on M^3CoT and MMBench.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Accelerating Discrete Diffusion Models with Parallel-In-Time Sampling cs.LG · 2026-07-01 · unverdicted · none · ref 12 · internal anchor
A parallel-in-time τ-leaping sampler for absorbing discrete diffusion models is introduced, with an exponential-factorial convergence proof and empirical speedups of 7-9× on synthetic tasks and 1.45-1.86× on image/text tasks while using 50% fewer NFE.
Visual-Redundancy-Controlled Parallel Decoding for Diffusion-Based Multimodal Large Language Models cs.LG · 2026-05-25 · unverdicted · none · ref 19 · internal anchor
VRCD prioritizes visually complementary positions during parallel decoding in dMLLMs by measuring attention overlap with the new Visual Redundancy Index, yielding accuracy gains over confidence-based baselines on M^3CoT and MMBench.

Dependency-Guided Parallel Decoding in Discrete Diffusion Language Models

fields

years

verdicts

representative citing papers

citing papers explorer