pith. sign in

Dependency-Guided Parallel Decoding in Discrete Diffusion Language Models

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it
abstract

Discrete diffusion language models (dLLMs) accelerate text generation by unmasking multiple tokens in parallel. However, parallel decoding introduces a distributional mismatch: it approximates the joint conditional using a fully factorized product of per-token marginals, which degrades output quality when selected tokens are strongly dependent. We propose DEMASK (DEpendency-guided unMASKing), a lightweight dependency predictor that attaches to the final hidden states of a dLLM. In a single forward pass, it estimates pairwise conditional influences between masked positions. Using these predictions, a greedy selection algorithm identifies positions with bounded cumulative dependency for simultaneous unmasking. Under a sub-additivity assumption, we prove this bounds the total variation distance between our parallel sampling and the model's joint. Empirically, DEMASK achieves 1.7-2.2$\times$ speedup on Dream-7B while matching or improving accuracy compared to confidence-based and KL-based baselines.

fields

cs.LG 2 cs.CL 1

years

2026 3

verdicts

UNVERDICTED 3

clear filters

representative citing papers

Accelerating Discrete Diffusion Models with Parallel-In-Time Sampling

cs.LG · 2026-07-01 · unverdicted · novelty 6.0

A parallel-in-time τ-leaping sampler for absorbing discrete diffusion models is introduced, with an exponential-factorial convergence proof and empirical speedups of 7-9× on synthetic tasks and 1.45-1.86× on image/text tasks while using 50% fewer NFE.

citing papers explorer

Showing 2 of 2 citing papers after filters.