pith. sign in

Dependency-Guided Parallel Decoding in Discrete Diffusion Language Models

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it
abstract

Discrete diffusion language models (dLLMs) accelerate text generation by unmasking multiple tokens in parallel. However, parallel decoding introduces a distributional mismatch: it approximates the joint conditional using a fully factorized product of per-token marginals, which degrades output quality when selected tokens are strongly dependent. We propose DEMASK (DEpendency-guided unMASKing), a lightweight dependency predictor that attaches to the final hidden states of a dLLM. In a single forward pass, it estimates pairwise conditional influences between masked positions. Using these predictions, a greedy selection algorithm identifies positions with bounded cumulative dependency for simultaneous unmasking. Under a sub-additivity assumption, we prove this bounds the total variation distance between our parallel sampling and the model's joint. Empirically, DEMASK achieves 1.7-2.2$\times$ speedup on Dream-7B while matching or improving accuracy compared to confidence-based and KL-based baselines.

fields

cs.LG 2 cs.CL 1

years

2026 3

verdicts

UNVERDICTED 3

clear filters

representative citing papers

Accelerating Discrete Diffusion Models with Parallel-In-Time Sampling

cs.LG · 2026-07-01 · unverdicted · novelty 6.0

A parallel-in-time τ-leaping sampler for absorbing discrete diffusion models is introduced, with an exponential-factorial convergence proof and empirical speedups of 7-9× on synthetic tasks and 1.45-1.86× on image/text tasks while using 50% fewer NFE.

citing papers explorer

Showing 1 of 1 citing paper after filters.

  • Supportive Token Revealing for Fast Diffusion Language Model Decoding cs.CL · 2026-06-02 · unverdicted · none · ref 6 · internal anchor

    AXON is a training-free module that selects supportive anchor tokens using attention, uncertainty, and confidence to improve the quality-latency trade-off in parallel decoding for diffusion language models.