Wedlm: Reconciling diffusion language models with standard causal attention for fast inference

Liu, A · 2025 · arXiv 2512.22737

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

DMax: Aggressive Parallel Decoding for dLLMs

cs.LG · 2026-04-09 · conditional · novelty 7.0 · 2 refs

DMax uses On-Policy Uniform Training and Soft Parallel Decoding to enable aggressive parallelism in dLLMs, raising TPF on GSM8K from 2.04 to 5.47 and on MBPP from 2.71 to 5.86 while preserving accuracy.

Improving Sampling for Masked Diffusion Models via Information Gain

cs.CL · 2026-02-20 · unverdicted · novelty 7.0

Info-Gain Sampler improves MDM decoding by using bidirectional information gain to reduce cumulative uncertainty, outperforming greedy samplers on reasoning accuracy and creative writing tasks.

TEAM: Temporal-Spatial Consistency Guided Expert Activation for MoE Diffusion Language Model Acceleration

cs.CL · 2026-02-09 · unverdicted · novelty 7.0

TEAM accelerates MoE dLLMs up to 2.2x by exploiting temporal-spatial consistency in expert routing to accept more tokens with fewer activations.

Tweedie's Formulae and Diffusion Generative Models Beyond Gaussian

stat.ML · 2026-05-19 · unverdicted · novelty 5.0

Extends Tweedie's formulae to GBM, BESQ, and CIR processes to enable non-Gaussian diffusion generative models and empirical Bayes applications.

citing papers explorer

Showing 4 of 4 citing papers.

DMax: Aggressive Parallel Decoding for dLLMs cs.LG · 2026-04-09 · conditional · none · ref 45 · 2 links
DMax uses On-Policy Uniform Training and Soft Parallel Decoding to enable aggressive parallelism in dLLMs, raising TPF on GSM8K from 2.04 to 5.47 and on MBPP from 2.71 to 5.86 while preserving accuracy.
Improving Sampling for Masked Diffusion Models via Information Gain cs.CL · 2026-02-20 · unverdicted · none · ref 10
Info-Gain Sampler improves MDM decoding by using bidirectional information gain to reduce cumulative uncertainty, outperforming greedy samplers on reasoning accuracy and creative writing tasks.
TEAM: Temporal-Spatial Consistency Guided Expert Activation for MoE Diffusion Language Model Acceleration cs.CL · 2026-02-09 · unverdicted · none · ref 17
TEAM accelerates MoE dLLMs up to 2.2x by exploiting temporal-spatial consistency in expert routing to accept more tokens with fewer activations.
Tweedie's Formulae and Diffusion Generative Models Beyond Gaussian stat.ML · 2026-05-19 · unverdicted · none · ref 42
Extends Tweedie's formulae to GBM, BESQ, and CIR processes to enable non-Gaussian diffusion generative models and empirical Bayes applications.

Wedlm: Reconciling diffusion language models with standard causal attention for fast inference

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer