Infinite Mask Diffusion Models use stochastic infinite-state masks to overcome the factorization error lower bound in standard masked diffusion, achieving superior few-step performance on language tasks via distillation.
hub
Seed diffusion: A large-scale diffusion language model with high-speed inference
16 Pith papers cite this work. Polarity classification is still indexing.
hub tools
years
2026 16representative citing papers
BadDLM implants effective backdoors in diffusion language models across concept, attribute, alignment, and payload targets by exploiting denoising dynamics while preserving clean performance.
LEAP detects early-converging tokens in dLLMs via future context filtering and multi-sequence superposition, reducing average denoising steps by about 30% while maintaining accuracy.
Energy-navigated trajectory shaping during training produces 8-step discrete flow matching students that achieve 32% lower perplexity than 1024-step teachers on 170M language models with unchanged inference cost.
A cascaded large-small model system generates edit sketches with the large model and applies them with the small model to make code editing both accurate and token-efficient.
NI Sampling accelerates discrete diffusion language models up to 14.3 times by training a neural indicator to select which tokens to sample at each step using a trajectory-preserving objective.
Diffusion LLMs hallucinate more than autoregressive models and display distinct failure modes including premature termination, incomplete denoising, and context intrusion.
Bell-shaped time sampling accelerates masked diffusion language model training by roughly 4x on LM1B by countering locality bias in language data.
TABOM models inference unmasking preferences as a Boltzmann distribution over predictive entropies and derives a ranking loss to align DLM training with observed trajectories, yielding gains in new domains and reduced catastrophic forgetting versus standard SFT.
ELF is a continuous embedding-space flow matching model for language that stays continuous until the last step and outperforms prior discrete and continuous diffusion language models with fewer sampling steps.
ME-DLM augments parallel masked diffusion models with edit-distance-supervised refinements to raise quality on coding and math benchmarks while using far fewer diffusion steps.
ReflectDrive-2 combines masked discrete diffusion with RL-aligned self-editing to generate and refine driving trajectories, reaching 91.0 PDMS on NAVSIM camera-only and 94.8 in best-of-6.
Stability-Weighted Decoding improves diffusion LLM accuracy by modulating token scores with temporal stability from KL divergence between prediction steps.
DLMs exhibit lower n-gram entropy, higher semantic coherence, and higher semantic diversity than ARMs, primarily due to bidirectional context and remasking decoding strategies.
DMax enables faster parallel decoding in diffusion language models by using on-policy training to recover from errors and soft embedding interpolations for iterative revision, boosting tokens per forward pass roughly 2-3x on benchmarks while preserving accuracy.
Diffusion coding model CoDA shows smaller accuracy drops than Qwen3-1.7B under 2-4 bit quantization on HumanEval and MBPP.
citing papers explorer
-
Infinite Mask Diffusion for Few-Step Distillation
Infinite Mask Diffusion Models use stochastic infinite-state masks to overcome the factorization error lower bound in standard masked diffusion, achieving superior few-step performance on language tasks via distillation.
-
BadDLM: Backdooring Diffusion Language Models with Diverse Targets
BadDLM implants effective backdoors in diffusion language models across concept, attribute, alignment, and payload targets by exploiting denoising dynamics while preserving clean performance.
-
LEAP: Unlocking dLLM Parallelism via Lookahead Early-Convergence Token Detection
LEAP detects early-converging tokens in dLLMs via future context filtering and multi-sequence superposition, reducing average denoising steps by about 30% while maintaining accuracy.
-
Trajectory as the Teacher: Few-Step Discrete Flow Matching via Energy-Navigated Distillation
Energy-navigated trajectory shaping during training produces 8-step discrete flow matching students that achieve 32% lower perplexity than 1024-step teachers on 170M language models with unchanged inference cost.
-
Cascaded Code Editing: Large-Small Model Collaboration for Effective and Efficient Code Editing
A cascaded large-small model system generates edit sketches with the large model and applies them with the small model to make code editing both accurate and token-efficient.
-
NI Sampling: Accelerating Discrete Diffusion Sampling by Token Order Optimization
NI Sampling accelerates discrete diffusion language models up to 14.3 times by training a neural indicator to select which tokens to sample at each step using a trajectory-preserving objective.
-
Lost in Diffusion: Uncovering Hallucination Patterns and Failure Modes in Diffusion Large Language Models
Diffusion LLMs hallucinate more than autoregressive models and display distinct failure modes including premature termination, incomplete denoising, and context intrusion.
-
Understanding and Accelerating the Training of Masked Diffusion Language Models
Bell-shaped time sampling accelerates masked diffusion language model training by roughly 4x on LM1B by countering locality bias in language data.
-
Self-Distilled Trajectory-Aware Boltzmann Modeling: Bridging the Training-Inference Discrepancy in Diffusion Language Models
TABOM models inference unmasking preferences as a Boltzmann distribution over predictive entropies and derives a ranking loss to align DLM training with observed trajectories, yielding gains in new domains and reduced catastrophic forgetting versus standard SFT.
-
ELF: Embedded Language Flows
ELF is a continuous embedding-space flow matching model for language that stays continuous until the last step and outperforms prior discrete and continuous diffusion language models with fewer sampling steps.
-
Edit-Based Refinement for Parallel Masked Diffusion Language Models
ME-DLM augments parallel masked diffusion models with edit-distance-supervised refinements to raise quality on coding and math benchmarks while using far fewer diffusion steps.
-
ReflectDrive-2: Reinforcement-Learning-Aligned Self-Editing for Discrete Diffusion Driving
ReflectDrive-2 combines masked discrete diffusion with RL-aligned self-editing to generate and refine driving trajectories, reaching 91.0 PDMS on NAVSIM camera-only and 94.8 in best-of-6.
-
Stability-Weighted Decoding for Diffusion Language Models
Stability-Weighted Decoding improves diffusion LLM accuracy by modulating token scores with temporal stability from KL divergence between prediction steps.
-
Differences in Text Generated by Diffusion and Autoregressive Language Models
DLMs exhibit lower n-gram entropy, higher semantic coherence, and higher semantic diversity than ARMs, primarily due to bidirectional context and remasking decoding strategies.
-
DMax: Aggressive Parallel Decoding for dLLMs
DMax enables faster parallel decoding in diffusion language models by using on-policy training to recover from errors and soft embedding interpolations for iterative revision, boosting tokens per forward pass roughly 2-3x on benchmarks while preserving accuracy.
-
On the Quantization Robustness of Diffusion Language Models in Coding Benchmarks
Diffusion coding model CoDA shows smaller accuracy drops than Qwen3-1.7B under 2-4 bit quantization on HumanEval and MBPP.