hub

Seed diffusion: A large-scale diffusion language model with high-speed inference

Yuxuan Song, Zheng Zhang, Cheng Luo, Pengyang Gao, Fan Xia, Hao Luo, Zheng Li, Yuehang Yang, Hongli Yu, Xingwei Qu, et al · 2025 · arXiv 2508.02193

16 Pith papers cite this work. Polarity classification is still indexing.

16 Pith papers citing it

read on arXiv browse 16 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

representative citing papers

Infinite Mask Diffusion for Few-Step Distillation

cs.CL · 2026-05-11 · unverdicted · novelty 7.0

Infinite Mask Diffusion Models use stochastic infinite-state masks to overcome the factorization error lower bound in standard masked diffusion, achieving superior few-step performance on language tasks via distillation.

BadDLM: Backdooring Diffusion Language Models with Diverse Targets

cs.CR · 2026-05-10 · unverdicted · novelty 7.0

BadDLM implants effective backdoors in diffusion language models across concept, attribute, alignment, and payload targets by exploiting denoising dynamics while preserving clean performance.

LEAP: Unlocking dLLM Parallelism via Lookahead Early-Convergence Token Detection

cs.LG · 2026-05-09 · unverdicted · novelty 7.0

LEAP detects early-converging tokens in dLLMs via future context filtering and multi-sequence superposition, reducing average denoising steps by about 30% while maintaining accuracy.

Trajectory as the Teacher: Few-Step Discrete Flow Matching via Energy-Navigated Distillation

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

Energy-navigated trajectory shaping during training produces 8-step discrete flow matching students that achieve 32% lower perplexity than 1024-step teachers on 170M language models with unchanged inference cost.

Cascaded Code Editing: Large-Small Model Collaboration for Effective and Efficient Code Editing

cs.SE · 2026-04-21 · unverdicted · novelty 7.0

A cascaded large-small model system generates edit sketches with the large model and applies them with the small model to make code editing both accurate and token-efficient.

NI Sampling: Accelerating Discrete Diffusion Sampling by Token Order Optimization

cs.LG · 2026-04-20 · unverdicted · novelty 7.0

NI Sampling accelerates discrete diffusion language models up to 14.3 times by training a neural indicator to select which tokens to sample at each step using a trajectory-preserving objective.

Lost in Diffusion: Uncovering Hallucination Patterns and Failure Modes in Diffusion Large Language Models

cs.CL · 2026-04-12 · unverdicted · novelty 7.0

Diffusion LLMs hallucinate more than autoregressive models and display distinct failure modes including premature termination, incomplete denoising, and context intrusion.

Understanding and Accelerating the Training of Masked Diffusion Language Models

cs.LG · 2026-05-13 · conditional · novelty 6.0

Bell-shaped time sampling accelerates masked diffusion language model training by roughly 4x on LM1B by countering locality bias in language data.

Self-Distilled Trajectory-Aware Boltzmann Modeling: Bridging the Training-Inference Discrepancy in Diffusion Language Models

cs.CL · 2026-05-12 · unverdicted · novelty 6.0

TABOM models inference unmasking preferences as a Boltzmann distribution over predictive entropies and derives a ranking loss to align DLM training with observed trajectories, yielding gains in new domains and reduced catastrophic forgetting versus standard SFT.

ELF: Embedded Language Flows

cs.CL · 2026-05-11 · unverdicted · novelty 6.0

ELF is a continuous embedding-space flow matching model for language that stays continuous until the last step and outperforms prior discrete and continuous diffusion language models with fewer sampling steps.

Edit-Based Refinement for Parallel Masked Diffusion Language Models

cs.CL · 2026-05-10 · unverdicted · novelty 6.0

ME-DLM augments parallel masked diffusion models with edit-distance-supervised refinements to raise quality on coding and math benchmarks while using far fewer diffusion steps.

ReflectDrive-2: Reinforcement-Learning-Aligned Self-Editing for Discrete Diffusion Driving

cs.RO · 2026-05-06 · unverdicted · novelty 6.0 · 2 refs

ReflectDrive-2 combines masked discrete diffusion with RL-aligned self-editing to generate and refine driving trajectories, reaching 91.0 PDMS on NAVSIM camera-only and 94.8 in best-of-6.

Stability-Weighted Decoding for Diffusion Language Models

cs.CL · 2026-04-18 · unverdicted · novelty 6.0

Stability-Weighted Decoding improves diffusion LLM accuracy by modulating token scores with temporal stability from KL divergence between prediction steps.

Differences in Text Generated by Diffusion and Autoregressive Language Models

cs.CL · 2026-04-04 · unverdicted · novelty 6.0

DLMs exhibit lower n-gram entropy, higher semantic coherence, and higher semantic diversity than ARMs, primarily due to bidirectional context and remasking decoding strategies.

DMax: Aggressive Parallel Decoding for dLLMs

cs.LG · 2026-04-09 · unverdicted · novelty 5.0

DMax enables faster parallel decoding in diffusion language models by using on-policy training to recover from errors and soft embedding interpolations for iterative revision, boosting tokens per forward pass roughly 2-3x on benchmarks while preserving accuracy.

On the Quantization Robustness of Diffusion Language Models in Coding Benchmarks

cs.LG · 2026-04-22 · unverdicted · novelty 4.0

Diffusion coding model CoDA shows smaller accuracy drops than Qwen3-1.7B under 2-4 bit quantization on HumanEval and MBPP.

citing papers explorer

Showing 16 of 16 citing papers.

Infinite Mask Diffusion for Few-Step Distillation cs.CL · 2026-05-11 · unverdicted · none · ref 9
Infinite Mask Diffusion Models use stochastic infinite-state masks to overcome the factorization error lower bound in standard masked diffusion, achieving superior few-step performance on language tasks via distillation.
BadDLM: Backdooring Diffusion Language Models with Diverse Targets cs.CR · 2026-05-10 · unverdicted · none · ref 46
BadDLM implants effective backdoors in diffusion language models across concept, attribute, alignment, and payload targets by exploiting denoising dynamics while preserving clean performance.
LEAP: Unlocking dLLM Parallelism via Lookahead Early-Convergence Token Detection cs.LG · 2026-05-09 · unverdicted · none · ref 13
LEAP detects early-converging tokens in dLLMs via future context filtering and multi-sequence superposition, reducing average denoising steps by about 30% while maintaining accuracy.
Trajectory as the Teacher: Few-Step Discrete Flow Matching via Energy-Navigated Distillation cs.LG · 2026-05-08 · unverdicted · none · ref 77
Energy-navigated trajectory shaping during training produces 8-step discrete flow matching students that achieve 32% lower perplexity than 1024-step teachers on 170M language models with unchanged inference cost.
Cascaded Code Editing: Large-Small Model Collaboration for Effective and Efficient Code Editing cs.SE · 2026-04-21 · unverdicted · none · ref 50
A cascaded large-small model system generates edit sketches with the large model and applies them with the small model to make code editing both accurate and token-efficient.
NI Sampling: Accelerating Discrete Diffusion Sampling by Token Order Optimization cs.LG · 2026-04-20 · unverdicted · none · ref 9
NI Sampling accelerates discrete diffusion language models up to 14.3 times by training a neural indicator to select which tokens to sample at each step using a trajectory-preserving objective.
Lost in Diffusion: Uncovering Hallucination Patterns and Failure Modes in Diffusion Large Language Models cs.CL · 2026-04-12 · unverdicted · none · ref 4
Diffusion LLMs hallucinate more than autoregressive models and display distinct failure modes including premature termination, incomplete denoising, and context intrusion.
Understanding and Accelerating the Training of Masked Diffusion Language Models cs.LG · 2026-05-13 · conditional · none · ref 64
Bell-shaped time sampling accelerates masked diffusion language model training by roughly 4x on LM1B by countering locality bias in language data.
Self-Distilled Trajectory-Aware Boltzmann Modeling: Bridging the Training-Inference Discrepancy in Diffusion Language Models cs.CL · 2026-05-12 · unverdicted · none · ref 13
TABOM models inference unmasking preferences as a Boltzmann distribution over predictive entropies and derives a ranking loss to align DLM training with observed trajectories, yielding gains in new domains and reduced catastrophic forgetting versus standard SFT.
ELF: Embedded Language Flows cs.CL · 2026-05-11 · unverdicted · none · ref 65
ELF is a continuous embedding-space flow matching model for language that stays continuous until the last step and outperforms prior discrete and continuous diffusion language models with fewer sampling steps.
Edit-Based Refinement for Parallel Masked Diffusion Language Models cs.CL · 2026-05-10 · unverdicted · none · ref 26
ME-DLM augments parallel masked diffusion models with edit-distance-supervised refinements to raise quality on coding and math benchmarks while using far fewer diffusion steps.
ReflectDrive-2: Reinforcement-Learning-Aligned Self-Editing for Discrete Diffusion Driving cs.RO · 2026-05-06 · unverdicted · none · ref 120 · 2 links
ReflectDrive-2 combines masked discrete diffusion with RL-aligned self-editing to generate and refine driving trajectories, reaching 91.0 PDMS on NAVSIM camera-only and 94.8 in best-of-6.
Stability-Weighted Decoding for Diffusion Language Models cs.CL · 2026-04-18 · unverdicted · none · ref 14
Stability-Weighted Decoding improves diffusion LLM accuracy by modulating token scores with temporal stability from KL divergence between prediction steps.
Differences in Text Generated by Diffusion and Autoregressive Language Models cs.CL · 2026-04-04 · unverdicted · none · ref 30
DLMs exhibit lower n-gram entropy, higher semantic coherence, and higher semantic diversity than ARMs, primarily due to bidirectional context and remasking decoding strategies.
DMax: Aggressive Parallel Decoding for dLLMs cs.LG · 2026-04-09 · unverdicted · none · ref 73
DMax enables faster parallel decoding in diffusion language models by using on-policy training to recover from errors and soft embedding interpolations for iterative revision, boosting tokens per forward pass roughly 2-3x on benchmarks while preserving accuracy.
On the Quantization Robustness of Diffusion Language Models in Coding Benchmarks cs.LG · 2026-04-22 · unverdicted · none · ref 22
Diffusion coding model CoDA shows smaller accuracy drops than Qwen3-1.7B under 2-4 bit quantization on HumanEval and MBPP.

Seed diffusion: A large-scale diffusion language model with high-speed inference

hub tools

fields

years

verdicts

representative citing papers

citing papers explorer