hub Canonical reference

Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference

Yuxuan Song, Zheng Zhang, Cheng Luo, Pengyang Gao, Fan Xia, Hao Luo · 2025 · cs.CL · arXiv 2508.02193

Canonical reference. 80% of citing Pith papers cite this work as background.

38 Pith papers citing it

Background 80% of classified citations

open full Pith review browse 38 citing papers arXiv PDF

abstract

We present Seed Diffusion Preview, a large-scale language model based on discrete-state diffusion, offering remarkably fast inference speed. Thanks to non-sequential, parallel generation, discrete diffusion models provide a notable speedup to mitigate the inherent latency of token-by-token decoding, as demonstrated recently (e.g., Mercury Coder, Gemini Diffusion). Seed Diffusion Preview achieves an inference speed of 2,146 token/s over H20 GPUs while maintaining competitive performance across a sweep of standard code evaluation benchmarks, significantly faster than contemporary Mercury and Gemini Diffusion, establishing new state of the art on the speed-quality Pareto frontier for code models.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 4 method 1

citation-polarity summary

background 4 use method 1

representative citing papers

Reinforcement Learning for Diffusion LLMs with Entropy-Guided Step Selection and Stepwise Advantages

cs.LG · 2026-03-13 · unverdicted · novelty 8.0

Derives an exact unbiased policy gradient for RL post-training of diffusion LLMs via entropy-guided step selection and one-step denoising rewards, achieving state-of-the-art results on coding and logical reasoning benchmarks.

Read the Trace, Steer the Path: Trajectory-Aware Reinforcement Learning for Diffusion Language Models

cs.CL · 2026-06-03 · unverdicted · novelty 7.0

CAPR is a new dLLM-RL method that uses cached trajectory states and block-wise reward redistribution from the denoising trace to deliver tree-like supervision at 0.75x flat and 0.6x tree rollout compute, achieving SOTA on Sudoku, Countdown, GSM8K and Math500.

MaskForge: Structure-Aware Adaptive Attacks for Jailbreaking Diffusion Large Language Models

cs.CR · 2026-06-01 · unverdicted · novelty 7.0

MaskForge reaches 79.3% average attack success rate on five dLLMs by adaptively searching and accumulating structural attack patterns with a UCB bandit, improving 17.6% over baselines and transferring to 88.2% on AdvBench.

Extracting Training Data from Diffusion Language Models via Infilling

cs.CL · 2026-05-22 · unverdicted · novelty 7.0

Infilling extraction on diffusion language models extracts up to three times more verbatim sequences than prefix methods and achieves higher recall on redacted emails than autoregressive models.

From Table to Cell: Attention for Better Reasoning with TABALIGN

cs.AI · 2026-05-14 · unverdicted · novelty 7.0

TABALIGN pairs a diffusion language model planner emitting binary cell masks with a trained attention verifier, raising average accuracy 15.76 points over strong baselines on eight table benchmarks while speeding execution 44.64%.

Support Before Frequency in Discrete Diffusion

cs.LG · 2026-05-13 · unverdicted · novelty 7.0

Discrete diffusion models learn data support before frequencies because the exact reverse process decomposes edits into a dominant validity scale and a finer probability coefficient.

Self-Distilled Trajectory-Aware Boltzmann Modeling: Bridging the Training-Inference Discrepancy in Diffusion Language Models

cs.CL · 2026-05-12 · unverdicted · novelty 7.0 · 2 refs

TABOM is a trajectory-aligned Boltzmann modeling framework that turns self-distilled inference paths into a pairwise ranking loss to close the training-inference gap in diffusion language models and expand their effective capabilities.

Infinite Mask Diffusion for Few-Step Distillation

cs.CL · 2026-05-11 · unverdicted · novelty 7.0

Infinite Mask Diffusion Models use stochastic infinite-state masks to overcome the factorization error lower bound in standard masked diffusion, achieving superior few-step performance on language tasks via distillation.

BadDLM: Backdooring Diffusion Language Models with Diverse Targets

cs.CR · 2026-05-10 · unverdicted · novelty 7.0

BadDLM implants effective backdoors in diffusion language models across concept, attribute, alignment, and payload targets by exploiting denoising dynamics while preserving clean performance.

LEAP: Unlocking dLLM Parallelism via Lookahead Early-Convergence Token Detection

cs.LG · 2026-05-09 · unverdicted · novelty 7.0

LEAP detects early-converging tokens in dLLMs via future context filtering and multi-sequence superposition, reducing average denoising steps by about 30% while maintaining accuracy.

Trajectory as the Teacher: Few-Step Discrete Flow Matching via Energy-Navigated Distillation

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

Energy-navigated trajectory shaping during training produces 8-step discrete flow matching students that achieve 32% lower perplexity than 1024-step teachers on 170M language models with unchanged inference cost.

Cascaded Code Editing: Large-Small Model Collaboration for Effective and Efficient Code Editing

cs.SE · 2026-04-21 · unverdicted · novelty 7.0

A cascaded large-small model system generates edit sketches with the large model and applies them with the small model to make code editing both accurate and token-efficient.

NI Sampling: Accelerating Discrete Diffusion Sampling by Token Order Optimization

cs.LG · 2026-04-20 · unverdicted · novelty 7.0

NI Sampling accelerates discrete diffusion language models up to 14.3 times by training a neural indicator to select which tokens to sample at each step using a trajectory-preserving objective.

Lost in Diffusion: Uncovering Hallucination Patterns and Failure Modes in Diffusion Large Language Models

cs.CL · 2026-04-12 · unverdicted · novelty 7.0

Diffusion LLMs hallucinate more than autoregressive models and display distinct failure modes including premature termination, incomplete denoising, and context intrusion.

DMax: Aggressive Parallel Decoding for dLLMs

cs.LG · 2026-04-09 · conditional · novelty 7.0 · 2 refs

DMax uses On-Policy Uniform Training and Soft Parallel Decoding to enable aggressive parallelism in dLLMs, raising TPF on GSM8K from 2.04 to 5.47 and on MBPP from 2.71 to 5.86 while preserving accuracy.

Locally Coherent Parallel Decoding in Diffusion Language Models

cs.CL · 2026-03-03 · unverdicted · novelty 7.0

CoDiLA adds a compact auxiliary AR model on diffusion latents to enforce local sequential validity during parallel token sampling in discrete diffusion language models.

Nemotron-Labs-Diffusion-Image: Advancing Masked Discrete Diffusion for High-Resolution Image Synthesis

cs.CV · 2026-06-29 · unverdicted · novelty 6.0

A masked discrete diffusion model adds token editing at inference and grouped cross-entropy training to reach 0.90 GenEval, 86.9 DPG, and 10.76 HPSv3 scores.

DiLaServe: High SLO Attainment Serving for Diffusion Language Models

cs.LG · 2026-06-27 · unverdicted · novelty 6.0

DiLaServe improves SLO attainment for diffusion language models by up to 56.6 percentage points and reduces latency by up to 46% with less than 1% accuracy drop via deadline-aware scheduling and dynamic reconfiguration.

NAVIRA: Decoupled Stochastic Remasking for Masked Diffusion Language Models

cs.CL · 2026-06-04 · unverdicted · novelty 6.0

NAVIRA decouples quality scoring from regeneration via stochastic remasking in masked diffusion LMs, improving fluency and LLM-judge scores on a 170M model.

BlockGen: Flexible Blockwise Sequence Modeling with Hybrid Samplers

cs.LG · 2026-06-01 · unverdicted · novelty 6.0

BlockGen enables flexible blockwise diffusion modeling with mixed block sizes and ARPC sampling, finding uniform diffusion outperforms masked under ancestral sampling in few-step regimes while the gap reverses with ARPC at high NFE.

AMix-2: Establishing Protein as a Native Modality in Large Language Models

q-bio.BM · 2026-05-29 · unverdicted · novelty 6.0

AMix-2 unifies protein sequences and text in one LLM via shared tokens and block-wise diffusion modeling, introduces the ProteinArena benchmark, and reports competitive performance against task-specific protein models and frontier LLMs.

dMoE: dLLMs with Learnable Block Experts

cs.CL · 2026-05-29 · unverdicted · novelty 6.0

dMoE aggregates token expert distributions to block level in dLLMs, cutting unique experts from 69.5 to 14.6, memory by 76-80%, and latency by 1.14-1.66x while retaining 99.11% performance.

Looped Diffusion Language Models

cs.LG · 2026-05-25 · conditional · novelty 6.0

LoopMDM loops early-middle layers in masked diffusion models to match same-size MDM performance with up to 3.3x fewer training FLOPs and outperform on reasoning tasks by up to 8.5 points on GSM8K.

Continuous Diffusion Scales Competitively with Discrete Diffusion for Language

cs.CL · 2026-05-18 · conditional · novelty 6.0

RePlaid achieves a 20x compute gap to autoregressive models, new SOTA PPL of 22.1 among continuous DLMs on OpenWebText, and competitive scaling laws by aligning architecture with modern discrete DLMs.

citing papers explorer

Showing 1 of 1 citing paper after filters.

From Table to Cell: Attention for Better Reasoning with TABALIGN cs.AI · 2026-05-14 · unverdicted · none · ref 52 · internal anchor
TABALIGN pairs a diffusion language model planner emitting binary cell masks with a trained attention verifier, raising average accuracy 15.76 points over strong baselines on eight table benchmarks while speeding execution 44.64%.

Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer