d3llm: Ultra-fast diffusion llm using pseudo- trajectory distillation.arXiv preprint arXiv:2601.07568

Yu-Yang Qian, Junda Su, Lanxiang Hu, Peiyuan Zhang, Zhijie Deng, Peng Zhao, Hao Zhang · 2026 · arXiv 2601.07568

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

read on arXiv browse 10 citing papers

citation-role summary

baseline 2 background 1

citation-polarity summary

baseline 2 background 1

representative citing papers

Learning from the Self-future: On-policy Self-distillation for dLLMs

cs.CL · 2026-06-16 · unverdicted · novelty 7.0

d-OPSD reframes on-policy self-distillation for dLLMs via suffix conditioning from self-generated answers and step-level supervision, outperforming RLVR and SFT on reasoning benchmarks with ~10% of the optimization steps.

AsyncLane: Decoupling Refinement from Advancement in Diffusion Language Model Decoding

cs.CL · 2026-06-07 · unverdicted · novelty 7.0

AsyncLane decouples refinement from advancement in DLM decoding via lane forking at delimiters plus efficiency optimizations, yielding up to 3x throughput gains on math and code benchmarks without retraining.

TAD: Temporal-Aware Trajectory Self-Distillation for Fast and Accurate Diffusion LLM

cs.CL · 2026-05-10 · unverdicted · novelty 7.0

TAD improves the accuracy-parallelism trade-off in diffusion LLMs via temporal-aware self-distillation that applies hard labels to soon-to-be-decoded tokens and soft supervision to future tokens.

DMax: Aggressive Parallel Decoding for dLLMs

cs.LG · 2026-04-09 · conditional · novelty 7.0 · 2 refs

DMax uses On-Policy Uniform Training and Soft Parallel Decoding to enable aggressive parallelism in dLLMs, raising TPF on GSM8K from 2.04 to 5.47 and on MBPP from 2.71 to 5.86 while preserving accuracy.

TEAM: Temporal-Spatial Consistency Guided Expert Activation for MoE Diffusion Language Model Acceleration

cs.CL · 2026-02-09 · unverdicted · novelty 7.0

TEAM accelerates MoE dLLMs up to 2.2x by exploiting temporal-spatial consistency in expert routing to accept more tokens with fewer activations.

Multi-Block Diffusion Language Models

cs.LG · 2026-06-28 · unverdicted · novelty 6.0 · 2 refs

MBD-LMs raise average tokens per forward pass from 3.47 to 6.19 (and to 9.34 with DMax) via multi-block teacher forcing and optimized parallel decoding while holding or slightly improving accuracy on math and code tasks.

PerceptionDLM: Parallel Region Perception with Multimodal Diffusion Language Models

cs.CV · 2026-06-17 · unverdicted · novelty 6.0

PerceptionDLM enables parallel region captioning in multimodal diffusion language models via prompting and attention masking, introduces ParaDLC-Bench, and claims first parallel region perception with DLMs.

JetSpec: Breaking the Scaling Ceiling of Speculative Decoding with Parallel Tree Drafting

cs.CL · 2026-06-16 · unverdicted · novelty 6.0

JetSpec trains a causal draft head to produce branch-consistent trees aligned with target autoregressive scores, achieving up to 9.64x speedup on MATH-500 and outperforming prior SD baselines on Qwen3 models.

Supportive Token Revealing for Fast Diffusion Language Model Decoding

cs.CL · 2026-06-02 · unverdicted · novelty 6.0

AXON is a training-free module that selects supportive anchor tokens using attention, uncertainty, and confidence to improve the quality-latency trade-off in parallel decoding for diffusion language models.

ECHO: Efficient Chest X-ray Report Generation with One-step Block Diffusion

cs.LG · 2026-04-10 · unverdicted · novelty 5.0 · 2 refs

ECHO introduces one-step block diffusion via Direct Conditional Distillation and Response-Asymmetric Diffusion to generate chest X-ray reports faster than autoregressive models while improving clinical metrics.

citing papers explorer

Showing 6 of 6 citing papers after filters.

Learning from the Self-future: On-policy Self-distillation for dLLMs cs.CL · 2026-06-16 · unverdicted · none · ref 20
d-OPSD reframes on-policy self-distillation for dLLMs via suffix conditioning from self-generated answers and step-level supervision, outperforming RLVR and SFT on reasoning benchmarks with ~10% of the optimization steps.
AsyncLane: Decoupling Refinement from Advancement in Diffusion Language Model Decoding cs.CL · 2026-06-07 · unverdicted · none · ref 15
AsyncLane decouples refinement from advancement in DLM decoding via lane forking at delimiters plus efficiency optimizations, yielding up to 3x throughput gains on math and code benchmarks without retraining.
TAD: Temporal-Aware Trajectory Self-Distillation for Fast and Accurate Diffusion LLM cs.CL · 2026-05-10 · unverdicted · none · ref 9
TAD improves the accuracy-parallelism trade-off in diffusion LLMs via temporal-aware self-distillation that applies hard labels to soon-to-be-decoded tokens and soft supervision to future tokens.
TEAM: Temporal-Spatial Consistency Guided Expert Activation for MoE Diffusion Language Model Acceleration cs.CL · 2026-02-09 · unverdicted · none · ref 18
TEAM accelerates MoE dLLMs up to 2.2x by exploiting temporal-spatial consistency in expert routing to accept more tokens with fewer activations.
JetSpec: Breaking the Scaling Ceiling of Speculative Decoding with Parallel Tree Drafting cs.CL · 2026-06-16 · unverdicted · none · ref 35
JetSpec trains a causal draft head to produce branch-consistent trees aligned with target autoregressive scores, achieving up to 9.64x speedup on MATH-500 and outperforming prior SD baselines on Qwen3 models.
Supportive Token Revealing for Fast Diffusion Language Model Decoding cs.CL · 2026-06-02 · unverdicted · none · ref 5
AXON is a training-free module that selects supportive anchor tokens using attention, uncertainty, and confidence to improve the quality-latency trade-off in parallel decoding for diffusion language models.

d3llm: Ultra-fast diffusion llm using pseudo- trajectory distillation.arXiv preprint arXiv:2601.07568

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer