TAD improves the accuracy-parallelism trade-off in diffusion LLMs via temporal-aware self-distillation that applies hard labels to soon-to-be-decoded tokens and soft supervision to future tokens.
Learning to parallel: Accelerating diffusion large language models via learnable parallel decoding
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CL 3years
2026 3roles
background 1polarities
background 1representative citing papers
Fast-dLLM++ generalizes Fast-dLLM decoding to heterogeneous confidence profiles via Fréchet profile selection, delivering up to 37% throughput gains on GSM8K, MATH, HumanEval, and MBPP with LLaDA-8B.
STDec raises dLLM decoding speed by up to 14x on benchmarks like MBPP by using observed spatio-temporal stability to create dynamic, token-specific confidence thresholds while preserving task performance.
citing papers explorer
-
TAD: Temporal-Aware Trajectory Self-Distillation for Fast and Accurate Diffusion LLM
TAD improves the accuracy-parallelism trade-off in diffusion LLMs via temporal-aware self-distillation that applies hard labels to soon-to-be-decoded tokens and soft supervision to future tokens.
-
Fast-dLLM++: Fr\'{e}chet Profile Decoding for Faster Diffusion LLM Inference
Fast-dLLM++ generalizes Fast-dLLM decoding to heterogeneous confidence profiles via Fréchet profile selection, delivering up to 37% throughput gains on GSM8K, MATH, HumanEval, and MBPP with LLaDA-8B.
-
STDec: Spatio-Temporal Stability Guided Decoding for dLLMs
STDec raises dLLM decoding speed by up to 14x on benchmarks like MBPP by using observed spatio-temporal stability to create dynamic, token-specific confidence thresholds while preserving task performance.