TAD improves the accuracy-parallelism trade-off in diffusion LLMs via temporal-aware self-distillation that applies hard labels to soon-to-be-decoded tokens and soft supervision to future tokens.
hub
Fast-dllm: Training-free acceleration of diffusion LLM by enabling KV cache and parallel decoding.CoRR, abs/2505.22618
16 Pith papers cite this work. Polarity classification is still indexing.
hub tools
years
2026 16representative citing papers
LEAP detects early-converging tokens in dLLMs via future context filtering and multi-sequence superposition, reducing average denoising steps by about 30% while maintaining accuracy.
BLT-D, BLT-S, and BLT-DV use block-wise diffusion training and speculative verification to enable parallel byte generation in byte-level LMs, cutting memory-bandwidth cost by over 50%.
GPO-V jailbreaks dVLMs by globally optimizing probabilities in the denoising process to bypass refusal patterns, achieving stealthy and transferable attacks.
FoCore uses self-contrast on early-converging high-density tokens to boost diffusion LLM quality on reasoning benchmarks while cutting decoding steps by over 2x.
DARE reuses up to 87% of attention activations in diffusion LLMs through KV caching and output reuse, delivering 1.2x per-layer latency gains with average performance drops of 1.2-2.0%.
Prefill-Time Intervention (PTI) reduces hallucinations in large vision-language models by applying a one-time modality-aware steering correction to the initial KV cache at the prefill stage rather than during autoregressive decoding.
NI Sampling accelerates discrete diffusion language models up to 14.3 times by training a neural indicator to select which tokens to sample at each step using a trajectory-preserving objective.
ECHO is a one-step block diffusion VLM for chest X-ray reports that improves RaTE and SemScore by over 60% while delivering 8x faster inference than autoregressive baselines.
TABOM models inference unmasking preferences as a Boltzmann distribution over predictive entropies and derives a ranking loss to align DLM training with observed trajectories, yielding gains in new domains and reduced catastrophic forgetting versus standard SFT.
Joint training of the latent space with the diffusion process produces a competitive latent diffusion language model that is faster than existing discrete and continuous diffusion baselines.
ReflectDrive-2 combines masked discrete diffusion with RL-aligned self-editing to generate and refine driving trajectories, reaching 91.0 PDMS on NAVSIM camera-only and 94.8 in best-of-6.
CDLM trains denoisers to be path-invariant across stochastic posterior bridges in discrete diffusion, unifying prior methods and achieving new SOTA few-step text generation performance.
Position and step penalty plus visual reasoning guidance fix premature answering and weak visual grounding in diffusion MLLMs, delivering up to 7.5% accuracy gains and over 3x speedup.
DualDiffusion combines a lightweight drafter using approximations with a full verifier to reduce generation steps in masked diffusion models while keeping accuracy on MMLU and GSM8K.
DMax enables faster parallel decoding in diffusion language models by using on-policy training to recover from errors and soft embedding interpolations for iterative revision, boosting tokens per forward pass roughly 2-3x on benchmarks while preserving accuracy.
citing papers explorer
-
GPO-V: Jailbreak Diffusion Vision Language Model by Global Probability Optimization
GPO-V jailbreaks dVLMs by globally optimizing probabilities in the denoising process to bypass refusal patterns, achieving stealthy and transferable attacks.
-
Prefill-Time Intervention for Mitigating Hallucination in Large Vision-Language Models
Prefill-Time Intervention (PTI) reduces hallucinations in large vision-language models by applying a one-time modality-aware steering correction to the initial KV cache at the prefill stage rather than during autoregressive decoding.