LLaDA is a scalable diffusion-based language model that matches autoregressive LLMs like LLaMA3 8B on tasks and surpasses GPT-4o on reversal poem completion.
Ssd- lm: Semi-autoregressive simplex-based diffusion language model for text generation and modular control
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
Naive samplers beat published diffusion and flow models on gen-PPL with incoherent output, proving the metric unsound and motivating distributional evaluation suites.
CCDD defines a joint multimodal diffusion on continuous representation space and discrete token space to combine expressivity with explicit token supervision for diffusion language models.
R2LM combines causal attention with a reverse Mamba SSM sidecar to supply right-side context in dLLMs, claiming 2.4x-12.9x throughput gains over bidirectional dLLMs and 1.9x-2.9x over AR baselines while matching or exceeding quality.
PAPO improves reasoning performance in diffusion LLMs by converting sparse terminal rewards into dense step-wise credit and replaying real high-uncertainty trajectories, reporting gains up to 42.2% on Countdown.
Efficient-DLM converts AR models to dLMs via block-wise causal attention and position-dependent masking, yielding higher accuracy and 2.7-4.5x throughput than Dream 7B and Qwen3 4B.
LLaDA-V is a diffusion-based multimodal large language model that reaches competitive or state-of-the-art results on visual instruction tasks while using a non-autoregressive architecture.
BA-Att introduces pre-downsampled block selection with norm-sorting and diagonal covariance correction to approximate sparse attention, yielding up to 6.95x speedup at 50% sparsity across language, multimodal, and video models.
citing papers explorer
-
Large Language Diffusion Models
LLaDA is a scalable diffusion-based language model that matches autoregressive LLMs like LLaMA3 8B on tasks and surpasses GPT-4o on reversal poem completion.
-
Hacking Generative Perplexity: Why Unconditional Text Evaluation Needs Distributional Metrics
Naive samplers beat published diffusion and flow models on gen-PPL with incoherent output, proving the metric unsound and motivating distributional evaluation suites.
-
Coevolutionary Continuous Discrete Diffusion: Make Your Diffusion Language Model a Latent Reasoner
CCDD defines a joint multimodal diffusion on continuous representation space and discrete token space to combine expressivity with explicit token supervision for diffusion language models.
-
Bifocal Diffusion Language Models: Asymmetric Bidirectional Context for Parallel Generation
R2LM combines causal attention with a reverse Mamba SSM sidecar to supply right-side context in dLLMs, claiming 2.4x-12.9x throughput gains over bidirectional dLLMs and 1.9x-2.9x over AR baselines while matching or exceeding quality.
-
Back on Track: Aligning Rewards and States for Reasoning in Diffusion Large Language Models
PAPO improves reasoning performance in diffusion LLMs by converting sparse terminal rewards into dense step-wise credit and replaying real high-uncertainty trajectories, reporting gains up to 42.2% on Countdown.
-
Efficient-DLM: From Autoregressive to Diffusion Language Models, and Beyond in Speed
Efficient-DLM converts AR models to dLMs via block-wise causal attention and position-dependent masking, yielding higher accuracy and 2.7-4.5x throughput than Dream 7B and Qwen3 4B.
-
LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning
LLaDA-V is a diffusion-based multimodal large language model that reaches competitive or state-of-the-art results on visual instruction tasks while using a non-autoregressive architecture.
-
Efficient Long-Context Modeling in Diffusion Language Models via Block Approximate Sparse Attention
BA-Att introduces pre-downsampled block selection with norm-sorting and diagonal covariance correction to approximate sparse attention, yielding up to 6.95x speedup at 50% sparsity across language, multimodal, and video models.