LLaDA is a scalable diffusion-based language model that matches autoregressive LLMs like LLaMA3 8B on tasks and surpasses GPT-4o on reversal poem completion.
Chiu, Alexander Rush, and Volodymyr Kuleshov
9 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
Introduces Block-R1 benchmark, Block-R1-41K dataset, and a conflict score to handle domain-specific optimal block sizes in RL post-training of diffusion LLMs.
Diffusion language models develop early-layer collapse around an indispensable super-outlier due to overtraining, resulting in higher compressibility and reversed optimal sparsity patterns versus autoregressive models.
Full-sequence masking in SFT unlocks prompt infilling for masked diffusion language models, producing templates that match or surpass hand-designed ones and transfer across models.
VQ-SAD combines a pretrained VQ-VAE with diffusion models by using its codebooks as discrete tokenizers for atoms and bonds, yielding slight improvements over prior diffusion methods on QM9 and ZINC250k.
Uniform-based discrete diffusion models behave as associative memories that retrieve unseen data, with a dataset-size-driven memorization-to-generalization transition detectable via conditional entropy of token predictions.
SCMDM adapts trained masked diffusion models to condition denoising steps on their own prior clean predictions, cutting generative perplexity nearly in half on open-web text while improving discretized image, molecule, and genomic synthesis.
VASR separates continuation and residual variance in reward-guided diffusion SMC, using optimal mass allocation and systematic resampling to achieve up to 26% better FID scores and faster runtimes than prior SMC and MCTS methods.
Generative perplexity and entropy are shown to be the two additive components of KL divergence to a reference distribution, motivating generative frontiers as a principled evaluation method for diffusion language models.
citing papers explorer
-
Large Language Diffusion Models
LLaDA is a scalable diffusion-based language model that matches autoregressive LLMs like LLaMA3 8B on tasks and surpasses GPT-4o on reversal poem completion.
-
Block-R1: Rethinking the Role of Block Size in Multi-domain Reinforcement Learning for Diffusion Large Language Models
Introduces Block-R1 benchmark, Block-R1-41K dataset, and a conflict score to handle domain-specific optimal block sizes in RL post-training of diffusion LLMs.
-
Layer Collapse in Diffusion Language Models
Diffusion language models develop early-layer collapse around an indispensable super-outlier due to overtraining, resulting in higher compressibility and reversed optimal sparsity patterns versus autoregressive models.
-
Unlocking Prompt Infilling Capability for Diffusion Language Models
Full-sequence masking in SFT unlocks prompt infilling for masked diffusion language models, producing templates that match or surpass hand-designed ones and transfer across models.
-
VQ-SAD: Vector Quantized Structure Aware Diffusion For Molecule Generation
VQ-SAD combines a pretrained VQ-VAE with diffusion models by using its codebooks as discrete tokenizers for atoms and bonds, yielding slight improvements over prior diffusion methods on QM9 and ZINC250k.
-
Language Diffusion Models are Associative Memories Capable of Retrieving Unseen Data
Uniform-based discrete diffusion models behave as associative memories that retrieve unseen data, with a dataset-size-driven memorization-to-generalization transition detectable via conditional entropy of token predictions.
-
Simple Self-Conditioning Adaptation for Masked Diffusion Models
SCMDM adapts trained masked diffusion models to condition denoising steps on their own prior clean predictions, cutting generative perplexity nearly in half on open-web text while improving discretized image, molecule, and genomic synthesis.
-
VASR: Variance-Aware Systematic Resampling for Reward-Guided Diffusion
VASR separates continuation and residual variance in reward-guided diffusion SMC, using optimal mass allocation and systematic resampling to achieve up to 26% better FID scores and faster runtimes than prior SMC and MCTS methods.
-
Generative Frontiers: Why Evaluation Matters for Diffusion Language Models
Generative perplexity and entropy are shown to be the two additive components of KL divergence to a reference distribution, motivating generative frontiers as a principled evaluation method for diffusion language models.