LLaDA is a scalable diffusion-based language model that matches autoregressive LLMs like LLaMA3 8B on tasks and surpasses GPT-4o on reversal poem completion.
hub Canonical reference
Continuous diffusion for categorical data
Canonical reference. 75% of citing Pith papers cite this work as background.
abstract
Diffusion models have quickly become the go-to paradigm for generative modelling of perceptual signals (such as images and sound) through iterative refinement. Their success hinges on the fact that the underlying physical phenomena are continuous. For inherently discrete and categorical data such as language, various diffusion-inspired alternatives have been proposed. However, the continuous nature of diffusion models conveys many benefits, and in this work we endeavour to preserve it. We propose CDCD, a framework for modelling categorical data with diffusion models that are continuous both in time and input space. We demonstrate its efficacy on several language modelling tasks.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
Self-conditioned flow language models solve fixed-point iterations, enabling fixed-point flow maps that distill into FMLM* which outperforms SOTA in few-step generation on OpenWebText.
Low Gen-PPL in continuous diffusion LMs results from repetition caused by a 1D contractive attractor in self-conditioning feedback; ACE subtracts the direction to reduce repetition to human levels while preserving quality.
Naive samplers beat published diffusion and flow models on gen-PPL with incoherent output, proving the metric unsound and motivating distributional evaluation suites.
Introduces the Insertion Process model for variable-length non-monotonic sequence generation via a bijective permutation mapping and permutation-based variational inference.
Infinite Mask Diffusion Models use stochastic infinite-state masks to overcome the factorization error lower bound in standard masked diffusion, achieving superior few-step performance on language tasks via distillation.
FoCore uses self-contrast on early-converging high-density tokens to boost diffusion LLM quality on reasoning benchmarks while cutting decoding steps by over 2x.
LangFlow is the first continuous diffusion language model to rival discrete diffusion on perplexity and generative perplexity while exceeding autoregressive baselines on several zero-shot tasks.
Discrete Stochastic Localization lets a single trained network support an entire family of per-token SNR paths for discrete sequence generation, with masked diffusion as a special case, and improves MAUVE scores when fine-tuning pretrained checkpoints.
CCDD defines a joint multimodal diffusion on continuous representation space and discrete token space to combine expressivity with explicit token supervision for diffusion language models.
First dedicated survey organizing diffusion and flow matching models for tabular data synthesis, imputation, anomaly detection, and related tasks, covering literature from 2015 to 2026 and highlighting open problems.
GeoCoupling optimizes temporal couplings between modalities in biomolecular generative models and outperforms synchronous baselines on drug design and protein design tasks.
Adapting LLaDA-8B-Instruct via Discrete Stochastic Localization with continuous per-token Gaussian noise yields continuous denoising that achieves top ROUGE-1 on zero-shot summarization at low step budgets and adds selective noisy-state robustness.
FP-MGMs with consistency loss and three-state reuse (CoFRe) reduce parameters by up to 38.8% and improve low-budget perplexity and FID versus standard masked generative models on text and images.
DiLaDiff augments masked diffusion LMs with latent space modeling and consistency distillation to improve token correlation capture and inference speed.
RePlaid achieves a 20x compute gap to autoregressive models, new SOTA PPL of 22.1 among continuous DLMs on OpenWebText, and competitive scaling laws by aligning architecture with modern discrete DLMs.
Manta-LM approximates the HJB equation via flow matching in latent control space to realize closed-loop optimal control for language generation.
DSL provides a continuous embedding framework where one denoiser supports a family of SNR paths for discrete sequences, improving MAUVE scores on OpenWebText and allowing random-order and hybrid sampling from a fine-tuned MDLM checkpoint.
ELF applies continuous-time flow matching in embedding space for language generation and reports outperforming prior discrete and continuous diffusion language models with fewer steps.
TextLDM applies DiT-style latent diffusion with flow matching to language modeling via a REPA-aligned VAE, outperforming prior diffusion LMs and matching GPT-2 when trained from scratch on OpenWebText2.
Cola DLM proposes a hierarchical latent diffusion model that learns a text-to-latent mapping, fits a global semantic prior in continuous space with a block-causal DiT, and performs conditional decoding, establishing latent prior modeling as an alternative to token-level autoregressive language model
Dataset-level metrics in diffusion language models mask substantial sample-level non-determinism that varies with model and system factors, which a new Factor Variance Attribution metric can decompose.
Position and step penalty plus visual reasoning guidance fix premature answering and weak visual grounding in diffusion MLLMs, delivering up to 7.5% accuracy gains and over 3x speedup.
Continuous flows on token embeddings with flow-map distillation produce one-step language models whose quality exceeds recent 8-step discrete diffusion baselines on LM1B and OpenWebText.
citing papers explorer
No citing papers match the current filters.