LLaDA is a scalable diffusion-based language model that matches autoregressive LLMs like LLaMA3 8B on tasks and surpasses GPT-4o on reversal poem completion.
Bayesian flow networks
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 3polarities
background 3representative citing papers
A framework pretrained on authentic binary occlusion masks uses guided sampling and intersection-based partitioning to train diffusion models on incomplete physical observations without zero-query regions.
GeoCoupling optimizes temporal couplings between modalities in biomolecular generative models and outperforms synchronous baselines on drug design and protein design tasks.
RG-inspired lattice models for piecewise GLMs provide explicit interpretable partitions and a replica-analysis-derived scaling law for regularization that allows increasing complexity without expected rise in generalization loss.
GraphBSI uses Bayesian Sample Inference as noise-controlled SDEs to generate discrete graphs in one shot, achieving state-of-the-art results on molecular benchmarks Moses and GuacaMol.
Seed Diffusion Preview is a discrete diffusion language model that reaches 2146 tokens per second inference on H20 GPUs with competitive code benchmark performance, establishing a new speed-quality Pareto frontier.
LLaDA-V is a diffusion-based multimodal large language model that reaches competitive or state-of-the-art results on visual instruction tasks while using a non-autoregressive architecture.
BA-Att introduces pre-downsampled block selection with norm-sorting and diagonal covariance correction to approximate sparse attention, yielding up to 6.95x speedup at 50% sparsity across language, multimodal, and video models.
Diffusion, score-based, and flow matching models are unified as instances of learning time-dependent vector fields inducing marginal distributions governed by continuity and Fokker-Planck equations.
citing papers explorer
-
Large Language Diffusion Models
LLaDA is a scalable diffusion-based language model that matches autoregressive LLMs like LLaMA3 8B on tasks and surpasses GPT-4o on reversal poem completion.
-
Observation-Aligned Mask Priors for Learning Physical Dynamics from Authentic Occlusions
A framework pretrained on authentic binary occlusion masks uses guided sampling and intersection-based partitioning to train diffusion models on incomplete physical observations without zero-query regions.
-
Demystifying Multimodal Biomolecular Co-design With Intrinsic Geodesic Coupling
GeoCoupling optimizes temporal couplings between modalities in biomolecular generative models and outperforms synchronous baselines on drug design and protein design tasks.
-
A renormalization-group inspired lattice-based framework for piecewise generalized linear models
RG-inspired lattice models for piecewise GLMs provide explicit interpretable partitions and a replica-analysis-derived scaling law for regularization that allows increasing complexity without expected rise in generalization loss.
-
Discrete Bayesian Sample Inference for Graph Generation
GraphBSI uses Bayesian Sample Inference as noise-controlled SDEs to generate discrete graphs in one shot, achieving state-of-the-art results on molecular benchmarks Moses and GuacaMol.
-
Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference
Seed Diffusion Preview is a discrete diffusion language model that reaches 2146 tokens per second inference on H20 GPUs with competitive code benchmark performance, establishing a new speed-quality Pareto frontier.
-
LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning
LLaDA-V is a diffusion-based multimodal large language model that reaches competitive or state-of-the-art results on visual instruction tasks while using a non-autoregressive architecture.
-
Efficient Long-Context Modeling in Diffusion Language Models via Block Approximate Sparse Attention
BA-Att introduces pre-downsampled block selection with norm-sorting and diagonal covariance correction to approximate sparse attention, yielding up to 6.95x speedup at 50% sparsity across language, multimodal, and video models.
-
A Unified Measure-Theoretic View of Diffusion, Score-Based, and Flow Matching Generative Models
Diffusion, score-based, and flow matching models are unified as instances of learning time-dependent vector fields inducing marginal distributions governed by continuity and Fokker-Planck equations.