Constrained Diffusion for Code (CDC) integrates constraint satisfaction into the reverse denoising process of discrete diffusion models via constraint-aware operators that use optimization and program analysis to steer generation toward feasible programs.
hub Mixed citations
Johnson, Jonathan Ho, Daniel Tarlow, and Rianne van den Berg
Mixed citation behavior. Most common role is background (67%).
hub tools
citation-role summary
citation-polarity summary
representative citing papers
A framework pretrained on authentic binary occlusion masks uses guided sampling and intersection-based partitioning to train diffusion models on incomplete physical observations without zero-query regions.
DCDM replaces positional blocks with learnable semantic chunks via differentiable Chunking Attention, yielding consistent gains over block and unstructured diffusion baselines up to 1.5B parameters.
Introduces Block-R1 benchmark, Block-R1-41K dataset, and a conflict score to handle domain-specific optimal block sizes in RL post-training of diffusion LLMs.
Full-sequence masking in SFT unlocks prompt infilling for masked diffusion language models, producing templates that match or surpass hand-designed ones and transfer across models.
A CNN-based discrete diffusion method refines sparse contours from segmentation masks using simplified denoising steps and minimal post-processing, outperforming baselines on small medical and environmental datasets while running 3.5 times faster.
A prompt-controlled diffusion framework generates class-ratio-targeted synthetic layouts and domain-consistent images that, when mixed with real data, improve segmentation accuracy on long-tailed remote-sensing datasets especially under domain shift.
Early and late denoising steps in masked diffusion LMs are robust to smaller-model replacement, enabling 17% FLOPs reduction with modest generative quality loss.
Progressive distillation halves sampling steps repeatedly in diffusion models, reaching 4 steps with FID 3.0 on CIFAR-10 from 8192-step samplers.
A narrative survey that catalogs fifty papers on diffusion-based adversarial techniques across text, vision, and vision-language models, proposes a six-class taxonomy of diffusion roles plus a unified five-dimension evaluation framework, and releases a companion catalog.
Recasts sampling-based nonconvex optimization as smoothed gradient descent to obtain non-asymptotic convergence guarantees and introduces the DIDA annealed algorithm that converges to the global optimum.
Coupling Models enable single-step discrete sequence generation via learned couplings to Gaussian latents and outperform prior one-step baselines on text perplexity, biological FBD, and image FID metrics.
GCCM prevents shortcut collapse in consistency models for graph prediction by using contrastive negative pairs and input feature perturbation, leading to better performance than deterministic baselines.
FRIGID scales a diffusion-based model for de novo molecular structure generation from mass spectra, reaching over 18% top-1 accuracy on MassSpecGym and tripling prior bests on NPLIB1 via large unlabeled training and inference-time fragmentation refinement with log-linear compute scaling.
GraphBSI uses Bayesian Sample Inference as noise-controlled SDEs to generate discrete graphs in one shot, achieving state-of-the-art results on molecular benchmarks Moses and GuacaMol.
Saber improves both speed and accuracy of diffusion language models on code generation by dynamically adjusting unmasking steps and reverting low-confidence tokens via backtracking.
Logit-KL Flow Matching recovers the flow-matching velocity field from conditional likelihood maximization and uses iterative denoise-re-noise sampling to improve perplexity and downstream metrics over prior NAR baselines on text and code tasks.
citing papers explorer
-
Constrained Code Generation with Discrete Diffusion
Constrained Diffusion for Code (CDC) integrates constraint satisfaction into the reverse denoising process of discrete diffusion models via constraint-aware operators that use optimization and program analysis to steer generation toward feasible programs.
-
Observation-Aligned Mask Priors for Learning Physical Dynamics from Authentic Occlusions
A framework pretrained on authentic binary occlusion masks uses guided sampling and intersection-based partitioning to train diffusion models on incomplete physical observations without zero-query regions.
-
Dynamic Chunking for Diffusion Language Models
DCDM replaces positional blocks with learnable semantic chunks via differentiable Chunking Attention, yielding consistent gains over block and unstructured diffusion baselines up to 1.5B parameters.
-
Block-R1: Rethinking the Role of Block Size in Multi-domain Reinforcement Learning for Diffusion Large Language Models
Introduces Block-R1 benchmark, Block-R1-41K dataset, and a conflict score to handle domain-specific optimal block sizes in RL post-training of diffusion LLMs.
-
Unlocking Prompt Infilling Capability for Diffusion Language Models
Full-sequence masking in SFT unlocks prompt infilling for masked diffusion language models, producing templates that match or surpass hand-designed ones and transfer across models.
-
Contour Refinement using Discrete Diffusion in Low Data Regime
A CNN-based discrete diffusion method refines sparse contours from segmentation masks using simplified denoising steps and minimal post-processing, outperforming baselines on small medical and environmental datasets while running 3.5 times faster.
-
Mitigating Long-Tail Bias via Prompt-Controlled Diffusion Augmentation
A prompt-controlled diffusion framework generates class-ratio-targeted synthetic layouts and domain-consistent images that, when mixed with real data, improve segmentation accuracy on long-tailed remote-sensing datasets especially under domain shift.
-
Not All Denoising Steps Are Equal: Model Scheduling for Faster Masked Diffusion Language Models
Early and late denoising steps in masked diffusion LMs are robust to smaller-model replacement, enabling 17% FLOPs reduction with modest generative quality loss.
-
Progressive Distillation for Fast Sampling of Diffusion Models
Progressive distillation halves sampling steps repeatedly in diffusion models, reaching 4 steps with FID 3.0 on CIFAR-10 from 8192-step samplers.
-
Adversarial Diffusion Across Modalities: A Fusion Survey of Attacks, Defenses, and Evaluation for Text, Vision, and Vision-Language Models
A narrative survey that catalogs fifty papers on diffusion-based adversarial techniques across text, vision, and vision-language models, proposes a six-class taxonomy of diffusion roles plus a unified five-dimension evaluation framework, and releases a companion catalog.
-
Global Convergence of Sampling-Based Nonconvex Optimization through Diffusion-Style Smoothing
Recasts sampling-based nonconvex optimization as smoothed gradient descent to obtain non-asymptotic convergence guarantees and introduces the DIDA annealed algorithm that converges to the global optimum.
-
Coupling Models for One-Step Discrete Generation
Coupling Models enable single-step discrete sequence generation via learned couplings to Gaussian latents and outperform prior one-step baselines on text perplexity, biological FBD, and image FID metrics.
-
GCCM: Enhancing Generative Graph Prediction via Contrastive Consistency Model
GCCM prevents shortcut collapse in consistency models for graph prediction by using contrastive negative pairs and input feature perturbation, leading to better performance than deterministic baselines.
-
FRIGID: Scaling Diffusion-Based Molecular Generation from Mass Spectra at Training and Inference Time
FRIGID scales a diffusion-based model for de novo molecular structure generation from mass spectra, reaching over 18% top-1 accuracy on MassSpecGym and tripling prior bests on NPLIB1 via large unlabeled training and inference-time fragmentation refinement with log-linear compute scaling.
-
Discrete Bayesian Sample Inference for Graph Generation
GraphBSI uses Bayesian Sample Inference as noise-controlled SDEs to generate discrete graphs in one shot, achieving state-of-the-art results on molecular benchmarks Moses and GuacaMol.
-
Saber: An Efficient Sampling with Adaptive Acceleration and Backtracking Enhanced Remasking for Diffusion Language Model
Saber improves both speed and accuracy of diffusion language models on code generation by dynamically adjusting unmasking steps and reverting low-confidence tokens via backtracking.
-
Logit-KL Flow Matching: Non-Autoregressive Text Generation via Sampling-Hybrid Inference
Logit-KL Flow Matching recovers the flow-matching velocity field from conditional likelihood maximization and uses iterative denoise-re-noise sampling to improve perplexity and downstream metrics over prior NAR baselines on text and code tasks.
- Consistent Diffusion Language Models
- Simple Self-Conditioning Adaptation for Masked Diffusion Models