{"total":22,"items":[{"citing_arxiv_id":"2606.27732","ref_index":2,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Bifocal Diffusion Language Models: Asymmetric Bidirectional Context for Parallel Generation","primary_cat":"cs.IR","submitted_at":"2026-06-26T05:26:11+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"R2LM combines causal attention with a reverse Mamba SSM sidecar to supply right-side context in dLLMs, claiming 2.4x-12.9x throughput gains over bidirectional dLLMs and 1.9x-2.9x over AR baselines while matching or exceeding quality.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.26566","ref_index":2,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Adversarial Diffusion Across Modalities: A Fusion Survey of Attacks, Defenses, and Evaluation for Text, Vision, and Vision-Language Models","primary_cat":"cs.CR","submitted_at":"2026-06-25T03:32:12+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A narrative survey that catalogs fifty papers on diffusion-based adversarial techniques across text, vision, and vision-language models, proposes a six-class taxonomy of diffusion roles plus a unified five-dimension evaluation framework, and releases a companion catalog.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.04535","ref_index":45,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Dynamic Infilling Anchors for Format-Constrained Generation in Diffusion Large Language Models","primary_cat":"cs.CL","submitted_at":"2026-06-03T07:18:23+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"DIA is a training-free method that dynamically adjusts anchor positions in diffusion LLMs to improve format compliance and accuracy on reasoning benchmarks like GSM8K and MATH.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.25820","ref_index":3,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Visual-Redundancy-Controlled Parallel Decoding for Diffusion-Based Multimodal Large Language Models","primary_cat":"cs.LG","submitted_at":"2026-05-25T13:16:51+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"VRCD prioritizes visually complementary positions during parallel decoding in dMLLMs by measuring attention overlap with the new Visual Redundancy Index, yielding accuracy gains over confidence-based baselines on M^3CoT and MMBench.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.16829","ref_index":33,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Constrained Code Generation with Discrete Diffusion","primary_cat":"cs.CL","submitted_at":"2026-05-16T06:15:47+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Constrained Diffusion for Code (CDC) integrates constraint satisfaction into the reverse denoising process of discrete diffusion models via constraint-aware operators that use optimization and program analysis to steer generation toward feasible programs.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.16818","ref_index":22,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Observation-Aligned Mask Priors for Learning Physical Dynamics from Authentic Occlusions","primary_cat":"cs.CV","submitted_at":"2026-05-16T05:23:49+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"A framework pretrained on authentic binary occlusion masks uses guided sampling and intersection-based partitioning to train diffusion models on incomplete physical observations without zero-query regions.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.16520","ref_index":14,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Global Convergence of Sampling-Based Nonconvex Optimization through Diffusion-Style Smoothing","primary_cat":"cs.LG","submitted_at":"2026-05-15T18:14:38+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Recasts sampling-based nonconvex optimization as smoothed gradient descent to obtain non-asymptotic convergence guarantees and introduces the DIDA annealed algorithm that converges to the global optimum.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"N= max{N(t0),N(t c)},(11) N(t0) = 3λ2 0Md 2E0Dτβ2 1δ ( V−1t−1 0 +V 0( β λ0 )2 +V 1( β λ0 )4t0 ) ,(12) N(tc) = 3λ2 cMd 2E0Dτβ2 1δ ( V−1t−1 c +V 0(β λc )2 +V 1(β λc )4tc ) (13) Then with probability at least1−δ, the dual-level annealing algorithm converges to ∥xM−x∗∥2≤∥x∗ F−x∗∥2 + (1−1 4κ2 F )M−M0(C2 ED2 τ+kgtM0) + 4(K2 F +σ2 F ) δ αF ( tF 2λF + 1 αF )(14) wherek g =C 2 E min{β2 λ2, 1 τ4}. Theorem 4.6 provides the first non-asymptotic global convergence result of SBO towards the global minimizer. Algorithm-wise, Theorem 4.6 unifies two annealing strategies in the literature: simulated annealing (Bertsimas & Tsitsiklis, 1993) for temperatureλand diffusion annealing (Pan et al., 2024) for noise levelt."},{"citing_arxiv_id":"2605.15676","ref_index":4,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Dynamic Chunking for Diffusion Language Models","primary_cat":"cs.CL","submitted_at":"2026-05-15T06:56:05+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"DCDM replaces positional blocks with learnable semantic chunks via differentiable Chunking Attention, yielding consistent gains over block and unstructured diffusion baselines up to 1.5B parameters.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.11726","ref_index":3,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Block-R1: Rethinking the Role of Block Size in Multi-domain Reinforcement Learning for Diffusion Large Language Models","primary_cat":"cs.LG","submitted_at":"2026-05-12T08:09:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Introduces Block-R1 benchmark, Block-R1-41K dataset, and a conflict score to handle domain-specific optimal block sizes in RL post-training of diffusion LLMs.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"size selection rather than globally fixed-size decoding for all tasks. 9 References [1] Arel. Arel's sudoku generator.URL https://www.ocf.berkeley.edu/ arel/sudoku/main.html, 2025. [2] M. Arriola, A. Gokaslan, J. T. Chiu, Z. Yang, Z. Qi, J. Han, S. S. Sahoo, and V . Kuleshov. Block diffusion: Interpolating between autoregressive and diffusion language models.ICLR, 2025. [3] J. Austin, D. D. Johnson, J. Ho, D. Tarlow, and R. van den Berg. Structured Denoising Diffusion Models in Discrete State-Spaces.arXiv preprint arXiv:2107.03006, 2023. [4] J. Austin, A. Odena, M. Nye, M. Bosma, H. Michalewski, D. Dohan, E. Jiang, C. Cai, M. Terry, Q. Le, and C. Sutton. Program Synthesis with Large Language Models.arXiv preprint arXiv:2108."},{"citing_arxiv_id":"2605.07193","ref_index":27,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Coupling Models for One-Step Discrete Generation","primary_cat":"cs.LG","submitted_at":"2026-05-08T03:40:39+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Coupling Models enable single-step discrete sequence generation via learned couplings to Gaussian latents and outperform prior one-step baselines on text perplexity, biological FBD, and image FID metrics.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"=E z∼qϕ(z)TV(qϕ(· |z), G θ(· |z)).(25) 20 For the second term, Gθ(· |z) is a Markov kernel from latents to sequences. Total variation cannot increase under a Markov kernel, so TV(pq θ, pgen θ )≤TV(q ϕ(z), pZ(z)).(26) Combining these two bounds proves Eq. (3). For the KL-based bound, Pinsker's inequality gives TV(qϕ(· |z), G θ(· |z))≤ r 1 2KL(qϕ(· |z)∥G θ(· |z)).(27) Taking expectation overz∼q ϕ(z)and applying Jensen's inequality, Ez∼qϕ(z)TV(qϕ(· |z), G θ(· |z))≤E z∼qϕ(z) r 1 2KL(qϕ(· |z)∥G θ(· |z))(28) ≤ r 1 2 Ez∼qϕ(z)KL(qϕ(· |z)∥G θ(· |z))(29) ≤ r εdec 2 .(30) Finally, applying Pinsker's inequality to the latent marginal gives TV(qϕ(z), pZ(z))≤ r 1 2KL(qϕ(z)∥p Z(z))≤ r εflow 2 ,(31) which proves Eq. (19). The final consistency statement follows by setting both terms in Eq."},{"citing_arxiv_id":"2605.05689","ref_index":18,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"GCCM: Enhancing Generative Graph Prediction via Contrastive Consistency Model","primary_cat":"cs.AI","submitted_at":"2026-05-07T05:29:57+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"GCCM prevents shortcut collapse in consistency models for graph prediction by using contrastive negative pairs and input feature perturbation, leading to better performance than deterministic baselines.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics, 2015. URL https://arxiv. org/abs/1503.03585. [17] Jacob Austin, Daniel D. Johnson, Jonathan Ho, Daniel Tarlow, and Rianne van den Berg. Structured denoising diffusion models in discrete state-spaces, 2023. URL https://arxiv. org/abs/2107.03006. [18] Emiel Hoogeboom, Didrik Nielsen, Priyank Jaini, Patrick Forré, and Max Welling. Argmax flows and multinomial diffusion: Learning categorical distributions, 2021. URL https:// arxiv.org/abs/2102.05379. [19] Vijay Prakash Dwivedi, Chaitanya K. Joshi, Anh Tuan Luu, Thomas Laurent, Yoshua Bengio, and Xavier Bresson. Benchmarking graph neural networks, 2022."},{"citing_arxiv_id":"2605.00161","ref_index":40,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Consistent Diffusion Language Models","primary_cat":"cs.LG","submitted_at":"2026-04-30T19:31:02+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"CDLM introduces MPDC training for discrete diffusion models, recovering prior methods as limits and claiming new SOTA text generation performance especially at low sampling budgets.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.26985","ref_index":18,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Simple Self-Conditioning Adaptation for Masked Diffusion Models","primary_cat":"cs.LG","submitted_at":"2026-04-28T19:34:04+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"SCMDM is a post-training self-conditioning adaptation for masked diffusion models that reduces generative perplexity by nearly 50% on OWT and improves performance on images, molecules, and genomics.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.16648","ref_index":1,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"FRIGID: Scaling Diffusion-Based Molecular Generation from Mass Spectra at Training and Inference Time","primary_cat":"cs.LG","submitted_at":"2026-04-17T19:11:18+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"FRIGID scales a diffusion-based model for de novo molecular structure generation from mass spectra, reaching over 18% top-1 accuracy on MassSpecGym and tripling prior bests on NPLIB1 via large unlabeled training and inference-time fragmentation refinement with log-linear compute scaling.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.03677","ref_index":2,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Unlocking Prompt Infilling Capability for Diffusion Language Models","primary_cat":"cs.CL","submitted_at":"2026-04-04T10:26:36+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Full-sequence masking in SFT unlocks prompt infilling for masked diffusion language models, producing templates that match or surpass hand-designed ones and transfer across models.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"[MASK]... #### He sprints 3*3=9 times. So he runs 9*60=540 meters #### 540 Natalia May May Nataila... ? ####Nataila... 3... Natalia ? [MASK]... ####Nataila... ? ####Nataila... [MASK] Natalia May Figure 1: Overview of the prompt infilling procedure and the change in training. (1) Gather few-shot examples with prompt templates and reference responses. (2) A diffusion LM (dLM) infills masked tokens in the prompt template, conditioned on the reference responses. (3) Validate infilled prompts by generating responses across all few-shot examples using either a dLM or an LLM. (4) The best infilled prompt is used for final inference on all inputs. • We introduce a prompt infilling procedure protocol during inference time where in-"},{"citing_arxiv_id":"2602.05880","ref_index":14,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Contour Refinement using Discrete Diffusion in Low Data Regime","primary_cat":"cs.CV","submitted_at":"2026-02-05T16:55:08+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"A CNN-based discrete diffusion method refines sparse contours from segmentation masks using simplified denoising steps and minimal post-processing, outperforming baselines on small medical and environmental datasets while running 3.5 times faster.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2602.04749","ref_index":18,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Mitigating Long-Tail Bias via Prompt-Controlled Diffusion Augmentation","primary_cat":"cs.CV","submitted_at":"2026-02-04T16:49:16+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":7.0,"formal_verification":"none","one_line_summary":"A prompt-controlled diffusion framework generates class-ratio-targeted synthetic layouts and domain-consistent images that, when mixed with real data, improve segmentation accuracy on long-tailed remote-sensing datasets especially under domain shift.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.02340","ref_index":2,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Not All Denoising Steps Are Equal: Model Scheduling for Faster Masked Diffusion Language Models","primary_cat":"cs.LG","submitted_at":"2026-02-04T13:04:58+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Early and late denoising steps in masked diffusion LMs are robust to smaller-model replacement, enabling 17% FLOPs reduction with modest generative quality loss.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2511.03015","ref_index":2,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Discrete Bayesian Sample Inference for Graph Generation","primary_cat":"cs.LG","submitted_at":"2025-11-04T21:25:51+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"GraphBSI uses Bayesian Sample Inference as noise-controlled SDEs to generate discrete graphs in one shot, achieving state-of-the-art results on molecular benchmarks Moses and GuacaMol.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2510.18165","ref_index":1,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Saber: An Efficient Sampling with Adaptive Acceleration and Backtracking Enhanced Remasking for Diffusion Language Model","primary_cat":"cs.AI","submitted_at":"2025-10-20T23:38:12+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Saber improves both speed and accuracy of diffusion language models on code generation by dynamically adjusting unmasking steps and reverting low-confidence tokens via backtracking.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2411.16821","ref_index":1,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Logit-KL Flow Matching: Non-Autoregressive Text Generation via Sampling-Hybrid Inference","primary_cat":"cs.CL","submitted_at":"2024-11-25T17:15:41+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Logit-KL Flow Matching recovers the flow-matching velocity field from conditional likelihood maximization and uses iterative denoise-re-noise sampling to improve perplexity and downstream metrics over prior NAR baselines on text and code tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2202.00512","ref_index":1,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Progressive Distillation for Fast Sampling of Diffusion Models","primary_cat":"cs.LG","submitted_at":"2022-02-01T16:07:25+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Progressive distillation halves sampling steps repeatedly in diffusion models, reaching 4 steps with FID 3.0 on CIFAR-10 from 8192-step samplers.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"λt = log[α2 t/σ2 t ] ⊿ log-SNR Lθ =w(λt)∥˜x − ˆxθ(zt)∥2 2 ⊿ Loss θ ←θ −γ∇θLθ ⊿ Optimization end while Algorithm 2 Progressive distillation Require: Trained teacher model ˆxη(zt) Require: Data set D Require: Loss weight functionw() Require: Student sampling stepsN forK iterations do θ ←η ⊿ Init student from teacher while not converged do x ∼ D t =i/N, i ∼Cat[1, 2,...,N ] ϵ ∼N (0,I ) zt =αtx +σtϵ # 2 steps of DDIM with teacher t′ =t − 0.5/N, t′′ =t − 1/N zt′ =αt′ ˆxη(zt) + σt′ σt (zt −αt ˆxη(zt)) zt′′ =αt′′ ˆxη(zt′) + σt′′ σt′ (zt′ −αt′ ˆxη(zt′)) ˜x = zt′′−(σt′′/σt)zt αt′′−(σt′′/σt)αt ⊿ Teacher ˆx target λt = log[α2 t/σ2 t ] Lθ =w(λt)∥˜x − ˆxθ(zt)∥2 2 θ ←θ −γ∇θLθ end while η ←θ ⊿ Student becomes next teacher"}],"limit":50,"offset":0}