{"total":15,"items":[{"citing_arxiv_id":"2607.00773","ref_index":27,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Accelerating Discrete Diffusion Models with Parallel-In-Time Sampling","primary_cat":"cs.LG","submitted_at":"2026-07-01T10:59:33+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A parallel-in-time τ-leaping sampler for absorbing discrete diffusion models is introduced, with an exponential-factorial convergence proof and empirical speedups of 7-9× on synthetic tasks and 1.45-1.86× on image/text tasks while using 50% fewer NFE.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.08048","ref_index":30,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Diffusion Language Model Parallel Decoding via Product-of-Experts Bridge","primary_cat":"cs.CL","submitted_at":"2026-06-06T08:21:06+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"PoE-Bridge uses a product-of-experts bridge between diffusion and autoregressive distributions, with DLM drafting plus rejection and importance sampling, to deliver 5x speedup over standard DLM decoding while recovering at least 95% of AR performance on math and coding tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.22765","ref_index":42,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Uniform Diffusion Models Revisited: Leave-One-Out Denoiser and Absorbing State Reformulation","primary_cat":"cs.LG","submitted_at":"2026-05-21T17:27:19+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Uniform diffusion models rely on a leave-one-out denoiser rather than the usual denoising posterior, with exact conversions derived; an absorbing-state reformulation is introduced that matches or exceeds masked diffusion on language modeling while preserving the original joint distribution.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.20813","ref_index":31,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"PulseCol: Periodically Refreshed Column-Sparse Attention for Accelerating Diffusion Language Models","primary_cat":"cs.CL","submitted_at":"2026-05-20T07:06:54+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"PulseCol introduces periodically refreshed column-sparse attention to achieve up to 1.95x speedup over FlashAttention in diffusion LLMs with maintained model quality.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.19726","ref_index":64,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Efficient Long-Context Modeling in Diffusion Language Models via Block Approximate Sparse Attention","primary_cat":"cs.CV","submitted_at":"2026-05-19T12:01:13+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"BA-Att introduces pre-downsampled block selection with norm-sorting and diagonal covariance correction to approximate sparse attention, yielding up to 6.95x speedup at 50% sparsity across language, multimodal, and video models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.17174","ref_index":27,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Beyond Execution: Static-Analysis Rewards and Hint-Conditioned Diffusion RL for Code Generation","primary_cat":"cs.SE","submitted_at":"2026-05-16T22:18:04+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Static checking rewards and moderate AST-based hints improve diffusion RL performance for code generation, with effectiveness varying by task difficulty across HumanEval, MBPP, and LiveCodeBench.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.11577","ref_index":27,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"BitLM: Unlocking Multi-Token Language Generation with Bitwise Continuous Diffusion","primary_cat":"cs.CL","submitted_at":"2026-05-12T06:02:59+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"BitLM replaces per-token softmax with bitwise continuous diffusion inside causal blocks to generate multiple tokens in parallel while preserving autoregressive structure.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.07193","ref_index":38,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Coupling Models for One-Step Discrete Generation","primary_cat":"cs.LG","submitted_at":"2026-05-08T03:40:39+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Coupling Models enable single-step discrete sequence generation via learned couplings to Gaussian latents and outperform prior one-step baselines on text perplexity, biological FBD, and image FID metrics.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.06548","ref_index":115,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Continuous Latent Diffusion Language Model","primary_cat":"cs.CL","submitted_at":"2026-05-07T16:44:56+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Cola DLM proposes a hierarchical latent diffusion model that learns a text-to-latent mapping, fits a global semantic prior in continuous space with a block-causal DiT, and performs conditional decoding, establishing latent prior modeling as an alternative to token-level autoregressive language model","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Continuously augmented discrete diffusion model for categorical generative modeling.arXiv preprint arXiv:2510.01329, 2025. [114] Kaiwen Zheng, Yongxin Chen, Hanzi Mao, Ming-Yu Liu, Jun Zhu, and Qinsheng Zhang. Masked diffusion models are secretly time-agnostic masked models and exploit inaccurate categorical sampling.arXiv preprint arXiv:2409.02908, 2024. [115] Lin Zheng, Jianbo Yuan, Lei Yu, and Lingpeng Kong. A reparameterized discrete diffusion model for text generation. arXiv preprint arXiv:2302.05737, 2023. [116] Kun Zhou, Yifan Li, Xin Zhao, and Ji-Rong Wen. Diffusion-nat: Self-prompting discrete diffusion for non- autoregressive text generation. InProceedings ofthe18thConferenceoftheEuropean ChapteroftheAssociation"},{"citing_arxiv_id":"2605.04291","ref_index":118,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Leveraging Pretrained Language Models as Energy Functions for Glauber Dynamics Text Diffusion","primary_cat":"cs.LG","submitted_at":"2026-05-05T20:51:03+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Pretrained language models are used as energy functions for Glauber dynamics in discrete text diffusion, improving generation quality over prior diffusion LMs and matching autoregressive models on benchmarks and reasoning tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.03360","ref_index":38,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"A-CODE: Fully Atomic Protein Co-Design with Unified Multimodal Diffusion","primary_cat":"q-bio.QM","submitted_at":"2026-05-05T04:41:14+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"A-CODE presents a fully atomic one-stage multimodal diffusion model for protein co-design that claims superior unconditional generation performance over prior one- and two-stage models plus a tenfold success-rate gain on hard binder-design tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.00182","ref_index":60,"ref_count":3,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Towards A Generative Protein Evolution Machine with DPLM-Evo","primary_cat":"cs.LG","submitted_at":"2026-04-30T19:59:07+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"DPLM-Evo introduces an evolutionary discrete diffusion framework with explicit edit prediction and contextual noising that claims SOTA single-sequence mutation effect prediction on ProteinGym while supporting variable-length evolution simulation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.05551","ref_index":7,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"FastDiSS: Few-step Match Many-step Diffusion Language Model on Sequence-to-Sequence Generation--Full Version","primary_cat":"cs.CL","submitted_at":"2026-04-07T07:52:08+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A training framework perturbs self-conditioning signals in diffusion language models to match few-step inference noise, enabling up to 400x faster sampling while surpassing standard continuous diffusion performance on sequence-to-sequence tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2505.16933","ref_index":106,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning","primary_cat":"cs.LG","submitted_at":"2025-05-22T17:23:26+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LLaDA-V is a diffusion-based multimodal large language model that reaches competitive or state-of-the-art results on visual instruction tasks while using a non-autoregressive architecture.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Neubig, \"Diffuser: Discrete diffusion via edit-based reconstruction,\" 2022. [104] H. Sun, L. Yu, B. Dai, D. Schuurmans, and H. Dai, \"Score-based continuous-time discrete diffusion models,\"arXiv preprint arXiv:2211.16750, 2022. [105] O. Kitouni, N. Nolte, J. Hensman, and B. Mitra, \"Disk: A diffusion model for structured knowledge,\"arXiv preprint arXiv:2312.05253, 2023. [106] L. Zheng, J. Yuan, L. Yu, and L. Kong, \"A reparameterized discrete diffusion model for text generation,\"ArXiv, vol. abs/2302.05737, 2023. [107] Z. Chen, H. Yuan, Y . Li, Y . Kou, J. Zhang, and Q. Gu, \"Fast sampling via de-randomization for discrete diffusion models,\"arXiv preprint arXiv:2312.09193, 2023. 15 [108] J. Ye, Z. Zheng, Y . Bao, L. Qian, and Q."},{"citing_arxiv_id":"2502.09992","ref_index":67,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Large Language Diffusion Models","primary_cat":"cs.CL","submitted_at":"2025-02-14T08:23:51+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"LLaDA is a scalable diffusion-based language model that matches autoregressive LLMs like LLaMA3 8B on tasks and surpasses GPT-4o on reversal poem completion.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"time discrete diffusion models.arXiv preprint arXiv:2211.16750, 2022. [65] Ouail Kitouni, Niklas Nolte, James Hensman, and Bhaskar Mitra. Disk: A diffusion model for structured knowledge.arXiv preprint arXiv:2312.05253, 2023. [66] Lin Zheng, Jianbo Yuan, Lei Yu, and Lingpeng Kong. A reparameterized discrete diffusion model for text generation.ArXiv, abs/2302.05737, 2023. [67] Zixiang Chen, Huizhuo Yuan, Yongqian Li, Yiwen Kou, Junkai Zhang, and Quanquan Gu. Fast sampling via de-randomization for discrete diffusion models.arXiv preprint arXiv:2312.09193, 2023. [68] Jiasheng Ye, Zaixiang Zheng, Yu Bao, Lihua Qian, and Quanquan Gu. Diffusion language models can perform many tasks with scaling and instruction-finetuning.arXiv preprint"}],"limit":50,"offset":0}