{"total":14,"items":[{"citing_arxiv_id":"2605.31215","ref_index":63,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Fixed-Point Masked Generative Modeling","primary_cat":"cs.LG","submitted_at":"2026-05-29T12:19:26+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"FP-MGMs with consistency loss and three-state reuse (CoFRe) reduce parameters by up to 38.8% and improve low-budget perplexity and FID versus standard masked generative models on text and images.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.22765","ref_index":29,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Uniform Diffusion Models Revisited: Leave-One-Out Denoiser and Absorbing State Reformulation","primary_cat":"cs.LG","submitted_at":"2026-05-21T17:27:19+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Uniform diffusion models rely on a leave-one-out denoiser rather than the usual denoising posterior, with exact conversions derived; an absorbing-state reformulation is introduced that matches or exceeds masked diffusion on language modeling while preserving the original joint distribution.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.09302","ref_index":32,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Discrete Langevin-Inspired Posterior Sampling","primary_cat":"cs.LG","submitted_at":"2026-05-10T03:59:40+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"ΔLPS is a gradient-guided discrete posterior sampler for inverse problems that works with masked or uniform discrete diffusion priors and outperforms prior discrete methods on image restoration tasks.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"Masked diffusion models instantiate this framework using an absorbing mask state. For a special token [M], the corruption process can be written as q(zt[ℓ]|z0[ℓ]) = Cat zt[ℓ];α tez0[ℓ] + (1−α t)e[M] \u0001 , where αt decreases with t. MDLM-style models [31] build on this absorbing process and train the denoiser with a weighted masked-token prediction objective. A different line of work, including Duo-style uniform-state diffusion [32], instead uses uniform corruption, q(zt[ℓ]|z0[ℓ]) = Cat zt[ℓ];α tez0[ℓ] + (1−α t)u \u0001 ,u= 1 K 1,(2) so corrupted variables are replaced by uniformly random tokens rather than a single absorbing mask. Both masked and uniform-state models provide pretrained discrete priors of the form pθ(z0;z t), but they induce different sampling behavior and different reverse-transition structure."},{"citing_arxiv_id":"2605.07971","ref_index":19,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"DVD: Discrete Voxel Diffusion for 3D Generation and Editing","primary_cat":"cs.CV","submitted_at":"2026-05-08T16:32:17+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Discrete diffusion is a natural candidate: sparse 3D voxel grids do not admit an obvious preferred generation order, while diffusion models denoise voxel states in parallel and are therefore well matched to long voxel sequences. However, this choice is not guaranteed to improve quality, since continuous diffusion remains dominant and often stronger in image generation [19, 20]. We therefore examine whether discrete diffusion can serve as a practical first-stage prior for sparse voxel generation. We find that discrete diffusion provides a practical and informative first-stage prior for sparse voxel generation. Modeling occupancy directly as categorical states improves sparse voxel quality over continuous counterparts across most evaluation metrics and exposes voxel-wise predictive"},{"citing_arxiv_id":"2605.07193","ref_index":3,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Coupling Models for One-Step Discrete Generation","primary_cat":"cs.LG","submitted_at":"2026-05-08T03:40:39+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Coupling Models enable single-step discrete sequence generation via learned couplings to Gaussian latents and outperform prior one-step baselines on text perplexity, biological FBD, and image FID metrics.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"and target sequences. We then train a parallel decoder to invert the coupling and emit all output tokens in one forward pass. At inference time, generation requires one step: draw z∼ N(0, I) and generate. We evaluateCoupling Modelacross three regimes: (1) controlled binary image modeling on MNIST- Binary, (2) biological sequence design on DNA enhancers, and (3) open-ended language generation on LM1B. In the strict one-step setting,Coupling Modelimproves the best available one-step compar- isons across all three domains. On MNIST-Binary, it reduces FID to5.50, outperforming the strongest one-step baselines in Table 1. On DNA enhancer generation, it improves over the best one-step base- line, distilled Dirichlet FM [Stärk et al."},{"citing_arxiv_id":"2605.06548","ref_index":81,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Continuous Latent Diffusion Language Model","primary_cat":"cs.CL","submitted_at":"2026-05-07T16:44:56+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Cola DLM proposes a hierarchical latent diffusion model that learns a text-to-latent mapping, fits a global semantic prior in continuous space with a block-causal DiT, and performs conditional decoding, establishing latent prior modeling as an alternative to token-level autoregressive language model","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"[79] Pierre H Richemond, Sander Dieleman, and Arnaud Doucet. Categorical sdes with simplex diffusion.arXiv preprint arXiv:2210.14784, 2022. [80] Subham S Sahoo, Marianne Arriola, Yair Schiff, Aaron Gokaslan, Edgar Marroquin, Justin T Chiu, Alexander Rush, and Volodymyr Kuleshov. Simple and effective masked diffusion language models.Advancesin Neural Information Processing Systems, 37:130136-130184, 2024. [81] Subham Sekhar Sahoo, Justin Deschenaux, Aaron Gokaslan, Guanghan Wang, Justin Chiu, and Volodymyr Kuleshov. The diffusion duality.arXiv preprint arXiv:2506.10892, 2025. [82] Maarten Sap, Hannah Rashkin, Derek Chen, Ronan Le Bras, and Yejin Choi. Social iqa: Commonsense reasoning about social interactions. In Proceedings of the 2019 conference on empirical methods in natural language"},{"citing_arxiv_id":"2604.17310","ref_index":8,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Interpolating Discrete Diffusion Models with Controllable Resampling","primary_cat":"cs.LG","submitted_at":"2026-04-19T07:55:57+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"IDDM interpolates diffusion transitions with a resampling mechanism to lessen dependence on intermediate latents and improve sample quality over masked and uniform discrete diffusion models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.11748","ref_index":17,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"LangFlow: Continuous Diffusion Rivals Discrete in Language Modeling","primary_cat":"cs.CL","submitted_at":"2026-04-13T17:21:41+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"LangFlow is the first continuous diffusion language model to rival discrete diffusion on perplexity and generative perplexity while exceeding autoregressive baselines on several zero-shot tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.08302","ref_index":67,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"DMax: Aggressive Parallel Decoding for dLLMs","primary_cat":"cs.LG","submitted_at":"2026-04-09T14:35:42+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"DMax uses On-Policy Uniform Training and Soft Parallel Decoding to enable aggressive parallelism in dLLMs, raising TPF on GSM8K from 2.04 to 5.47 and on MBPP from 2.71 to 5.86 while preserving accuracy.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"IEEE/CVF conference on computer vision and pattern recognition, pages 22500-22510, 2023. [66] Subham Sahoo, Marianne Arriola, Yair Schiff, Aaron Gokaslan, Edgar Marroquin, Justin Chiu, Alexander Rush, and V olodymyr Kuleshov. Simple and effective masked diffusion language models.Advances in Neural Information Processing Systems, 37:130136-130184, 2024. [67] Subham Sekhar Sahoo, Justin Deschenaux, Aaron Gokaslan, Guanghan Wang, Justin Chiu, and V olodymyr Kuleshov. The diffusion duality.arXiv preprint arXiv:2506.10892, 2025. [68] Yair Schiff, Subham Sekhar Sahoo, Hao Phung, Guanghan Wang, Alexander Rush, V olodymyr Kuleshov, Hugo Dalla-Torre, Sam Boshar, Bernardo P de Almeida, and Thomas Pierrot. Simple guidance mecha-"},{"citing_arxiv_id":"2604.05497","ref_index":30,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Thinking Diffusion: Penalize and Guide Visual-Grounded Reasoning in Diffusion Multimodal Language Models","primary_cat":"cs.AI","submitted_at":"2026-04-07T06:41:05+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Position and step penalty plus visual reasoning guidance fix premature answering and weak visual grounding in diffusion MLLMs, delivering up to 7.5% accuracy gains and over 3x speedup.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"the framework must force the model to regenerate the se- quence from the beginning, which requires additional infer- ence calls and results in several times higher computational cost [11, 23, 26, 32]. In contrast, the reasoning process of dLLMs consists of iterative steps, where tokens are unmasked and the remain- ing tokens are remasked for progressive refinement at each timestep [30, 38]. Thus, the core of diffusion CoT lies in the remasking strategy [10, 33]. While intuitive strategies such as low-confidence, entropy, and margin-based methods have been explored, research on optimal remasking strate- gies for reasoning remains in its early stages. Although dif- fusion CoT has not yet reached the reasoning performance level of AR CoT, it offers two key advantages that AR CoT"},{"citing_arxiv_id":"2602.16813","ref_index":20,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Flow Map Language Models: One-step Language Modeling via Continuous Denoising","primary_cat":"cs.CL","submitted_at":"2026-02-18T19:23:07+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Continuous flows on token embeddings with flow-map distillation produce one-step language models whose quality exceeds recent 8-step discrete diffusion baselines on LM1B and OpenWebText.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2512.14067","ref_index":7,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Efficient-DLM: From Autoregressive to Diffusion Language Models, and Beyond in Speed","primary_cat":"cs.CL","submitted_at":"2025-12-16T04:12:17+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Efficient-DLM converts AR models to dLMs via block-wise causal attention and position-dependent masking, yielding higher accuracy and 2.7-4.5x throughput than Dream 7B and Qwen3 4B.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2510.09541","ref_index":1,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models","primary_cat":"cs.CL","submitted_at":"2025-10-10T16:52:25+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"SPG uses upper and lower bounds on log-likelihood to provide a better policy gradient for RL in diffusion LLMs, outperforming ELBO-based methods on math and puzzle tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2510.03206","ref_index":33,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Coevolutionary Continuous Discrete Diffusion: Make Your Diffusion Language Model a Latent Reasoner","primary_cat":"cs.AI","submitted_at":"2025-10-03T17:44:41+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"CCDD defines a joint multimodal diffusion on continuous representation space and discrete token space to combine expressivity with explicit token supervision for diffusion language models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}