{"total":20,"items":[{"citing_arxiv_id":"2607.01775","ref_index":50,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Set Diffusion: Interpolating Token Orderings Between Autoregression and Diffusion for Fast and Flexible Decoding","primary_cat":"cs.LG","submitted_at":"2026-07-02T06:45:43+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Set diffusion factorizes likelihood over arbitrary token sets and uses a set-causal diffusion architecture to support KV caching and any-order decoding, yielding improved speed-quality tradeoffs versus prior diffusion LMs.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2607.01774","ref_index":60,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Subliminal Clocks: Latent Time Modelling in Diffusion Language Models","primary_cat":"cs.AI","submitted_at":"2026-07-02T06:45:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"DLMs encode a decodable latent timestep signal in residual activations that can be steered to predictably change model confidence and entropy.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2607.00714","ref_index":22,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Self-conditioned Flow Map Language Models via Fixed-point Flows","primary_cat":"cs.CL","submitted_at":"2026-07-01T10:02:20+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Self-conditioned flow language models solve fixed-point iterations, enabling fixed-point flow maps that distill into FMLM* which outperforms SOTA in few-step generation on OpenWebText.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2607.00588","ref_index":13,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Low Perplexity is Repetition: A One-Dimensional Self-Conditioning Attractor in Continuous Diffusion LMs","primary_cat":"cs.CL","submitted_at":"2026-07-01T08:13:07+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Low Gen-PPL in continuous diffusion LMs results from repetition caused by a 1D contractive attractor in self-conditioning feedback; ACE subtracts the direction to reduce repetition to human levels while preserving quality.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.08417","ref_index":29,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Hacking Generative Perplexity: Why Unconditional Text Evaluation Needs Distributional Metrics","primary_cat":"cs.CL","submitted_at":"2026-06-07T02:35:56+00:00","verdict":"ACCEPT","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Naive samplers beat published diffusion and flow models on gen-PPL with incoherent output, proving the metric unsound and motivating distributional evaluation suites.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.24173","ref_index":39,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Extracting Training Data from Diffusion Language Models via Infilling","primary_cat":"cs.CL","submitted_at":"2026-05-22T19:46:08+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Infilling extraction on diffusion language models extracts up to three times more verbatim sequences than prefix methods and achieves higher recall on redacted emails than autoregressive models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.22586","ref_index":29,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"A Tutorial on Diffusion Theory: From Differential Equations to Diffusion Models","primary_cat":"cs.LG","submitted_at":"2026-05-21T14:59:12+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":1.0,"formal_verification":"none","one_line_summary":"A tutorial that unifies diffusion probabilistic models, score-based generative modeling, and SDE methods by deriving forward and reverse dynamics from a shared Gaussian noising process.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.19726","ref_index":37,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Efficient Long-Context Modeling in Diffusion Language Models via Block Approximate Sparse Attention","primary_cat":"cs.CV","submitted_at":"2026-05-19T12:01:13+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"BA-Att introduces pre-downsampled block selection with norm-sorting and diagonal covariance correction to approximate sparse attention, yielding up to 6.95x speedup at 50% sparsity across language, multimodal, and video models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.18530","ref_index":66,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Continuous Diffusion Scales Competitively with Discrete Diffusion for Language","primary_cat":"cs.CL","submitted_at":"2026-05-18T15:15:24+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"RePlaid achieves a 20x compute gap to autoregressive models, new SOTA PPL of 22.1 among continuous DLMs on OpenWebText, and competitive scaling laws by aligning architecture with modern discrete DLMs.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.14531","ref_index":40,"ref_count":3,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Language Generation as Optimal Control: Closed-Loop Diffusion in Latent Control Space","primary_cat":"cs.CL","submitted_at":"2026-05-14T08:13:43+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Manta-LM approximates the HJB equation via flow matching in latent control space to realize closed-loop optimal control for language generation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.12836","ref_index":27,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Discrete Stochastic Localization for Non-autoregressive Generation","primary_cat":"cs.LG","submitted_at":"2026-05-13T00:12:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"DSL provides a continuous embedding framework where one denoiser supports a family of SNR paths for discrete sequences, improving MAUVE scores on OpenWebText and allowing random-order and hybrid sampling from a fine-tuned MDLM checkpoint.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.11577","ref_index":21,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"BitLM: Unlocking Multi-Token Language Generation with Bitwise Continuous Diffusion","primary_cat":"cs.CL","submitted_at":"2026-05-12T06:02:59+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"BitLM replaces per-token softmax with bitwise continuous diffusion inside causal blocks to generate multiple tokens in parallel while preserving autoregressive structure.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"At inference time, decoding alternates between a causal backbone update and a blockwise denoising step, as summarized in Algorithm 2. Suppose blocks up to n− 1 have already been realized. The backbone consumes the prompt and previously generated binary blocks, maintains a KV cache, and outputs the condition tensor C(n−1) for the next block. We then initialize the target block from Gaussian noise: A(n) tK ∼ N(0,I m×B ), (21) 7 Preprint. Under review. Figure 3:Pretraining loss for the 0.6B, 1.7B, 4B, and 8B BitLM. Figure 4:Cfg and denoising step ablation of inference setting. where 1 =t K >t K−1 >· · ·>t 0 = 0 is a denoising schedule with K steps. At each step, we predict the clean block and move the current state toward it: ˆA(n) 0 =DiffHead θ \u0010 A(n) tk ,t k;C (n−1)"},{"citing_arxiv_id":"2605.10938","ref_index":66,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"ELF: Embedded Language Flows","primary_cat":"cs.CL","submitted_at":"2026-05-11T17:59:29+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"ELF applies continuous-time flow matching in embedding space for language generation and reports outperforming prior discrete and continuous diffusion language models with fewer steps.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Our method adopts Flow Matching to formulate language generation in continuous embedding space and continuous time. Continuous diffusion language models.Continuous DLMs map discrete tokens to a continuous space to perform denoising.Embedding-spacemethods, such as Diffusion-LM [ 34], CDCD [13], and DiffuSeq [19], add Gaussian noise directly to token embeddings [66, 79, 21, 72, 77, 36, 74, 15]. A complementary direction studiessimplex-basedrepresentations, including SSD-LM [ 22] and TESS [44, 68], as well as related manifold-based formulations [27]. Although these methods provide 2 � ELF network input tokens corruptembed or.... �� .... .... �1 �� ....�2Training ELF denoiser ELF (t=1) unembed output tokens ( gaussian )"},{"citing_arxiv_id":"2605.06548","ref_index":87,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Continuous Latent Diffusion Language Model","primary_cat":"cs.CL","submitted_at":"2026-05-07T16:44:56+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Cola DLM proposes a hierarchical latent diffusion model that learns a text-to-latent mapping, fits a global semantic prior in continuous space with a block-causal DiT, and performs conditional decoding, establishing latent prior modeling as an alternative to token-level autoregressive language model","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"[85] Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations.arXiv preprint arXiv:2011.13456, 2020. [86] Aadithya Srikanth, Mudit Gaur, and Vaneet Aggarwal. Discrete state diffusion models: A sample complexity perspective. arXiv preprint arXiv:2510.10854, 2025. [87] Robin Strudel, Corentin Tallec, Florent Altché, Yilun Du, Yaroslav Ganin, Arthur Mensch, Will Grathwohl, Nikolay Savinov, Sander Dieleman, Laurent Sifre, et al. Self-conditioned embedding diffusion for text generation. arXiv preprint arXiv:2211.04236, 2022. [88] Haoran Sun, Lijun Yu, Bo Dai, Dale Schuurmans, and Hanjun Dai. Score-based continuous-time discrete diffusion"},{"citing_arxiv_id":"2604.26985","ref_index":15,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Simple Self-Conditioning Adaptation for Masked Diffusion Models","primary_cat":"cs.LG","submitted_at":"2026-04-28T19:34:04+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"SCMDM is a post-training self-conditioning adaptation for masked diffusion models that reduces generative perplexity by nearly 50% on OWT and improves performance on images, molecules, and genomics.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2602.16813","ref_index":36,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Flow Map Language Models: One-step Language Modeling via Continuous Denoising","primary_cat":"cs.CL","submitted_at":"2026-02-18T19:23:07+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Continuous flows on token embeddings with flow-map distillation produce one-step language models whose quality exceeds recent 8-step discrete diffusion baselines on LM1B and OpenWebText.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2505.16933","ref_index":88,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning","primary_cat":"cs.LG","submitted_at":"2025-05-22T17:23:26+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LLaDA-V is a diffusion-based multimodal large language model that reaches competitive or state-of-the-art results on visual instruction tasks while using a non-autoregressive architecture.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"[86] S. Gong, M. Li, J. Feng, Z. Wu, and L. Kong, \"Diffuseq: Sequence to sequence text generation with diffusion models,\"arXiv preprint arXiv:2210.08933, 2022. [87] X. Han, S. Kumar, and Y . Tsvetkov, \"Ssd-lm: Semi-autoregressive simplex-based diffusion language model for text generation and modular control,\"arXiv preprint arXiv:2210.17432, 2022. 14 [88] R. Strudel, C. Tallec, F. Altché, Y . Du, Y . Ganin, A. Mensch, W. Grathwohl, N. Savinov, S. Dieleman, L. Sifre et al., \"Self-conditioned embedding diffusion for text generation,\"arXiv preprint arXiv:2211.04236, 2022. [89] T. Chen, R. Zhang, and G. Hinton, \"Analog bits: Generating discrete data using diffusion models with self-conditioning,\"arXiv preprint arXiv:2208."},{"citing_arxiv_id":"2502.17119","ref_index":140,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Diffusion and Flow Matching Models for Tabular Data: A Survey","primary_cat":"cs.LG","submitted_at":"2025-02-24T13:01:33+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"First dedicated survey organizing diffusion and flow matching models for tabular data synthesis, imputation, anomaly detection, and related tasks, covering literature from 2015 to 2026 and highlighting open problems.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Diffusion Models for Discrete Data. Diffusion models for discrete data can be divided into two categories [134]: 1) the first category of works converts discrete structures into a continuous latent space and then directly applies Gaussian diffusion model in the latent space. This line of works include [97], [100], [135], [136], [137], [138], [139], [140]; 2) the second category of works directly defines the diffusion process on discrete structures, and we will focus on this category in the remaining of this section. 1) Binary Diffusion Model: This line of work is for the first time studied in [22], which explored the scenario of binary random features. 2) Multinomial Diffusion Model: Multinomial Diffusion"},{"citing_arxiv_id":"2502.09992","ref_index":45,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Large Language Diffusion Models","primary_cat":"cs.CL","submitted_at":"2025-02-14T08:23:51+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"LLaDA is a scalable diffusion-based language model that matches autoregressive LLMs like LLaMA3 8B on tasks and surpasses GPT-4o on reversal poem completion.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"based diffusion language model for text generation and modular control.arXiv preprint arXiv:2210.17432, 2022. [44] Robin Strudel, Corentin Tallec, Florent Altché, Yilun Du, Yaroslav Ganin, Arthur Mensch, Will Grathwohl, Nikolay Savinov, Sander Dieleman, Laurent Sifre, et al. Self-conditioned embedding diffusion for text generation.arXiv preprint arXiv:2211.04236, 2022. [45] Ting Chen, Ruixiang Zhang, and Geoffrey Hinton. Analog bits: Generating discrete data using diffusion models with self-conditioning.arXiv preprint arXiv:2208.04202, 2022. [46] Sander Dieleman, Laurent Sartran, Arman Roshannai, Nikolay Savinov, Yaroslav Ganin, Pierre H Richemond, Arnaud Doucet, Robin Strudel, Chris Dyer, Conor Durkan, et al. Contin-"},{"citing_arxiv_id":"2406.03736","ref_index":21,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Your Absorbing Discrete Diffusion Secretly Models the Conditional Distributions of Clean Data","primary_cat":"cs.LG","submitted_at":"2024-06-06T04:22:11+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Absorbing discrete diffusion models the conditional distributions of clean data; reparameterizing yields a time-independent RADD that unifies with AO-ARMs and reaches SOTA perplexity among diffusion models on zero-shot language benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}