Diffusion Language Model Parallel Decoding via Product-of-Experts Bridge

Brian L. Trippe; Juntong Shi; Jure Leskovec; Minkai Xu; Stefano Ermon

arxiv: 2606.08048 · v1 · pith:UZWSGYF3new · submitted 2026-06-06 · 💻 cs.CL

Diffusion Language Model Parallel Decoding via Product-of-Experts Bridge

Juntong Shi , Brian L. Trippe , Jure Leskovec , Stefano Ermon , Minkai Xu This is my paper

Pith reviewed 2026-06-27 19:45 UTC · model grok-4.3

classification 💻 cs.CL

keywords diffusion language modelsparallel decodingproduct of expertsimportance samplingrejection samplingautoregressive modelsmathematical reasoningcode generation

0 comments

The pith

Product-of-Experts bridge enables diffusion language models to decode in parallel while recovering at least 95 percent of autoregressive model performance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Diffusion language models generate text quickly through parallel token prediction but fall short in quality compared to autoregressive models due to missing token dependencies. The paper proposes PoE-Bridge to close this gap by inserting an intermediate Product-of-Experts distribution formed from the diffusion proposal and autoregressive target. Drafting occurs in parallel with the diffusion model, followed by rejection sampling to align candidates with the PoE and importance sampling to correct toward the target. This yields a fivefold speedup over standard diffusion decoding and recovers most of the quality on mathematical reasoning and coding tasks. A sympathetic reader would care because it offers a practical way to combine the speed of parallel generation with near-autoregressive accuracy.

Core claim

PoE-Bridge constructs an intermediate distribution as the product of experts from the DLM proposal and AR target. Multiple continuations are drafted in parallel using the DLM, rejection sampling verifies and shifts them toward the PoE, and importance sampling further aligns them with the AR target. Additional techniques include mixed-temperature sampling for diversity and elastic rejection windows to minimize wasted computation. This framework achieves significantly improved accuracy with a 5 times speedup over standard DLM decoding and recovers at least 95 percent of the target AR model's performance on challenging tasks.

What carries the argument

The intermediate Product-of-Experts distribution that serves as a bridge for rejection and importance sampling between the diffusion language model proposal and the autoregressive target.

If this is right

Parallel decoding with the PoE bridge advances most of the quality gap to autoregressive models on math and coding.
The method maintains efficiency while improving accuracy through the two-stage sampling correction.
Mixed-temperature sampling increases output diversity without sacrificing the performance gains.
Elastic rejection windows reduce the computational waste in verification steps.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach may apply to other generative model pairs where one is fast but approximate and the other is accurate but sequential.
It could influence the design of hybrid decoding algorithms for future large language models balancing speed and quality.
Empirical results on specific tasks suggest potential for broader application if the sampling efficiency holds across domains.

Load-bearing premise

An intermediate Product-of-Experts distribution from the DLM proposal and AR target can be sampled efficiently via rejection-plus-importance procedures without needing too many particles or introducing bias that blocks recovery of AR performance.

What would settle it

Observing that the rejection sampling step requires a number of particles that makes the overall computation slower than standard autoregressive decoding, or that the generated outputs fall short of 95 percent AR performance recovery on the mathematical reasoning and coding benchmarks.

Figures

Figures reproduced from arXiv: 2606.08048 by Brian L. Trippe, Juntong Shi, Jure Leskovec, Minkai Xu, Stefano Ermon.

**Figure 1.** Figure 1: Comparison between naive speculative sampling and PoE-Bridge. (A) Naive speculative sampling directly corrects DLM drafts from pD to the AR target pAR. Due to the large proposal–target mismatch, direct verification often accepts only short prefixes, resulting in limited throughput gains. (B) PoE-Bridge splits this difficult correction into two easier stages: speculative rejection sampling first moves DLM d… view at source ↗

**Figure 2.** Figure 2: Effect of increasing the number of parallel candidates K under uniform- and mixed-temperature sampling. Mixed-temperature sampling enables consistent accuracy improvements with increasing K, whereas uniform-temperature sampling yields early-plateau returns. ilies, as they share the same tokenizer and vocabulary. Throughout all experiments, we use Dream-7B-Instruct as the DLM proposal. For task-specific AR … view at source ↗

**Figure 3.** Figure 3: Additional statistics for the ablation study on the scaling effect of K, conducted on MATH. Since the AR decoding baseline does not have the corresponding statistics for the #Accept per Fwd. statistics, we omit it in that subplot. 5 10 15 K 0.73 0.74 0.75 0.76 Accuracy (%) 5 10 15 K 50 60 70 80 Thrpt. (tok/sec) 5 10 15 K 170 180 190 200 210 Gen Len (tokens) 5 10 15 K 5.2 5.4 5.6 5.8 6.0 #Accept per Fwd. MB… view at source ↗

**Figure 4.** Figure 4: Additional statistics for the ablation study on the scaling effect of K, conducted on MBPP. Since the AR decoding baseline does not have the corresponding statistics for the #Accept per Fwd. statistics, we omit it in that subplot. 14 [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗

read the original abstract

Diffusion language models (DLMs) offer substantial speed advantages through parallel decoding, but the lack of token dependencies limits generation quality compared to autoregressive (AR) models. Recent progress attempts to bridge the gap via importance sampling, with DLM being the proposal and AR being the target. However, due to the huge gap between their distributions, the sampling requires a large number of particles and is thus expensive to compute. In this paper, we introduce PoE-Bridge, a novel decoding framework that drastically improves generation speed and accuracy by introducing an intermediate distribution to bridge the gap. The distribution is constructed as a Product-of-Experts (PoE) of the DLM proposal and the AR target. With the intermediate distribution, we first use the DLM to draft multiple continuations in parallel, then apply rejection sampling to verify the drafted tokens and move the resulting candidates toward the PoE. We then use importance sampling to further correct the PoE-aligned candidates toward the AR target. We further propose several improved techniques, including mixed-temperature sampling for enhanced diversity and elastic rejection windows for reducing wasted verification. Empirically, PoE-Bridge achieves significantly improved accuracy with $5\times$ speedup over the standard DLM decoding approach, and recovers at least 95% of the target AR model's performance, efficiently advancing most of the quality gap on challenging mathematical reasoning and coding tasks. Our code is available at https://github.com/juntongshi48/poe-bridge.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PoE-Bridge inserts a product-of-experts intermediate between DLM drafting and AR correction to cut the particle count needed for importance sampling, but the abstract gives no particle counts or variance numbers so the 5x/95% claims stay unverified.

read the letter

The new piece is the two-stage correction: draft in parallel with the DLM, reject samples to push them toward the PoE of DLM and AR, then importance-sample the survivors to the AR target. They also add mixed-temperature sampling and elastic rejection windows. That combination is not in the earlier DLM-to-AR importance sampling papers cited in the abstract.

The work is useful because it directly attacks the distribution gap that forces huge particle budgets in plain importance sampling. The code release is a plus for anyone who wants to test the idea on math or code tasks.

The soft spot is exactly the one in the stress-test note. The abstract says the PoE narrows the gap enough for 5x speedup and 95% recovery, but supplies no numbers on how many particles survive rejection, what the effective sample size is after importance weighting, or how the weights behave on the reasoning benchmarks. If the PoE does not shrink the gap as much as hoped, the second stage could still need many particles or leave bias. Without those diagnostics or ablations the 95% figure cannot be evaluated.

This is for groups already running diffusion or parallel decoding experiments who need lower latency on reasoning workloads. A reader who wants to reproduce or extend the method will get value from the framework even if the numbers need checking.

It deserves peer review because the approach is concrete, the problem is real, and the claims are falsifiable once the particle and variance data are shown.

Referee Report

3 major / 2 minor

Summary. The paper introduces PoE-Bridge, a decoding framework for diffusion language models that constructs an intermediate Product-of-Experts distribution from the DLM proposal and AR target. It uses parallel DLM drafting, followed by rejection sampling to align candidates with the PoE and importance sampling to correct toward the AR target, augmented by mixed-temperature sampling and elastic rejection windows. The central empirical claim is a 5× speedup over standard DLM decoding while recovering at least 95% of AR model performance on mathematical reasoning and coding tasks.

Significance. If the sampling procedure can be shown to achieve the claimed recovery without prohibitive particle counts or uncontrolled bias, the work would meaningfully advance parallel decoding by narrowing the quality gap between fast DLMs and slower AR models. Code availability is a strength that aids reproducibility.

major comments (3)

[Abstract] Abstract: the claim that the two-stage rejection-plus-importance procedure recovers ≥95% of AR performance rests on the assumption that the PoE intermediate can be sampled efficiently; no particle counts, effective sample sizes, or importance-weight variance are reported, leaving open whether the procedure avoids the 'large number of particles' problem explicitly noted for standard importance sampling.
[§3.2] §3.2 (PoE sampling procedure): the rejection step that 'moves the resulting candidates toward the PoE' followed by importance correction to the AR target is load-bearing for both the speedup and accuracy claims, yet the manuscript supplies no analysis of acceptance rates, residual bias after the second stage, or how the mixed-temperature parameters affect weight variance.
[§4] §4 (Experiments): the headline numbers (5× speedup, 95% recovery) are presented without ablations isolating the contribution of the PoE bridge versus the auxiliary techniques, without statistical significance tests, and without error analysis on the mathematical-reasoning and coding tasks, making it impossible to verify that the central claim holds.

minor comments (2)

The abstract states that code is available but does not indicate the license or whether the released repository contains the exact experimental configurations used for the reported numbers.
[§3] Notation for the PoE distribution p_PoE(x) = p_DLM(x) · p_AR(x) / Z is introduced without an explicit normalizing-constant discussion or reference to how Z is handled in the rejection and importance steps.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight important aspects of our sampling procedure and experimental validation. We address each major comment below and commit to revisions that will strengthen the manuscript without altering its core claims.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that the two-stage rejection-plus-importance procedure recovers ≥95% of AR performance rests on the assumption that the PoE intermediate can be sampled efficiently; no particle counts, effective sample sizes, or importance-weight variance are reported, leaving open whether the procedure avoids the 'large number of particles' problem explicitly noted for standard importance sampling.

Authors: We agree that these efficiency metrics are necessary to substantiate the claim. In the revised manuscript we will report the number of particles used, effective sample sizes, and importance-weight variance for the math and coding experiments, directly comparing them to the direct importance-sampling baseline to show that the PoE bridge materially reduces the particle requirement. revision: yes
Referee: [§3.2] §3.2 (PoE sampling procedure): the rejection step that 'moves the resulting candidates toward the PoE' followed by importance correction to the AR target is load-bearing for both the speedup and accuracy claims, yet the manuscript supplies no analysis of acceptance rates, residual bias after the second stage, or how the mixed-temperature parameters affect weight variance.

Authors: We acknowledge the absence of this analysis. Section 3.2 will be expanded with (i) empirical acceptance rates for the rejection step, (ii) an assessment of residual bias after the importance-sampling correction, and (iii) an ablation of mixed-temperature settings and their effect on weight variance. These additions will quantify the contribution of each stage. revision: yes
Referee: [§4] §4 (Experiments): the headline numbers (5× speedup, 95% recovery) are presented without ablations isolating the contribution of the PoE bridge versus the auxiliary techniques, without statistical significance tests, and without error analysis on the mathematical-reasoning and coding tasks, making it impossible to verify that the central claim holds.

Authors: We agree that the experimental section requires greater rigor. The revised §4 will include (a) ablations that isolate the PoE bridge from mixed-temperature sampling and elastic rejection windows, (b) statistical significance tests (paired t-tests across seeds), and (c) per-task error bars or variance across runs. These changes will allow readers to verify the reported speed-up and recovery figures. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper introduces PoE-Bridge as a new intermediate distribution (DLM proposal × AR target) and describes a two-stage correction (rejection sampling followed by importance sampling) plus auxiliary techniques such as mixed-temperature sampling. No equation or claim reduces by construction to a fitted parameter renamed as a prediction, nor does any load-bearing step rely on a self-citation chain whose cited result is itself unverified. The performance numbers are presented as empirical outcomes of the sampling procedure rather than algebraic identities; the central assumption (efficient sampling from the PoE without prohibitive particle counts or uncontrolled bias) is stated explicitly and left open to external verification via the released code. The derivation therefore remains independent of its own inputs.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The method rests on standard sampling theory and introduces the PoE distribution as the central new construct; details on any fitted parameters or additional assumptions are absent from the abstract.

free parameters (1)

mixed-temperature sampling parameters
Mentioned for diversity but no values or fitting procedure given in the abstract.

axioms (1)

standard math Rejection sampling and importance sampling can be applied sequentially to move samples from the DLM proposal through the PoE toward the AR target without prohibitive variance.
Standard Monte Carlo techniques assumed to function as described for the large distribution gap.

invented entities (1)

Product-of-Experts bridge distribution no independent evidence
purpose: Intermediate distribution that enables efficient correction from DLM to AR via the two-stage sampling procedure.
Newly postulated construct whose sampling properties are central to the claimed speedup and quality recovery.

pith-pipeline@v0.9.1-grok · 5804 in / 1423 out tokens · 25056 ms · 2026-06-27T19:45:25.233476+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

41 extracted references · 5 canonical work pages

[1]

Qwen2.5: A Party of Foundation Models , url =

Qwen Team , month =. Qwen2.5: A Party of Foundation Models , url =
[2]

2024 , eprint=

Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement , author=. 2024 , eprint=

2024
[3]

2024 , eprint=

Qwen2.5-Coder Technical Report , author=. 2024 , eprint=

2024
[4]

arXiv preprint arXiv:2108.07732 , year=

Program synthesis with large language models , author=. arXiv preprint arXiv:2108.07732 , year=

Pith/arXiv arXiv
[5]

2025 , eprint=

Dream 7B: Diffusion Large Language Models , author=. 2025 , eprint=

2025
[6]

arXiv preprint arXiv:2110.14168 , year=

Training verifiers to solve math word problems , author=. arXiv preprint arXiv:2110.14168 , year=

Pith/arXiv arXiv
[7]

arXiv preprint arXiv:2103.03874 , year=

Measuring mathematical problem solving with the math dataset , author=. arXiv preprint arXiv:2103.03874 , year=

Pith/arXiv arXiv
[8]

arXiv preprint arXiv:2107.03374 , year=

Evaluating large language models trained on code , author=. arXiv preprint arXiv:2107.03374 , year=

Pith/arXiv arXiv
[9]

The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

Large Language Diffusion Models , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=
[10]

Accelerating Diffusion

Daniel Mingyi Israel and Guy Van den Broeck and Aditya Grover , booktitle=. Accelerating Diffusion. 2025 , url=

2025
[11]

The Thirty-eighth Annual Conference on Neural Information Processing Systems , year=

Simple and Effective Masked Diffusion Language Models , author=. The Thirty-eighth Annual Conference on Neural Information Processing Systems , year=
[12]

The Thirty-eighth Annual Conference on Neural Information Processing Systems , year=

Simplified and Generalized Masked Diffusion for Discrete Data , author=. The Thirty-eighth Annual Conference on Neural Information Processing Systems , year=
[13]

Advances in Neural Information Processing Systems , editor=

Structured Denoising Diffusion Models in Discrete State-Spaces , author=. Advances in Neural Information Processing Systems , editor=. 2021 , url=

2021
[14]

Advances in Neural Information Processing Systems , editor=

A Continuous Time Framework for Discrete Denoising Models , author=. Advances in Neural Information Processing Systems , editor=. 2022 , url=

2022
[15]

2024 , url=

Discrete Diffusion Language Modeling by Estimating the Ratios of the Data Distribution , author=. 2024 , url=

2024
[16]

, title =

Veach, Eric and Guibas, Leonidas J. , title =. 1995 , isbn =. doi:10.1145/218380.218498 , booktitle =

work page doi:10.1145/218380.218498 1995
[17]

Artificial Neural Networks, 1999

Products of experts , author =. Artificial Neural Networks, 1999. ICANN 99. Ninth International Conference on (Conf. Publ. No. 470) , volume =. 1999 , organization =

1999
[18]

doi:10.5281/zenodo.10256836 , url =

Gao, Leo and Tow, Jonathan and Abbasi, Baber and Biderman, Stella and Black, Sid and DiPofi, Anthony and Foster, Charles and Golding, Laurence and Hsu, Jeffrey and Le Noac'h, Alain and Li, Haonan and McDonell, Kyle and Muennighoff, Niklas and Ociepa, Chris and Phang, Jason and Reynolds, Laria and Schoelkopf, Hailey and Skowron, Aviya and Sutawika, Lintang...

work page doi:10.5281/zenodo.10256836
[19]

2025 , eprint=

DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation , author=. 2025 , eprint=

2025
[20]

2023 , eprint=

Accelerating Large Language Model Decoding with Speculative Sampling , author=. 2023 , eprint=

2023
[21]

Proceedings of the 40th International Conference on Machine Learning , pages =

Fast Inference from Transformers via Speculative Decoding , author =. Proceedings of the 40th International Conference on Machine Learning , pages =. 2023 , editor =

2023
[22]

The Thirteenth International Conference on Learning Representations , year=

Energy-Based Diffusion Language Models for Text Generation , author=. The Thirteenth International Conference on Learning Representations , year=
[23]

The Thirteenth International Conference on Learning Representations , year=

Faster Cascades via Speculative Decoding , author=. The Thirteenth International Conference on Learning Representations , year=
[24]

and Ben-Nun, Tal and Cardei, Michael and Kailkhura, Bhavya and Fioretto, Ferdinando

Christopher, Jacob K and Bartoldson, Brian R. and Ben-Nun, Tal and Cardei, Michael and Kailkhura, Bhavya and Fioretto, Ferdinando. Speculative Diffusion Decoding: Accelerating Language Generation through Diffusion. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Tec...

work page doi:10.18653/v1/2025.naacl-long.601 2025
[25]

2025 , eprint=

Reviving Any-Subset Autoregressive Models with Principled Parallel Sampling and Speculative Decoding , author=. 2025 , eprint=

2025
[26]

2024 , eprint=

ParallelSpec: Parallel Drafter for Efficient Speculative Decoding , author=. 2024 , eprint=

2024
[27]

International Conference on Learning Representations , year=

Non-Autoregressive Neural Machine Translation , author=. International Conference on Learning Representations , year=
[28]

Mask-Predict: Parallel Decoding of Conditional Masked Language Models , author=. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) , pages=

2019
[29]

International Conference on Learning Representations , year=

Step-unrolled Denoising Autoencoders for Text Generation , author=. International Conference on Learning Representations , year=
[30]

arXiv preprint arXiv:2302.05737 , year=

A reparameterized discrete diffusion model for text generation , author=. arXiv preprint arXiv:2302.05737 , year=

arXiv
[31]

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 , pages=

Fully Non-autoregressive Neural Machine Translation: Tricks of the Trade , author=. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 , pages=

2021
[32]

NIPS , year=

Attention is All you Need , author=. NIPS , year=
[33]

Transformers: State-of-the-Art Natural Language Processing

Wolf, Thomas and Debut, Lysandre and Sanh, Victor and Chaumond, Julien and Delangue, Clement and Moi, Anthony and Cistac, Pierric and Rault, Tim and Louf, Remi and Funtowicz, Morgan and Davison, Joe and Shleifer, Sam and von Platen, Patrick and Ma, Clara and Jernite, Yacine and Plu, Julien and Xu, Canwen and Le Scao, Teven and Gugger, Sylvain and Drame, M...

work page doi:10.18653/v1/2020.emnlp-demos.6 2020
[34]

arXiv preprint arXiv:2302.13971 , year=

Llama: Open and efficient foundation language models , author=. arXiv preprint arXiv:2302.13971 , year=

Pith/arXiv arXiv
[35]

Advances in neural information processing systems , volume=

Chain-of-thought prompting elicits reasoning in large language models , author=. Advances in neural information processing systems , volume=
[36]

Nature , volume=

Solving olympiad geometry without human demonstrations , author=. Nature , volume=. 2024 , publisher=

2024
[37]

arXiv preprint arXiv:2308.12950 , year=

Code llama: Open foundation models for code , author=. arXiv preprint arXiv:2308.12950 , year=

Pith/arXiv arXiv
[38]

Accelerating

Nadav Timor and Jonathan Mamou and Daniel Korat and Moshe Berchansky and Gaurav Jain and Oren Pereg and Moshe Wasserblat and David Harel , booktitle=. Accelerating. 2025 , url=

2025
[39]

2025 , eprint=

TiDAR: Think in Diffusion, Talk in Autoregression , author=. 2025 , eprint=

2025
[40]

The Fourteenth International Conference on Learning Representations , year=

Speculative Speculative Decoding , author=. The Fourteenth International Conference on Learning Representations , year=
[41]

1986 , edition =

Luc Devroye , title =. 1986 , edition =. doi:10.1007/978-1-4613-8643-8 , pages =

work page doi:10.1007/978-1-4613-8643-8 1986

[1] [1]

Qwen2.5: A Party of Foundation Models , url =

Qwen Team , month =. Qwen2.5: A Party of Foundation Models , url =

[2] [2]

2024 , eprint=

Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement , author=. 2024 , eprint=

2024

[3] [3]

2024 , eprint=

Qwen2.5-Coder Technical Report , author=. 2024 , eprint=

2024

[4] [4]

arXiv preprint arXiv:2108.07732 , year=

Program synthesis with large language models , author=. arXiv preprint arXiv:2108.07732 , year=

Pith/arXiv arXiv

[5] [5]

2025 , eprint=

Dream 7B: Diffusion Large Language Models , author=. 2025 , eprint=

2025

[6] [6]

arXiv preprint arXiv:2110.14168 , year=

Training verifiers to solve math word problems , author=. arXiv preprint arXiv:2110.14168 , year=

Pith/arXiv arXiv

[7] [7]

arXiv preprint arXiv:2103.03874 , year=

Measuring mathematical problem solving with the math dataset , author=. arXiv preprint arXiv:2103.03874 , year=

Pith/arXiv arXiv

[8] [8]

arXiv preprint arXiv:2107.03374 , year=

Evaluating large language models trained on code , author=. arXiv preprint arXiv:2107.03374 , year=

Pith/arXiv arXiv

[9] [9]

The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

Large Language Diffusion Models , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

[10] [10]

Accelerating Diffusion

Daniel Mingyi Israel and Guy Van den Broeck and Aditya Grover , booktitle=. Accelerating Diffusion. 2025 , url=

2025

[11] [11]

The Thirty-eighth Annual Conference on Neural Information Processing Systems , year=

Simple and Effective Masked Diffusion Language Models , author=. The Thirty-eighth Annual Conference on Neural Information Processing Systems , year=

[12] [12]

The Thirty-eighth Annual Conference on Neural Information Processing Systems , year=

Simplified and Generalized Masked Diffusion for Discrete Data , author=. The Thirty-eighth Annual Conference on Neural Information Processing Systems , year=

[13] [13]

Advances in Neural Information Processing Systems , editor=

Structured Denoising Diffusion Models in Discrete State-Spaces , author=. Advances in Neural Information Processing Systems , editor=. 2021 , url=

2021

[14] [14]

Advances in Neural Information Processing Systems , editor=

A Continuous Time Framework for Discrete Denoising Models , author=. Advances in Neural Information Processing Systems , editor=. 2022 , url=

2022

[15] [15]

2024 , url=

Discrete Diffusion Language Modeling by Estimating the Ratios of the Data Distribution , author=. 2024 , url=

2024

[16] [16]

, title =

Veach, Eric and Guibas, Leonidas J. , title =. 1995 , isbn =. doi:10.1145/218380.218498 , booktitle =

work page doi:10.1145/218380.218498 1995

[17] [17]

Artificial Neural Networks, 1999

Products of experts , author =. Artificial Neural Networks, 1999. ICANN 99. Ninth International Conference on (Conf. Publ. No. 470) , volume =. 1999 , organization =

1999

[18] [18]

doi:10.5281/zenodo.10256836 , url =

Gao, Leo and Tow, Jonathan and Abbasi, Baber and Biderman, Stella and Black, Sid and DiPofi, Anthony and Foster, Charles and Golding, Laurence and Hsu, Jeffrey and Le Noac'h, Alain and Li, Haonan and McDonell, Kyle and Muennighoff, Niklas and Ociepa, Chris and Phang, Jason and Reynolds, Laria and Schoelkopf, Hailey and Skowron, Aviya and Sutawika, Lintang...

work page doi:10.5281/zenodo.10256836

[19] [19]

2025 , eprint=

DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation , author=. 2025 , eprint=

2025

[20] [20]

2023 , eprint=

Accelerating Large Language Model Decoding with Speculative Sampling , author=. 2023 , eprint=

2023

[21] [21]

Proceedings of the 40th International Conference on Machine Learning , pages =

Fast Inference from Transformers via Speculative Decoding , author =. Proceedings of the 40th International Conference on Machine Learning , pages =. 2023 , editor =

2023

[22] [22]

The Thirteenth International Conference on Learning Representations , year=

Energy-Based Diffusion Language Models for Text Generation , author=. The Thirteenth International Conference on Learning Representations , year=

[23] [23]

The Thirteenth International Conference on Learning Representations , year=

Faster Cascades via Speculative Decoding , author=. The Thirteenth International Conference on Learning Representations , year=

[24] [24]

and Ben-Nun, Tal and Cardei, Michael and Kailkhura, Bhavya and Fioretto, Ferdinando

Christopher, Jacob K and Bartoldson, Brian R. and Ben-Nun, Tal and Cardei, Michael and Kailkhura, Bhavya and Fioretto, Ferdinando. Speculative Diffusion Decoding: Accelerating Language Generation through Diffusion. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Tec...

work page doi:10.18653/v1/2025.naacl-long.601 2025

[25] [25]

2025 , eprint=

Reviving Any-Subset Autoregressive Models with Principled Parallel Sampling and Speculative Decoding , author=. 2025 , eprint=

2025

[26] [26]

2024 , eprint=

ParallelSpec: Parallel Drafter for Efficient Speculative Decoding , author=. 2024 , eprint=

2024

[27] [27]

International Conference on Learning Representations , year=

Non-Autoregressive Neural Machine Translation , author=. International Conference on Learning Representations , year=

[28] [28]

Mask-Predict: Parallel Decoding of Conditional Masked Language Models , author=. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) , pages=

2019

[29] [29]

International Conference on Learning Representations , year=

Step-unrolled Denoising Autoencoders for Text Generation , author=. International Conference on Learning Representations , year=

[30] [30]

arXiv preprint arXiv:2302.05737 , year=

A reparameterized discrete diffusion model for text generation , author=. arXiv preprint arXiv:2302.05737 , year=

arXiv

[31] [31]

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 , pages=

Fully Non-autoregressive Neural Machine Translation: Tricks of the Trade , author=. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 , pages=

2021

[32] [32]

NIPS , year=

Attention is All you Need , author=. NIPS , year=

[33] [33]

Transformers: State-of-the-Art Natural Language Processing

Wolf, Thomas and Debut, Lysandre and Sanh, Victor and Chaumond, Julien and Delangue, Clement and Moi, Anthony and Cistac, Pierric and Rault, Tim and Louf, Remi and Funtowicz, Morgan and Davison, Joe and Shleifer, Sam and von Platen, Patrick and Ma, Clara and Jernite, Yacine and Plu, Julien and Xu, Canwen and Le Scao, Teven and Gugger, Sylvain and Drame, M...

work page doi:10.18653/v1/2020.emnlp-demos.6 2020

[34] [34]

arXiv preprint arXiv:2302.13971 , year=

Llama: Open and efficient foundation language models , author=. arXiv preprint arXiv:2302.13971 , year=

Pith/arXiv arXiv

[35] [35]

Advances in neural information processing systems , volume=

Chain-of-thought prompting elicits reasoning in large language models , author=. Advances in neural information processing systems , volume=

[36] [36]

Nature , volume=

Solving olympiad geometry without human demonstrations , author=. Nature , volume=. 2024 , publisher=

2024

[37] [37]

arXiv preprint arXiv:2308.12950 , year=

Code llama: Open foundation models for code , author=. arXiv preprint arXiv:2308.12950 , year=

Pith/arXiv arXiv

[38] [38]

Accelerating

Nadav Timor and Jonathan Mamou and Daniel Korat and Moshe Berchansky and Gaurav Jain and Oren Pereg and Moshe Wasserblat and David Harel , booktitle=. Accelerating. 2025 , url=

2025

[39] [39]

2025 , eprint=

TiDAR: Think in Diffusion, Talk in Autoregression , author=. 2025 , eprint=

2025

[40] [40]

The Fourteenth International Conference on Learning Representations , year=

Speculative Speculative Decoding , author=. The Fourteenth International Conference on Learning Representations , year=

[41] [41]

1986 , edition =

Luc Devroye , title =. 1986 , edition =. doi:10.1007/978-1-4613-8643-8 , pages =

work page doi:10.1007/978-1-4613-8643-8 1986