Sdar: A synergistic diffusion-autoregression paradigm for scalable sequence generation

Shuang Cheng, Yihan Bian, Dawei Liu, Linfeng Zhang, Qian Yao, Zhongbo Tian, Wenhai Wang, Qipeng Guo, Kai Chen, Biqing Qi, et al · 2025 · arXiv 2510.06303

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

read on arXiv browse 9 citing papers

representative citing papers

BlockVLA: Accelerating Autoregressive VLA via Block Diffusion Finetuning

cs.RO · 2026-05-13 · unverdicted · novelty 7.0

BlockVLA accelerates autoregressive VLA models by 3.3x using block diffusion finetuning, with faster training convergence and better early performance on long-horizon robotic tasks.

Relative Score Policy Optimization for Diffusion Language Models

cs.CL · 2026-05-11 · unverdicted · novelty 7.0

RSPO interprets reward advantages as targets for relative log-ratios in dLLMs, calibrating noisy estimates to stabilize RLVR training and achieve strong gains on planning tasks with competitive math reasoning performance.

TAD: Temporal-Aware Trajectory Self-Distillation for Fast and Accurate Diffusion LLM

cs.CL · 2026-05-10 · unverdicted · novelty 7.0

TAD improves the accuracy-parallelism trade-off in diffusion LLMs via temporal-aware self-distillation that applies hard labels to soon-to-be-decoded tokens and soft supervision to future tokens.

DepCap: Adaptive Block-Wise Parallel Decoding for Efficient Diffusion LM Inference

cs.LG · 2026-04-17 · unverdicted · novelty 7.0

DepCap accelerates diffusion LM inference up to 5.63x by using last-block influence for adaptive block boundaries and conflict-free token selection for parallel decoding, with negligible quality loss.

Lost in Diffusion: Uncovering Hallucination Patterns and Failure Modes in Diffusion Large Language Models

cs.CL · 2026-04-12 · unverdicted · novelty 7.0

Diffusion LLMs hallucinate more than autoregressive models and display distinct failure modes including premature termination, incomplete denoising, and context intrusion.

Orthrus: Memory-Efficient Parallel Token Generation via Dual-View Diffusion

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

Orthrus unifies autoregressive and diffusion views on a shared KV cache to deliver lossless parallel token generation with up to 7.8x speedup and O(1) memory overhead.

Dataset-Level Metrics Attenuate Non-Determinism: A Fine-Grained Non-Determinism Evaluation in Diffusion Language Models

cs.LG · 2026-04-15 · unverdicted · novelty 6.0

Dataset-level metrics in diffusion language models mask substantial sample-level non-determinism that varies with model and system factors, which a new Factor Variance Attribution metric can decompose.

Differences in Text Generated by Diffusion and Autoregressive Language Models

cs.CL · 2026-04-04 · unverdicted · novelty 6.0

DLMs exhibit lower n-gram entropy, higher semantic coherence, and higher semantic diversity than ARMs, primarily due to bidirectional context and remasking decoding strategies.

DMax: Aggressive Parallel Decoding for dLLMs

cs.LG · 2026-04-09 · unverdicted · novelty 5.0

DMax enables faster parallel decoding in diffusion language models by using on-policy training to recover from errors and soft embedding interpolations for iterative revision, boosting tokens per forward pass roughly 2-3x on benchmarks while preserving accuracy.

citing papers explorer

Showing 9 of 9 citing papers.

BlockVLA: Accelerating Autoregressive VLA via Block Diffusion Finetuning cs.RO · 2026-05-13 · unverdicted · none · ref 8
BlockVLA accelerates autoregressive VLA models by 3.3x using block diffusion finetuning, with faster training convergence and better early performance on long-horizon robotic tasks.
Relative Score Policy Optimization for Diffusion Language Models cs.CL · 2026-05-11 · unverdicted · none · ref 64
RSPO interprets reward advantages as targets for relative log-ratios in dLLMs, calibrating noisy estimates to stabilize RLVR training and achieve strong gains on planning tasks with competitive math reasoning performance.
TAD: Temporal-Aware Trajectory Self-Distillation for Fast and Accurate Diffusion LLM cs.CL · 2026-05-10 · unverdicted · none · ref 36
TAD improves the accuracy-parallelism trade-off in diffusion LLMs via temporal-aware self-distillation that applies hard labels to soon-to-be-decoded tokens and soft supervision to future tokens.
DepCap: Adaptive Block-Wise Parallel Decoding for Efficient Diffusion LM Inference cs.LG · 2026-04-17 · unverdicted · none · ref 8
DepCap accelerates diffusion LM inference up to 5.63x by using last-block influence for adaptive block boundaries and conflict-free token selection for parallel decoding, with negligible quality loss.
Lost in Diffusion: Uncovering Hallucination Patterns and Failure Modes in Diffusion Large Language Models cs.CL · 2026-04-12 · unverdicted · none · ref 1
Diffusion LLMs hallucinate more than autoregressive models and display distinct failure modes including premature termination, incomplete denoising, and context intrusion.
Orthrus: Memory-Efficient Parallel Token Generation via Dual-View Diffusion cs.LG · 2026-05-12 · unverdicted · none · ref 8
Orthrus unifies autoregressive and diffusion views on a shared KV cache to deliver lossless parallel token generation with up to 7.8x speedup and O(1) memory overhead.
Dataset-Level Metrics Attenuate Non-Determinism: A Fine-Grained Non-Determinism Evaluation in Diffusion Language Models cs.LG · 2026-04-15 · unverdicted · none · ref 6
Dataset-level metrics in diffusion language models mask substantial sample-level non-determinism that varies with model and system factors, which a new Factor Variance Attribution metric can decompose.
Differences in Text Generated by Diffusion and Autoregressive Language Models cs.CL · 2026-04-04 · unverdicted · none · ref 6
DLMs exhibit lower n-gram entropy, higher semantic coherence, and higher semantic diversity than ARMs, primarily due to bidirectional context and remasking decoding strategies.
DMax: Aggressive Parallel Decoding for dLLMs cs.LG · 2026-04-09 · unverdicted · none · ref 17
DMax enables faster parallel decoding in diffusion language models by using on-policy training to recover from errors and soft embedding interpolations for iterative revision, boosting tokens per forward pass roughly 2-3x on benchmarks while preserving accuracy.

Sdar: A synergistic diffusion-autoregression paradigm for scalable sequence generation

fields

years

verdicts

representative citing papers

citing papers explorer