hub

Beyond autoregression: Discrete diffusion for complex reasoning and planning

Ye, J · 2025 · arXiv 2410.14157

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

read on arXiv browse 10 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 4

citation-polarity summary

background 4

representative citing papers

Reinforcement Learning for Diffusion LLMs with Entropy-Guided Step Selection and Stepwise Advantages

cs.LG · 2026-03-13 · unverdicted · novelty 8.0

Derives an exact unbiased policy gradient for RL post-training of diffusion LLMs via entropy-guided step selection and one-step denoising rewards, achieving state-of-the-art results on coding and logical reasoning benchmarks.

Uniform Diffusion Models Revisited: Leave-One-Out Denoiser and Absorbing State Reformulation

cs.LG · 2026-05-21 · unverdicted · novelty 7.0

Uniform diffusion models rely on a leave-one-out denoiser rather than the usual denoising posterior, with exact conversions derived; an absorbing-state reformulation is introduced that matches or exceeds masked diffusion on language modeling while preserving the original joint distribution.

Block-R1: Rethinking the Role of Block Size in Multi-domain Reinforcement Learning for Diffusion Large Language Models

cs.LG · 2026-05-12 · unverdicted · novelty 7.0 · 2 refs

Introduces Block-R1 benchmark, Block-R1-41K dataset, and a conflict score to handle domain-specific optimal block sizes in RL post-training of diffusion LLMs.

Improving Sampling for Masked Diffusion Models via Information Gain

cs.CL · 2026-02-20 · unverdicted · novelty 7.0

Info-Gain Sampler improves MDM decoding by using bidirectional information gain to reduce cumulative uncertainty, outperforming greedy samplers on reasoning accuracy and creative writing tasks.

Self-Supervised On-Policy Distillation for Reasoning Language Models

cs.LG · 2026-05-17 · unverdicted · novelty 6.0

SSOPD converts intra-group correct-wrong contrast into process supervision by distilling a teacher distribution from the shortest correct completion into prefixes of the longest wrong completion, improving GRPO on AIME and HMMT benchmarks.

How to Train Your Latent Diffusion Language Model Jointly With the Latent Space

cs.CL · 2026-05-08 · unverdicted · novelty 6.0

Joint training of the latent space with the diffusion process produces a competitive latent diffusion language model that is faster than existing discrete and continuous diffusion baselines.

Continuous Latent Diffusion Language Model

cs.CL · 2026-05-07 · unverdicted · novelty 6.0

Cola DLM proposes a hierarchical latent diffusion model that learns a text-to-latent mapping, fits a global semantic prior in continuous space with a block-causal DiT, and performs conditional decoding, establishing latent prior modeling as an alternative to token-level autoregressive language model

SSA: Improving Performance With a Better Scoring Function

cs.CL · 2025-08-20 · unverdicted · novelty 5.0

Replacing Softmax with Scaled Signed Averaging in transformer attention improves generalization under distribution shifts for in-context learning and boosts results on NLP benchmarks.

Beyond Execution: Static-Analysis Rewards and Hint-Conditioned Diffusion RL for Code Generation

cs.SE · 2026-05-16 · unverdicted · novelty 4.0

Static checking rewards and moderate AST-based hints improve diffusion RL performance for code generation, with effectiveness varying by task difficulty across HumanEval, MBPP, and LiveCodeBench.

Agentic Reasoning for Large Language Models

cs.AI · 2026-01-18 · unverdicted · novelty 4.0

The survey structures agentic reasoning for LLMs into foundational, self-evolving, and collective multi-agent layers while distinguishing in-context orchestration from post-training optimization and reviewing applications across domains.

citing papers explorer

Showing 10 of 10 citing papers.

Reinforcement Learning for Diffusion LLMs with Entropy-Guided Step Selection and Stepwise Advantages cs.LG · 2026-03-13 · unverdicted · none · ref 16
Derives an exact unbiased policy gradient for RL post-training of diffusion LLMs via entropy-guided step selection and one-step denoising rewards, achieving state-of-the-art results on coding and logical reasoning benchmarks.
Uniform Diffusion Models Revisited: Leave-One-Out Denoiser and Absorbing State Reformulation cs.LG · 2026-05-21 · unverdicted · none · ref 38
Uniform diffusion models rely on a leave-one-out denoiser rather than the usual denoising posterior, with exact conversions derived; an absorbing-state reformulation is introduced that matches or exceeds masked diffusion on language modeling while preserving the original joint distribution.
Block-R1: Rethinking the Role of Block Size in Multi-domain Reinforcement Learning for Diffusion Large Language Models cs.LG · 2026-05-12 · unverdicted · none · ref 69 · 2 links
Introduces Block-R1 benchmark, Block-R1-41K dataset, and a conflict score to handle domain-specific optimal block sizes in RL post-training of diffusion LLMs.
Improving Sampling for Masked Diffusion Models via Information Gain cs.CL · 2026-02-20 · unverdicted · none · ref 16
Info-Gain Sampler improves MDM decoding by using bidirectional information gain to reduce cumulative uncertainty, outperforming greedy samplers on reasoning accuracy and creative writing tasks.
Self-Supervised On-Policy Distillation for Reasoning Language Models cs.LG · 2026-05-17 · unverdicted · none · ref 88
SSOPD converts intra-group correct-wrong contrast into process supervision by distilling a teacher distribution from the shortest correct completion into prefixes of the longest wrong completion, improving GRPO on AIME and HMMT benchmarks.
How to Train Your Latent Diffusion Language Model Jointly With the Latent Space cs.CL · 2026-05-08 · unverdicted · none · ref 54
Joint training of the latent space with the diffusion process produces a competitive latent diffusion language model that is faster than existing discrete and continuous diffusion baselines.
Continuous Latent Diffusion Language Model cs.CL · 2026-05-07 · unverdicted · none · ref 103
Cola DLM proposes a hierarchical latent diffusion model that learns a text-to-latent mapping, fits a global semantic prior in continuous space with a block-causal DiT, and performs conditional decoding, establishing latent prior modeling as an alternative to token-level autoregressive language model
SSA: Improving Performance With a Better Scoring Function cs.CL · 2025-08-20 · unverdicted · none · ref 17
Replacing Softmax with Scaled Signed Averaging in transformer attention improves generalization under distribution shifts for in-context learning and boosts results on NLP benchmarks.
Beyond Execution: Static-Analysis Rewards and Hint-Conditioned Diffusion RL for Code Generation cs.SE · 2026-05-16 · unverdicted · none · ref 23
Static checking rewards and moderate AST-based hints improve diffusion RL performance for code generation, with effectiveness varying by task difficulty across HumanEval, MBPP, and LiveCodeBench.
Agentic Reasoning for Large Language Models cs.AI · 2026-01-18 · unverdicted · none · ref 149
The survey structures agentic reasoning for LLMs into foundational, self-evolving, and collective multi-agent layers while distinguishing in-context orchestration from post-training optimization and reviewing applications across domains.

Beyond autoregression: Discrete diffusion for complex reasoning and planning

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer