Learn from Your Mistakes: Self-Correcting Masked Diffusion Models

Gilad Turok; Guanghan Wang; Marianne Arriola; Michael Elad; Omer Belhasin; Ran Zilberstein; Roy Uziel; Volodymyr Kuleshov; Yair Schiff

arxiv: 2602.11590 · v3 · pith:3CIYLKW7new · submitted 2026-02-12 · 💻 cs.LG

Learn from Your Mistakes: Self-Correcting Masked Diffusion Models

Yair Schiff , Omer Belhasin , Roy Uziel , Guanghan Wang , Marianne Arriola , Gilad Turok , Ran Zilberstein , Michael Elad

show 1 more author

Volodymyr Kuleshov

This is my paper

classification 💻 cs.LG

keywords mdmsmodelstokensdiffusiongenerationmaskedmethodmistakes

0 comments

read the original abstract

Masked diffusion models (MDMs) have emerged as a promising alternative to autoregressive models, enabling parallel token generation while achieving competitive performance. Despite these advantages, MDMs face a fundamental limitation: once tokens are unmasked, they remain fixed, leading to error accumulation and ultimately degrading sample quality. We address this by proposing a framework that trains a model to perform both unmasking and correction. By reusing outputs from the MDM denoising network as inputs for corrector training, we train a model to recover from potential mistakes. During generation we apply additional corrective refinement steps between unmasking ones in order to change decoded tokens and improve outputs. We name our training and sampling method Progressive Self-Correction (ProSeCo) for its unique ability to iteratively refine an entire sequence, including already generated tokens. We conduct extensive experimental validation across multiple conditional and unconditional tasks, demonstrating that \method~yields better quality-efficiency trade-offs (up to ~4x faster sampling) and enables inference-time compute scaling to further increase sample quality beyond standard MDMs (up to ~1.2x improvement on benchmarks).

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Remask, Don't Replace: Token-to-Mask Refinement in Diffusion Large Language Models
cs.CL 2026-04 unverdicted novelty 7.0

Token-to-Mask remasking improves self-correction in diffusion LLMs by resetting erroneous commitments to masks rather than overwriting them, yielding +13.33 points on AIME 2025 and +8.56 on CMATH.
Self-Generated Error Training for Token Editing in Diffusion Language Models
cs.CL 2026-06 unverdicted novelty 6.0

Self-generated T2T training on LLaDA2.1-mini improves benchmark accuracy and lowers edit intensity by supervising recovery from model-generated corruptions instead of random ones.
Revise, Don't Freeze: Sampler-Matched Training for Self-Correcting Masked Diffusion Language Models
cs.CL 2026-05 unverdicted novelty 6.0

Presents D3IM sampler and SCOPE post-training that enable visible-token revision in masked diffusion LMs, reporting double-digit gains on GSM8K and HumanEval for LLaDA-8B.
Discrete-WAM: Unified Discrete Vision-Action Token Editing for World-Policy Learning
cs.RO 2026-06 unverdicted novelty 5.0

Discrete-WAM unifies world modeling and policy learning for autonomous driving by representing observations, states, decisions, and actions as tokens in one space and using hierarchical token editing for planning.
Re-evaluating Confidence Remasking in Masked Diffusion Language Models
cs.LG 2026-06 unverdicted novelty 3.0

Re-evaluation finds post-hoc remasking (WINO) yields little-to-no gain over confidence unmasking in standard dLLM settings and can worsen diversity collapse under stochastic decoding.