Mechanism Shift During Post-training from Autoregressive to Masked Diffusion Language Models

· 2026 · cs.LG · arXiv 2601.14758

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Post-training pretrained autoregressive models (ARMs) into masked diffusion models (MDMs) has emerged as a cost-effective way to overcome the limitations of sequential generation. Yet it remains unclear whether post-trained MDMs acquire genuinely new computational mechanisms or merely re-express autoregressive computation in a non-autoregressive form. Through a comparative circuit analysis of ARMs and their MDM counterparts post-trained from the same backbones, we uncover two complementary axes of reorganization. Structurally, the shift is task-dependent: MDMs preserve autoregressive circuitry on locally causal tasks but abandon inherited pathways and front-load computation into early layers on global tasks. Semantically, the shift is consistent across regimes: sharp, localized specialization in ARMs gives way to distributed integration in MDMs. Together, these findings show that diffusion post-training is not a surface-level change in the generation procedure but a reorganization of internal computation whose depth depends on the task.

representative citing papers

Subliminal Clocks: Latent Time Modelling in Diffusion Language Models

cs.AI · 2026-07-02 · unverdicted · novelty 6.0

DLMs encode a decodable latent timestep signal in residual activations that can be steered to predictably change model confidence and entropy.

citing papers explorer

Showing 1 of 1 citing paper.

Subliminal Clocks: Latent Time Modelling in Diffusion Language Models cs.AI · 2026-07-02 · unverdicted · none · ref 20 · internal anchor
DLMs encode a decodable latent timestep signal in residual activations that can be steered to predictably change model confidence and entropy.

Mechanism Shift During Post-training from Autoregressive to Masked Diffusion Language Models

fields

years

verdicts

representative citing papers

citing papers explorer