Continuous Latent Diffusion Language Model

· 2026 · cs.CL · arXiv 2605.06548

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open full Pith review browse 3 citing papers arXiv PDF

abstract

Large language models have achieved remarkable success under the autoregressive paradigm, yet high-quality text generation need not be tied to a fixed left-to-right order. Existing alternatives still struggle to jointly achieve generation efficiency, scalable representation learning, and effective global semantic modeling. We propose Cola DLM, a hierarchical latent diffusion language model that frames text generation through hierarchical information decomposition. Cola DLM first learns a stable text-to-latent mapping with a Text VAE, then models a global semantic prior in continuous latent space with a block-causal DiT, and finally generates text through conditional decoding. From a unified Markov-path perspective, its diffusion process performs latent prior transport rather than token-level observation recovery, thereby separating global semantic organization from local textual realization. This design yields a more flexible non-autoregressive inductive bias, supports semantic compression and prior fitting in continuous space, and naturally extends to other continuous modalities. Through experiments spanning 4 research questions, 8 benchmarks, strictly matched ~2B-parameter autoregressive and LLaDA baselines, and scaling curves up to about 2000 EFLOPs, we identify an effective overall configuration of Cola DLM and verify its strong scaling behavior for text generation. Taken together, the results establish hierarchical continuous latent prior modeling as a principled alternative to strictly token-level language modeling, where generation quality and scaling behavior may better reflect model capability than likelihood, while also suggesting a concrete path toward unified modeling across discrete text and continuous modalities.

representative citing papers

Continuous Language Diffusion as a Decoder-Interface Problem

cs.CL · 2026-06-07 · unverdicted · novelty 7.0 · 2 refs

Continuous language diffusion works by entering high-margin decoder basins where frozen T5 embeddings recover 93-96% of native decisions and linear readouts reach 97.9% agreement, implying models should be evaluated as representation-decoder systems.

VoidPadding: Let [VOID] Handle Padding in Masked Diffusion Language Models so that [EOS] Can Focus on Semantic Termination

cs.CL · 2026-06-16 · unverdicted · novelty 6.0

VoidPadding decouples padding from termination in MDLMs via a new [VOID] token, delivering +17.84 average benchmark points and 55.7% fewer decoding steps on Dream-7B-Instruct.

Speech Meets ELF: Audio Conditional Continuous-Target Diffusion for Speech Recognition and Translation

cs.SD · 2026-06-09 · unverdicted · novelty 6.0

ELF-S2T applies audio-conditioned flow-matching on continuous text latents from pre-trained ELF to achieve competitive ASR and S2TT results, with analysis showing shared close-distance confusion in latent space.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Continuous Language Diffusion as a Decoder-Interface Problem cs.CL · 2026-06-07 · unverdicted · none · ref 21 · 2 links · internal anchor
Continuous language diffusion works by entering high-margin decoder basins where frozen T5 embeddings recover 93-96% of native decisions and linear readouts reach 97.9% agreement, implying models should be evaluated as representation-decoder systems.
VoidPadding: Let [VOID] Handle Padding in Masked Diffusion Language Models so that [EOS] Can Focus on Semantic Termination cs.CL · 2026-06-16 · unverdicted · none · ref 46 · internal anchor
VoidPadding decouples padding from termination in MDLMs via a new [VOID] token, delivering +17.84 average benchmark points and 55.7% fewer decoding steps on Dream-7B-Instruct.

Continuous Latent Diffusion Language Model

fields

years

verdicts

representative citing papers

citing papers explorer