Ideas in Inference-time Scaling can Benefit Generative Pre-training Algorithms

Jiaming Song; Linqi Zhou

arxiv: 2503.07154 · v3 · pith:AU5FTV63new · submitted 2025-03-10 · 💻 cs.LG · cs.AI

Ideas in Inference-time Scaling can Benefit Generative Pre-training Algorithms

Jiaming Song , Linqi Zhou This is my paper

classification 💻 cs.LG cs.AI

keywords inferenceprocedurediffusiontrainingalgorithmsautoregressivebecausecontinuous

0 comments

read the original abstract

Generative pre-training is often framed through a false dichotomy between autoregressive models for discrete signals and diffusion models for continuous signals. We argue that the dichotomy is false because it conflates model family, data representation, training objective, and inference procedure. Autoregression is an inference procedure that expands a sequence through normalized conditional draws, while diffusion is a refinement procedure that repeatedly revises an existing state. The more useful contrast is therefore not autoregressive versus diffusion, but discrete tokens learned with cross-entropy versus continuous tokens learned with diffusion-style objectives, together with the inference algorithms used to sample from them. From this perspective, algorithmic progress should prioritize inference-time efficiency along two axes: sequence expansion and state refinement. We advocate designing the inference procedure before the training objective, because a training method cannot compensate for an inference map that omits necessary arguments or imposes an incorrect factorization. We illustrate this principle through a target-time limitation of DDIM-style samplers, a joint-distribution limitation of multi-token prediction, and recent flow-map and few-step distillation methods that directly parameterize long-range inference moves.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding
cs.CL 2025-05 conditional novelty 7.0

Fast-dLLM adds reusable KV cache blocks and selective parallel decoding to diffusion LLMs, closing most of the speed gap with autoregressive models without retraining.