Mirai: Autoregressive Visual Generation Needs Foresight

· 2026 · cs.CV · arXiv 2601.14671

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Autoregressive (AR) visual generators model images as sequences of discrete tokens and are trained with a next-token likelihood objective. This strict causal supervision optimizes each step based only on the immediate next token, which can weaken global coherence and slow convergence. We investigate whether foresight, training signals that originate from later tokens, can improve autoregressive visual generation. We conduct a series of controlled diagnostics along the injection level, foresight layout, and foresight source axes, revealing a key insight: aligning foresight with AR models' internal representations on the 2D image grid improves causal modeling. We formulate this insight with Mirai (meaning "future" in Japanese), a general framework that injects future information into AR training with no architecture change and no extra inference overhead: Mirai-E uses explicit foresight from multiple future positions of unidirectional representations, whereas Mirai-I leverages implicit foresight from matched bidirectional representations. Extensive experiments show that Mirai significantly accelerates convergence and improves generation quality. For instance, Mirai can speed up LlamaGen-B's convergence by up to 10$\times$ and reduce the generation FID from 5.34 to 4.34 on the ImageNet class-condition image generation benchmark. Our study highlights that visual autoregressive models need foresight.

representative citing papers

Video-Mirai: Autoregressive Video Diffusion Models Need Foresight

cs.CV · 2026-06-02 · unverdicted · novelty 6.0

Training method distills non-causal future targets into causal video diffusion states to boost long-horizon consistency without changing inference architecture or cost.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Video-Mirai: Autoregressive Video Diffusion Models Need Foresight cs.CV · 2026-06-02 · unverdicted · none · ref 40 · internal anchor
Training method distills non-causal future targets into causal video diffusion states to boost long-horizon consistency without changing inference architecture or cost.

Mirai: Autoregressive Visual Generation Needs Foresight

fields

years

verdicts

representative citing papers

citing papers explorer