Prologue adds a small set of learnable tokens trained exclusively with AR cross-entropy loss to decouple generation from reconstruction in autoregressive visual models, yielding lower gFID on ImageNet 256x256.
Spectralar: Spectral autore- gressive visual generation.arXiv preprint arXiv:2506.10962
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 4years
2026 4verdicts
UNVERDICTED 4representative citing papers
GAR-Font is a global-aware autoregressive framework for multimodal few-shot font generation that adds global tokenization, a language-style adapter, and post-refinement to improve style coherence over patch-based methods.
An end-to-end autoregressive model with a jointly trained 1D semantic tokenizer achieves state-of-the-art FID 1.48 on ImageNet 256x256 generation without guidance.
MEPA adds token-routed MoE and residual self-supervised feature alignment to VAR models, reporting better FID on ImageNet 256x256 with half the training epochs and fewer parameters than dense baselines.
citing papers explorer
-
Autoregressive Visual Generation Needs a Prologue
Prologue adds a small set of learnable tokens trained exclusively with AR cross-entropy loss to decouple generation from reconstruction in autoregressive visual models, yielding lower gFID on ImageNet 256x256.