Towards unifying understanding and generation in the era of vision foundation models: A survey from the au- toregression perspective.arXiv preprint arXiv:2410.22217,

Shenghao Xie, Wenqiang Zu, Mingyang Zhao, Duo Su, Shilong Liu, Ruohua Shi, Guoqi Li, Shanghang Zhang, Lei Ma · arXiv 2410.22217

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

read on arXiv browse 1 citing papers

representative citing papers

Bridging Video Understanding and Generation in a Unified Framework

cs.CV · 2026-06-30 · unverdicted · novelty 5.0

Vega unifies video understanding and generation via shared vocabulary and hybrid autoregressive-diffusion architecture, reporting strong results on VBench and VideoMME.

citing papers explorer

Showing 1 of 1 citing paper.

Bridging Video Understanding and Generation in a Unified Framework cs.CV · 2026-06-30 · unverdicted · none · ref 75
Vega unifies video understanding and generation via shared vocabulary and hybrid autoregressive-diffusion architecture, reporting strong results on VBench and VideoMME.

Towards unifying understanding and generation in the era of vision foundation models: A survey from the au- toregression perspective.arXiv preprint arXiv:2410.22217,

fields

years

verdicts

representative citing papers

citing papers explorer