Vega unifies video understanding and generation via shared vocabulary and hybrid autoregressive-diffusion architecture, reporting strong results on VBench and VideoMME.
Towards unifying understanding and generation in the era of vision foundation models: A survey from the au- toregression perspective.arXiv preprint arXiv:2410.22217,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Bridging Video Understanding and Generation in a Unified Framework
Vega unifies video understanding and generation via shared vocabulary and hybrid autoregressive-diffusion architecture, reporting strong results on VBench and VideoMME.