pith. machine review for the scientific record. sign in

X-vila: Cross-modality alignment for large language model.arXiv preprint arXiv:2405.19335, 2024a

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

citation-role summary

background 1

citation-polarity summary

fields

cs.CV 2

years

2026 1 2024 1

verdicts

UNVERDICTED 2

roles

background 1

polarities

background 1

representative citing papers

Evolution of Video Generative Foundations

cs.CV · 2026-04-07 · unverdicted · novelty 2.0

This survey traces video generation technology from GANs to diffusion models and then to autoregressive and multimodal approaches while analyzing principles, strengths, and future trends.

citing papers explorer

Showing 2 of 2 citing papers.

  • Show-o: One Single Transformer to Unify Multimodal Understanding and Generation cs.CV · 2024-08-22 · unverdicted · none · ref 24

    Show-o unifies autoregressive and discrete diffusion modeling inside one transformer to support multimodal understanding and generation tasks with competitive benchmark performance.

  • Evolution of Video Generative Foundations cs.CV · 2026-04-07 · unverdicted · none · ref 143

    This survey traces video generation technology from GANs to diffusion models and then to autoregressive and multimodal approaches while analyzing principles, strengths, and future trends.