Direct2v: Large language models are frame-level directors for zero-shot text-to-video generation.arXiv preprint arXiv:2305.14330

72 Susung Hong, Junyoung Seo, Heeseong Shin, Sunghwan Hong, Seungryong Kim · 2023 · arXiv 2305.14330

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Movie Gen: A Cast of Media Foundation Models

cs.CV · 2024-10-17 · unverdicted · novelty 5.0

A 30B-parameter transformer and related models generate high-quality videos and audio, claiming state-of-the-art results on text-to-video, video editing, personalization, and audio generation tasks.

Evolution of Video Generative Foundations

cs.CV · 2026-04-07 · unverdicted · novelty 2.0

This survey traces video generation technology from GANs to diffusion models and then to autoregressive and multimodal approaches while analyzing principles, strengths, and future trends.

citing papers explorer

Showing 2 of 2 citing papers.

Movie Gen: A Cast of Media Foundation Models cs.CV · 2024-10-17 · unverdicted · none · ref 25
A 30B-parameter transformer and related models generate high-quality videos and audio, claiming state-of-the-art results on text-to-video, video editing, personalization, and audio generation tasks.
Evolution of Video Generative Foundations cs.CV · 2026-04-07 · unverdicted · none · ref 99
This survey traces video generation technology from GANs to diffusion models and then to autoregressive and multimodal approaches while analyzing principles, strengths, and future trends.

Direct2v: Large language models are frame-level directors for zero-shot text-to-video generation.arXiv preprint arXiv:2305.14330

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer