Video generation models as world simulators

Tim Brooks, Bill Peebles, Connor Holmes, Will DePue, Yufei Guo, Li Jing, David Schnurr, Joe Taylor, Troy Luhman, Eric Luhman, Clarence Ng, Ricky Wang, Aditya Ramesh · 2024

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

browse 6 citing papers

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

Large Language Diffusion Models

cs.CL · 2025-02-14 · unverdicted · novelty 8.0

LLaDA is a scalable diffusion-based language model that matches autoregressive LLMs like LLaMA3 8B on tasks and surpasses GPT-4o on reversal poem completion.

SyncDPO: Enhancing Temporal Synchronization in Video-Audio Joint Generation via Preference Learning

cs.CV · 2026-05-12 · unverdicted · novelty 6.0

SyncDPO improves temporal synchronization in video-audio joint generation using DPO with efficient on-the-fly negative sample construction and curriculum learning.

WorldDirector: Building Controllable World Simulators with Persistent Dynamic Memory

cs.CV · 2026-07-02 · unverdicted · novelty 5.0

A video world model framework that uses LLM-orchestrated 3D trajectories as control signals for generation to achieve persistent dynamic object memory and viewpoint freedom.

One Sentence, One Drama: Personalized Short-Form Drama Generation via Multi-Agent Systems

cs.CV · 2026-05-21 · unverdicted · novelty 5.0

A hierarchical multi-agent framework converts a single sentence into a short drama using debate-based scripting, 3D-grounded first frames for spatial consistency, and multi-stage reviewer loops.

A Survey on Vision-Language-Action Models: An Action Tokenization Perspective

cs.RO · 2025-07-02 · unverdicted · novelty 5.0

The survey frames VLA models as pipelines that generate progressively grounded action tokens and classifies those tokens into eight types to guide future development.

Simulus: Combining Improvements in Sample-Efficient World Model Agents

cs.LG · 2025-02-17 · unverdicted · novelty 5.0

Simulus integrates flexible tokenization, intrinsic motivation, prioritized world model replay, and regression-as-classification to achieve state-of-the-art sample efficiency for planning-free world model agents on visual Atari 100K, DMC Proprioception 500K, and symbolic Craftax-1M benchmarks.

citing papers explorer

Showing 6 of 6 citing papers.

Large Language Diffusion Models cs.CL · 2025-02-14 · unverdicted · none · ref 12
LLaDA is a scalable diffusion-based language model that matches autoregressive LLMs like LLaMA3 8B on tasks and surpasses GPT-4o on reversal poem completion.
SyncDPO: Enhancing Temporal Synchronization in Video-Audio Joint Generation via Preference Learning cs.CV · 2026-05-12 · unverdicted · none · ref 5
SyncDPO improves temporal synchronization in video-audio joint generation using DPO with efficient on-the-fly negative sample construction and curriculum learning.
WorldDirector: Building Controllable World Simulators with Persistent Dynamic Memory cs.CV · 2026-07-02 · unverdicted · none · ref 8
A video world model framework that uses LLM-orchestrated 3D trajectories as control signals for generation to achieve persistent dynamic object memory and viewpoint freedom.
One Sentence, One Drama: Personalized Short-Form Drama Generation via Multi-Agent Systems cs.CV · 2026-05-21 · unverdicted · none · ref 7
A hierarchical multi-agent framework converts a single sentence into a short drama using debate-based scripting, 3D-grounded first frames for spatial consistency, and multi-stage reviewer loops.
A Survey on Vision-Language-Action Models: An Action Tokenization Perspective cs.RO · 2025-07-02 · unverdicted · none · ref 109
The survey frames VLA models as pipelines that generate progressively grounded action tokens and classifies those tokens into eight types to guide future development.
Simulus: Combining Improvements in Sample-Efficient World Model Agents cs.LG · 2025-02-17 · unverdicted · none · ref 8
Simulus integrates flexible tokenization, intrinsic motivation, prioritized world model replay, and regression-as-classification to achieve state-of-the-art sample efficiency for planning-free world model agents on visual Atari 100K, DMC Proprioception 500K, and symbolic Craftax-1M benchmarks.

Video generation models as world simulators

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer