TIE derives a sinc-based interval encoding from temporal integrability and duration invariance principles, raising temporal constraint satisfaction from 77% to 96% on the OmniEvents dataset while preserving visual quality.
arXiv preprint arXiv:2305.13840 (2023)
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 7roles
background 1polarities
background 1representative citing papers
TrajectoryMover enables moving object trajectories in videos by training on large-scale synthetic paired data generated via the new TrajectoryAtlas pipeline.
MMControl adds multi-modal controls for identity, timbre, pose, and layout to unified audio-video diffusion models via dual-stream injection and adjustable guidance scaling.
VERTIGO post-trains camera trajectory generators with visual preference signals from Unity-rendered previews scored by a cinematically fine-tuned VLM, cutting character off-screen rates from 38% to near zero while improving framing and prompt adherence.
Video Prediction Policy conditions robot action learning on future-frame predictions inside fine-tuned video diffusion models, yielding 18.6% relative gains on Calvin ABC-D and 31.6% higher real-world success rates.
CameraCtrl enables accurate camera pose control in video diffusion models through a trained plug-and-play module and dataset choices emphasizing diverse camera trajectories with matching appearance.
The paper reviews the background, technology, applications, limitations, and future directions of OpenAI's Sora text-to-video generative model based on public information.
citing papers explorer
-
TIE: Time Interval Encoding for Video Generation over Events
TIE derives a sinc-based interval encoding from temporal integrability and duration invariance principles, raising temporal constraint satisfaction from 77% to 96% on the OmniEvents dataset while preserving visual quality.
-
TrajectoryMover: Generative Movement of Object Trajectories in Videos
TrajectoryMover enables moving object trajectories in videos by training on large-scale synthetic paired data generated via the new TrajectoryAtlas pipeline.
-
MMControl: Unified Multi-Modal Control for Joint Audio-Video Generation
MMControl adds multi-modal controls for identity, timbre, pose, and layout to unified audio-video diffusion models via dual-stream injection and adjustable guidance scaling.
-
VERTIGO: Visual Preference Optimization for Cinematic Camera Trajectory Generation
VERTIGO post-trains camera trajectory generators with visual preference signals from Unity-rendered previews scored by a cinematically fine-tuned VLM, cutting character off-screen rates from 38% to near zero while improving framing and prompt adherence.
-
Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations
Video Prediction Policy conditions robot action learning on future-frame predictions inside fine-tuned video diffusion models, yielding 18.6% relative gains on Calvin ABC-D and 31.6% higher real-world success rates.
-
CameraCtrl: Enabling Camera Control for Text-to-Video Generation
CameraCtrl enables accurate camera pose control in video diffusion models through a trained plug-and-play module and dataset choices emphasizing diverse camera trajectories with matching appearance.
-
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
The paper reviews the background, technology, applications, limitations, and future directions of OpenAI's Sora text-to-video generative model based on public information.