hub Canonical reference

Control-a-video: Controllable text-to-video generation with diffusion models

Chen, W · 2023 · arXiv 2305.13840

Canonical reference. 80% of citing Pith papers cite this work as background.

18 Pith papers citing it

Background 80% of classified citations

read on arXiv browse 18 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 4 method 1

citation-polarity summary

background 4 use method 1

representative citing papers

TIE: Time Interval Encoding for Video Generation over Events

cs.CV · 2026-05-11 · unverdicted · novelty 7.0 · 2 refs

TIE derives a sinc-based interval encoding from Temporal Integrability and Duration Invariance principles, raising human-verified temporal constraint satisfaction from 77.34% to 96.03% while preserving visual quality in DiT models.

TrajectoryMover: Generative Movement of Object Trajectories in Videos

cs.CV · 2026-03-31 · unverdicted · novelty 7.0 · 2 refs

A synthetic data pipeline and fine-tuned video model enable generative editing to move object 3D trajectories in videos while keeping relative motion.

VACE: All-in-One Video Creation and Editing

cs.CV · 2025-03-10 · unverdicted · novelty 7.0

VACE unifies reference-to-video generation, video-to-video editing, and masked video-to-video editing in one Diffusion Transformer framework using a Video Condition Unit for inputs and a Context Adapter for task injection.

Semantic-Aware, Physics-Informed, Geometry-Grounded Weather Video Synthesis

cs.CV · 2026-06-27 · unverdicted · novelty 6.0

A new framework factorizes weather video synthesis into semantic appearance anchoring, physics-informed Gaussian particle simulation under gravity/wind/turbulence, and geometry-grounded alignment to produce diverse realistic weather effects.

OmniDirector: General Multi-Shot Camera Cloning without Cross-Paired Data

cs.CV · 2026-06-11 · unverdicted · novelty 6.0

OmniDirector introduces a grid-based camera representation and hierarchical prompt agent for multi-shot camera cloning in video diffusion models trained on million-scale unpaired data.

ReactiveGWM: Steering NPC in Reactive Game World Models

cs.CV · 2026-05-14 · unverdicted · novelty 6.0

ReactiveGWM introduces a decoupled diffusion architecture for player-NPC interactions that learns game-agnostic response logic for zero-shot strategy transfer across games.

VERTIGO: Visual Preference Optimization for Cinematic Camera Trajectory Generation

cs.CV · 2026-04-02 · conditional · novelty 6.0

VERTIGO post-trains camera trajectory generators with visual preference signals from Unity-rendered previews scored by a cinematically fine-tuned VLM, cutting character off-screen rates from 38% to near zero while improving framing and prompt adherence.

Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations

cs.CV · 2024-12-19 · unverdicted · novelty 6.0

Video Prediction Policy conditions robot action learning on future-frame predictions inside fine-tuned video diffusion models, yielding 18.6% relative gains on Calvin ABC-D and 31.6% higher real-world success rates.

CameraCtrl: Enabling Camera Control for Text-to-Video Generation

cs.CV · 2024-04-02 · unverdicted · novelty 6.0

CameraCtrl enables accurate camera pose control in video diffusion models through a trained plug-and-play module and dataset choices emphasizing diverse camera trajectories with matching appearance.

KGEdit: Ambiguity-Aware Knowledge Graphs for Training-Free Precise Video Generation and Editing

cs.CV · 2026-05-28 · unverdicted · novelty 5.0

KGEdit uses an ambiguity-aware knowledge graph and structured injection modules to improve semantic control and temporal consistency in training-free text-to-video diffusion models.

GlowGS: Generative Semantic Feature Learning for 3D Gaussian Splatting in Nighttime Glow Scenes

cs.CV · 2026-05-22 · unverdicted · novelty 5.0

GlowGS improves 3D Gaussian Splatting in nighttime glow scenes via semantic feature generation from diffusion models and novel-view semantic learning with vision foundation models.

DriveCtrl: Conditioned Sim-to-Real Driving Video Generation

cs.CV · 2026-05-14 · unverdicted · novelty 5.0

DriveCtrl is a depth-conditioned controllable framework that generates realistic driving videos from simulation while preserving annotations and scene dynamics.

DriVerse: Navigation World Model for Driving Simulation via Multimodal Trajectory Prompting and Motion Alignment

cs.RO · 2025-04-22 · unverdicted · novelty 5.0

DriVerse is a generative model that simulates driving scenes from an image and trajectory using multimodal prompting and motion alignment, achieving better performance on nuScenes and Waymo datasets with minimal training.

I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models

cs.CV · 2023-11-07 · unverdicted · novelty 5.0

I2VGen-XL applies cascaded diffusion models with a base stage for semantic preservation via hierarchical encoders and a refinement stage for detail and resolution, trained on 35 million text-video and 6 billion text-image pairs.

EchoTorrent: Towards Swift, Sustained, and Streaming Multi-Modal Video Generation

cs.CV · 2026-02-14 · unverdicted · novelty 4.0

EchoTorrent combines multi-teacher distillation, adaptive CFG calibration, hybrid long-tail forcing, and VAE decoder refinement to enable few-pass autoregressive streaming video generation with improved temporal consistency and audio-lip sync.

Fewer, Better Frames: A Compute-Normalized Proof of Concept for Coherence-First World-Model Rendering with Model-Guided FSR4 Frame Generation

cs.GR · 2026-05-11 · unverdicted · novelty 3.0

Coherence-first rendering with 15 FPS anchors plus FSR4 upsampling to 30 FPS preserves scene geometry and identity longer than native 30 FPS generation across tested forest, sword, desert, and snow scenes, with LPIPS favoring the coherence branch.

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

cs.CV · 2024-02-27 · unverdicted · novelty 2.0

The paper reviews the background, technology, applications, limitations, and future directions of OpenAI's Sora text-to-video generative model based on public information.

MMControl: Unified Multi-Modal Control for Joint Audio-Video Generation

cs.CV · 2026-04-21

citing papers explorer

Showing 1 of 1 citing paper after filters.

DriVerse: Navigation World Model for Driving Simulation via Multimodal Trajectory Prompting and Motion Alignment cs.RO · 2025-04-22 · unverdicted · none · ref 9
DriVerse is a generative model that simulates driving scenes from an image and trajectory using multimodal prompting and motion alignment, achieving better performance on nuScenes and Waymo datasets with minimal training.

Control-a-video: Controllable text-to-video generation with diffusion models

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer