hub

Flexible diffusion modeling of long videos

William Harvey, Saeid Naderiparizi, Vaden Masrani, Christian Weilbach, Frank Wood · 2022 · arXiv 2205.11495

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

read on arXiv browse 12 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

cs.LG · 2022-09-07 · unverdicted · novelty 8.0

Rectified flow learns straight-path neural ODEs for distribution transport, yielding efficient generative models and domain transfers that work well even with a single simulation step.

Phenaki: Variable Length Video Generation From Open Domain Textual Description

cs.CV · 2022-10-05 · unverdicted · novelty 7.0

Phenaki generates arbitrary-length videos from sequences of text prompts by tokenizing videos with causal temporal attention and generating tokens with a text-conditioned masked transformer, trained jointly on images and videos.

Imagen Video: High Definition Video Generation with Diffusion Models

cs.CV · 2022-10-05 · unverdicted · novelty 7.0

Imagen Video generates high-definition text-conditional videos via a cascade of base and super-resolution diffusion models, achieving high fidelity and controllability.

Feed-forward Motion In-betweening for Any 4D

cs.CV · 2026-06-20 · unverdicted · novelty 6.0

Proposes a feed-forward keyframe-conditioned in-betweening method for arbitrary 4D meshes using a topology-agnostic VAE and MMDiT-based rectified flow model.

DOLLAR: Few-Step Video Generation via Distillation and Latent Reward Optimization

cs.CV · 2024-12-20 · unverdicted · novelty 6.0

DOLLAR combines variational score and consistency distillation for few-step video generation plus latent reward optimization, reporting 82.57 VBench score and up to 278x speedup over the teacher diffusion model for 128-frame 10-second videos.

DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory

cs.CV · 2023-08-16 · unverdicted · novelty 6.0

DragNUWA integrates text, image, and trajectory controls into a diffusion video model using a Trajectory Sampler, Multiscale Fusion, and Adaptive Training to enable fine-grained open-domain video generation.

MagicVideo: Efficient Video Generation With Latent Diffusion Models

cs.CV · 2022-11-20 · unverdicted · novelty 6.0

MagicVideo generates 256x256 text-conditioned video clips via latent diffusion with a custom 3D U-Net, achieving roughly 64 times lower compute than prior video diffusion models.

eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers

cs.CV · 2022-11-02 · unverdicted · novelty 6.0

An ensemble of stage-specialized text-to-image diffusion models improves prompt alignment over single shared-parameter models while preserving visual quality and inference speed.

Probabilistic Precipitation Nowcasting with Rectified Flow Transformers

cs.CV · 2026-05-29 · unverdicted · novelty 5.0

FREUD applies rectified flow transformers with frame-wise encoding and a unified decoder to achieve state-of-the-art probabilistic precipitation nowcasting on the SEVIR benchmark.

I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models

cs.CV · 2023-11-07 · unverdicted · novelty 5.0

I2VGen-XL applies cascaded diffusion models with a base stage for semantic preservation via hierarchical encoders and a refinement stage for detail and resolution, trained on 35 million text-video and 6 billion text-image pairs.

ModelScope Text-to-Video Technical Report

cs.CV · 2023-08-12 · unverdicted · novelty 4.0

ModelScopeT2V is a 1.7-billion-parameter text-to-video model built on Stable Diffusion that adds temporal modeling and outperforms prior methods on three evaluation metrics.

VRAG: Learning World Models for Interactive Video Generation

cs.CV · 2025-05-28

citing papers explorer

Showing 12 of 12 citing papers.

Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow cs.LG · 2022-09-07 · unverdicted · none · ref 21
Rectified flow learns straight-path neural ODEs for distribution transport, yielding efficient generative models and domain transfers that work well even with a single simulation step.
Phenaki: Variable Length Video Generation From Open Domain Textual Description cs.CV · 2022-10-05 · unverdicted · none · ref 14
Phenaki generates arbitrary-length videos from sequences of text prompts by tokenizing videos with causal temporal attention and generating tokens with a text-conditioned masked transformer, trained jointly on images and videos.
Imagen Video: High Definition Video Generation with Diffusion Models cs.CV · 2022-10-05 · unverdicted · none · ref 6
Imagen Video generates high-definition text-conditional videos via a cascade of base and super-resolution diffusion models, achieving high fidelity and controllability.
Feed-forward Motion In-betweening for Any 4D cs.CV · 2026-06-20 · unverdicted · none · ref 13
Proposes a feed-forward keyframe-conditioned in-betweening method for arbitrary 4D meshes using a topology-agnostic VAE and MMDiT-based rectified flow model.
DOLLAR: Few-Step Video Generation via Distillation and Latent Reward Optimization cs.CV · 2024-12-20 · unverdicted · none · ref 13
DOLLAR combines variational score and consistency distillation for few-step video generation plus latent reward optimization, reporting 82.57 VBench score and up to 278x speedup over the teacher diffusion model for 128-frame 10-second videos.
DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory cs.CV · 2023-08-16 · unverdicted · none · ref 291
DragNUWA integrates text, image, and trajectory controls into a diffusion video model using a Trajectory Sampler, Multiscale Fusion, and Adaptive Training to enable fine-grained open-domain video generation.
MagicVideo: Efficient Video Generation With Latent Diffusion Models cs.CV · 2022-11-20 · unverdicted · none · ref 9
MagicVideo generates 256x256 text-conditioned video clips via latent diffusion with a custom 3D U-Net, achieving roughly 64 times lower compute than prior video diffusion models.
eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers cs.CV · 2022-11-02 · unverdicted · none · ref 20
An ensemble of stage-specialized text-to-image diffusion models improves prompt alignment over single shared-parameter models while preserving visual quality and inference speed.
Probabilistic Precipitation Nowcasting with Rectified Flow Transformers cs.CV · 2026-05-29 · unverdicted · none · ref 44
FREUD applies rectified flow transformers with frame-wise encoding and a unified decoder to achieve state-of-the-art probabilistic precipitation nowcasting on the SEVIR benchmark.
I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models cs.CV · 2023-11-07 · unverdicted · none · ref 12
I2VGen-XL applies cascaded diffusion models with a base stage for semantic preservation via hierarchical encoders and a refinement stage for detail and resolution, trained on 35 million text-video and 6 billion text-image pairs.
ModelScope Text-to-Video Technical Report cs.CV · 2023-08-12 · unverdicted · none · ref 14
ModelScopeT2V is a 1.7-billion-parameter text-to-video model built on Stable Diffusion that adds temporal modeling and outperforms prior methods on three evaluation metrics.
VRAG: Learning World Models for Interactive Video Generation cs.CV · 2025-05-28 · unreviewed · ref 12

Flexible diffusion modeling of long videos

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer