SimInsert is a training-free video object insertion technique that decouples the task into single-frame editing and semantic motion description, using image-to-video diffusion models with non-invasive guidance to achieve spatio-temporal coherence.
ConsistI2V: Enhancing Vi- sual Consistency for Image-to-Video Generation
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 6representative citing papers
Mutual Forcing trains a single native autoregressive audio-video model with mutually reinforcing few-step and multi-step modes via self-distillation to match 50-step baselines at 4-8 steps.
Show-o2 unifies text, image, and video understanding and generation in a single autoregressive-plus-flow-matching model built on 3D causal VAE representations.
A survey that organizes diffusion image-to-video methods into a taxonomy, distills core designs in condition encoding, temporal modeling, noise prior, and upsampling, and discusses applications plus challenges.
The Cosmos platform supplies open-source pre-trained world models and supporting tools for building fine-tunable digital world simulations to train Physical AI.
citing papers explorer
-
SimInsert: Seamless Video Object Insertion via Regional Sparse Attention Fusion
SimInsert is a training-free video object insertion technique that decouples the task into single-frame editing and semantic motion description, using image-to-video diffusion models with non-invasive guidance to achieve spatio-temporal coherence.
-
Mutual Forcing: Dual-Mode Self-Evolution for Fast Autoregressive Audio-Video Character Generation
Mutual Forcing trains a single native autoregressive audio-video model with mutually reinforcing few-step and multi-step modes via self-distillation to match 50-step baselines at 4-8 steps.
-
Show-o2: Improved Native Unified Multimodal Models
Show-o2 unifies text, image, and video understanding and generation in a single autoregressive-plus-flow-matching model built on 3D causal VAE representations.
-
Image-to-Video Diffusion: From Foundations to Open Frontiers
A survey that organizes diffusion image-to-video methods into a taxonomy, distills core designs in condition encoding, temporal modeling, noise prior, and upsampling, and discusses applications plus challenges.
-
Cosmos World Foundation Model Platform for Physical AI
The Cosmos platform supplies open-source pre-trained world models and supporting tools for building fine-tunable digital world simulations to train Physical AI.
- CaC: Advancing Video Reward Models via Hierarchical Spatiotemporal Concentrating