iTryOn is a diffusion-based framework that adds spatial 3D hand guidance and semantic action-aware embeddings to handle complex garment deformations during human-clothing interactions in videos.
hub
Animate anyone: Consistent and controllable image-to-video synthesis for character animation
15 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
ExpertEdit edits novice motions to expert skill levels by learning a motion prior from unpaired videos and infilling masked skill-critical spans.
FrameCache uses a Screen-Cache-Match strategy and Trajectory-Aware Autoregressive Generation to convert past frames into causal guidance for temporally coherent human animation videos.
Vid-Freeze immunizes images by adding perturbations that target attention dynamics in I2V models to enforce temporal freezing and suppress motion synthesis.
HandsOnWorld creates a hand-controlled egocentric video generator from unconstrained monocular video via a new EgoVid-Pro dataset from monocular reconstruction and a Plücker Hand Map that disentangles camera and hand motion.
Error-Conditioned Neural Solvers improve PDE prediction accuracy by using the residual field as network input for learned corrections, outperforming residual-minimization methods by up to 10x on turbulent flows and generalizing better under distribution shifts.
EverAnimate restores drifted latent flow trajectories in chunked video generation via persistent latent propagation and restorative flow matching, achieving measurable gains in PSNR, SSIM, LPIPS, and FID over prior long-animation methods with only LoRA tuning.
CameraCtrl enables accurate camera pose control in video diffusion models through a trained plug-and-play module and dataset choices emphasizing diverse camera trajectories with matching appearance.
Presents a scene-adaptive 3D human image animation framework using ground-adaptive motion retargeting and viewpoint-adaptive latent fusion to control human and camera trajectories, claiming improvements on two benchmarks.
A controllable generative augmentation approach synthesizes diverse pose videos from indoor and outdoor datasets to improve model performance on unseen domains in 3D human pose estimation.
DriVerse is a generative model that simulates driving scenes from an image and trajectory using multimodal prompting and motion alignment, achieving better performance on nuScenes and Waymo datasets with minimal training.
Pose-dIVE augments Re-ID training sets with diffusion-generated images of diverse poses and viewpoints by conditioning on SMPL parameters.
EchoTorrent combines multi-teacher distillation, adaptive CFG calibration, hybrid long-tail forcing, and VAE decoder refinement to enable few-pass autoregressive streaming video generation with improved temporal consistency and audio-lip sync.
The paper reviews the background, technology, applications, limitations, and future directions of OpenAI's Sora text-to-video generative model based on public information.
citing papers explorer
-
iTryOn: Mastering Interactive Video Virtual Try-On with Spatial-Semantic Guidance
iTryOn is a diffusion-based framework that adds spatial 3D hand guidance and semantic action-aware embeddings to handle complex garment deformations during human-clothing interactions in videos.
-
ExpertEdit: Learning Skill-Aware Motion Editing from Expert Videos
ExpertEdit edits novice motions to expert skill levels by learning a motion prior from unpaired videos and infilling masked skill-critical spans.
-
Screen, Cache, and Match: A Training-Free Causality-Consistent Reference Frame Framework for Human Animation
FrameCache uses a Screen-Cache-Match strategy and Trajectory-Aware Autoregressive Generation to convert past frames into causal guidance for temporally coherent human animation videos.
-
Vid-Freeze: Protecting Images from Malicious Image-to-Video Generation via Temporal Freezing
Vid-Freeze immunizes images by adding perturbations that target attention dynamics in I2V models to enforce temporal freezing and suppress motion synthesis.
-
HandsOnWorld: Unconstrained Egocentric Video Generation with Camera-Disentangled Hand Control
HandsOnWorld creates a hand-controlled egocentric video generator from unconstrained monocular video via a new EgoVid-Pro dataset from monocular reconstruction and a Plücker Hand Map that disentangles camera and hand motion.
-
Error-Conditioned Neural Solvers
Error-Conditioned Neural Solvers improve PDE prediction accuracy by using the residual field as network input for learned corrections, outperforming residual-minimization methods by up to 10x on turbulent flows and generalizing better under distribution shifts.
-
EverAnimate: Minute-Scale Human Animation via Latent Flow Restoration
EverAnimate restores drifted latent flow trajectories in chunked video generation via persistent latent propagation and restorative flow matching, achieving measurable gains in PSNR, SSIM, LPIPS, and FID over prior long-animation methods with only LoRA tuning.
-
CameraCtrl: Enabling Camera Control for Text-to-Video Generation
CameraCtrl enables accurate camera pose control in video diffusion models through a trained plug-and-play module and dataset choices emphasizing diverse camera trajectories with matching appearance.
-
3D Scene-Adaptive Trajectory-Controllable Human Image Animation with Camera Movement
Presents a scene-adaptive 3D human image animation framework using ground-adaptive motion retargeting and viewpoint-adaptive latent fusion to control human and camera trajectories, claiming improvements on two benchmarks.
-
Enhancing Domain Generalization in 3D Human Pose Estimation through Controllable Generative Augmentation
A controllable generative augmentation approach synthesizes diverse pose videos from indoor and outdoor datasets to improve model performance on unseen domains in 3D human pose estimation.
-
DriVerse: Navigation World Model for Driving Simulation via Multimodal Trajectory Prompting and Motion Alignment
DriVerse is a generative model that simulates driving scenes from an image and trajectory using multimodal prompting and motion alignment, achieving better performance on nuScenes and Waymo datasets with minimal training.
-
Pose-dIVE: Pose-Diversified Augmentation with Diffusion Model for Person Re-Identification
Pose-dIVE augments Re-ID training sets with diffusion-generated images of diverse poses and viewpoints by conditioning on SMPL parameters.
-
EchoTorrent: Towards Swift, Sustained, and Streaming Multi-Modal Video Generation
EchoTorrent combines multi-teacher distillation, adaptive CFG calibration, hybrid long-tail forcing, and VAE decoder refinement to enable few-pass autoregressive streaming video generation with improved temporal consistency and audio-lip sync.
-
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
The paper reviews the background, technology, applications, limitations, and future directions of OpenAI's Sora text-to-video generative model based on public information.
- GimbalDiffusion: Gravity-Aware Camera Control for Video Generation