hub

Animate anyone: Consistent and controllable image-to-video synthesis for character animation

Animate Anyone: Consistent, Controllable Image-to-Video Synthesis for Character Animation , author= · 2024 · arXiv 2311.17117

15 Pith papers cite this work. Polarity classification is still indexing.

15 Pith papers citing it

read on arXiv browse 15 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2 method 1

citation-polarity summary

background 2 use method 1

representative citing papers

iTryOn: Mastering Interactive Video Virtual Try-On with Spatial-Semantic Guidance

cs.CV · 2026-05-20 · unverdicted · novelty 7.0

iTryOn is a diffusion-based framework that adds spatial 3D hand guidance and semantic action-aware embeddings to handle complex garment deformations during human-clothing interactions in videos.

ExpertEdit: Learning Skill-Aware Motion Editing from Expert Videos

cs.CV · 2026-04-12 · unverdicted · novelty 7.0

ExpertEdit edits novice motions to expert skill levels by learning a motion prior from unpaired videos and infilling masked skill-critical spans.

Screen, Cache, and Match: A Training-Free Causality-Consistent Reference Frame Framework for Human Animation

cs.GR · 2025-12-13 · unverdicted · novelty 7.0

FrameCache uses a Screen-Cache-Match strategy and Trajectory-Aware Autoregressive Generation to convert past frames into causal guidance for temporally coherent human animation videos.

Vid-Freeze: Protecting Images from Malicious Image-to-Video Generation via Temporal Freezing

cs.CV · 2025-09-27 · unverdicted · novelty 7.0

Vid-Freeze immunizes images by adding perturbations that target attention dynamics in I2V models to enforce temporal freezing and suppress motion synthesis.

HandsOnWorld: Unconstrained Egocentric Video Generation with Camera-Disentangled Hand Control

cs.CV · 2026-07-02 · unverdicted · novelty 6.0

HandsOnWorld creates a hand-controlled egocentric video generator from unconstrained monocular video via a new EgoVid-Pro dataset from monocular reconstruction and a Plücker Hand Map that disentangles camera and hand motion.

Error-Conditioned Neural Solvers

cs.LG · 2026-06-25 · unverdicted · novelty 6.0

Error-Conditioned Neural Solvers improve PDE prediction accuracy by using the residual field as network input for learned corrections, outperforming residual-minimization methods by up to 10x on turbulent flows and generalizing better under distribution shifts.

EverAnimate: Minute-Scale Human Animation via Latent Flow Restoration

cs.CV · 2026-05-14 · unverdicted · novelty 6.0

EverAnimate restores drifted latent flow trajectories in chunked video generation via persistent latent propagation and restorative flow matching, achieving measurable gains in PSNR, SSIM, LPIPS, and FID over prior long-animation methods with only LoRA tuning.

CameraCtrl: Enabling Camera Control for Text-to-Video Generation

cs.CV · 2024-04-02 · unverdicted · novelty 6.0

CameraCtrl enables accurate camera pose control in video diffusion models through a trained plug-and-play module and dataset choices emphasizing diverse camera trajectories with matching appearance.

3D Scene-Adaptive Trajectory-Controllable Human Image Animation with Camera Movement

cs.CV · 2026-06-29 · unverdicted · novelty 5.0 · 2 refs

Presents a scene-adaptive 3D human image animation framework using ground-adaptive motion retargeting and viewpoint-adaptive latent fusion to control human and camera trajectories, claiming improvements on two benchmarks.

Enhancing Domain Generalization in 3D Human Pose Estimation through Controllable Generative Augmentation

cs.CV · 2026-05-12 · unverdicted · novelty 5.0

A controllable generative augmentation approach synthesizes diverse pose videos from indoor and outdoor datasets to improve model performance on unseen domains in 3D human pose estimation.

DriVerse: Navigation World Model for Driving Simulation via Multimodal Trajectory Prompting and Motion Alignment

cs.RO · 2025-04-22 · unverdicted · novelty 5.0

DriVerse is a generative model that simulates driving scenes from an image and trajectory using multimodal prompting and motion alignment, achieving better performance on nuScenes and Waymo datasets with minimal training.

Pose-dIVE: Pose-Diversified Augmentation with Diffusion Model for Person Re-Identification

cs.CV · 2024-06-23 · unverdicted · novelty 5.0

Pose-dIVE augments Re-ID training sets with diffusion-generated images of diverse poses and viewpoints by conditioning on SMPL parameters.

EchoTorrent: Towards Swift, Sustained, and Streaming Multi-Modal Video Generation

cs.CV · 2026-02-14 · unverdicted · novelty 4.0

EchoTorrent combines multi-teacher distillation, adaptive CFG calibration, hybrid long-tail forcing, and VAE decoder refinement to enable few-pass autoregressive streaming video generation with improved temporal consistency and audio-lip sync.

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

cs.CV · 2024-02-27 · unverdicted · novelty 2.0

The paper reviews the background, technology, applications, limitations, and future directions of OpenAI's Sora text-to-video generative model based on public information.

GimbalDiffusion: Gravity-Aware Camera Control for Video Generation

cs.CV · 2025-12-09

citing papers explorer

Showing 15 of 15 citing papers.

iTryOn: Mastering Interactive Video Virtual Try-On with Spatial-Semantic Guidance cs.CV · 2026-05-20 · unverdicted · none · ref 78
iTryOn is a diffusion-based framework that adds spatial 3D hand guidance and semantic action-aware embeddings to handle complex garment deformations during human-clothing interactions in videos.
ExpertEdit: Learning Skill-Aware Motion Editing from Expert Videos cs.CV · 2026-04-12 · unverdicted · none · ref 20
ExpertEdit edits novice motions to expert skill levels by learning a motion prior from unpaired videos and infilling masked skill-critical spans.
Screen, Cache, and Match: A Training-Free Causality-Consistent Reference Frame Framework for Human Animation cs.GR · 2025-12-13 · unverdicted · none · ref 5
FrameCache uses a Screen-Cache-Match strategy and Trajectory-Aware Autoregressive Generation to convert past frames into causal guidance for temporally coherent human animation videos.
Vid-Freeze: Protecting Images from Malicious Image-to-Video Generation via Temporal Freezing cs.CV · 2025-09-27 · unverdicted · none · ref 15
Vid-Freeze immunizes images by adding perturbations that target attention dynamics in I2V models to enforce temporal freezing and suppress motion synthesis.
HandsOnWorld: Unconstrained Egocentric Video Generation with Camera-Disentangled Hand Control cs.CV · 2026-07-02 · unverdicted · none · ref 22
HandsOnWorld creates a hand-controlled egocentric video generator from unconstrained monocular video via a new EgoVid-Pro dataset from monocular reconstruction and a Plücker Hand Map that disentangles camera and hand motion.
Error-Conditioned Neural Solvers cs.LG · 2026-06-25 · unverdicted · none · ref 76
Error-Conditioned Neural Solvers improve PDE prediction accuracy by using the residual field as network input for learned corrections, outperforming residual-minimization methods by up to 10x on turbulent flows and generalizing better under distribution shifts.
EverAnimate: Minute-Scale Human Animation via Latent Flow Restoration cs.CV · 2026-05-14 · unverdicted · none · ref 7
EverAnimate restores drifted latent flow trajectories in chunked video generation via persistent latent propagation and restorative flow matching, achieving measurable gains in PSNR, SSIM, LPIPS, and FID over prior long-animation methods with only LoRA tuning.
CameraCtrl: Enabling Camera Control for Text-to-Video Generation cs.CV · 2024-04-02 · unverdicted · none · ref 122
CameraCtrl enables accurate camera pose control in video diffusion models through a trained plug-and-play module and dataset choices emphasizing diverse camera trajectories with matching appearance.
3D Scene-Adaptive Trajectory-Controllable Human Image Animation with Camera Movement cs.CV · 2026-06-29 · unverdicted · none · ref 11 · 2 links
Presents a scene-adaptive 3D human image animation framework using ground-adaptive motion retargeting and viewpoint-adaptive latent fusion to control human and camera trajectories, claiming improvements on two benchmarks.
Enhancing Domain Generalization in 3D Human Pose Estimation through Controllable Generative Augmentation cs.CV · 2026-05-12 · unverdicted · none · ref 17
A controllable generative augmentation approach synthesizes diverse pose videos from indoor and outdoor datasets to improve model performance on unseen domains in 3D human pose estimation.
DriVerse: Navigation World Model for Driving Simulation via Multimodal Trajectory Prompting and Motion Alignment cs.RO · 2025-04-22 · unverdicted · none · ref 31
DriVerse is a generative model that simulates driving scenes from an image and trajectory using multimodal prompting and motion alignment, achieving better performance on nuScenes and Waymo datasets with minimal training.
Pose-dIVE: Pose-Diversified Augmentation with Diffusion Model for Person Re-Identification cs.CV · 2024-06-23 · unverdicted · none · ref 25
Pose-dIVE augments Re-ID training sets with diffusion-generated images of diverse poses and viewpoints by conditioning on SMPL parameters.
EchoTorrent: Towards Swift, Sustained, and Streaming Multi-Modal Video Generation cs.CV · 2026-02-14 · unverdicted · none · ref 42
EchoTorrent combines multi-teacher distillation, adaptive CFG calibration, hybrid long-tail forcing, and VAE decoder refinement to enable few-pass autoregressive streaming video generation with improved temporal consistency and audio-lip sync.
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models cs.CV · 2024-02-27 · unverdicted · none · ref 152
The paper reviews the background, technology, applications, limitations, and future directions of OpenAI's Sora text-to-video generative model based on public information.
GimbalDiffusion: Gravity-Aware Camera Control for Video Generation cs.CV · 2025-12-09 · unreviewed · ref 14

Animate anyone: Consistent and controllable image-to-video synthesis for character animation

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer