MultiAnimate: Pose-Guided Image Animation Made Extensible

· 2026 · cs.CV · arXiv 2602.21581

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Pose-guided human image animation aims to synthesize realistic videos of a reference character driven by a sequence of poses. While diffusion-based methods have achieved remarkable success, most existing approaches are limited to single-character animation. We observe that naively extending these methods to multi-character scenarios often leads to identity confusion and implausible occlusions between characters. To address these challenges, in this paper, we propose an extensible multi-character image animation framework built upon modern Diffusion Transformers (DiTs) for video generation. At its core, our framework introduces two novel components-Identifier Assigner and Identifier Adapter - which collaboratively capture per-person positional cues and inter-person spatial relationships. This mask-driven scheme, along with a scalable training strategy, not only enhances flexibility but also enables generalization to scenarios with more characters than those seen during training. Remarkably, trained on only a two-character dataset, our model generalizes to multi-character animation while maintaining compatibility with single-character cases. Extensive experiments demonstrate that our approach achieves state-of-the-art performance in multi-character image animation, surpassing existing diffusion-based baselines.

representative citing papers

SCAIL-2: Unifying Controlled Character Animation with End-to-end In-Context Conditioning

cs.CV · 2026-06-09 · unverdicted · novelty 6.0

SCAIL-2 achieves end-to-end character animation via direct video concatenation, in-context mask conditioning, mode-specific RoPE, the synthetic MotionPair-60K dataset, and Bias-Aware DPO, outperforming prior methods on multiple tasks.

citing papers explorer

Showing 1 of 1 citing paper.

SCAIL-2: Unifying Controlled Character Animation with End-to-end In-Context Conditioning cs.CV · 2026-06-09 · unverdicted · none · ref 25 · internal anchor
SCAIL-2 achieves end-to-end character animation via direct video concatenation, in-context mask conditioning, mode-specific RoPE, the synthetic MotionPair-60K dataset, and Bias-Aware DPO, outperforming prior methods on multiple tasks.

MultiAnimate: Pose-Guided Image Animation Made Extensible

fields

years

verdicts

representative citing papers

citing papers explorer