Follow-your-motion: Video motion transfer via efficient spatial-temporal decoupled finetuning

Yue Ma, Yulong Liu, Qiyuan Zhu, Ayden Yang, Kunyu Feng, Xinhua Zhang, Zhifeng Li, Sirui Han, Chenyang Qi, Qifeng Chen · 2025 · arXiv 2506.05207

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

read on arXiv browse 10 citing papers

citation-role summary

background 2 other 1

citation-polarity summary

background 2 unclear 1

representative citing papers

ST-BiBench: Benchmarking Multi-Stream Multimodal Coordination in Bimanual Embodied Tasks for MLLMs

cs.RO · 2026-02-09 · unverdicted · novelty 7.0

ST-BiBench reveals a coordination paradox in which MLLMs show strong high-level strategic reasoning yet fail at fine-grained 16-dimensional bimanual action synthesis and multi-stream fusion.

GeoEdit: Geometry-Aware Object Editing via Dual-Branch Denoising

cs.CV · 2026-06-29 · unverdicted · novelty 6.0

GeoEdit introduces a Lift-Manipulate-Render-Denoise pipeline with dual-branch denoising and variance-homogeneous injection for 3D-consistent object editing in single photos.

PAI-Studio: Cinematic Video Background Replacement with Camera-Aware Motion

cs.CV · 2026-05-31 · unverdicted · novelty 6.0

PAI-Studio reformulates cinematic background replacement as in-context conditional generation inside a Diffusion Transformer with bidirectional attention, trained on a new 30K film-sourced dataset, and reports better motion consistency and relighting than prior open-source and commercial systems.

OmniHumanoid: Streaming Cross-Embodiment Video Generation with Paired-Free Adaptation

cs.CV · 2026-05-12 · unverdicted · novelty 6.0

OmniHumanoid factorizes transferable motion learning from embodiment-specific adaptation to enable scalable cross-embodiment video generation without paired data for new humanoids.

MotionAdapter: Video Motion Transfer via Content-Aware Attention Customization

cs.CV · 2026-01-05 · unverdicted · novelty 6.0

MotionAdapter transfers reference video motions into target videos inside DiT diffusion models by isolating attention-derived motion fields and refining them via DINO-guided semantic alignment.

EasyVFX: Frequency-Driven Decoupling for Resource-Efficient VFX Generation

cs.CV · 2026-05-21 · unverdicted · novelty 5.0

EasyVFX decouples VFX generation via frequency-aware Mixture-of-Experts and test-time training to achieve realistic effects with limited resources.

Focused Forcing: Content-Aware Per-Frame KV Selection for Efficient Autoregressive Video Diffusion

cs.CV · 2026-05-18 · unverdicted · novelty 5.0

Focused Forcing is a training-free per-frame KV selection method that combines attention scores with diversity metrics and head-importance estimation to accelerate autoregressive video diffusion up to 1.48x while improving quality.

Embedding-perturbed Exploration Preference Optimization for Flow Models

cs.CV · 2026-05-15 · unverdicted · novelty 5.0

E²PO uses embedding-level perturbations to maintain intra-group variance and discriminative signal in RL-based preference optimization for generative flow models.

Follow-Your-Preference++: Rethinking Preference Alignment for Image Inpainting

cs.CV · 2026-06-02 · unverdicted · novelty 3.0

Empirical study shows reward model ensembles mitigate biases like brightness and composition in preference data for image inpainting, yielding better performance than prior methods without architecture changes.

LMMs Meet Object-Centric Vision: Understanding, Segmentation, Editing and Generation

cs.CV · 2026-04-13 · unverdicted · novelty 3.0

This review organizes literature on large multimodal models and object-centric vision into four themes—understanding, referring segmentation, editing, and generation—while summarizing paradigms, strategies, and challenges like instance permanence and consistent interaction.

citing papers explorer

Showing 10 of 10 citing papers.

ST-BiBench: Benchmarking Multi-Stream Multimodal Coordination in Bimanual Embodied Tasks for MLLMs cs.RO · 2026-02-09 · unverdicted · none · ref 79
ST-BiBench reveals a coordination paradox in which MLLMs show strong high-level strategic reasoning yet fail at fine-grained 16-dimensional bimanual action synthesis and multi-stream fusion.
GeoEdit: Geometry-Aware Object Editing via Dual-Branch Denoising cs.CV · 2026-06-29 · unverdicted · none · ref 36
GeoEdit introduces a Lift-Manipulate-Render-Denoise pipeline with dual-branch denoising and variance-homogeneous injection for 3D-consistent object editing in single photos.
PAI-Studio: Cinematic Video Background Replacement with Camera-Aware Motion cs.CV · 2026-05-31 · unverdicted · none · ref 32
PAI-Studio reformulates cinematic background replacement as in-context conditional generation inside a Diffusion Transformer with bidirectional attention, trained on a new 30K film-sourced dataset, and reports better motion consistency and relighting than prior open-source and commercial systems.
OmniHumanoid: Streaming Cross-Embodiment Video Generation with Paired-Free Adaptation cs.CV · 2026-05-12 · unverdicted · none · ref 22
OmniHumanoid factorizes transferable motion learning from embodiment-specific adaptation to enable scalable cross-embodiment video generation without paired data for new humanoids.
MotionAdapter: Video Motion Transfer via Content-Aware Attention Customization cs.CV · 2026-01-05 · unverdicted · none · ref 21
MotionAdapter transfers reference video motions into target videos inside DiT diffusion models by isolating attention-derived motion fields and refining them via DINO-guided semantic alignment.
EasyVFX: Frequency-Driven Decoupling for Resource-Efficient VFX Generation cs.CV · 2026-05-21 · unverdicted · none · ref 34
EasyVFX decouples VFX generation via frequency-aware Mixture-of-Experts and test-time training to achieve realistic effects with limited resources.
Focused Forcing: Content-Aware Per-Frame KV Selection for Efficient Autoregressive Video Diffusion cs.CV · 2026-05-18 · unverdicted · none · ref 32
Focused Forcing is a training-free per-frame KV selection method that combines attention scores with diversity metrics and head-importance estimation to accelerate autoregressive video diffusion up to 1.48x while improving quality.
Embedding-perturbed Exploration Preference Optimization for Flow Models cs.CV · 2026-05-15 · unverdicted · none · ref 61
E²PO uses embedding-level perturbations to maintain intra-group variance and discriminative signal in RL-based preference optimization for generative flow models.
Follow-Your-Preference++: Rethinking Preference Alignment for Image Inpainting cs.CV · 2026-06-02 · unverdicted · none · ref 43
Empirical study shows reward model ensembles mitigate biases like brightness and composition in preference data for image inpainting, yielding better performance than prior methods without architecture changes.
LMMs Meet Object-Centric Vision: Understanding, Segmentation, Editing and Generation cs.CV · 2026-04-13 · unverdicted · none · ref 109
This review organizes literature on large multimodal models and object-centric vision into four themes—understanding, referring segmentation, editing, and generation—while summarizing paradigms, strategies, and challenges like instance permanence and consistent interaction.

Follow-your-motion: Video motion transfer via efficient spatial-temporal decoupled finetuning

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer