A decoupled memory branch with hybrid cues, cross-attention, and gating improves spatial consistency and data efficiency in long-horizon camera-trajectory video generation.
In: Proceedings of the Computer Vision and Pattern Recognition Conference
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Composing a policy that maps 2D waypoints to joint actions with a frozen world model yields a lifted world model that achieves 3.8 times lower mean joint error than direct low-level search while being more compute-efficient and generalizing to unseen environments.
citing papers explorer
-
Memorize When Needed: Decoupled Memory Control for Spatially Consistent Long-Horizon Video Generation
A decoupled memory branch with hybrid cues, cross-attention, and gating improves spatial consistency and data efficiency in long-horizon camera-trajectory video generation.
-
Lifting Embodied World Models for Planning and Control
Composing a policy that maps 2D waypoints to joint actions with a frozen world model yields a lifted world model that achieves 3.8 times lower mean joint error than direct low-level search while being more compute-efficient and generalizing to unseen environments.