Introduces mesh tokenization to condition DiT-based video diffusion models directly on 3D human meshes for motion control without 2D rendering.
Perception-as-control: Fine-grained control- lable image animation with 3d-aware motion representation
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 3years
2026 3verdicts
UNVERDICTED 3representative citing papers
OptiWorld inserts a classical optimal-control layer that extracts a world state, plans an optimal trajectory on a geometric manifold under physical constraints, and renders the video conditioned on that trajectory.
Autoregressive probabilistic world models trained on raw videos yield emergent object segmentation, 3D controllability, and physical relationship inference via multi-future motion correlation analysis.
citing papers explorer
-
Towards 3D-Aware Video Diffusion Models: Render-Free Human Motion Control with Mesh Tokenization
Introduces mesh tokenization to condition DiT-based video diffusion models directly on 3D human meshes for motion control without 2D rendering.
-
OptiWorld: Optimal Control for Video World Generation under Physical Constraints
OptiWorld inserts a classical optimal-control layer that extracts a world state, plans an optimal trajectory on a geometric manifold under physical constraints, and renders the video conditioned on that trajectory.
-
Physical Object Understanding with a Physically Controllable World Model
Autoregressive probabilistic world models trained on raw videos yield emergent object segmentation, 3D controllability, and physical relationship inference via multi-future motion correlation analysis.