TrackCraft3R is the first method to repurpose a video diffusion transformer as a feed-forward dense 3D tracker via dual-latent representations and temporal RoPE alignment, achieving SOTA performance with lower compute.
Can video diffusion model reconstruct 4d geometry?
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.CV 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
Geometry Forcing aligns video diffusion representations with geometric foundation model features via angular cosine and scale regression objectives to improve 3D consistency in generated videos.
citing papers explorer
-
TrackCraft3R: Repurposing Video Diffusion Transformers for Dense 3D Tracking
TrackCraft3R is the first method to repurpose a video diffusion transformer as a feed-forward dense 3D tracker via dual-latent representations and temporal RoPE alignment, achieving SOTA performance with lower compute.
-
Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling
Geometry Forcing aligns video diffusion representations with geometric foundation model features via angular cosine and scale regression objectives to improve 3D consistency in generated videos.