TrackCraft3R is the first method to repurpose a video diffusion transformer as a feed-forward dense 3D tracker via dual-latent representations and temporal RoPE alignment, achieving SOTA performance with lower compute.
D^2USt3R: Enhancing 3D Reconstruction with 4D Pointmaps for Dynamic Scenes, April 2025
6 Pith papers cite this work. Polarity classification is still indexing.
6
Pith papers citing it
citation-role summary
background 2
citation-polarity summary
fields
cs.CV 6roles
background 2polarities
background 2representative citing papers
NoPo4D is the first feed-forward system for dynamic 4D Gaussian splatting from unposed multi-view videos, using velocity decomposition supervised by optical flow and a bidirectional motion encoder.
C3G creates compact 3D Gaussian representations with 2K points by guiding placement via learnable tokens that aggregate multi-view features through attention, yielding better efficiency and performance than dense methods.
VGGT-Ω improves feed-forward reconstruction accuracy and efficiency by architectural simplifications, register-based attention, and training on much larger supervised and unlabeled video data.