Reshoot-Anything trains a diffusion transformer on pseudo multi-view triplets created by cropping and warping monocular videos to achieve temporally consistent video reshooting with robust camera control on dynamic scenes.
Dust3r: Geometric 3d vi- sion made easy
7 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 7verdicts
UNVERDICTED 7representative citing papers
AnchorSplat uses anchor-aligned 3D Gaussians guided by geometric priors for feed-forward scene reconstruction, achieving SOTA novel view synthesis on ScanNet++ with fewer primitives and better view consistency.
POMA-3D learns self-supervised 3D scene representations from point maps and improves performance on geometric 3D tasks including navigation and scene retrieval.
LA-Pose achieves over 10% higher pose accuracy than recent feed-forward methods on Waymo and PandaSet benchmarks by repurposing latent actions from self-supervised inverse-dynamics pretraining while using orders of magnitude less labeled 3D data.
CylinderDepth uses cylindrical spatial attention with non-learned weights to enforce cross-view consistency in self-supervised surround depth estimation.
Co-Me distills a confidence predictor to selectively merge low-confidence tokens in visual geometric transformers, delivering up to 21.5x speedup on VGGT and 20.4x on Pi3 while preserving spatial coverage and performance.
A pipeline that reconstructs articulated objects from sparse unposed images by aligning independent per-pose reconstructions via learned deformation fields and progressive static/moving part disentanglement.
citing papers explorer
-
Reshoot-Anything: A Self-Supervised Model for In-the-Wild Video Reshooting
Reshoot-Anything trains a diffusion transformer on pseudo multi-view triplets created by cropping and warping monocular videos to achieve temporally consistent video reshooting with robust camera control on dynamic scenes.
-
AnchorSplat: Feed-Forward 3D Gaussian Splatting with 3D Geometric Priors
AnchorSplat uses anchor-aligned 3D Gaussians guided by geometric priors for feed-forward scene reconstruction, achieving SOTA novel view synthesis on ScanNet++ with fewer primitives and better view consistency.
-
POMA-3D: The Point Map Way to 3D Scene Understanding
POMA-3D learns self-supervised 3D scene representations from point maps and improves performance on geometric 3D tasks including navigation and scene retrieval.
-
LA-Pose: Latent Action Pretraining Meets Pose Estimation
LA-Pose achieves over 10% higher pose accuracy than recent feed-forward methods on Waymo and PandaSet benchmarks by repurposing latent actions from self-supervised inverse-dynamics pretraining while using orders of magnitude less labeled 3D data.
-
CylinderDepth: Cylindrical Spatial Attention for Multi-View Consistent Self-Supervised Surround Depth Estimation
CylinderDepth uses cylindrical spatial attention with non-learned weights to enforce cross-view consistency in self-supervised surround depth estimation.
-
Co-Me: Confidence-Guided Token Merging for Visual Geometric Transformers
Co-Me distills a confidence predictor to selectively merge low-confidence tokens in visual geometric transformers, delivering up to 21.5x speedup on VGGT and 20.4x on Pi3 while preserving spatial coverage and performance.
-
PAOLI: Pose-free Articulated Object Learning from Sparse-view Images
A pipeline that reconstructs articulated objects from sparse unposed images by aligning independent per-pose reconstructions via learned deformation fields and progressive static/moving part disentanglement.