pith. sign in

hub

VGGT-SLAM: Dense RGB SLAM Optimized on the SL(4) Manifold

24 Pith papers cite this work. Polarity classification is still indexing.

24 Pith papers citing it
abstract

We present VGGT-SLAM, a dense RGB SLAM system constructed by incrementally and globally aligning submaps created from the feed-forward scene reconstruction approach VGGT using only uncalibrated monocular cameras. While related works align submaps using similarity transforms (i.e., translation, rotation, and scale), we show that such approaches are inadequate in the case of uncalibrated cameras. In particular, we revisit the idea of reconstruction ambiguity, where given a set of uncalibrated cameras with no assumption on the camera motion or scene structure, the scene can only be reconstructed up to a 15-degrees-of-freedom projective transformation of the true geometry. This inspires us to recover a consistent scene reconstruction across submaps by optimizing over the SL(4) manifold, thus estimating 15-degrees-of-freedom homography transforms between sequential submaps while accounting for potential loop closure constraints. As verified by extensive experiments, we demonstrate that VGGT-SLAM achieves improved map quality using long video sequences that are infeasible for VGGT due to its high GPU requirements.

hub tools

citation-role summary

background 1 baseline 1 dataset 1

citation-polarity summary

fields

cs.CV 19 cs.RO 5

years

2026 16 2025 8

representative citing papers

Depth Anything 3: Recovering the Visual Space from Any Views

cs.CV · 2025-11-13 · unverdicted · novelty 6.0

DA3 recovers consistent visual geometry from arbitrary views via a vanilla DINO transformer and depth-ray target, setting new SOTA on a visual geometry benchmark while outperforming DA2 on monocular depth.

$R^3$: 3D Reconstruction via Relative Regression

cs.CV · 2026-05-26 · unverdicted · novelty 5.0

R³ uses relative regression with confidence-weighted constraints from an MLP to support long-context offline and streaming 3D reconstruction without global coordinate assumptions.

HorizonStream: Long-Horizon Attention for Streaming 3D Reconstruction

cs.CV · 2026-05-22 · unverdicted · novelty 5.0

HorizonStream is a long-horizon Transformer that factorizes geometric evidence influence into channel-wise linear attention for long-range temporal propagation and local spatiotemporal attention for short-range matching, claiming stable generalization from 48-frame training to over 10,000-frame test

TTT3R: 3D Reconstruction as Test-Time Training

cs.CV · 2025-09-30 · unverdicted · novelty 5.0

TTT3R derives a closed-form learning rate from memory-observation alignment confidence to boost length generalization in RNN-based 3D reconstruction by 2x in global pose estimation.

ViPE: Video Pose Engine for 3D Geometric Perception

cs.CV · 2025-08-12 · unverdicted · novelty 5.0

ViPE estimates camera intrinsics, motion, and dense near-metric depth from uncalibrated videos, outperforming baselines on TUM and KITTI while releasing annotations for 96M frames across real and generated videos.

VGGT-SLAM++

cs.CV · 2026-04-08 · unverdicted · novelty 4.0

VGGT-SLAM++ improves on prior transformer SLAM by adding dense DEM submap graphs and high-cadence local optimization, achieving SOTA accuracy with reduced drift and bounded memory on benchmarks.

citing papers explorer

Showing 24 of 24 citing papers.