pith. sign in

hub

VGGT-SLAM: Dense RGB SLAM Optimized on the SL(4) Manifold

30 Pith papers cite this work. Polarity classification is still indexing.

30 Pith papers citing it
abstract

We present VGGT-SLAM, a dense RGB SLAM system constructed by incrementally and globally aligning submaps created from the feed-forward scene reconstruction approach VGGT using only uncalibrated monocular cameras. While related works align submaps using similarity transforms (i.e., translation, rotation, and scale), we show that such approaches are inadequate in the case of uncalibrated cameras. In particular, we revisit the idea of reconstruction ambiguity, where given a set of uncalibrated cameras with no assumption on the camera motion or scene structure, the scene can only be reconstructed up to a 15-degrees-of-freedom projective transformation of the true geometry. This inspires us to recover a consistent scene reconstruction across submaps by optimizing over the SL(4) manifold, thus estimating 15-degrees-of-freedom homography transforms between sequential submaps while accounting for potential loop closure constraints. As verified by extensive experiments, we demonstrate that VGGT-SLAM achieves improved map quality using long video sequences that are infeasible for VGGT due to its high GPU requirements.

hub tools

citation-role summary

background 1 baseline 1 dataset 1

citation-polarity summary

fields

cs.CV 24 cs.RO 6

years

2026 22 2025 8

clear filters

representative citing papers

Depth Anything 3: Recovering the Visual Space from Any Views

cs.CV · 2025-11-13 · unverdicted · novelty 6.0

DA3 recovers consistent visual geometry from arbitrary views via a vanilla DINO transformer and depth-ray target, setting new SOTA on a visual geometry benchmark while outperforming DA2 on monocular depth.

$R^3$: 3D Reconstruction via Relative Regression

cs.CV · 2026-05-26 · unverdicted · novelty 5.0

R³ uses relative regression with confidence-weighted constraints from an MLP to support long-context offline and streaming 3D reconstruction without global coordinate assumptions.

HorizonStream: Long-Horizon Attention for Streaming 3D Reconstruction

cs.CV · 2026-05-22 · unverdicted · novelty 5.0

HorizonStream is a long-horizon Transformer that factorizes geometric evidence influence into channel-wise linear attention for long-range temporal propagation and local spatiotemporal attention for short-range matching, claiming stable generalization from 48-frame training to over 10,000-frame test

citing papers explorer

Showing 24 of 24 citing papers after filters.