LEXI-SG is the first monocular RGB system for dense open-vocabulary 3D scene graphs that partitions scenes into rooms and performs feed-forward reconstruction per room before global factor-graph alignment.
hub
Vggt-long: Chunk it, loop it, align it–pushing vggt’s limits on kilometer-scale long rgb sequences
13 Pith papers cite this work. Polarity classification is still indexing.
hub tools
years
2026 13verdicts
UNVERDICTED 13representative citing papers
PaceVGGT reduces VGGT inference latency by up to 5.1x on ScanNet-50 via pre-AA token pruning with a distilled Token Scorer, per-frame keep budgets, adaptive merge/prune, and feature-guided restoration, while preserving reconstruction quality on ScanNet-50 and 7-Scenes.
CAL2M achieves calibration-free kilometer-level SLAM by using an assistant eye for scale, epipolar-guided intrinsic correction, and anchor propagation for nonlinear sub-map alignment.
RetrieveVGGT enables constant-memory long-context streaming 3D reconstruction by retrieving relevant frames via query-key similarities in VGGT's first attention layer, outperforming StreamVGGT and others.
Asymmetric token reduction, with distinct merging for queries and pruning for key-values plus layer-wise adaptation, delivers up to 28x speedup on 1000-frame 3D reconstruction inputs while preserving competitive quality.
Ray-aware pointer memory with adaptive retain-or-replace updates enhances stability and accuracy in streaming 3D reconstruction.
LingBot-Map is a streaming 3D reconstruction model built on a geometric context transformer that combines anchor context, pose-reference window, and trajectory memory to deliver accurate, drift-resistant results at 20 FPS over sequences longer than 10,000 frames.
Scal3R achieves better accuracy and consistency in large-scale 3D scene reconstruction by maintaining a compressed global context through test-time adaptation of lightweight neural networks on long video sequences.
A monocular vision system estimates real-scale island area and coastline length with around 10% error using only place name or coordinates input via automated image capture, point cloud generation, and trajectory alignment.
ReorgGS reorganizes the Gaussian distribution in converged 3DGS models by resampling centers and covariances to reduce parameterization degeneration and enable better subsequent optimization.
MR.ScaleMaster adds a false-loop alarm and per-session Sim(3) scale estimation to enable accurate multi-agent monocular mapping, showing 7.2x ATE improvement on KITTI with up to 15 agents.
MonoEM-GS stabilizes view-dependent geometry from foundation models inside a global Gaussian Splatting representation via EM and adds multi-modal features for in-place open-set segmentation.
A pipeline using virtual remote sensing from Google Earth Studio, Pi-Long 3D reconstruction, metric alignment, and watershed segmentation estimates forest fuel load as a scalable alternative to traditional surveys.
citing papers explorer
-
LEXI-SG: Monocular 3D Scene Graph Mapping with Room-Guided Feed-Forward Reconstruction
LEXI-SG is the first monocular RGB system for dense open-vocabulary 3D scene graphs that partitions scenes into rooms and performs feed-forward reconstruction per room before global factor-graph alignment.
-
PaceVGGT: Pre-Alternating-Attention Token Pruning for Visual Geometry Transformers
PaceVGGT reduces VGGT inference latency by up to 5.1x on ScanNet-50 via pre-AA token pruning with a distilled Token Scorer, per-frame keep budgets, adaptive merge/prune, and feature-guided restoration, while preserving reconstruction quality on ScanNet-50 and 7-Scenes.
-
Keep It CALM: Toward Calibration-Free Kilometer-Level SLAM with Visual Geometry Foundation Models via an Assistant Eye
CAL2M achieves calibration-free kilometer-level SLAM by using an assistant eye for scale, epipolar-guided intrinsic correction, and anchor propagation for nonlinear sub-map alignment.
-
Attention Itself Could Retrieve.RetrieveVGGT: Training-Free Long Context Streaming 3D Reconstruction via Query-Key Similarity Retrieval
RetrieveVGGT enables constant-memory long-context streaming 3D reconstruction by retrieving relevant frames via query-key similarities in VGGT's first attention layer, outperforming StreamVGGT and others.
-
Spark3R: Asymmetric Token Reduction Makes Fast Feed-Forward 3D Reconstruction
Asymmetric token reduction, with distinct merging for queries and pruning for key-values plus layer-wise adaptation, delivers up to 28x speedup on 1000-frame 3D reconstruction inputs while preserving competitive quality.
-
Ray-Aware Pointer Memory with Adaptive Updates for Streaming 3D Reconstruction
Ray-aware pointer memory with adaptive retain-or-replace updates enhances stability and accuracy in streaming 3D reconstruction.
-
Geometric Context Transformer for Streaming 3D Reconstruction
LingBot-Map is a streaming 3D reconstruction model built on a geometric context transformer that combines anchor context, pose-reference window, and trajectory memory to deliver accurate, drift-resistant results at 20 FPS over sequences longer than 10,000 frames.
-
Scal3R: Scalable Test-Time Training for Large-Scale 3D Reconstruction
Scal3R achieves better accuracy and consistency in large-scale 3D scene reconstruction by maintaining a compressed global context through test-time adaptation of lightweight neural networks on long video sequences.
-
Real-Scale Island Area and Coastline Estimation using Only its Place Name or Coordinates
A monocular vision system estimates real-scale island area and coastline length with around 10% error using only place name or coordinates input via automated image capture, point cloud generation, and trajectory alignment.
-
ReorgGS: Equivalent Distribution Reorganization for 3D Gaussian Splatting
ReorgGS reorganizes the Gaussian distribution in converged 3DGS models by resampling centers and covariances to reduce parameterization degeneration and enable better subsequent optimization.
-
MR.ScaleMaster: Scale-Consistent Collaborative Mapping from Crowd-Sourced Monocular Videos
MR.ScaleMaster adds a false-loop alarm and per-session Sim(3) scale estimation to enable accurate multi-agent monocular mapping, showing 7.2x ATE improvement on KITTI with up to 15 agents.
-
MonoEM-GS: Monocular Expectation-Maximization Gaussian Splatting SLAM
MonoEM-GS stabilizes view-dependent geometry from foundation models inside a global Gaussian Splatting representation via EM and adds multi-modal features for in-place open-set segmentation.
-
Rapid Forest Fuel Load Estimation via Virtual Remote Sensing and Metric-Scale Feed-Forward 3D Reconstruction
A pipeline using virtual remote sensing from Google Earth Studio, Pi-Long 3D reconstruction, metric alignment, and watershed segmentation estimates forest fuel load as a scalable alternative to traditional surveys.