hub Canonical reference

Splatt3R: Zero-shot Gaussian Splatting from Uncalibrated Image Pairs

Brandon Smart, Chuanxia Zheng, Iro Laina, Victor Adrian Prisacariu · 2024 · cs.CV · arXiv 2408.13912

Canonical reference. 100% of citing Pith papers cite this work as background.

29 Pith papers citing it

Background 100% of classified citations

open full Pith review browse 29 citing papers arXiv PDF

abstract

In this paper, we introduce Splatt3R, a pose-free, feed-forward method for in-the-wild 3D reconstruction and novel view synthesis from stereo pairs. Given uncalibrated natural images, Splatt3R can predict 3D Gaussian Splats without requiring any camera parameters or depth information. For generalizability, we build Splatt3R upon a ``foundation'' 3D geometry reconstruction method, MASt3R, by extending it to deal with both 3D structure and appearance. Specifically, unlike the original MASt3R which reconstructs only 3D point clouds, we predict the additional Gaussian attributes required to construct a Gaussian primitive for each point. Hence, unlike other novel view synthesis methods, Splatt3R is first trained by optimizing the 3D point cloud's geometry loss, and then a novel view synthesis objective. By doing this, we avoid the local minima present in training 3D Gaussian Splats from stereo views. We also propose a novel loss masking strategy that we empirically find is critical for strong performance on extrapolated viewpoints. We train Splatt3R on the ScanNet++ dataset and demonstrate excellent generalisation to uncalibrated, in-the-wild images. Splatt3R can reconstruct scenes at 4FPS at 512 x 512 resolution, and the resultant splats can be rendered in real-time.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 6

citation-polarity summary

background 6

representative citing papers

Mind the Gap: Geometrically Accurate Generative Reconstruction from Disjoint Views

cs.CV · 2026-05-08 · unverdicted · novelty 8.0

GLADOS reconstructs 3D geometry from disjoint views by generating intermediate perspectives, performing robust coarse alignment that tolerates generative inconsistencies, and iteratively expanding context for consistency.

No Pose, No Problem in 4D: Feed-Forward Dynamic Gaussians from Unposed Multi-View Videos

cs.CV · 2026-05-21 · unverdicted · novelty 7.0

NoPo4D is the first feed-forward system for dynamic 4D Gaussian splatting from unposed multi-view videos, using velocity decomposition supervised by optical flow and a bidirectional motion encoder.

ConFixGS: Learning to Fix Feedforward 3D Gaussian Splatting with Confidence-Aware Diffusion Priors in Driving Scenes

cs.CV · 2026-05-10 · unverdicted · novelty 7.0

ConFixGS repairs feedforward 3D Gaussian Splatting with confidence-aware diffusion priors, delivering up to 3.68 dB PSNR gains and halved FID scores on Waymo, nuScenes, and KITTI novel view synthesis tasks.

SplatWeaver: Learning to Allocate Gaussian Primitives for Generalizable Novel View Synthesis

cs.CV · 2026-05-08 · unverdicted · novelty 7.0 · 2 refs

SplatWeaver uses cardinality Gaussian experts and pixel-level routing to dynamically allocate varying numbers of Gaussian primitives for generalizable novel view synthesis.

Ground4D: Spatially-Grounded Feedforward 4D Reconstruction for Unstructured Off-Road Scenes

cs.CV · 2026-05-06 · unverdicted · novelty 7.0

Ground4D resolves temporal conflicts in feedforward 4D Gaussian reconstruction for off-road scenes via voxel-grounded temporal aggregation with intra-voxel softmax and surface normal regularization, outperforming prior methods on ORAD-3D and RELLIS-3D while generalizing zero-shot.

WildSplatter: Feed-forward 3D Gaussian Splatting with Appearance Control from Unconstrained Images

cs.CV · 2026-04-23 · unverdicted · novelty 7.0

WildSplatter jointly learns 3D Gaussians and appearance embeddings from unconstrained photo collections to enable fast feed-forward reconstruction and flexible lighting control in 3D Gaussian Splatting.

Free-Range Gaussians: Non-Grid-Aligned Generative 3D Gaussian Reconstruction

cs.CV · 2026-04-06 · unverdicted · novelty 7.0

Free-Range Gaussians uses flow matching over Gaussian parameters to predict non-grid-aligned 3D Gaussians from multi-view images, enabling synthesis of plausible content in unobserved regions with fewer primitives than grid-aligned methods.

3AM: 3egment Anything with Geometric Consistency in Videos

cs.CV · 2026-01-13 · unverdicted · novelty 7.0

3AM integrates MUSt3R 3D features into SAM2 via a Feature Merger and FOV-aware sampling to deliver geometry-consistent video object segmentation from RGB alone, with large gains on wide-baseline datasets.

MODEST: Multi-Optics Depth-of-Field Stereo Dataset

cs.CV · 2025-11-25 · accept · novelty 7.0

MODEST provides the first large-scale high-resolution stereo DSLR dataset with systematic variation of focal length and aperture to support research on real-world optical effects in depth estimation.

VGGT-SLAM: Dense RGB SLAM Optimized on the SL(4) Manifold

cs.CV · 2025-05-18 · unverdicted · novelty 7.0

VGGT-SLAM aligns VGGT submaps via SL(4) manifold optimization of 15-DoF homographies to enable consistent dense RGB SLAM on long uncalibrated monocular videos.

Cross-View Splatter: Feed-Forward View Synthesis with Georeferenced Images

cs.CV · 2026-05-19 · unverdicted · novelty 6.0

A feed-forward model aligns ground and satellite features to predict Gaussian splats for improved novel-view synthesis on georeferenced outdoor scenes.

Generative 3D Gaussians with Learned Density Control

cs.GR · 2026-05-08 · unverdicted · novelty 6.0

DeG models 3D Gaussians via learned octree density and uses VecSeq Sobol re-indexing to turn set generation into sequence modeling, claiming SOTA quality in single-image-to-3D.

FluSplat: Sparse-View 3D Editing without Test-Time Optimization

cs.CV · 2026-04-21 · unverdicted · novelty 6.0

FluSplat trains a model with geometric alignment constraints on multi-view edits to produce consistent 3D scene edits from sparse views in a single forward pass without test-time optimization.

Geometric Context Transformer for Streaming 3D Reconstruction

cs.CV · 2026-04-15 · unverdicted · novelty 6.0

LingBot-Map is a streaming 3D reconstruction model built on a geometric context transformer that combines anchor context, pose-reference window, and trajectory memory to deliver accurate, drift-resistant results at 20 FPS over sequences longer than 10,000 frames.

Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective

cs.CV · 2026-04-15 · unverdicted · novelty 6.0

The paper proposes a problem-driven taxonomy for feed-forward 3D scene modeling that groups methods by five core challenges: feature enhancement, geometry awareness, model efficiency, augmentation strategies, and temporal-aware modeling.

LiveStre4m: Feed-Forward Live Streaming of Novel Views from Unposed Multi-View Video

cs.CV · 2026-04-08 · unverdicted · novelty 6.0

LiveStre4m delivers real-time novel-view video streaming from unposed multi-view inputs via a multi-view vision transformer, diffusion-transformer interpolation, and a learned camera pose predictor.

DePT3R: Joint Dense Point Tracking and 3D Reconstruction of Dynamic Scenes in a Single Forward Pass

cs.CV · 2025-12-15 · unverdicted · novelty 6.0

DePT3R performs joint dense point tracking and 3D reconstruction of dynamic scenes from multiple unposed images using a single neural network forward pass.

C3G: Learning Compact 3D Representations with 2K Gaussians

cs.CV · 2025-12-03 · unverdicted · novelty 6.0

C3G creates compact 3D Gaussian representations with 2K points by guiding placement via learnable tokens that aggregate multi-view features through attention, yielding better efficiency and performance than dense methods.

Depth Anything 3: Recovering the Visual Space from Any Views

cs.CV · 2025-11-13 · unverdicted · novelty 6.0

DA3 recovers consistent visual geometry from arbitrary views via a vanilla DINO transformer and depth-ray target, setting new SOTA on a visual geometry benchmark while outperforming DA2 on monocular depth.

Densemarks: Learning Canonical Embeddings for Human Heads Images via Point Tracks

cs.CV · 2025-11-04 · unverdicted · novelty 6.0

DenseMarks learns a canonical 3D embedding space for human head images by training a Vision Transformer with contrastive loss on pairwise point tracks from in-the-wild videos, plus landmark and segmentation supervision.

LTGS: Long-Term Gaussian Scene Chronology From Sparse View Updates

cs.CV · 2025-10-10 · unverdicted · novelty 6.0

LTGS uses object template Gaussians as reusable priors that are refined via a pipeline to model long-term scene chronology from sparse-view updates in 3D Gaussian Splatting.

Streaming 4D Visual Geometry Transformer

cs.CV · 2025-07-15 · unverdicted · novelty 6.0

A causal transformer with key-value caching and distillation from a bidirectional VGGT model enables efficient online 4D geometry reconstruction from videos.

Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling

cs.CV · 2025-07-10 · unverdicted · novelty 6.0

Geometry Forcing aligns video diffusion representations with geometric foundation model features via angular cosine and scale regression objectives to improve 3D consistency in generated videos.

The Less You Depend, The More You Learn: Synthesizing Novel Views from Sparse, Unposed Images with Minimal 3D Knowledge

cs.CV · 2025-06-11 · unverdicted · novelty 6.0

Data-centric novel view synthesis models with minimal 3D knowledge and no pose annotations scale better with data volume and outperform traditional bias-driven methods.

citing papers explorer

Showing 29 of 29 citing papers.

Mind the Gap: Geometrically Accurate Generative Reconstruction from Disjoint Views cs.CV · 2026-05-08 · unverdicted · none · ref 5 · internal anchor
GLADOS reconstructs 3D geometry from disjoint views by generating intermediate perspectives, performing robust coarse alignment that tolerates generative inconsistencies, and iteratively expanding context for consistency.
No Pose, No Problem in 4D: Feed-Forward Dynamic Gaussians from Unposed Multi-View Videos cs.CV · 2026-05-21 · unverdicted · none · ref 50 · internal anchor
NoPo4D is the first feed-forward system for dynamic 4D Gaussian splatting from unposed multi-view videos, using velocity decomposition supervised by optical flow and a bidirectional motion encoder.
ConFixGS: Learning to Fix Feedforward 3D Gaussian Splatting with Confidence-Aware Diffusion Priors in Driving Scenes cs.CV · 2026-05-10 · unverdicted · none · ref 11 · internal anchor
ConFixGS repairs feedforward 3D Gaussian Splatting with confidence-aware diffusion priors, delivering up to 3.68 dB PSNR gains and halved FID scores on Waymo, nuScenes, and KITTI novel view synthesis tasks.
SplatWeaver: Learning to Allocate Gaussian Primitives for Generalizable Novel View Synthesis cs.CV · 2026-05-08 · unverdicted · none · ref 25 · 2 links · internal anchor
SplatWeaver uses cardinality Gaussian experts and pixel-level routing to dynamically allocate varying numbers of Gaussian primitives for generalizable novel view synthesis.
Ground4D: Spatially-Grounded Feedforward 4D Reconstruction for Unstructured Off-Road Scenes cs.CV · 2026-05-06 · unverdicted · none · ref 37 · internal anchor
Ground4D resolves temporal conflicts in feedforward 4D Gaussian reconstruction for off-road scenes via voxel-grounded temporal aggregation with intra-voxel softmax and surface normal regularization, outperforming prior methods on ORAD-3D and RELLIS-3D while generalizing zero-shot.
WildSplatter: Feed-forward 3D Gaussian Splatting with Appearance Control from Unconstrained Images cs.CV · 2026-04-23 · unverdicted · none · ref 25 · internal anchor
WildSplatter jointly learns 3D Gaussians and appearance embeddings from unconstrained photo collections to enable fast feed-forward reconstruction and flexible lighting control in 3D Gaussian Splatting.
Free-Range Gaussians: Non-Grid-Aligned Generative 3D Gaussian Reconstruction cs.CV · 2026-04-06 · unverdicted · none · ref 48 · internal anchor
Free-Range Gaussians uses flow matching over Gaussian parameters to predict non-grid-aligned 3D Gaussians from multi-view images, enabling synthesis of plausible content in unobserved regions with fewer primitives than grid-aligned methods.
3AM: 3egment Anything with Geometric Consistency in Videos cs.CV · 2026-01-13 · unverdicted · none · ref 68 · internal anchor
3AM integrates MUSt3R 3D features into SAM2 via a Feature Merger and FOV-aware sampling to deliver geometry-consistent video object segmentation from RGB alone, with large gains on wide-baseline datasets.
MODEST: Multi-Optics Depth-of-Field Stereo Dataset cs.CV · 2025-11-25 · accept · none · ref 35 · internal anchor
MODEST provides the first large-scale high-resolution stereo DSLR dataset with systematic variation of focal length and aperture to support research on real-world optical effects in depth estimation.
VGGT-SLAM: Dense RGB SLAM Optimized on the SL(4) Manifold cs.CV · 2025-05-18 · unverdicted · none · ref 60 · internal anchor
VGGT-SLAM aligns VGGT submaps via SL(4) manifold optimization of 15-DoF homographies to enable consistent dense RGB SLAM on long uncalibrated monocular videos.
Cross-View Splatter: Feed-Forward View Synthesis with Georeferenced Images cs.CV · 2026-05-19 · unverdicted · none · ref 70 · internal anchor
A feed-forward model aligns ground and satellite features to predict Gaussian splats for improved novel-view synthesis on georeferenced outdoor scenes.
Generative 3D Gaussians with Learned Density Control cs.GR · 2026-05-08 · unverdicted · none · ref 38 · internal anchor
DeG models 3D Gaussians via learned octree density and uses VecSeq Sobol re-indexing to turn set generation into sequence modeling, claiming SOTA quality in single-image-to-3D.
FluSplat: Sparse-View 3D Editing without Test-Time Optimization cs.CV · 2026-04-21 · unverdicted · none · ref 42 · internal anchor
FluSplat trains a model with geometric alignment constraints on multi-view edits to produce consistent 3D scene edits from sparse views in a single forward pass without test-time optimization.
Geometric Context Transformer for Streaming 3D Reconstruction cs.CV · 2026-04-15 · unverdicted · none · ref 61 · internal anchor
LingBot-Map is a streaming 3D reconstruction model built on a geometric context transformer that combines anchor context, pose-reference window, and trajectory memory to deliver accurate, drift-resistant results at 20 FPS over sequences longer than 10,000 frames.
Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective cs.CV · 2026-04-15 · unverdicted · none · ref 144 · internal anchor
The paper proposes a problem-driven taxonomy for feed-forward 3D scene modeling that groups methods by five core challenges: feature enhancement, geometry awareness, model efficiency, augmentation strategies, and temporal-aware modeling.
LiveStre4m: Feed-Forward Live Streaming of Novel Views from Unposed Multi-View Video cs.CV · 2026-04-08 · unverdicted · none · ref 24 · internal anchor
LiveStre4m delivers real-time novel-view video streaming from unposed multi-view inputs via a multi-view vision transformer, diffusion-transformer interpolation, and a learned camera pose predictor.
DePT3R: Joint Dense Point Tracking and 3D Reconstruction of Dynamic Scenes in a Single Forward Pass cs.CV · 2025-12-15 · unverdicted · none · ref 25 · internal anchor
DePT3R performs joint dense point tracking and 3D reconstruction of dynamic scenes from multiple unposed images using a single neural network forward pass.
C3G: Learning Compact 3D Representations with 2K Gaussians cs.CV · 2025-12-03 · unverdicted · none · ref 61 · internal anchor
C3G creates compact 3D Gaussian representations with 2K points by guiding placement via learnable tokens that aggregate multi-view features through attention, yielding better efficiency and performance than dense methods.
Depth Anything 3: Recovering the Visual Space from Any Views cs.CV · 2025-11-13 · unverdicted · none · ref 79 · internal anchor
DA3 recovers consistent visual geometry from arbitrary views via a vanilla DINO transformer and depth-ray target, setting new SOTA on a visual geometry benchmark while outperforming DA2 on monocular depth.
Densemarks: Learning Canonical Embeddings for Human Heads Images via Point Tracks cs.CV · 2025-11-04 · unverdicted · none · ref 19 · internal anchor
DenseMarks learns a canonical 3D embedding space for human head images by training a Vision Transformer with contrastive loss on pairwise point tracks from in-the-wild videos, plus landmark and segmentation supervision.
LTGS: Long-Term Gaussian Scene Chronology From Sparse View Updates cs.CV · 2025-10-10 · unverdicted · none · ref 4 · internal anchor
LTGS uses object template Gaussians as reusable priors that are refined via a pipeline to model long-term scene chronology from sparse-view updates in 3D Gaussian Splatting.
Streaming 4D Visual Geometry Transformer cs.CV · 2025-07-15 · unverdicted · none · ref 10 · internal anchor
A causal transformer with key-value caching and distillation from a bidirectional VGGT model enables efficient online 4D geometry reconstruction from videos.
Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling cs.CV · 2025-07-10 · unverdicted · none · ref 63 · internal anchor
Geometry Forcing aligns video diffusion representations with geometric foundation model features via angular cosine and scale regression objectives to improve 3D consistency in generated videos.
The Less You Depend, The More You Learn: Synthesizing Novel Views from Sparse, Unposed Images with Minimal 3D Knowledge cs.CV · 2025-06-11 · unverdicted · none · ref 35 · internal anchor
Data-centric novel view synthesis models with minimal 3D knowledge and no pose annotations scale better with data volume and outperform traditional bias-driven methods.
DVSM: Decoder-only View Synthesis Model Done Right cs.CV · 2026-05-28 · unverdicted · none · ref 66 · internal anchor
Decoder-only view synthesis model using KV-cache representation and weight sharing between reconstruction and rendering networks achieves new SOTA on novel view synthesis benchmarks.
LangFlash: Feed-forward 3D Language Gaussian Splatting from Sparse Unposed Images cs.CV · 2026-05-22 · unverdicted · none · ref 37 · internal anchor
LangFlash introduces a feed-forward model for 3D language Gaussian splatting from sparse unposed images, claiming superior novel view synthesis and semantic consistency via enriched training data and sparse semantic encoding.
ReorgGS: Equivalent Distribution Reorganization for 3D Gaussian Splatting cs.CV · 2026-05-09 · unverdicted · none · ref 32 · internal anchor
ReorgGS reorganizes the Gaussian distribution in converged 3DGS models by resampling centers and covariances to reduce parameterization degeneration and enable better subsequent optimization.
Learning 3D Representations for Spatial Intelligence from Unposed Multi-View Images cs.CV · 2026-04-12 · unverdicted · none · ref 58 · internal anchor
UniSplat learns consistent 3D geometry, appearance, and semantics from unposed images using dual masking, progressive Gaussian splatting, and recalibration to align predictions across tasks.
VGGT-SLAM++ cs.CV · 2026-04-08 · unverdicted · none · ref 66 · internal anchor
VGGT-SLAM++ improves on prior transformer SLAM by adding dense DEM submap graphs and high-cadence local optimization, achieving SOTA accuracy with reduced drift and bounded memory on benchmarks.

Splatt3R: Zero-shot Gaussian Splatting from Uncalibrated Image Pairs

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer