Recognition: no theorem link
Splatt3R: Zero-shot Gaussian Splatting from Uncalibrated Image Pairs
Pith reviewed 2026-05-17 01:11 UTC · model grok-4.3
The pith
Splatt3R turns any uncalibrated stereo image pair into a 3D Gaussian splat without camera parameters or depth.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Given two uncalibrated images, Splatt3R predicts a set of 3D Gaussians whose positions, colors, and other attributes allow both accurate geometry reconstruction and high-quality novel view synthesis. The method first optimizes a geometry loss on the point cloud and later switches to a rendering loss, with masking on extrapolated regions, to achieve this from stereo pairs alone.
What carries the argument
Two-stage training that first optimizes only the 3D point cloud geometry loss before applying the novel view synthesis objective, together with a loss masking strategy for extrapolated viewpoints.
If this is right
- Scenes can be reconstructed and rendered in real time from casual uncalibrated photo pairs.
- 3D modeling becomes possible in environments where acquiring camera poses or depth is impractical.
- The approach supports generalization from controlled training data to diverse natural images.
- Reconstruction runs at 4 frames per second for 512x512 images with real-time splat rendering afterward.
Where Pith is reading between the lines
- Extending the method to sequences of more than two images could improve consistency and reduce artifacts in complex scenes.
- Integration with mobile devices might enable instant 3D capture from everyday photography without special equipment.
- Similar staged training could benefit other 3D representation learning tasks that suffer from local minima in direct optimization.
Load-bearing premise
The assumption that beginning with geometry optimization and then moving to appearance optimization, plus the masking, reliably sidesteps the local minima encountered in direct Gaussian splat training from stereo pairs.
What would settle it
Running Splatt3R on a dataset of uncalibrated image pairs with ground-truth novel views from significantly different angles and measuring whether the rendered images match the ground truth within a small error margin.
read the original abstract
In this paper, we introduce Splatt3R, a pose-free, feed-forward method for in-the-wild 3D reconstruction and novel view synthesis from stereo pairs. Given uncalibrated natural images, Splatt3R can predict 3D Gaussian Splats without requiring any camera parameters or depth information. For generalizability, we build Splatt3R upon a ``foundation'' 3D geometry reconstruction method, MASt3R, by extending it to deal with both 3D structure and appearance. Specifically, unlike the original MASt3R which reconstructs only 3D point clouds, we predict the additional Gaussian attributes required to construct a Gaussian primitive for each point. Hence, unlike other novel view synthesis methods, Splatt3R is first trained by optimizing the 3D point cloud's geometry loss, and then a novel view synthesis objective. By doing this, we avoid the local minima present in training 3D Gaussian Splats from stereo views. We also propose a novel loss masking strategy that we empirically find is critical for strong performance on extrapolated viewpoints. We train Splatt3R on the ScanNet++ dataset and demonstrate excellent generalisation to uncalibrated, in-the-wild images. Splatt3R can reconstruct scenes at 4FPS at 512 x 512 resolution, and the resultant splats can be rendered in real-time.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Splatt3R, a feed-forward, pose-free method that predicts 3D Gaussian splats directly from uncalibrated stereo image pairs. It extends the MASt3R foundation model to output per-point Gaussian attributes (scales, rotations, opacities, and appearance coefficients) in addition to 3D points. Training follows a two-stage schedule: first optimizing a 3D point-cloud geometry loss inherited from MASt3R, then switching to a novel-view-synthesis objective, together with a proposed loss-masking strategy. The model is trained on ScanNet++ and claims strong generalization to in-the-wild images, with reconstruction at 4 FPS (512×512) and real-time splat rendering.
Significance. If the results hold, the work would be significant for enabling calibration-free, feed-forward Gaussian splatting and novel-view synthesis from casual stereo pairs. Leveraging a foundation model plus staged training could simplify pipelines that currently rely on SfM or multi-view optimization, while the reported speed supports practical deployment. Explicit credit is due for the reproducible integration with MASt3R and the focus on in-the-wild generalization.
major comments (2)
- [§3.2] §3.2 (Training Procedure): The central claim that the two-stage schedule (geometry loss followed by NVS objective) plus loss masking reliably avoids the local minima that arise in direct Gaussian-splat optimization from stereo views is load-bearing for the method. No ablation is reported that compares staged training against joint optimization of all Gaussian attributes or against MASt3R features alone; quantitative metrics (PSNR/SSIM/LPIPS on extrapolated views) for these variants are required to substantiate the claim.
- [§4.3] §4.3 (Ablations and Masking): The loss-masking strategy is described as 'empirically critical' for performance on extrapolated viewpoints, yet the manuscript provides no isolated quantitative ablation (e.g., with/without masking on the same backbone) or details on mask computation. This omission weakens the ability to attribute gains specifically to the proposed masking rather than to the base MASt3R geometry.
minor comments (2)
- [Abstract] The abstract states 'excellent generalisation' without citing any numerical metrics or baseline comparisons; adding a brief quantitative statement would improve clarity.
- [§3.1] Notation for the additional Gaussian attributes (scale, rotation, opacity, appearance) should be introduced once in §3.1 and used consistently thereafter to avoid ambiguity.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and for recognizing the potential significance of Splatt3R for calibration-free Gaussian splatting. We respond point-by-point to the major comments below.
read point-by-point responses
-
Referee: [§3.2] §3.2 (Training Procedure): The central claim that the two-stage schedule (geometry loss followed by NVS objective) plus loss masking reliably avoids the local minima that arise in direct Gaussian-splat optimization from stereo views is load-bearing for the method. No ablation is reported that compares staged training against joint optimization of all Gaussian attributes or against MASt3R features alone; quantitative metrics (PSNR/SSIM/LPIPS on extrapolated views) for these variants are required to substantiate the claim.
Authors: We agree that an explicit ablation is needed to substantiate the benefit of the two-stage schedule. In the revised manuscript we will add a quantitative comparison on the same evaluation protocol, reporting PSNR, SSIM and LPIPS on extrapolated views for three variants: (1) joint optimization of all Gaussian attributes from the first epoch, (2) MASt3R geometry features without the novel-view-synthesis stage, and (3) the proposed two-stage procedure. We expect the staged approach to show clear gains because early geometry stabilization prevents the optimizer from settling into poor local minima once appearance attributes are introduced. revision: yes
-
Referee: [§4.3] §4.3 (Ablations and Masking): The loss-masking strategy is described as 'empirically critical' for performance on extrapolated viewpoints, yet the manuscript provides no isolated quantitative ablation (e.g., with/without masking on the same backbone) or details on mask computation. This omission weakens the ability to attribute gains specifically to the proposed masking rather than to the base MASt3R geometry.
Authors: We acknowledge the omission. In the revision we will expand the description of the loss-masking strategy with the exact computation (thresholding on MASt3R per-point confidence combined with cross-view geometric consistency checks) and add an isolated ablation that trains the identical backbone with and without masking. The table will report PSNR/SSIM/LPIPS on extrapolated views so that the contribution of masking can be isolated from the base MASt3R geometry. revision: yes
Circularity Check
No significant circularity: extension of external MASt3R with empirical staged training
full rationale
The derivation relies on an external foundation model (MASt3R) for base geometry reconstruction and a public dataset (ScanNet++). The core extension—predicting per-point Gaussian attributes (scales, rotations, opacities, appearance) and applying a two-stage optimization (first 3D point cloud geometry loss, then novel-view synthesis objective) plus loss masking—is presented as an empirical design choice to avoid local minima, not as a mathematical reduction or self-definitional fit. No equations equate outputs to inputs by construction, no uniqueness theorem is imported from self-citations, and no ansatz is smuggled via prior author work. Generalization claims rest on reported experiments rather than forced predictions. This is a standard non-circular extension of prior independent work.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption MASt3R produces sufficiently accurate 3D point clouds from uncalibrated pairs to serve as a stable base for subsequent Gaussian attribute prediction.
- ad hoc to paper Optimizing geometry loss first followed by novel-view-synthesis loss avoids the local minima that arise when training Gaussian splats directly from stereo views.
Forward citations
Cited by 19 Pith papers
-
Mind the Gap: Geometrically Accurate Generative Reconstruction from Disjoint Views
GLADOS reconstructs 3D geometry from disjoint views by generating intermediate perspectives, performing robust coarse alignment that tolerates generative inconsistencies, and iteratively expanding context for consistency.
-
ConFixGS: Learning to Fix Feedforward 3D Gaussian Splatting with Confidence-Aware Diffusion Priors in Driving Scenes
ConFixGS repairs feedforward 3D Gaussian Splatting with confidence-aware diffusion priors, delivering up to 3.68 dB PSNR gains and halved FID scores on Waymo, nuScenes, and KITTI novel view synthesis tasks.
-
SplatWeaver: Learning to Allocate Gaussian Primitives for Generalizable Novel View Synthesis
SplatWeaver dynamically allocates Gaussian primitives via cardinality experts and pixel-level routing guided by high-frequency cues for improved generalizable novel view synthesis.
-
Ground4D: Spatially-Grounded Feedforward 4D Reconstruction for Unstructured Off-Road Scenes
Ground4D resolves temporal conflicts in feedforward 4D Gaussian reconstruction for off-road scenes via voxel-grounded temporal aggregation with intra-voxel softmax and surface normal regularization, outperforming prio...
-
WildSplatter: Feed-forward 3D Gaussian Splatting with Appearance Control from Unconstrained Images
WildSplatter jointly learns 3D Gaussians and appearance embeddings from unconstrained photo collections to enable fast feed-forward reconstruction and flexible lighting control in 3D Gaussian Splatting.
-
Free-Range Gaussians: Non-Grid-Aligned Generative 3D Gaussian Reconstruction
Free-Range Gaussians uses flow matching over Gaussian parameters to predict non-grid-aligned 3D Gaussians from multi-view images, enabling synthesis of plausible content in unobserved regions with fewer primitives tha...
-
3AM: 3egment Anything with Geometric Consistency in Videos
3AM integrates MUSt3R 3D features into SAM2 via a Feature Merger and FOV-aware sampling to deliver geometry-consistent video object segmentation from RGB alone, with large gains on wide-baseline datasets.
-
MODEST: Multi-Optics Depth-of-Field Stereo Dataset
MODEST provides the first large-scale high-resolution stereo DSLR dataset with systematic variation of focal length and aperture to support research on real-world optical effects in depth estimation.
-
FluSplat: Sparse-View 3D Editing without Test-Time Optimization
FluSplat trains a model with geometric alignment constraints on multi-view edits to produce consistent 3D scene edits from sparse views in a single forward pass without test-time optimization.
-
Geometric Context Transformer for Streaming 3D Reconstruction
LingBot-Map is a streaming 3D reconstruction model built on a geometric context transformer that combines anchor context, pose-reference window, and trajectory memory to deliver accurate, drift-resistant results at 20...
-
Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective
The paper proposes a problem-driven taxonomy for feed-forward 3D scene modeling that groups methods by five core challenges: feature enhancement, geometry awareness, model efficiency, augmentation strategies, and temp...
-
LiveStre4m: Feed-Forward Live Streaming of Novel Views from Unposed Multi-View Video
LiveStre4m delivers real-time novel-view video streaming from unposed multi-view inputs via a multi-view vision transformer, diffusion-transformer interpolation, and a learned camera pose predictor.
-
DePT3R: Joint Dense Point Tracking and 3D Reconstruction of Dynamic Scenes in a Single Forward Pass
DePT3R performs joint dense point tracking and 3D reconstruction of dynamic scenes from multiple unposed images using a single neural network forward pass.
-
C3G: Learning Compact 3D Representations with 2K Gaussians
C3G creates compact 3D Gaussian representations with 2K points by guiding placement via learnable tokens that aggregate multi-view features through attention, yielding better efficiency and performance than dense methods.
-
Depth Anything 3: Recovering the Visual Space from Any Views
DA3 recovers consistent visual geometry from arbitrary views via a vanilla DINO transformer and depth-ray target, setting new SOTA on a visual geometry benchmark while outperforming DA2 on monocular depth.
-
Streaming 4D Visual Geometry Transformer
A causal transformer with key-value caching and distillation from a bidirectional VGGT model enables efficient online 4D geometry reconstruction from videos.
-
ReorgGS: Equivalent Distribution Reorganization for 3D Gaussian Splatting
ReorgGS reorganizes the Gaussian distribution in converged 3DGS models by resampling centers and covariances to reduce parameterization degeneration and enable better subsequent optimization.
-
Learning 3D Representations for Spatial Intelligence from Unposed Multi-View Images
UniSplat learns consistent 3D geometry, appearance, and semantics from unposed images using dual masking, progressive Gaussian splatting, and recalibration to align predictions across tasks.
-
VGGT-SLAM++
VGGT-SLAM++ improves on prior transformer SLAM by adding dense DEM submap graphs and high-cadence local optimization, achieving SOTA accuracy with reduced drift and bounded memory on benchmarks.
Reference graph
Works this paper leans on
-
[1]
The plenoptic func- tion and the elements of early vision
Edward H Adelson and James R Bergen. The plenoptic func- tion and the elements of early vision. MIT Press, 1991. 2
work page 1991
-
[2]
Stephen T Barnard and Martin A Fischler. Computational stereo. ACM Computing Surveys (CSUR), 1982. 3
work page 1982
-
[3]
Mip-nerf 360: Unbounded anti-aliased neural radiance fields
Jonathan T Barron, Ben Mildenhall, Dor Verbin, Pratul P Srinivasan, and Peter Hedman. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. In CVPR, 2022. 2
work page 2022
-
[4]
Porf: Pose residual field for accurate neural sur- face reconstruction
Jia-Wang Bian, Wenjing Bian, Victor Adrian Prisacariu, and Philip Torr. Porf: Pose residual field for accurate neural sur- face reconstruction. In ICLR, 2023. 3
work page 2023
-
[5]
Nope-nerf: Optimising neural ra- diance field with no pose prior
Wenjing Bian, Zirui Wang, Kejie Li, Jia-Wang Bian, and Victor Adrian Prisacariu. Nope-nerf: Optimising neural ra- diance field with no pose prior. In CVPR, 2023. 3
work page 2023
-
[6]
Pyramid stereo matching network
Jia-Ren Chang and Yong-Sheng Chen. Pyramid stereo matching network. In CVPR, 2018. 3
work page 2018
-
[7]
pixelsplat: 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction
David Charatan, Sizhe Lester Li, Andrea Tagliasacchi, and Vincent Sitzmann. pixelsplat: 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction. In CVPR,
-
[8]
Mvsnerf: Fast general- izable radiance field reconstruction from multi-view stereo
Anpei Chen, Zexiang Xu, Fuqiang Zhao, Xiaoshuai Zhang, Fanbo Xiang, Jingyi Yu, and Hao Su. Mvsnerf: Fast general- izable radiance field reconstruction from multi-view stereo. In ICCV, 2021. 2
work page 2021
-
[9]
Dbarf: Deep bundle-adjusting generalizable neural radiance fields
Yu Chen and Gim Hee Lee. Dbarf: Deep bundle-adjusting generalizable neural radiance fields. In CVPR, 2023. 3
work page 2023
-
[10]
Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images
Yuedong Chen, Haofei Xu, Chuanxia Zheng, Bohan Zhuang, Marc Pollefeys, Andreas Geiger, Tat-Jen Cham, and Jianfei Cai. Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images. ECCV, 2024. 2, 3, 4, 5, 6
work page 2024
-
[11]
Cheng Chi, Qingjie Wang, Tianyu Hao, Peng Guo, and Xin Yang. Feature-level collaboration: Joint unsupervised learn- ing of optical flow, stereo depth and camera motion. In CVPR, 2021. 3
work page 2021
-
[12]
Stereo radiance fields (srf): Learning view syn- thesis for sparse views of novel scenes
Julian Chibane, Aayush Bansal, Verica Lazova, and Gerard Pons-Moll. Stereo radiance fields (srf): Learning view syn- thesis for sparse views of novel scenes. In CVPR, 2021. 2
work page 2021
-
[13]
Photometric bundle adjustment for dense multi-view 3d modeling
Ama ¨el Delaunoy and Marc Pollefeys. Photometric bundle adjustment for dense multi-view 3d modeling. In CVPR,
-
[14]
Superpoint: Self-supervised interest point detection and description
Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabi- novich. Superpoint: Self-supervised interest point detection and description. In CVPRW, 2018. 3
work page 2018
-
[15]
An image is worth 16x16 words: Transformers for image recognition at scale
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, Jakob Uszkorei, and Neil Houlsy. An image is worth 16x16 words: Transformers for image recognition at scale. ICLR, 2021. 5
work page 2021
-
[16]
Learning to render novel views from wide-baseline stereo pairs
Yilun Du, Cameron Smith, Ayush Tewari, and Vincent Sitz- mann. Learning to render novel views from wide-baseline stereo pairs. In CVPR, 2023. 2, 3, 5
work page 2023
-
[17]
Unsupervised cnn for single view depth estimation: Geome- try to the rescue
Ravi Garg, Vijay Kumar Bg, Gustavo Carneiro, and Ian Reid. Unsupervised cnn for single view depth estimation: Geome- try to the rescue. In ECCV, 2016. 3
work page 2016
-
[18]
Unsupervised monocular depth estimation with left- right consistency
Cl ´ement Godard, Oisin Mac Aodha, and Gabriel J Bros- tow. Unsupervised monocular depth estimation with left- right consistency. In CVPR, 2017. 3
work page 2017
-
[19]
Steven J Gortler, Radek Grzeszczuk, Richard Szeliski, and Michael F Cohen. The lumigraph. In Computer Graphics and Interactive Techniques, 1996. 2
work page 1996
-
[20]
A combined corner and edge detector
Chris Harris, Mike Stephens, et al. A combined corner and edge detector. In Alvey Vision Conference, 1988. 3
work page 1988
-
[21]
L/sub/spl in- fin//minimization in geometric reconstruction problems
Richard Hartley and Frederik Schaffalitzky. L/sub/spl in- fin//minimization in geometric reconstruction problems. In CVPR, 2004. 3
work page 2004
-
[22]
Richard I Hartley and Peter Sturm. Triangulation. Computer Vision and Image Understanding, 1997
work page 1997
-
[23]
Stereo from uncalibrated cameras
Richard I Hartley, Rajiv Gupta, and Tom Chang. Stereo from uncalibrated cameras. In CVPR, 1992. 3
work page 1992
-
[24]
Occlusions, discontinu- ities, and epipolar lines in stereo
Hiroshi Ishikawa and Davi Geiger. Occlusions, discontinu- ities, and epipolar lines in stereo. In ECCV, 1998. 3
work page 1998
-
[25]
Self-calibrating neural radiance fields
Yoonwoo Jeong, Seokjun Ahn, Christopher Choy, Anima Anandkumar, Minsu Cho, and Jaesik Park. Self-calibrating neural radiance fields. In ICCV, 2021. 3
work page 2021
-
[26]
Leap: Liberate sparse-view 3d modeling from camera poses
Hanwen Jiang, Zhenyu Jiang, Yue Zhao, and Qixing Huang. Leap: Liberate sparse-view 3d modeling from camera poses. In ICLR, 2023. 3
work page 2023
-
[27]
Geonerf: Generalizing nerf with geometry priors
Mohammad Mahdi Johari, Yann Lepoittevin, and Franc ¸ois Fleuret. Geonerf: Generalizing nerf with geometry priors. In CVPR, 2022. 2
work page 2022
-
[28]
A stereo machine for video-rate dense depth mapping and its new applications
Takeo Kanade, Atsushi Yoshida, Kazuo Oda, Hiroshi Kano, and Masaya Tanaka. A stereo machine for video-rate dense depth mapping and its new applications. In CVPR, 1996. 3
work page 1996
-
[29]
3d gaussian splatting for real-time radiance field rendering
Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering. ToG, 2023. 1, 2, 3, 4
work page 2023
-
[30]
Generalizable novel-view synthesis using a stereo camera
Haechan Lee, Wonjoon Jin, Seung-Hwan Baek, and Sunghyun Cho. Generalizable novel-view synthesis using a stereo camera. CVPR, 2024. 3
work page 2024
-
[31]
Grounding image matching in 3d with mast3r,
Vincent Leroy, Yohann Cabon, and J´erˆome Revaud. Ground- ing image matching in 3d with mast3r. arXiv preprint arXiv:2406.09756, 2024. 2, 3, 4, 6
-
[32]
Marc Levoy and Pat Hanrahan. Light field rendering. In SIGGRAPH, 1996. 2
work page 1996
-
[33]
Ggrt: Towards generalizable 3d gaussians without pose priors in real-time
Hao Li, Yuanyuan Gao, Dingwen Zhang, Chenming Wu, Yalun Dai, Chen Zhao, Haocheng Feng, Errui Ding, Jing- dong Wang, and Junwei Han. Ggrt: Towards generalizable 3d gaussians without pose priors in real-time. ECCV, 2024. 2, 3
work page 2024
-
[34]
Taming uncertainty in sparse-view generalizable nerf via indirect diffusion guid- ance
Yaokun Li, Chao Gou, and Guang Tan. Taming uncertainty in sparse-view generalizable nerf via indirect diffusion guid- ance. arXiv preprint arXiv:2402.01217, 2024. 3
-
[35]
Barf: Bundle-adjusting neural radiance fields
Chen-Hsuan Lin, Wei-Chiu Ma, Antonio Torralba, and Si- mon Lucey. Barf: Bundle-adjusting neural radiance fields. In ICCV, 2021. 3
work page 2021
-
[36]
Fast generalizable gaussian splatting reconstruction from multi-view stereo
Tianqi Liu, Guangcong Wang, Shoukang Hu, Liao Shen, Xinyi Ye, Yuhang Zang, Zhiguo Cao, Wei Li, and Ziwei Liu. Fast generalizable gaussian splatting reconstruction from multi-view stereo. ECCV, 2024. 3
work page 2024
-
[37]
Neural rays for occlusion-aware image-based render- ing
Yuan Liu, Sida Peng, Lingjie Liu, Qianqian Wang, Peng Wang, Christian Theobalt, Xiaowei Zhou, and Wenping 9 Wang. Neural rays for occlusion-aware image-based render- ing. In CVPR, 2022. 3
work page 2022
-
[38]
Sparseneus: Fast generalizable neural sur- face reconstruction from sparse views
Xiaoxiao Long, Cheng Lin, Peng Wang, Taku Komura, and Wenping Wang. Sparseneus: Fast generalizable neural sur- face reconstruction from sparse views. In ECCV, 2022. 3
work page 2022
-
[39]
Distinctive image features from scale- invariant keypoints
David G Lowe. Distinctive image features from scale- invariant keypoints. IJCV, 2004. 3
work page 2004
-
[40]
The fundamen- tal matrix: Theory, algorithms, and stability analysis
Quan-Tuan Luong and Olivier D Faugeras. The fundamen- tal matrix: Theory, algorithms, and stability analysis. IJCV,
-
[41]
Srinivasan, Matthew Tancik, Jonathan T
Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis. In ECCV, 2020. 1, 2
work page 2020
-
[42]
Instant neural graphics primitives with a multires- olution hash encoding
Thomas M ¨uller, Alex Evans, Christoph Schied, and Alexan- der Keller. Instant neural graphics primitives with a multires- olution hash encoding. In SIGGRAPH, 2022. 2
work page 2022
-
[43]
Colnerf: Collaboration for generaliz- able sparse input neural radiance field
Zhangkai Ni, Peiqi Yang, Wenhan Yang, Hanli Wang, Lin Ma, and Sam Kwong. Colnerf: Collaboration for generaliz- able sparse input neural radiance field. In AAAI, 2024. 3
work page 2024
-
[44]
Deep fundamental matrix estimation
Ren ´e Ranftl and Vladlen Koltun. Deep fundamental matrix estimation. In ECCV, 2018. 3
work page 2018
-
[45]
Vi- sion transformers for dense prediction
Ren ´e Ranftl, Alexey Bochkovskiy, and Vladlen Koltun. Vi- sion transformers for dense prediction. In ICCV, 2021. 5
work page 2021
-
[46]
Com- mon objects in 3d: Large-scale learning and evaluation of real-life 3d category reconstruction
Jeremy Reizenstein, Roman Shapovalov, Philipp Henzler, Luca Sbordone, Patrick Labatut, and David Novotny. Com- mon objects in 3d: Large-scale learning and evaluation of real-life 3d category reconstruction. In ICCV, 2021. 3
work page 2021
-
[47]
Scene representation networks: Continuous 3d- structure-aware neural scene representations
Vincent Sitzmann, Michael Zollh ¨ofer, and Gordon Wet- zstein. Scene representation networks: Continuous 3d- structure-aware neural scene representations. NeurIPS,
-
[48]
Light field networks: Neu- ral scene representations with single-evaluation rendering
Vincent Sitzmann, Semon Rezchikov, Bill Freeman, Josh Tenenbaum, and Fredo Durand. Light field networks: Neu- ral scene representations with single-evaluation rendering. NeurIPS, 2021. 1
work page 2021
-
[49]
Flowcam: Training generalizable 3d radiance fields without camera poses via pixel-aligned scene flow
Cameron Smith, Yilun Du, Ayush Tewari, and Vincent Sitz- mann. Flowcam: Training generalizable 3d radiance fields without camera poses via pixel-aligned scene flow. In NeurIPS, 2023. 3
work page 2023
-
[50]
Generalizable patch-based neural render- ing
Mohammed Suhail, Carlos Esteves, Leonid Sigal, and Ameesh Makadia. Generalizable patch-based neural render- ing. In ECCV, 2022. 3
work page 2022
-
[51]
Flash3d: Feed-forward gener- alisable 3d scene reconstruction from a single image
Stanislaw Szymanowicz, Eldar Insafutdinov, Chuanxia Zheng, Dylan Campbell, Jo ˜ao F Henriques, Christian Rup- precht, and Andrea Vedaldi. Flash3d: Feed-forward gener- alisable 3d scene reconstruction from a single image. arXiv preprint arXiv:2406.04343, 2024. 2, 4, 6
-
[52]
Splatter image: Ultra-fast single-view 3d recon- struction
Stanislaw Szymanowicz, Chrisitian Rupprecht, and Andrea Vedaldi. Splatter image: Ultra-fast single-view 3d recon- struction. In CVPR, 2024. 2, 3, 4, 5
work page 2024
-
[53]
Miroslav Trajkovi ´c and Mark Hedley. Fast corner detection. Image and Vision Computing, 1998. 3
work page 1998
-
[54]
Sparf: Neural radiance fields from sparse and noisy poses
Prune Truong, Marie-Julie Rakotosaona, Fabian Manhardt, and Federico Tombari. Sparf: Neural radiance fields from sparse and noisy poses. In CVPR, 2023. 3
work page 2023
-
[55]
Deep two-view structure-from-motion revisited
Jianyuan Wang, Yiran Zhong, Yuchao Dai, Stan Birchfield, Kaihao Zhang, Nikolai Smolyanskiy, and Hongdong Li. Deep two-view structure-from-motion revisited. In CVPR,
-
[56]
Barron, Ricardo Martin- Brualla, Noah Snavely, and Thomas Funkhouser
Qianqian Wang, Zhicheng Wang, Kyle Genova, Pratul Srini- vasan, Howard Zhou, Jonathan T. Barron, Ricardo Martin- Brualla, Noah Snavely, and Thomas Funkhouser. Ibrnet: Learning multi-view image-based rendering. InCVPR, 2021. 2, 3
work page 2021
-
[57]
Dust3r: Geometric 3d vi- sion made easy
Shuzhe Wang, Vincent Leroy, Yohan Cabon, Boris Chidlovskii, and Jerome Revaud. Dust3r: Geometric 3d vi- sion made easy. In CVPR, 2024. 3, 4
work page 2024
-
[58]
Nerf–: Neural radiance fields without known camera parameters
Zirui Wang, Shangzhe Wu, Weidi Xie, Min Chen, and Victor Adrian Prisacariu. Nerf–: Neural radiance fields without known camera parameters. arXiv preprint arXiv:2102.07064, 2021. 3
-
[59]
latentsplat: Autoencoding variational gaussians for fast generalizable 3d reconstruction
Christopher Wewer, Kevin Raj, Eddy Ilg, Bernt Schiele, and Jan Eric Lenssen. latentsplat: Autoencoding variational gaussians for fast generalizable 3d reconstruction. ECCV,
-
[60]
Large scale photo- metric bundle adjustment
Oliver J Woodford and Edward Rosten. Large scale photo- metric bundle adjustment. In BMVC, 2020. 3
work page 2020
-
[61]
Scannet++: A high-fidelity dataset of 3d indoor scenes
Chandan Yeshwanth, Yueh-Cheng Liu, Matthias Nießner, and Angela Dai. Scannet++: A high-fidelity dataset of 3d indoor scenes. In ICCV, 2023. 6
work page 2023
-
[62]
Geonet: Unsupervised learn- ing of dense depth, optical flow and camera pose
Zhichao Yin and Jianping Shi. Geonet: Unsupervised learn- ing of dense depth, optical flow and camera pose. In CVPR,
-
[63]
pixelnerf: Neural radiance fields from one or few images
Alex Yu, Vickie Ye, Matthew Tancik, and Angjoo Kanazawa. pixelnerf: Neural radiance fields from one or few images. In CVPR, 2021. 2, 3
work page 2021
-
[64]
Computing the stereo match- ing cost with a convolutional neural network
Jure Zbontar and Yann LeCun. Computing the stereo match- ing cost with a convolutional neural network. In CVPR,
-
[65]
Huangying Zhan, Ravi Garg, Chamara Saroj Weerasekera, Kejie Li, Harsh Agarwal, and Ian Reid. Unsupervised learn- ing of monocular depth estimation and visual odometry with deep feature reconstruction. In CVPR, 2018. 3
work page 2018
-
[66]
Ga-net: Guided aggregation net for end- to-end stereo matching
Feihu Zhang, Victor Prisacariu, Ruigang Yang, and Philip HS Torr. Ga-net: Guided aggregation net for end- to-end stereo matching. In CVPR, 2019. 3
work page 2019
-
[67]
Zhengyou Zhang, Rachid Deriche, Olivier Faugeras, and Quang-Tuan Luong. A robust technique for matching two uncalibrated images through the recovery of the unknown epipolar geometry. Artificial Intelligence, 1995. 3
work page 1995
-
[68]
Shunyuan Zheng, Boyao Zhou, Ruizhi Shao, Boning Liu, Shengping Zhang, Liqiang Nie, and Yebin Liu. Gps- gaussian: Generalizable pixel-wise 3d gaussian splatting for real-time human novel view synthesis. In CVPR, 2024. 2, 3
work page 2024
-
[69]
Stereo magnification: Learning view syn- thesis using multiplane images
Tinghui Zhou, Richard Tucker, John Flynn, Graham Fyffe, and Noah Snavely. Stereo magnification: Learning view syn- thesis using multiplane images. In SIGGRAPH, 2018. 3 10
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.