arxiv: 2604.27422 · v1 · submitted 2026-04-30 · 💻 cs.CV

Recognition: unknown

Sparse-View 3D Gaussian Splatting in the Wild

Jordan A. James, Minjae Lee, Myeongseok Nam, Sang-Hyun Lee, Soomok Lee, William J. Beksi, Wongi Park

Authors on Pith no claims yet

Pith reviewed 2026-05-07 08:55 UTC · model grok-4.3

classification 💻 cs.CV

keywords 3D Gaussian splattingsparse view synthesisnovel view synthesisdiffusion modelunconstrained imagestransient distractorsview refinementGaussian replication

0 comments

The pith

A new method for 3D Gaussian splatting enables high-quality novel view synthesis from sparse unconstrained images containing distractors.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops a framework to synthesize novel views in 3D from a small number of real-world photos that include unwanted transient elements like moving objects. It refines rendered images using a diffusion model conditioned on a reference view and a mask of transients to clean up artifacts. To handle areas with few Gaussians due to sparsity, it creates pseudo views and replicates Gaussians in those regions in a sparsity-aware way. If successful, this would allow accurate 3D reconstructions from casual, sparse photo collections without needing controlled dense captures, making the technology more practical for everyday use.

Core claim

The authors claim that by applying reference-guided view refinement with a diffusion model that incorporates a transient mask and a reference image, and by using pseudo-view generation together with sparsity-aware Gaussian replication, their approach can effectively process sparse unconstrained image collections and deliver superior 3D rendering results compared to previous techniques.

What carries the argument

Reference-guided diffusion refinement using transient masks and reference images for artifact mitigation, combined with pseudo-view generation and sparsity-aware replication to densify the Gaussian field.

If this is right

Outperforms prior methods with improvements of 17.2% in PSNR, 10.8% in SSIM, and 4.0% in LPIPS on public datasets.
Achieves high-fidelity 3D rendering in unconstrained scenarios with distractors.
Reduces reliance on labor-intensive dense image acquisition for real-world 3D modeling.
Handles sparse image sets more effectively than existing Gaussian splatting approaches.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Applying this to video sequences could allow 3D reconstruction of scenes with moving people or vehicles by treating motion as transients.
The replication strategy might be adapted to other 3D representation methods like neural radiance fields for similar sparsity issues.
Future work could test the method on even sparser inputs, such as 3-5 images, to determine the minimum viable number for good results.

Load-bearing premise

The method assumes that reliable transient masks and reference images can be obtained or generated to guide the diffusion model without introducing additional artifacts, and that pseudo-view generation accurately identifies and fills sparse Gaussian regions without distorting the underlying geometry.

What would settle it

If experiments on a dataset with known ground-truth geometry show that incorrect transient masks lead to lower performance metrics or visible distortions compared to baselines, that would indicate the claim does not hold.

Figures

Figures reproduced from arXiv: 2604.27422 by Jordan A. James, Minjae Lee, Myeongseok Nam, Sang-Hyun Lee, Soomok Lee, William J. Beksi, Wongi Park.

**Figure 1.** Figure 1: Given a sparse set of images, our approach effectively renders 3D novel view synthesis in view at source ↗

**Figure 2.** Figure 2: We observe that (a) distractors are ephemeral, appearing across diverse regions, and (b) using view at source ↗

**Figure 3.** Figure 3: An overview of SaveWildGS. We refine rendered images utilizing a reference view and a view at source ↗

**Figure 4.** Figure 4: An overview of sparsity-aware Gaussian replication. (a) Existing work densifies the Gaussians view at source ↗

**Figure 5.** Figure 5: Qualitative results on the NeRF On-the-go dataset. view at source ↗

**Figure 6.** Figure 6: Examples of transient masks in rendered images on the NeRF On-the-go dataset view at source ↗

**Figure 8.** Figure 8: An analysis of 3D Gaussian primitives. (a) The structure from motion (SfM) points via view at source ↗

**Figure 9.** Figure 9: A comparison of w/ and w/o a training scheme on the Fountain and Patio scene. Rank (R) PSNR ↑SSIM ↑LPIPS ↓ w/o LoRA [12] 23.08 0.579 0.317 R = 4 24.32 0.605 0.293 R = 8 25.21 0.605 0.287 R = 16 25.37 0.601 0.294 view at source ↗

**Figure 10.** Figure 10: Overlapped camera perspectives represent a potential limitation. Limitations. Although SaveWildGS yields state-of-the-art results on 3D novel view synthesis for sparse sets of images, there are two limitations: (i) it can struggle to handle an object that may be static in the whole sequence (e.g., parked vehicles, standing pedestrians, etc.), which makes it hard to identify transient elements from the ba… view at source ↗

**Figure 11.** Figure 11: Qualitative results on the Photo Tourism dataset. view at source ↗

**Figure 12.** Figure 12: Qualitative results on the LLFF dataset. view at source ↗

**Figure 13.** Figure 13: An ablation of robustness on text descriptions. view at source ↗

**Figure 14.** Figure 14: Qualitative results showing the refined images from the corrupted images. view at source ↗

read the original abstract

We propose a 3D novel sparse-view synthesis framework for unconstrained real-world scenarios that contain distractors. Unlike existing methods that primarily perform novel-view synthesis from a sparse set of constrained images without transient elements or leverage unconstrained dense image collections to enhance 3D representation in real-world scenarios, our method not only effectively tackles sparse unconstrained image collections, but also shows high-quality 3D rendering results. To do this, we introduce reference-guided view refinement with a diffusion model using a transient mask and a reference image to enhance the 3D representation and mitigate artifacts in rendered views. Furthermore, we address sparse regions in the Gaussian field via pseudo-view generation along with a sparsity-aware Gaussian replication strategy to amplify Gaussians in the sparse regions. Extensive experiments on publicly available datasets demonstrate that our methodology consistently outperforms existing methods (e.g., PSNR - 17.2%, SSIM - 10.8%, LPIPS - 4.0%) and provides high-fidelity 3D rendering results. This advancement paves the way for realizing unconstrained real-world scenarios without labor-intensive data acquisition. Our project page is available at $\href{https://robotic-vision-lab.github.io/SaveWildGS/}{here}$

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper pairs diffusion-based view refinement via transient masks with sparsity-aware Gaussian replication to handle distractors in sparse unconstrained 3DGS, but the abstract supplies no experiment details to support the reported gains.

read the letter

This paper targets sparse-view 3D Gaussian Splatting on real-world photo sets that include transients and distractors. It adds reference-guided diffusion refinement to clean up rendered views and a pseudo-view generation step plus Gaussian replication to fill sparse regions in the field. That combination is the concrete new element beyond standard 3DGS pipelines or dense-collection methods. The framing is useful: most prior work either assumes clean sparse inputs or relies on dense unconstrained captures, so a method that tries to operate on limited, noisy phone photos addresses a practical gap for robotics and AR. The abstract reports clear metric lifts over baselines, which at least shows the authors ran comparisons on public data. The pipeline itself is straightforward to understand from the description. The main weakness is that the abstract gives no information on how the transient masks are produced, what the reference images are, which baselines were used, or any error analysis. Without those details it is impossible to tell whether the PSNR, SSIM, and LPIPS numbers reflect genuine robustness or depend on favorable mask quality that may not generalize. The stress-test concern about mask errors leading to preserved transients or new hallucinations is fair and unaddressed in the provided text. This work is aimed at computer vision groups extending Gaussian Splatting to casual captures. A reader already working on 3D reconstruction from sparse real data would pick up usable ideas on the refinement and replication steps. It deserves peer review because the problem is well-motivated and the components are testable, even though the current write-up needs more evidence on implementation and failure modes before the claims can be taken as settled.

Referee Report

4 major / 3 minor

Summary. The paper proposes a sparse-view 3D Gaussian Splatting framework for unconstrained real-world scenes with distractors. It introduces reference-guided view refinement via a diffusion model conditioned on transient masks and reference images to mitigate artifacts, combined with pseudo-view generation and sparsity-aware Gaussian replication to handle sparse regions in the Gaussian field. Experiments on public datasets are claimed to show consistent outperformance over prior methods, with example gains of 17.2% PSNR, 10.8% SSIM, and 4.0% LPIPS.

Significance. If the results hold under rigorous validation, the work would meaningfully extend 3D Gaussian Splatting to practical, sparse, unconstrained captures, lowering barriers to high-fidelity novel-view synthesis in robotics, AR, and wild-environment modeling without requiring dense clean data.

major comments (4)

[Abstract] Abstract: The stated quantitative gains (PSNR +17.2%, SSIM +10.8%, LPIPS +4.0%) are presented without naming the datasets, exact baseline methods, input view counts, or absolute metric values, preventing assessment of whether the improvements are load-bearing or reproducible.
[Method (reference-guided refinement)] Reference-guided view refinement section: No quantitative evaluation (e.g., mask IoU, precision-recall, or failure-case analysis) is supplied for transient mask accuracy or reference-image quality on the target unconstrained datasets; mask errors would allow diffusion to preserve distractors or hallucinate, directly undermining the claimed rendering gains.
[Method (sparsity-aware replication)] Sparsity-aware replication subsection: The pseudo-view generation and Gaussian replication strategy lacks an ablation or geometry-error metric (e.g., depth consistency or point-cloud alignment) showing that replication fills sparse regions without distorting scene geometry; this is a core assumption for the sparse-region handling claim.
[Experiments] Experiments section: No error analysis, variance across runs, or per-scene breakdown is reported despite the reader's note on missing experimental details, making it impossible to verify that the pipeline supports the central claim under realistic distractor conditions.

minor comments (3)

[Abstract] Abstract: The notation 'PSNR - 17.2%' is ambiguous (dash likely intends improvement); clarify as relative gains with absolute baseline values.
[Figures] Figure captions: Transient-mask and pseudo-view visualizations would benefit from explicit labels indicating success/failure cases to aid reader interpretation.
[Related Work] Related work: Add explicit comparison to recent diffusion-based novel-view methods and unconstrained Gaussian Splatting variants to better situate the contribution.

Simulated Author's Rebuttal

4 responses · 0 unresolved

We thank the referee for the thoughtful and constructive feedback on our manuscript. We address each major comment point-by-point below. Where the concerns identify gaps in the current version, we commit to revisions that strengthen the paper without misrepresenting the existing results or experiments.

read point-by-point responses

Referee: [Abstract] Abstract: The stated quantitative gains (PSNR +17.2%, SSIM +10.8%, LPIPS +4.0%) are presented without naming the datasets, exact baseline methods, input view counts, or absolute metric values, preventing assessment of whether the improvements are load-bearing or reproducible.

Authors: We agree that the abstract would benefit from greater specificity. In the revised manuscript we will explicitly name the evaluation datasets, the baseline methods, the input view counts used for each experiment, and report both absolute metric values and the relative gains so that readers can directly assess the magnitude and reproducibility of the improvements. revision: yes
Referee: [Method (reference-guided refinement)] Reference-guided view refinement section: No quantitative evaluation (e.g., mask IoU, precision-recall, or failure-case analysis) is supplied for transient mask accuracy or reference-image quality on the target unconstrained datasets; mask errors would allow diffusion to preserve distractors or hallucinate, directly undermining the claimed rendering gains.

Authors: We acknowledge that the current manuscript does not include quantitative metrics such as mask IoU or precision-recall for the transient masks. Ground-truth transient masks are unavailable for the real-world unconstrained datasets, which precludes standard IoU computation. We will add a dedicated failure-case analysis subsection with qualitative examples of mask and reference-image quality, together with visual inspection of how mask errors propagate (or do not) into final renderings, to provide stronger supporting evidence for the refinement module. revision: partial
Referee: [Method (sparsity-aware replication)] Sparsity-aware replication subsection: The pseudo-view generation and Gaussian replication strategy lacks an ablation or geometry-error metric (e.g., depth consistency or point-cloud alignment) showing that replication fills sparse regions without distorting scene geometry; this is a core assumption for the sparse-region handling claim.

Authors: We agree that an explicit ablation and geometry-preservation metric would strengthen the sparsity-aware replication claim. In the revision we will include an ablation study isolating the replication component and report geometry-related metrics (depth consistency on rendered views and alignment error against the original point cloud) to demonstrate that replication improves coverage without introducing geometric distortion. revision: yes
Referee: [Experiments] Experiments section: No error analysis, variance across runs, or per-scene breakdown is reported despite the reader's note on missing experimental details, making it impossible to verify that the pipeline supports the central claim under realistic distractor conditions.

Authors: We will expand the experiments section to provide per-scene quantitative breakdowns for all reported metrics. Because each training run is computationally expensive, we performed single runs per scene; we will explicitly note this limitation and report any observed run-to-run variance on a representative subset of scenes where multiple seeds were feasible. These additions will allow readers to assess consistency under realistic distractor conditions. revision: yes

Circularity Check

0 steps flagged

Empirical pipeline with independent algorithmic components; no circular derivations

full rationale

The paper introduces a practical framework for sparse-view 3D Gaussian splatting in unconstrained scenes, relying on reference-guided diffusion refinement (using transient masks and reference images) plus pseudo-view generation with sparsity-aware replication. These are presented as new algorithmic steps evaluated through experiments on public datasets, with performance gains reported via direct comparisons (PSNR/SSIM/LPIPS). No equations, derivations, or uniqueness claims reduce by construction to fitted inputs, self-referential definitions, or load-bearing self-citations; the central claims remain externally falsifiable via the reported metrics and do not loop back to the method's own parameters.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard computer-vision assumptions plus two domain-specific premises not independently verified in the abstract: reliable transient masks and reference images exist, and pseudo-views can safely amplify Gaussians. No explicit free parameters or invented entities are named.

axioms (2)

domain assumption Transient masks and reference images can be obtained or generated to guide diffusion refinement without new artifacts
Invoked in the reference-guided view refinement step described in the abstract
domain assumption Pseudo-view generation plus sparsity-aware replication accurately fills under-sampled regions of the Gaussian field
Invoked in the sparsity-handling component of the abstract

pith-pipeline@v0.9.0 · 5539 in / 1428 out tokens · 23252 ms · 2026-05-07T08:55:51.177625+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

79 extracted references · 20 canonical work pages · 4 internal anchors

[1]

Atp: Adaptive threshold pruning for efficient data encoding in quantum neural networks

Mohamed Afane, Gabrielle Ebbrecht, Ying Wang, Juntao Chen, and Junaid Farooq. Atp: Adaptive threshold pruning for efficient data encoding in quantum neural networks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20427–20436, 2025

2025
[2]

Distractor-free generalizable 3d gaussian splatting.arXiv preprint arXiv:2411.17605, 2024

Yanqi Bao, Jing Liao, Jing Huo, and Yang Gao. Distractor-free generalizable 3d gaussian splatting.arXiv preprint arXiv:2411.17605, 2024

work page arXiv 2024
[3]

Nerf-hugs: Improved neural radi- ance fields in non-static scenes using heuristics-guided segmentation

Jiahao Chen, Yipeng Qin, Lingjie Liu, Jiangbo Lu, and Guanbin Li. Nerf-hugs: Improved neural radi- ance fields in non-static scenes using heuristics-guided segmentation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19436–19446, 2024

2024
[4]

Hallucinated neural radiance fields in the wild

Xingyu Chen, Qi Zhang, Xiaoyu Li, Yue Chen, Ying Feng, Xuan Wang, and Jue Wang. Hallucinated neural radiance fields in the wild. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12943–12952, 2022

2022
[5]

Hac: Hash-grid assisted context for 3d gaussian splatting compression

Yihang Chen, Qianyi Wu, Weiyao Lin, Mehrtash Harandi, and Jianfei Cai. Hac: Hash-grid assisted context for 3d gaussian splatting compression. InProceedings of the European Conference on Computer Vision, pages 422–438. Springer, 2024

2024
[6]

Hac++: Towards 100x compression of 3d gaussian splatting.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

Yihang Chen, Qianyi Wu, Weiyao Lin, Mehrtash Harandi, and Jianfei Cai. Hac++: Towards 100x compression of 3d gaussian splatting.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

2025
[7]

Perspective-aware 3d gaussian inpainting with multi-view consistency.arXiv preprint arXiv:2510.10993, 2025

Yuxin Cheng, Binxiao Huang, Taiqiang Wu, Wenyong Zhou, Chenchen Ding, Zhengwu Liu, Graziano Chesi, and Ngai Wong. Perspective-aware 3d gaussian inpainting with multi-view consistency.arXiv preprint arXiv:2510.10993, 2025

work page arXiv 2025
[8]

Depth-regularized optimization for 3d gaussian splatting in few-shot images

Jaeyoung Chung, Jeongtaek Oh, and Kyoung Mu Lee. Depth-regularized optimization for 3d gaussian splatting in few-shot images. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 811–820, 2024

2024
[9]

Eap-gs: Efficient augmentation of pointcloud for 3d gaussian splatting in few-shot scene reconstruction

Dongrui Dai and Yuxiang Xing. Eap-gs: Efficient augmentation of pointcloud for 3d gaussian splatting in few-shot scene reconstruction. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16498–16507, 2025

2025
[10]

Robustsplat: Decoupling densification and dynamics for transient-free 3dgs.arXiv preprint arXiv:2506.02751, 2025

Chuanyu Fu, Yuqi Zhang, Kunbin Yao, Guanying Chen, Yuan Xiong, Chuan Huang, Shuguang Cui, and Xiaochun Cao. Robustsplat: Decoupling densification and dynamics for transient-free 3dgs.arXiv preprint arXiv:2506.02751, 2025

work page arXiv 2025
[11]

Gaussianocc: Fully self-supervised and efficient 3d occupancy estimation with gaussian splatting

Wanshui Gan, Fang Liu, Hongbin Xu, Ningkai Mo, and Naoto Yokoya. Gaussianocc: Fully self-supervised and efficient 3d occupancy estimation with gaussian splatting. InProceedings of the IEEE/CVF Interna- tional Conference on Computer Vision, pages 28980–28990, 2025

2025
[12]

Lora: Low-rank adaptation of large language models

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low-rank adaptation of large language models. InProceedings of the International Conference on Learning Representations, 2022

2022
[13]

No pose at all: Self-supervised pose-free 3d gaussian splatting from sparse views

Ranran Huang and Krystian Mikolajczyk. No pose at all: Self-supervised pose-free 3d gaussian splatting from sparse views. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 27947–27957, 2025

2025
[14]

Enerverse: Envisioning embodied future space for robotics manipulation

Siyuan Huang, Liliang Chen, Pengfei Zhou, Shengcong Chen, Yue Liao, Zhengkai Jiang, Yue Hu, Peng Gao, Hongsheng Li, Maoqing Yao, and Guanghui Ren. Enerverse: Envisioning embodied future space for robotics manipulation. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026

2026
[15]

Gaussiannexus: Room- scale real-time ar/vr telepresence with gaussian splatting

Xincheng Huang, Dieter Frehlich, Ziyi Xia, Peyman Gholami, and Robert Xiao. Gaussiannexus: Room- scale real-time ar/vr telepresence with gaussian splatting. InProceedings of the ACM Symposium on User Interface Software and Technology, pages 1–18, 2025

2025
[16]

3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42(4):139–1, 2023

Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42(4):139–1, 2023

2023
[17]

Adam: A Method for Stochastic Optimization

Diederik P Kingma. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review arXiv 2014
[18]

Segment anything

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, et al. Segment anything. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4015–4026, 2023. 10

2023
[19]

Generative sparse-view gaussian splatting

Hanyang Kong, Xingyi Yang, and Xinchao Wang. Generative sparse-view gaussian splatting. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 26745–26755, 2025

2025
[20]

Wildgaussians: 3d gaussian splatting in the wild.arXiv preprint arXiv:2407.08447, 2024

Jonas Kulhanek, Songyou Peng, Zuzana Kukelova, Marc Pollefeys, and Torsten Sattler. Wildgaussians: 3d gaussian splatting in the wild.arXiv preprint arXiv:2407.08447, 2024

work page arXiv 2024
[21]

Compact 3d gaussian representation for radiance field

Joo Chan Lee, Daniel Rho, Xiangyu Sun, Jong Hwan Ko, and Eunbyung Park. Compact 3d gaussian representation for radiance field. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21719–21728, 2024

2024
[22]

Ms-gs: Multi-appearance sparse-view 3d gaussian splatting in the wild

Deming Li, Kaiwen Jiang, Yutao Tang, Ravi Ramamoorthi, Rama Chellappa, and Cheng Peng. Ms-gs: Multi-appearance sparse-view 3d gaussian splatting in the wild. InProceedings of the Conference on Neural Information Processing Systems, 2025

2025
[23]

Dngaussian: Optimizing sparse-view 3d gaussian radiance fields with global-local depth normalization

Jiahe Li, Jiawei Zhang, Xiao Bai, Jin Zheng, Xin Ning, Jun Zhou, and Lin Gu. Dngaussian: Optimizing sparse-view 3d gaussian radiance fields with global-local depth normalization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20775–20785, 2024

2024
[24]

Nerf-ms: Neural radiance fields with multi-sequence

Peihao Li, Shaohui Wang, Chen Yang, Bingbing Liu, Weichao Qiu, and Haoqian Wang. Nerf-ms: Neural radiance fields with multi-sequence. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 18591–18600, 2023

2023
[25]

Sparsegs-w: Sparse-view 3d gaussian splatting in the wild with generative priors.arXiv preprint arXiv:2503.19452,

Yiqing Li, Xuan Wang, Jiawei Wu, Yikun Ma, and Zhi Jin. Sparsegs-w: Sparse-view 3d gaussian splatting in the wild with generative priors.arXiv preprint arXiv:2503.19452, 2025

work page arXiv 2025
[26]

Align your gaussians: Text-to-4d with dynamic 3d gaussians and composed diffusion models

Huan Ling, Seung Wook Kim, Antonio Torralba, Sanja Fidler, and Karsten Kreis. Align your gaussians: Text-to-4d with dynamic 3d gaussians and composed diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8576–8588, 2024

2024
[27]

3dgs-enhancer: Enhancing unbounded 3d gaussian splatting with view-consistent 2d diffusion priors

Xi Liu, Chaoyi Zhou, and Siyu Huang. 3dgs-enhancer: Enhancing unbounded 3d gaussian splatting with view-consistent 2d diffusion priors. InProceedings of the Advances in Neural Information Processing Systems, volume 37, pages 133305–133327, 2024

2024
[28]

Feed-forward 3d gaussian splatting compression with long-context modeling.arXiv preprint arXiv:2512.00877, 2025

Zhening Liu, Rui Song, Yushi Huang, Yingdong Hu, Xinjie Zhang, Jiawei Shao, Zehong Lin, and Jun Zhang. Feed-forward 3d gaussian splatting compression with long-context modeling.arXiv preprint arXiv:2512.00877, 2025

work page arXiv 2025
[29]

Scaffold-gs: Structured 3d gaussians for view-adaptive rendering

Tao Lu, Mulin Yu, Linning Xu, Yuanbo Xiangli, Limin Wang, Dahua Lin, and Bo Dai. Scaffold-gs: Structured 3d gaussians for view-adaptive rendering. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20654–20664, 2024

2024
[30]

Nerf in the wild: Neural radiance fields for unconstrained photo collections

Ricardo Martin-Brualla, Noha Radwan, Mehdi SM Sajjadi, Jonathan T Barron, Alexey Dosovitskiy, and Daniel Duckworth. Nerf in the wild: Neural radiance fields for unconstrained photo collections. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7210–7219, 2021

2021
[31]

Local light field fusion: Practical view synthesis with prescriptive sampling guidelines.ACM Transactions on Graphics, 38(4):1–14, 2019

Ben Mildenhall, Pratul P Srinivasan, Rodrigo Ortiz-Cayon, Nima Khademi Kalantari, Ravi Ramamoorthi, Ren Ng, and Abhishek Kar. Local light field fusion: Practical view synthesis with prescriptive sampling guidelines.ACM Transactions on Graphics, 38(4):1–14, 2019

2019
[32]

Nerf: Representing scenes as neural radiance fields for view synthesis.Communications of the ACM, 65(1):99–106, 2021

Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view synthesis.Communications of the ACM, 65(1):99–106, 2021

2021
[33]

Compressed 3d gaussian splatting for accelerated novel view synthesis

Simon Niedermayr, Josef Stumpfegger, and Rüdiger Westermann. Compressed 3d gaussian splatting for accelerated novel view synthesis. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10349–10358, 2024

2024
[34]

Regnerf: Regularizing neural radiance fields for view synthesis from sparse inputs

Michael Niemeyer, Jonathan T Barron, Ben Mildenhall, Mehdi SM Sajjadi, Andreas Geiger, and Noha Radwan. Regnerf: Regularizing neural radiance fields for view synthesis from sparse inputs. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5470–5480, 2021

2021
[35]

DINOv2: Learning Robust Visual Features without Supervision

Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervision.arXiv preprint arXiv:2304.07193, 2023

work page internal anchor Pith review arXiv 2023
[36]

Ri3d: Few-shot gaussian splatting with repair and inpainting diffusion priors.arXiv preprint arXiv:2503.10860, 2025

Avinash Paliwal, Xilong Zhou, Wei Ye, Jinhui Xiong, Rakesh Ranjan, and Nima Khademi Kalantari. Ri3d: Few-shot gaussian splatting with repair and inpainting diffusion priors.arXiv preprint arXiv:2503.10860, 2025. 11

work page arXiv 2025
[37]

Dropgaussian: Structural regularization for sparse-view gaus- sian splatting

Hyunwoo Park, Gun Ryu, and Wonjun Kim. Dropgaussian: Structural regularization for sparse-view gaus- sian splatting. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21600–21609, 2025

2025
[38]

Forestsplats: Deformable transient field for gaussian splatting in the wild.arXiv preprint arXiv:2503.06179, 2025

Wongi Park, Myeongseok Nam, Siwon Kim, Sangwoo Jo, and Soomok Lee. Forestsplats: Deformable transient field for gaussian splatting in the wild.arXiv preprint arXiv:2503.06179, 2025

work page arXiv 2025
[39]

DreamFusion: Text-to-3D using 2D Diffusion

Ben Poole, Ajay Jain, Jonathan T Barron, and Ben Mildenhall. Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988, 2022

work page internal anchor Pith review arXiv 2022
[40]

Vision transformers for dense prediction

René Ranftl, Alexey Bochkovskiy, and Vladlen Koltun. Vision transformers for dense prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 12179–12188, 2021

2021
[41]

L4gm: Large 4d gaussian reconstruction model

Jiawei Ren, Cheng Xie, Ashkan Mirzaei, Karsten Kreis, Ziwei Liu, Antonio Torralba, Sanja Fidler, Seung Wook Kim, Huan Ling, et al. L4gm: Large 4d gaussian reconstruction model. InProceedings of the Advances in Neural Information Processing Systems, volume 37, pages 56828–56858, 2024

2024
[42]

Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks

Tianhe Ren, Shilong Liu, Ailing Zeng, Jing Lin, Kunchang Li, He Cao, Jiayu Chen, Xinyu Huang, Yukang Chen, Feng Yan, et al. Grounded sam: Assembling open-world models for diverse visual tasks.arXiv preprint arXiv:2401.14159, 2024

work page internal anchor Pith review arXiv 2024
[43]

Nerf on-the-go: Exploiting uncertainty for distractor-free nerfs in the wild

Weining Ren, Zihan Zhu, Boyang Sun, Jiaqi Chen, Marc Pollefeys, and Songyou Peng. Nerf on-the-go: Exploiting uncertainty for distractor-free nerfs in the wild. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8931–8940, 2024

2024
[44]

Gen3c: 3d-informed world-consistent video generation with precise camera control

Xuanchi Ren, Tianchang Shen, Jiahui Huang, Huan Ling, Yifan Lu, Merlin Nimier-David, Thomas Müller, Alexander Keller, Sanja Fidler, and Jun Gao. Gen3c: 3d-informed world-consistent video generation with precise camera control. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6121–6132, 2025

2025
[45]

Spotlesssplats: Ignoring distractors in 3d gaussian splatting.arXiv preprint arXiv:2406.20055, 2024

Sara Sabour, Lily Goli, George Kopanas, Mark Matthews, Dmitry Lagun, Leonidas Guibas, Alec Jacobson, David J Fleet, and Andrea Tagliasacchi. Spotlesssplats: Ignoring distractors in 3d gaussian splatting.arXiv preprint arXiv:2406.20055, 2024

work page arXiv 2024
[46]

Zeronvs: Zero-shot 360-degree view synthesis from a single image

Kyle Sargent, Zizhang Li, Tanmay Shah, Charles Herrmann, Hong-Xing Yu, Yunzhi Zhang, Eric Ryan Chan, Dmitry Lagun, Li Fei-Fei, Deqing Sun, et al. Zeronvs: Zero-shot 360-degree view synthesis from a single image. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9420–9429, 2024

2024
[47]

Structure-from-motion revisited

Johannes L Schönberger and Jan-Michael Frahm. Structure-from-motion revisited. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4104–4113, 2016

2016
[48]

Pixelwise view selection for unstructured multi-view stereo

Johannes L Schönberger, Enliang Zheng, Jan-Michael Frahm, and Marc Pollefeys. Pixelwise view selection for unstructured multi-view stereo. InProceedings of the European Conference on Computer Vision, pages 501–518. Springer, 2016

2016
[49]

Gaussianshopvr: Facilitating immersive 3d authoring using gaussian splatting in vr

Yulin Shen, Boyu Li, Jiayang Huang, David Yip, and Zeyu Wang. Gaussianshopvr: Facilitating immersive 3d authoring using gaussian splatting in vr. InProceedings of the ACM Symposium on User Interface Software and Technology, pages 1–14, 2025

2025
[50]

arXiv preprint arXiv:2310.15110 , year=

Ruoxi Shi, Hansheng Chen, Zhuoyang Zhang, Minghua Liu, Chao Xu, Xinyue Wei, Linghao Chen, Chong Zeng, and Hao Su. Zero123++: A single image to consistent multi-view diffusion base model.arXiv preprint arXiv:2310.15110, 2023

work page arXiv 2023
[51]

Photo tourism: Exploring photo collections in 3d

Noah Snavely, Steven M Seitz, and Richard Szeliski. Photo tourism: Exploring photo collections in 3d. In Proceedings of the ACM SIGGRAPH Papers, pages 835–846. ACM, 2006

2006
[52]

Simplenerf: Regularizing sparse input neural radiance fields with simpler solutions

Nagabhushan Somraj, Adithyan Karanayil, and Rajiv Soundararajan. Simplenerf: Regularizing sparse input neural radiance fields with simpler solutions. InProceedings of the SIGGRAPH Asia Conference, pages 1–11, 2023

2023
[53]

Dronesplat: 3d gaussian splatting for robust 3d reconstruction from in-the-wild drone imagery

Jiadong Tang, Yu Gao, Dianyi Yang, Liqi Yan, Yufeng Yue, and Yi Yang. Dronesplat: 3d gaussian splatting for robust 3d reconstruction from in-the-wild drone imagery. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 833–843, 2025

2025
[54]

Emergent correspondence from image diffusion

Luming Tang, Menglin Jia, Qianqian Wang, Cheng Perng Phoo, and Bharath Hariharan. Emergent correspondence from image diffusion. InProceedings of the Advances in Neural Information Processing Systems, volume 36, pages 1363–1389, 2023. 12

2023
[55]

Nexussplats: Efficient 3d gaussian splatting in the wild.arXiv preprint arXiv:2411.14514, 2024

Yuzhou Tang, Dejun Xu, Yongjie Hou, Zhenzhong Wang, and Min Jiang. Nexussplats: Efficient 3d gaussian splatting in the wild.arXiv preprint arXiv:2411.14514, 2024

work page arXiv 2024
[56]

Vrsplat: Fast and robust gaussian splatting for virtual reality.Proceedings of the ACM on Computer Graphics and Interactive Techniques, 8(1):1–22, 2025

Xuechang Tu, Lukas Radl, Michael Steiner, Markus Steinberger, Bernhard Kerbl, and Fernando de la Torre. Vrsplat: Fast and robust gaussian splatting for virtual reality.Proceedings of the ACM on Computer Graphics and Interactive Techniques, 8(1):1–22, 2025

2025
[57]

Image quality assessment: From error visibility to structural similarity.IEEE Transactions on Image Processing, 13(4):600–612, 2004

Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. Image quality assessment: From error visibility to structural similarity.IEEE Transactions on Image Processing, 13(4):600–612, 2004

2004
[58]

Parkgaussian: Surround-view 3d gaussian splatting for autonomous parking.arXiv preprint arXiv:2601.01386, 2026

Xiaobao Wei, Zhangjie Ye, Yuxiang Gu, Zunjie Zhu, Yunfei Guo, Yingying Shen, Shan Zhao, Ming Lu, Haiyang Sun, Bing Wang, et al. Parkgaussian: Surround-view 3d gaussian splatting for autonomous parking.arXiv preprint arXiv:2601.01386, 2026

work page arXiv 2026
[59]

Difix3d+: Improving 3d reconstructions with single-step diffusion models

Jay Zhangjie Wu, Yuxuan Zhang, Haithem Turki, Xuanchi Ren, Jun Gao, Mike Zheng Shou, Sanja Fidler, Zan Gojcic, and Huan Ling. Difix3d+: Improving 3d reconstructions with single-step diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 26024–26035, 2025

2025
[60]

Bg-triangle: Bézier gaussian triangle for 3d vectorization and rendering

Minye Wu, Haizhao Dai, Kaixin Yao, Tinne Tuytelaars, and Jingyi Yu. Bg-triangle: Bézier gaussian triangle for 3d vectorization and rendering. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16197–16207, 2025

2025
[61]

Reconfusion: 3d reconstruction with diffusion priors

Rundi Wu, Ben Mildenhall, Philipp Henzler, Keunhong Park, Ruiqi Gao, Daniel Watson, Pratul P Srinivasan, Dor Verbin, Jonathan T Barron, Ben Poole, et al. Reconfusion: 3d reconstruction with diffusion priors. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21551– 21561, 2024

2024
[62]

Diffusionerf: Regularizing neural radiance fields with denoising diffusion models

Jamie Wynn and Daniyar Turmukhambetov. Diffusionerf: Regularizing neural radiance fields with denoising diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4180–4189, 2023

2023
[63]

SparseGS: Real-time 360 ◦ sparse view synthesis using gaussian splat- ting,

Haolin Xiong, Sairisheek Muttukuru, Rishi Upadhyay, Pradyumna Chari, and Achuta Kadambi. Sparsegs: Real-time 360 {\deg} sparse view synthesis using gaussian splatting.arXiv preprint arXiv:2312.00206, 2023

work page arXiv 2023
[64]

Wild-gs: Real-time novel view synthesis from unconstrained photo collections.arXiv preprint arXiv:2406.10373, 2024

Jiacong Xu, Yiqun Mei, and Vishal M Patel. Wild-gs: Real-time novel view synthesis from unconstrained photo collections.arXiv preprint arXiv:2406.10373, 2024

work page arXiv 2024
[65]

Ad-gs: Object-aware b-spline gaussian splatting for self-supervised autonomous driving

Jiawei Xu, Kai Deng, Zexin Fan, Shenlong Wang, Jin Xie, and Jian Yang. Ad-gs: Object-aware b-spline gaussian splatting for self-supervised autonomous driving. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 24770–24779, 2025

2025
[66]

Freenerf: Improving few-shot neural rendering with free frequency regularization

Jiawei Yang, Marco Pavone, and Yue Wang. Freenerf: Improving few-shot neural rendering with free frequency regularization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8254–8263, 2023

2023
[67]

Depth anything: Unleashing the power of large-scale unlabeled data

Lihe Yang, Bingyi Kang, Zilong Huang, Xiaogang Xu, Jiashi Feng, and Hengshuang Zhao. Depth anything: Unleashing the power of large-scale unlabeled data. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10371–10381, 2024

2024
[68]

Cross-ray neural radiance fields for novel-view synthesis from unconstrained image collections

Yifan Yang, Shuhai Zhang, Zixiong Huang, Yubing Zhang, and Mingkui Tan. Cross-ray neural radiance fields for novel-view synthesis from unconstrained image collections. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 15901–15911, 2023

2023
[69]

Fewviewgs: Gaussian splatting with few view matching and multi-stage training

Ruihong Yin, Vladimir Yugay, Yue Li, Sezer Karaoglu, and Theo Gevers. Fewviewgs: Gaussian splatting with few view matching and multi-stage training. InProceedings of the Advances in Neural Information Processing Systems, volume 37, pages 127204–127225, 2024

2024
[70]

arXiv preprint arXiv:2508.09667 , year=

Xingyilang Yin, Qi Zhang, Jiahao Chang, Ying Feng, Qingnan Fan, Xi Yang, Chi-Man Pun, Huaqi Zhang, and Xiaodong Cun. Gsfixer: Improving 3d gaussian splatting with reference-guided video diffusion priors. arXiv preprint arXiv:2508.09667, 2025

work page arXiv 2025
[71]

Mip-splatting: Alias-free 3d gaussian splatting

Zehao Yu, Anpei Chen, Binbin Huang, Torsten Sattler, and Andreas Geiger. Mip-splatting: Alias-free 3d gaussian splatting. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19447–19456, 2024. 13

2024
[72]

Frequency-aware density control via reparameter- ization for high-quality rendering of 3d gaussian splatting

Zhaojie Zeng, Yuesong Wang, Lili Ju, and Tao Guan. Frequency-aware density control via reparameter- ization for high-quality rendering of 3d gaussian splatting. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 9833–9841, 2025

2025
[73]

Gaussian in the wild: 3d gaussian splatting for unconstrained image collections

Dongbin Zhang, Chuming Wang, Weitao Wang, Peihao Li, Minghan Qin, and Haoqian Wang. Gaussian in the wild: 3d gaussian splatting for unconstrained image collections. InProceedings of the European Conference on Computer Vision, pages 341–359. Springer, 2024

2024
[74]

Su-rgs: Relightable 3d gaussian splatting from sparse views under unconstrained illuminations

Qi Zhang, Chi Huang, Qian Zhang, Nan Li, and Wei Feng. Su-rgs: Relightable 3d gaussian splatting from sparse views under unconstrained illuminations. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 26859–26868, 2025

2025
[75]

The unreasonable effectiveness of deep features as a perceptual metric

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 586–595, 2018

2018
[76]

Pixel-gs: Density control with pixel-aware gradient for 3d gaussian splatting

Zheng Zhang, Wenbo Hu, Yixing Lao, Tong He, and Hengshuang Zhao. Pixel-gs: Density control with pixel-aware gradient for 3d gaussian splatting. InProceedings of the European Conference on Computer Vision, pages 326–342. Springer, 2024

2024
[77]

Self-ensembling gaussian splatting for few-shot novel view synthesis

Chen Zhao, Xuan Wang, Tong Zhang, Saqib Javed, and Mathieu Salzmann. Self-ensembling gaussian splatting for few-shot novel view synthesis. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 4940–4950, 2025

2025
[78]

Drivedreamer4d: World models are effective data machines for 4d driving scene representation

Guosheng Zhao, Chaojun Ni, Xiaofeng Wang, Zheng Zhu, Xueyang Zhang, Yida Wang, Guan Huang, Xinze Chen, Boyuan Wang, Youyi Zhang, et al. Drivedreamer4d: World models are effective data machines for 4d driving scene representation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12015–12026, 2025

2025
[79]

dynamic” inTtext effectively captured transient elements. Inspired by this, we included the word “dynamic

Zehao Zhu, Zhiwen Fan, Yifan Jiang, and Zhangyang Wang. Fsgs: Real-time few-shot view synthesis using gaussian splatting. InProceedings of the European Conference on Computer Vision, pages 145–163. Springer, 2024. 14 Appendix In this appendix, we provide an additional discussion, more experimental results, and other technical details. We organize the appe...

work page arXiv 2024