pith. machine review for the scientific record. sign in

arxiv: 2604.27422 · v1 · submitted 2026-04-30 · 💻 cs.CV

Recognition: unknown

Sparse-View 3D Gaussian Splatting in the Wild

Jordan A. James, Minjae Lee, Myeongseok Nam, Sang-Hyun Lee, Soomok Lee, William J. Beksi, Wongi Park

Authors on Pith no claims yet

Pith reviewed 2026-05-07 08:55 UTC · model grok-4.3

classification 💻 cs.CV
keywords 3D Gaussian splattingsparse view synthesisnovel view synthesisdiffusion modelunconstrained imagestransient distractorsview refinementGaussian replication
0
0 comments X

The pith

A new method for 3D Gaussian splatting enables high-quality novel view synthesis from sparse unconstrained images containing distractors.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops a framework to synthesize novel views in 3D from a small number of real-world photos that include unwanted transient elements like moving objects. It refines rendered images using a diffusion model conditioned on a reference view and a mask of transients to clean up artifacts. To handle areas with few Gaussians due to sparsity, it creates pseudo views and replicates Gaussians in those regions in a sparsity-aware way. If successful, this would allow accurate 3D reconstructions from casual, sparse photo collections without needing controlled dense captures, making the technology more practical for everyday use.

Core claim

The authors claim that by applying reference-guided view refinement with a diffusion model that incorporates a transient mask and a reference image, and by using pseudo-view generation together with sparsity-aware Gaussian replication, their approach can effectively process sparse unconstrained image collections and deliver superior 3D rendering results compared to previous techniques.

What carries the argument

Reference-guided diffusion refinement using transient masks and reference images for artifact mitigation, combined with pseudo-view generation and sparsity-aware replication to densify the Gaussian field.

If this is right

  • Outperforms prior methods with improvements of 17.2% in PSNR, 10.8% in SSIM, and 4.0% in LPIPS on public datasets.
  • Achieves high-fidelity 3D rendering in unconstrained scenarios with distractors.
  • Reduces reliance on labor-intensive dense image acquisition for real-world 3D modeling.
  • Handles sparse image sets more effectively than existing Gaussian splatting approaches.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Applying this to video sequences could allow 3D reconstruction of scenes with moving people or vehicles by treating motion as transients.
  • The replication strategy might be adapted to other 3D representation methods like neural radiance fields for similar sparsity issues.
  • Future work could test the method on even sparser inputs, such as 3-5 images, to determine the minimum viable number for good results.

Load-bearing premise

The method assumes that reliable transient masks and reference images can be obtained or generated to guide the diffusion model without introducing additional artifacts, and that pseudo-view generation accurately identifies and fills sparse Gaussian regions without distorting the underlying geometry.

What would settle it

If experiments on a dataset with known ground-truth geometry show that incorrect transient masks lead to lower performance metrics or visible distortions compared to baselines, that would indicate the claim does not hold.

Figures

Figures reproduced from arXiv: 2604.27422 by Jordan A. James, Minjae Lee, Myeongseok Nam, Sang-Hyun Lee, Soomok Lee, William J. Beksi, Wongi Park.

Figure 1
Figure 1. Figure 1: Given a sparse set of images, our approach effectively renders 3D novel view synthesis in view at source ↗
Figure 2
Figure 2. Figure 2: We observe that (a) distractors are ephemeral, appearing across diverse regions, and (b) using view at source ↗
Figure 3
Figure 3. Figure 3: An overview of SaveWildGS. We refine rendered images utilizing a reference view and a view at source ↗
Figure 4
Figure 4. Figure 4: An overview of sparsity-aware Gaussian replication. (a) Existing work densifies the Gaussians view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative results on the NeRF On-the-go dataset. view at source ↗
Figure 6
Figure 6. Figure 6: Examples of transient masks in rendered images on the NeRF On-the-go dataset view at source ↗
Figure 8
Figure 8. Figure 8: An analysis of 3D Gaussian primitives. (a) The structure from motion (SfM) points via view at source ↗
Figure 9
Figure 9. Figure 9: A comparison of w/ and w/o a training scheme on the Fountain and Patio scene. Rank (R) PSNR ↑SSIM ↑LPIPS ↓ w/o LoRA [12] 23.08 0.579 0.317 R = 4 24.32 0.605 0.293 R = 8 25.21 0.605 0.287 R = 16 25.37 0.601 0.294 view at source ↗
Figure 10
Figure 10. Figure 10: Overlapped camera perspectives represent a potential limitation. Limitations. Although SaveWildGS yields state-of-the-art results on 3D novel view synthesis for sparse sets of im￾ages, there are two limitations: (i) it can struggle to handle an object that may be static in the whole sequence (e.g., parked vehicles, standing pedestrians, etc.), which makes it hard to identify transient elements from the ba… view at source ↗
Figure 11
Figure 11. Figure 11: Qualitative results on the Photo Tourism dataset. view at source ↗
Figure 12
Figure 12. Figure 12: Qualitative results on the LLFF dataset. view at source ↗
Figure 13
Figure 13. Figure 13: An ablation of robustness on text descriptions. view at source ↗
Figure 14
Figure 14. Figure 14: Qualitative results showing the refined images from the corrupted images. view at source ↗
read the original abstract

We propose a 3D novel sparse-view synthesis framework for unconstrained real-world scenarios that contain distractors. Unlike existing methods that primarily perform novel-view synthesis from a sparse set of constrained images without transient elements or leverage unconstrained dense image collections to enhance 3D representation in real-world scenarios, our method not only effectively tackles sparse unconstrained image collections, but also shows high-quality 3D rendering results. To do this, we introduce reference-guided view refinement with a diffusion model using a transient mask and a reference image to enhance the 3D representation and mitigate artifacts in rendered views. Furthermore, we address sparse regions in the Gaussian field via pseudo-view generation along with a sparsity-aware Gaussian replication strategy to amplify Gaussians in the sparse regions. Extensive experiments on publicly available datasets demonstrate that our methodology consistently outperforms existing methods (e.g., PSNR - 17.2%, SSIM - 10.8%, LPIPS - 4.0%) and provides high-fidelity 3D rendering results. This advancement paves the way for realizing unconstrained real-world scenarios without labor-intensive data acquisition. Our project page is available at $\href{https://robotic-vision-lab.github.io/SaveWildGS/}{here}$

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

4 major / 3 minor

Summary. The paper proposes a sparse-view 3D Gaussian Splatting framework for unconstrained real-world scenes with distractors. It introduces reference-guided view refinement via a diffusion model conditioned on transient masks and reference images to mitigate artifacts, combined with pseudo-view generation and sparsity-aware Gaussian replication to handle sparse regions in the Gaussian field. Experiments on public datasets are claimed to show consistent outperformance over prior methods, with example gains of 17.2% PSNR, 10.8% SSIM, and 4.0% LPIPS.

Significance. If the results hold under rigorous validation, the work would meaningfully extend 3D Gaussian Splatting to practical, sparse, unconstrained captures, lowering barriers to high-fidelity novel-view synthesis in robotics, AR, and wild-environment modeling without requiring dense clean data.

major comments (4)
  1. [Abstract] Abstract: The stated quantitative gains (PSNR +17.2%, SSIM +10.8%, LPIPS +4.0%) are presented without naming the datasets, exact baseline methods, input view counts, or absolute metric values, preventing assessment of whether the improvements are load-bearing or reproducible.
  2. [Method (reference-guided refinement)] Reference-guided view refinement section: No quantitative evaluation (e.g., mask IoU, precision-recall, or failure-case analysis) is supplied for transient mask accuracy or reference-image quality on the target unconstrained datasets; mask errors would allow diffusion to preserve distractors or hallucinate, directly undermining the claimed rendering gains.
  3. [Method (sparsity-aware replication)] Sparsity-aware replication subsection: The pseudo-view generation and Gaussian replication strategy lacks an ablation or geometry-error metric (e.g., depth consistency or point-cloud alignment) showing that replication fills sparse regions without distorting scene geometry; this is a core assumption for the sparse-region handling claim.
  4. [Experiments] Experiments section: No error analysis, variance across runs, or per-scene breakdown is reported despite the reader's note on missing experimental details, making it impossible to verify that the pipeline supports the central claim under realistic distractor conditions.
minor comments (3)
  1. [Abstract] Abstract: The notation 'PSNR - 17.2%' is ambiguous (dash likely intends improvement); clarify as relative gains with absolute baseline values.
  2. [Figures] Figure captions: Transient-mask and pseudo-view visualizations would benefit from explicit labels indicating success/failure cases to aid reader interpretation.
  3. [Related Work] Related work: Add explicit comparison to recent diffusion-based novel-view methods and unconstrained Gaussian Splatting variants to better situate the contribution.

Simulated Author's Rebuttal

4 responses · 0 unresolved

We thank the referee for the thoughtful and constructive feedback on our manuscript. We address each major comment point-by-point below. Where the concerns identify gaps in the current version, we commit to revisions that strengthen the paper without misrepresenting the existing results or experiments.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The stated quantitative gains (PSNR +17.2%, SSIM +10.8%, LPIPS +4.0%) are presented without naming the datasets, exact baseline methods, input view counts, or absolute metric values, preventing assessment of whether the improvements are load-bearing or reproducible.

    Authors: We agree that the abstract would benefit from greater specificity. In the revised manuscript we will explicitly name the evaluation datasets, the baseline methods, the input view counts used for each experiment, and report both absolute metric values and the relative gains so that readers can directly assess the magnitude and reproducibility of the improvements. revision: yes

  2. Referee: [Method (reference-guided refinement)] Reference-guided view refinement section: No quantitative evaluation (e.g., mask IoU, precision-recall, or failure-case analysis) is supplied for transient mask accuracy or reference-image quality on the target unconstrained datasets; mask errors would allow diffusion to preserve distractors or hallucinate, directly undermining the claimed rendering gains.

    Authors: We acknowledge that the current manuscript does not include quantitative metrics such as mask IoU or precision-recall for the transient masks. Ground-truth transient masks are unavailable for the real-world unconstrained datasets, which precludes standard IoU computation. We will add a dedicated failure-case analysis subsection with qualitative examples of mask and reference-image quality, together with visual inspection of how mask errors propagate (or do not) into final renderings, to provide stronger supporting evidence for the refinement module. revision: partial

  3. Referee: [Method (sparsity-aware replication)] Sparsity-aware replication subsection: The pseudo-view generation and Gaussian replication strategy lacks an ablation or geometry-error metric (e.g., depth consistency or point-cloud alignment) showing that replication fills sparse regions without distorting scene geometry; this is a core assumption for the sparse-region handling claim.

    Authors: We agree that an explicit ablation and geometry-preservation metric would strengthen the sparsity-aware replication claim. In the revision we will include an ablation study isolating the replication component and report geometry-related metrics (depth consistency on rendered views and alignment error against the original point cloud) to demonstrate that replication improves coverage without introducing geometric distortion. revision: yes

  4. Referee: [Experiments] Experiments section: No error analysis, variance across runs, or per-scene breakdown is reported despite the reader's note on missing experimental details, making it impossible to verify that the pipeline supports the central claim under realistic distractor conditions.

    Authors: We will expand the experiments section to provide per-scene quantitative breakdowns for all reported metrics. Because each training run is computationally expensive, we performed single runs per scene; we will explicitly note this limitation and report any observed run-to-run variance on a representative subset of scenes where multiple seeds were feasible. These additions will allow readers to assess consistency under realistic distractor conditions. revision: yes

Circularity Check

0 steps flagged

Empirical pipeline with independent algorithmic components; no circular derivations

full rationale

The paper introduces a practical framework for sparse-view 3D Gaussian splatting in unconstrained scenes, relying on reference-guided diffusion refinement (using transient masks and reference images) plus pseudo-view generation with sparsity-aware replication. These are presented as new algorithmic steps evaluated through experiments on public datasets, with performance gains reported via direct comparisons (PSNR/SSIM/LPIPS). No equations, derivations, or uniqueness claims reduce by construction to fitted inputs, self-referential definitions, or load-bearing self-citations; the central claims remain externally falsifiable via the reported metrics and do not loop back to the method's own parameters.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard computer-vision assumptions plus two domain-specific premises not independently verified in the abstract: reliable transient masks and reference images exist, and pseudo-views can safely amplify Gaussians. No explicit free parameters or invented entities are named.

axioms (2)
  • domain assumption Transient masks and reference images can be obtained or generated to guide diffusion refinement without new artifacts
    Invoked in the reference-guided view refinement step described in the abstract
  • domain assumption Pseudo-view generation plus sparsity-aware replication accurately fills under-sampled regions of the Gaussian field
    Invoked in the sparsity-handling component of the abstract

pith-pipeline@v0.9.0 · 5539 in / 1428 out tokens · 23252 ms · 2026-05-07T08:55:51.177625+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

79 extracted references · 20 canonical work pages · 4 internal anchors

  1. [1]

    Atp: Adaptive threshold pruning for efficient data encoding in quantum neural networks

    Mohamed Afane, Gabrielle Ebbrecht, Ying Wang, Juntao Chen, and Junaid Farooq. Atp: Adaptive threshold pruning for efficient data encoding in quantum neural networks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20427–20436, 2025

  2. [2]

    Distractor-free generalizable 3d gaussian splatting.arXiv preprint arXiv:2411.17605, 2024

    Yanqi Bao, Jing Liao, Jing Huo, and Yang Gao. Distractor-free generalizable 3d gaussian splatting.arXiv preprint arXiv:2411.17605, 2024

  3. [3]

    Nerf-hugs: Improved neural radi- ance fields in non-static scenes using heuristics-guided segmentation

    Jiahao Chen, Yipeng Qin, Lingjie Liu, Jiangbo Lu, and Guanbin Li. Nerf-hugs: Improved neural radi- ance fields in non-static scenes using heuristics-guided segmentation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19436–19446, 2024

  4. [4]

    Hallucinated neural radiance fields in the wild

    Xingyu Chen, Qi Zhang, Xiaoyu Li, Yue Chen, Ying Feng, Xuan Wang, and Jue Wang. Hallucinated neural radiance fields in the wild. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12943–12952, 2022

  5. [5]

    Hac: Hash-grid assisted context for 3d gaussian splatting compression

    Yihang Chen, Qianyi Wu, Weiyao Lin, Mehrtash Harandi, and Jianfei Cai. Hac: Hash-grid assisted context for 3d gaussian splatting compression. InProceedings of the European Conference on Computer Vision, pages 422–438. Springer, 2024

  6. [6]

    Hac++: Towards 100x compression of 3d gaussian splatting.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

    Yihang Chen, Qianyi Wu, Weiyao Lin, Mehrtash Harandi, and Jianfei Cai. Hac++: Towards 100x compression of 3d gaussian splatting.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

  7. [7]

    Perspective-aware 3d gaussian inpainting with multi-view consistency.arXiv preprint arXiv:2510.10993, 2025

    Yuxin Cheng, Binxiao Huang, Taiqiang Wu, Wenyong Zhou, Chenchen Ding, Zhengwu Liu, Graziano Chesi, and Ngai Wong. Perspective-aware 3d gaussian inpainting with multi-view consistency.arXiv preprint arXiv:2510.10993, 2025

  8. [8]

    Depth-regularized optimization for 3d gaussian splatting in few-shot images

    Jaeyoung Chung, Jeongtaek Oh, and Kyoung Mu Lee. Depth-regularized optimization for 3d gaussian splatting in few-shot images. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 811–820, 2024

  9. [9]

    Eap-gs: Efficient augmentation of pointcloud for 3d gaussian splatting in few-shot scene reconstruction

    Dongrui Dai and Yuxiang Xing. Eap-gs: Efficient augmentation of pointcloud for 3d gaussian splatting in few-shot scene reconstruction. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16498–16507, 2025

  10. [10]

    Robustsplat: Decoupling densification and dynamics for transient-free 3dgs.arXiv preprint arXiv:2506.02751, 2025

    Chuanyu Fu, Yuqi Zhang, Kunbin Yao, Guanying Chen, Yuan Xiong, Chuan Huang, Shuguang Cui, and Xiaochun Cao. Robustsplat: Decoupling densification and dynamics for transient-free 3dgs.arXiv preprint arXiv:2506.02751, 2025

  11. [11]

    Gaussianocc: Fully self-supervised and efficient 3d occupancy estimation with gaussian splatting

    Wanshui Gan, Fang Liu, Hongbin Xu, Ningkai Mo, and Naoto Yokoya. Gaussianocc: Fully self-supervised and efficient 3d occupancy estimation with gaussian splatting. InProceedings of the IEEE/CVF Interna- tional Conference on Computer Vision, pages 28980–28990, 2025

  12. [12]

    Lora: Low-rank adaptation of large language models

    Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low-rank adaptation of large language models. InProceedings of the International Conference on Learning Representations, 2022

  13. [13]

    No pose at all: Self-supervised pose-free 3d gaussian splatting from sparse views

    Ranran Huang and Krystian Mikolajczyk. No pose at all: Self-supervised pose-free 3d gaussian splatting from sparse views. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 27947–27957, 2025

  14. [14]

    Enerverse: Envisioning embodied future space for robotics manipulation

    Siyuan Huang, Liliang Chen, Pengfei Zhou, Shengcong Chen, Yue Liao, Zhengkai Jiang, Yue Hu, Peng Gao, Hongsheng Li, Maoqing Yao, and Guanghui Ren. Enerverse: Envisioning embodied future space for robotics manipulation. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026

  15. [15]

    Gaussiannexus: Room- scale real-time ar/vr telepresence with gaussian splatting

    Xincheng Huang, Dieter Frehlich, Ziyi Xia, Peyman Gholami, and Robert Xiao. Gaussiannexus: Room- scale real-time ar/vr telepresence with gaussian splatting. InProceedings of the ACM Symposium on User Interface Software and Technology, pages 1–18, 2025

  16. [16]

    3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42(4):139–1, 2023

    Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42(4):139–1, 2023

  17. [17]

    Adam: A Method for Stochastic Optimization

    Diederik P Kingma. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980, 2014

  18. [18]

    Segment anything

    Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, et al. Segment anything. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4015–4026, 2023. 10

  19. [19]

    Generative sparse-view gaussian splatting

    Hanyang Kong, Xingyi Yang, and Xinchao Wang. Generative sparse-view gaussian splatting. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 26745–26755, 2025

  20. [20]

    Wildgaussians: 3d gaussian splatting in the wild.arXiv preprint arXiv:2407.08447, 2024

    Jonas Kulhanek, Songyou Peng, Zuzana Kukelova, Marc Pollefeys, and Torsten Sattler. Wildgaussians: 3d gaussian splatting in the wild.arXiv preprint arXiv:2407.08447, 2024

  21. [21]

    Compact 3d gaussian representation for radiance field

    Joo Chan Lee, Daniel Rho, Xiangyu Sun, Jong Hwan Ko, and Eunbyung Park. Compact 3d gaussian representation for radiance field. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21719–21728, 2024

  22. [22]

    Ms-gs: Multi-appearance sparse-view 3d gaussian splatting in the wild

    Deming Li, Kaiwen Jiang, Yutao Tang, Ravi Ramamoorthi, Rama Chellappa, and Cheng Peng. Ms-gs: Multi-appearance sparse-view 3d gaussian splatting in the wild. InProceedings of the Conference on Neural Information Processing Systems, 2025

  23. [23]

    Dngaussian: Optimizing sparse-view 3d gaussian radiance fields with global-local depth normalization

    Jiahe Li, Jiawei Zhang, Xiao Bai, Jin Zheng, Xin Ning, Jun Zhou, and Lin Gu. Dngaussian: Optimizing sparse-view 3d gaussian radiance fields with global-local depth normalization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20775–20785, 2024

  24. [24]

    Nerf-ms: Neural radiance fields with multi-sequence

    Peihao Li, Shaohui Wang, Chen Yang, Bingbing Liu, Weichao Qiu, and Haoqian Wang. Nerf-ms: Neural radiance fields with multi-sequence. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 18591–18600, 2023

  25. [25]

    Sparsegs-w: Sparse-view 3d gaussian splatting in the wild with generative priors.arXiv preprint arXiv:2503.19452,

    Yiqing Li, Xuan Wang, Jiawei Wu, Yikun Ma, and Zhi Jin. Sparsegs-w: Sparse-view 3d gaussian splatting in the wild with generative priors.arXiv preprint arXiv:2503.19452, 2025

  26. [26]

    Align your gaussians: Text-to-4d with dynamic 3d gaussians and composed diffusion models

    Huan Ling, Seung Wook Kim, Antonio Torralba, Sanja Fidler, and Karsten Kreis. Align your gaussians: Text-to-4d with dynamic 3d gaussians and composed diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8576–8588, 2024

  27. [27]

    3dgs-enhancer: Enhancing unbounded 3d gaussian splatting with view-consistent 2d diffusion priors

    Xi Liu, Chaoyi Zhou, and Siyu Huang. 3dgs-enhancer: Enhancing unbounded 3d gaussian splatting with view-consistent 2d diffusion priors. InProceedings of the Advances in Neural Information Processing Systems, volume 37, pages 133305–133327, 2024

  28. [28]

    Feed-forward 3d gaussian splatting compression with long-context modeling.arXiv preprint arXiv:2512.00877, 2025

    Zhening Liu, Rui Song, Yushi Huang, Yingdong Hu, Xinjie Zhang, Jiawei Shao, Zehong Lin, and Jun Zhang. Feed-forward 3d gaussian splatting compression with long-context modeling.arXiv preprint arXiv:2512.00877, 2025

  29. [29]

    Scaffold-gs: Structured 3d gaussians for view-adaptive rendering

    Tao Lu, Mulin Yu, Linning Xu, Yuanbo Xiangli, Limin Wang, Dahua Lin, and Bo Dai. Scaffold-gs: Structured 3d gaussians for view-adaptive rendering. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20654–20664, 2024

  30. [30]

    Nerf in the wild: Neural radiance fields for unconstrained photo collections

    Ricardo Martin-Brualla, Noha Radwan, Mehdi SM Sajjadi, Jonathan T Barron, Alexey Dosovitskiy, and Daniel Duckworth. Nerf in the wild: Neural radiance fields for unconstrained photo collections. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7210–7219, 2021

  31. [31]

    Local light field fusion: Practical view synthesis with prescriptive sampling guidelines.ACM Transactions on Graphics, 38(4):1–14, 2019

    Ben Mildenhall, Pratul P Srinivasan, Rodrigo Ortiz-Cayon, Nima Khademi Kalantari, Ravi Ramamoorthi, Ren Ng, and Abhishek Kar. Local light field fusion: Practical view synthesis with prescriptive sampling guidelines.ACM Transactions on Graphics, 38(4):1–14, 2019

  32. [32]

    Nerf: Representing scenes as neural radiance fields for view synthesis.Communications of the ACM, 65(1):99–106, 2021

    Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view synthesis.Communications of the ACM, 65(1):99–106, 2021

  33. [33]

    Compressed 3d gaussian splatting for accelerated novel view synthesis

    Simon Niedermayr, Josef Stumpfegger, and Rüdiger Westermann. Compressed 3d gaussian splatting for accelerated novel view synthesis. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10349–10358, 2024

  34. [34]

    Regnerf: Regularizing neural radiance fields for view synthesis from sparse inputs

    Michael Niemeyer, Jonathan T Barron, Ben Mildenhall, Mehdi SM Sajjadi, Andreas Geiger, and Noha Radwan. Regnerf: Regularizing neural radiance fields for view synthesis from sparse inputs. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5470–5480, 2021

  35. [35]

    DINOv2: Learning Robust Visual Features without Supervision

    Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervision.arXiv preprint arXiv:2304.07193, 2023

  36. [36]

    Ri3d: Few-shot gaussian splatting with repair and inpainting diffusion priors.arXiv preprint arXiv:2503.10860, 2025

    Avinash Paliwal, Xilong Zhou, Wei Ye, Jinhui Xiong, Rakesh Ranjan, and Nima Khademi Kalantari. Ri3d: Few-shot gaussian splatting with repair and inpainting diffusion priors.arXiv preprint arXiv:2503.10860, 2025. 11

  37. [37]

    Dropgaussian: Structural regularization for sparse-view gaus- sian splatting

    Hyunwoo Park, Gun Ryu, and Wonjun Kim. Dropgaussian: Structural regularization for sparse-view gaus- sian splatting. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21600–21609, 2025

  38. [38]

    Forestsplats: Deformable transient field for gaussian splatting in the wild.arXiv preprint arXiv:2503.06179, 2025

    Wongi Park, Myeongseok Nam, Siwon Kim, Sangwoo Jo, and Soomok Lee. Forestsplats: Deformable transient field for gaussian splatting in the wild.arXiv preprint arXiv:2503.06179, 2025

  39. [39]

    DreamFusion: Text-to-3D using 2D Diffusion

    Ben Poole, Ajay Jain, Jonathan T Barron, and Ben Mildenhall. Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988, 2022

  40. [40]

    Vision transformers for dense prediction

    René Ranftl, Alexey Bochkovskiy, and Vladlen Koltun. Vision transformers for dense prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 12179–12188, 2021

  41. [41]

    L4gm: Large 4d gaussian reconstruction model

    Jiawei Ren, Cheng Xie, Ashkan Mirzaei, Karsten Kreis, Ziwei Liu, Antonio Torralba, Sanja Fidler, Seung Wook Kim, Huan Ling, et al. L4gm: Large 4d gaussian reconstruction model. InProceedings of the Advances in Neural Information Processing Systems, volume 37, pages 56828–56858, 2024

  42. [42]

    Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks

    Tianhe Ren, Shilong Liu, Ailing Zeng, Jing Lin, Kunchang Li, He Cao, Jiayu Chen, Xinyu Huang, Yukang Chen, Feng Yan, et al. Grounded sam: Assembling open-world models for diverse visual tasks.arXiv preprint arXiv:2401.14159, 2024

  43. [43]

    Nerf on-the-go: Exploiting uncertainty for distractor-free nerfs in the wild

    Weining Ren, Zihan Zhu, Boyang Sun, Jiaqi Chen, Marc Pollefeys, and Songyou Peng. Nerf on-the-go: Exploiting uncertainty for distractor-free nerfs in the wild. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8931–8940, 2024

  44. [44]

    Gen3c: 3d-informed world-consistent video generation with precise camera control

    Xuanchi Ren, Tianchang Shen, Jiahui Huang, Huan Ling, Yifan Lu, Merlin Nimier-David, Thomas Müller, Alexander Keller, Sanja Fidler, and Jun Gao. Gen3c: 3d-informed world-consistent video generation with precise camera control. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6121–6132, 2025

  45. [45]

    Spotlesssplats: Ignoring distractors in 3d gaussian splatting.arXiv preprint arXiv:2406.20055, 2024

    Sara Sabour, Lily Goli, George Kopanas, Mark Matthews, Dmitry Lagun, Leonidas Guibas, Alec Jacobson, David J Fleet, and Andrea Tagliasacchi. Spotlesssplats: Ignoring distractors in 3d gaussian splatting.arXiv preprint arXiv:2406.20055, 2024

  46. [46]

    Zeronvs: Zero-shot 360-degree view synthesis from a single image

    Kyle Sargent, Zizhang Li, Tanmay Shah, Charles Herrmann, Hong-Xing Yu, Yunzhi Zhang, Eric Ryan Chan, Dmitry Lagun, Li Fei-Fei, Deqing Sun, et al. Zeronvs: Zero-shot 360-degree view synthesis from a single image. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9420–9429, 2024

  47. [47]

    Structure-from-motion revisited

    Johannes L Schönberger and Jan-Michael Frahm. Structure-from-motion revisited. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4104–4113, 2016

  48. [48]

    Pixelwise view selection for unstructured multi-view stereo

    Johannes L Schönberger, Enliang Zheng, Jan-Michael Frahm, and Marc Pollefeys. Pixelwise view selection for unstructured multi-view stereo. InProceedings of the European Conference on Computer Vision, pages 501–518. Springer, 2016

  49. [49]

    Gaussianshopvr: Facilitating immersive 3d authoring using gaussian splatting in vr

    Yulin Shen, Boyu Li, Jiayang Huang, David Yip, and Zeyu Wang. Gaussianshopvr: Facilitating immersive 3d authoring using gaussian splatting in vr. InProceedings of the ACM Symposium on User Interface Software and Technology, pages 1–14, 2025

  50. [50]

    arXiv preprint arXiv:2310.15110 , year=

    Ruoxi Shi, Hansheng Chen, Zhuoyang Zhang, Minghua Liu, Chao Xu, Xinyue Wei, Linghao Chen, Chong Zeng, and Hao Su. Zero123++: A single image to consistent multi-view diffusion base model.arXiv preprint arXiv:2310.15110, 2023

  51. [51]

    Photo tourism: Exploring photo collections in 3d

    Noah Snavely, Steven M Seitz, and Richard Szeliski. Photo tourism: Exploring photo collections in 3d. In Proceedings of the ACM SIGGRAPH Papers, pages 835–846. ACM, 2006

  52. [52]

    Simplenerf: Regularizing sparse input neural radiance fields with simpler solutions

    Nagabhushan Somraj, Adithyan Karanayil, and Rajiv Soundararajan. Simplenerf: Regularizing sparse input neural radiance fields with simpler solutions. InProceedings of the SIGGRAPH Asia Conference, pages 1–11, 2023

  53. [53]

    Dronesplat: 3d gaussian splatting for robust 3d reconstruction from in-the-wild drone imagery

    Jiadong Tang, Yu Gao, Dianyi Yang, Liqi Yan, Yufeng Yue, and Yi Yang. Dronesplat: 3d gaussian splatting for robust 3d reconstruction from in-the-wild drone imagery. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 833–843, 2025

  54. [54]

    Emergent correspondence from image diffusion

    Luming Tang, Menglin Jia, Qianqian Wang, Cheng Perng Phoo, and Bharath Hariharan. Emergent correspondence from image diffusion. InProceedings of the Advances in Neural Information Processing Systems, volume 36, pages 1363–1389, 2023. 12

  55. [55]

    Nexussplats: Efficient 3d gaussian splatting in the wild.arXiv preprint arXiv:2411.14514, 2024

    Yuzhou Tang, Dejun Xu, Yongjie Hou, Zhenzhong Wang, and Min Jiang. Nexussplats: Efficient 3d gaussian splatting in the wild.arXiv preprint arXiv:2411.14514, 2024

  56. [56]

    Vrsplat: Fast and robust gaussian splatting for virtual reality.Proceedings of the ACM on Computer Graphics and Interactive Techniques, 8(1):1–22, 2025

    Xuechang Tu, Lukas Radl, Michael Steiner, Markus Steinberger, Bernhard Kerbl, and Fernando de la Torre. Vrsplat: Fast and robust gaussian splatting for virtual reality.Proceedings of the ACM on Computer Graphics and Interactive Techniques, 8(1):1–22, 2025

  57. [57]

    Image quality assessment: From error visibility to structural similarity.IEEE Transactions on Image Processing, 13(4):600–612, 2004

    Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. Image quality assessment: From error visibility to structural similarity.IEEE Transactions on Image Processing, 13(4):600–612, 2004

  58. [58]

    Parkgaussian: Surround-view 3d gaussian splatting for autonomous parking.arXiv preprint arXiv:2601.01386, 2026

    Xiaobao Wei, Zhangjie Ye, Yuxiang Gu, Zunjie Zhu, Yunfei Guo, Yingying Shen, Shan Zhao, Ming Lu, Haiyang Sun, Bing Wang, et al. Parkgaussian: Surround-view 3d gaussian splatting for autonomous parking.arXiv preprint arXiv:2601.01386, 2026

  59. [59]

    Difix3d+: Improving 3d reconstructions with single-step diffusion models

    Jay Zhangjie Wu, Yuxuan Zhang, Haithem Turki, Xuanchi Ren, Jun Gao, Mike Zheng Shou, Sanja Fidler, Zan Gojcic, and Huan Ling. Difix3d+: Improving 3d reconstructions with single-step diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 26024–26035, 2025

  60. [60]

    Bg-triangle: Bézier gaussian triangle for 3d vectorization and rendering

    Minye Wu, Haizhao Dai, Kaixin Yao, Tinne Tuytelaars, and Jingyi Yu. Bg-triangle: Bézier gaussian triangle for 3d vectorization and rendering. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16197–16207, 2025

  61. [61]

    Reconfusion: 3d reconstruction with diffusion priors

    Rundi Wu, Ben Mildenhall, Philipp Henzler, Keunhong Park, Ruiqi Gao, Daniel Watson, Pratul P Srinivasan, Dor Verbin, Jonathan T Barron, Ben Poole, et al. Reconfusion: 3d reconstruction with diffusion priors. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21551– 21561, 2024

  62. [62]

    Diffusionerf: Regularizing neural radiance fields with denoising diffusion models

    Jamie Wynn and Daniyar Turmukhambetov. Diffusionerf: Regularizing neural radiance fields with denoising diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4180–4189, 2023

  63. [63]

    SparseGS: Real-time 360 ◦ sparse view synthesis using gaussian splat- ting,

    Haolin Xiong, Sairisheek Muttukuru, Rishi Upadhyay, Pradyumna Chari, and Achuta Kadambi. Sparsegs: Real-time 360 {\deg} sparse view synthesis using gaussian splatting.arXiv preprint arXiv:2312.00206, 2023

  64. [64]

    Wild-gs: Real-time novel view synthesis from unconstrained photo collections.arXiv preprint arXiv:2406.10373, 2024

    Jiacong Xu, Yiqun Mei, and Vishal M Patel. Wild-gs: Real-time novel view synthesis from unconstrained photo collections.arXiv preprint arXiv:2406.10373, 2024

  65. [65]

    Ad-gs: Object-aware b-spline gaussian splatting for self-supervised autonomous driving

    Jiawei Xu, Kai Deng, Zexin Fan, Shenlong Wang, Jin Xie, and Jian Yang. Ad-gs: Object-aware b-spline gaussian splatting for self-supervised autonomous driving. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 24770–24779, 2025

  66. [66]

    Freenerf: Improving few-shot neural rendering with free frequency regularization

    Jiawei Yang, Marco Pavone, and Yue Wang. Freenerf: Improving few-shot neural rendering with free frequency regularization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8254–8263, 2023

  67. [67]

    Depth anything: Unleashing the power of large-scale unlabeled data

    Lihe Yang, Bingyi Kang, Zilong Huang, Xiaogang Xu, Jiashi Feng, and Hengshuang Zhao. Depth anything: Unleashing the power of large-scale unlabeled data. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10371–10381, 2024

  68. [68]

    Cross-ray neural radiance fields for novel-view synthesis from unconstrained image collections

    Yifan Yang, Shuhai Zhang, Zixiong Huang, Yubing Zhang, and Mingkui Tan. Cross-ray neural radiance fields for novel-view synthesis from unconstrained image collections. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 15901–15911, 2023

  69. [69]

    Fewviewgs: Gaussian splatting with few view matching and multi-stage training

    Ruihong Yin, Vladimir Yugay, Yue Li, Sezer Karaoglu, and Theo Gevers. Fewviewgs: Gaussian splatting with few view matching and multi-stage training. InProceedings of the Advances in Neural Information Processing Systems, volume 37, pages 127204–127225, 2024

  70. [70]

    arXiv preprint arXiv:2508.09667 , year=

    Xingyilang Yin, Qi Zhang, Jiahao Chang, Ying Feng, Qingnan Fan, Xi Yang, Chi-Man Pun, Huaqi Zhang, and Xiaodong Cun. Gsfixer: Improving 3d gaussian splatting with reference-guided video diffusion priors. arXiv preprint arXiv:2508.09667, 2025

  71. [71]

    Mip-splatting: Alias-free 3d gaussian splatting

    Zehao Yu, Anpei Chen, Binbin Huang, Torsten Sattler, and Andreas Geiger. Mip-splatting: Alias-free 3d gaussian splatting. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19447–19456, 2024. 13

  72. [72]

    Frequency-aware density control via reparameter- ization for high-quality rendering of 3d gaussian splatting

    Zhaojie Zeng, Yuesong Wang, Lili Ju, and Tao Guan. Frequency-aware density control via reparameter- ization for high-quality rendering of 3d gaussian splatting. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 9833–9841, 2025

  73. [73]

    Gaussian in the wild: 3d gaussian splatting for unconstrained image collections

    Dongbin Zhang, Chuming Wang, Weitao Wang, Peihao Li, Minghan Qin, and Haoqian Wang. Gaussian in the wild: 3d gaussian splatting for unconstrained image collections. InProceedings of the European Conference on Computer Vision, pages 341–359. Springer, 2024

  74. [74]

    Su-rgs: Relightable 3d gaussian splatting from sparse views under unconstrained illuminations

    Qi Zhang, Chi Huang, Qian Zhang, Nan Li, and Wei Feng. Su-rgs: Relightable 3d gaussian splatting from sparse views under unconstrained illuminations. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 26859–26868, 2025

  75. [75]

    The unreasonable effectiveness of deep features as a perceptual metric

    Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 586–595, 2018

  76. [76]

    Pixel-gs: Density control with pixel-aware gradient for 3d gaussian splatting

    Zheng Zhang, Wenbo Hu, Yixing Lao, Tong He, and Hengshuang Zhao. Pixel-gs: Density control with pixel-aware gradient for 3d gaussian splatting. InProceedings of the European Conference on Computer Vision, pages 326–342. Springer, 2024

  77. [77]

    Self-ensembling gaussian splatting for few-shot novel view synthesis

    Chen Zhao, Xuan Wang, Tong Zhang, Saqib Javed, and Mathieu Salzmann. Self-ensembling gaussian splatting for few-shot novel view synthesis. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 4940–4950, 2025

  78. [78]

    Drivedreamer4d: World models are effective data machines for 4d driving scene representation

    Guosheng Zhao, Chaojun Ni, Xiaofeng Wang, Zheng Zhu, Xueyang Zhang, Yida Wang, Guan Huang, Xinze Chen, Boyuan Wang, Youyi Zhang, et al. Drivedreamer4d: World models are effective data machines for 4d driving scene representation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12015–12026, 2025

  79. [79]

    dynamic” inTtext effectively captured transient elements. Inspired by this, we included the word “dynamic

    Zehao Zhu, Zhiwen Fan, Yifan Jiang, and Zhangyang Wang. Fsgs: Real-time few-shot view synthesis using gaussian splatting. InProceedings of the European Conference on Computer Vision, pages 145–163. Springer, 2024. 14 Appendix In this appendix, we provide an additional discussion, more experimental results, and other technical details. We organize the appe...