pith. machine review for the scientific record. sign in

arxiv: 2604.21631 · v1 · submitted 2026-04-23 · 💻 cs.CV

Recognition: unknown

DualSplat: Robust 3D Gaussian Splatting via Pseudo-Mask Bootstrapping from Reconstruction Failures

Authors on Pith no claims yet

Pith reviewed 2026-05-09 22:08 UTC · model grok-4.3

classification 💻 cs.CV
keywords 3D Gaussian Splattingtransient objectspseudo-masksrobust reconstructionmulti-view consistencyphotometric residualsfailure bootstrappingneural rendering
0
0 comments X

The pith

A two-stage method turns failures from an initial 3D Gaussian Splatting pass into pseudo-masks that enable clean reconstruction despite transient objects in the input images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses the circular problem in 3D Gaussian Splatting where transient objects that move between views break multi-view consistency and ruin photorealistic rendering, yet detecting those transients reliably requires an already clean scene model. It shows that a conservative first training pass leaves detectable incomplete fragments of those transients, which can be turned into object-level pseudo-masks by combining photometric residuals, feature mismatches, and instance segmentation boundaries. These masks then drive a second optimization pass that produces a static scene representation while an online network gradually reduces reliance on the masks in favor of self-consistency. A reader should care because everyday photo collections for 3D capture almost always contain moving people, cars, or foliage; solving the circular dependency makes real-time rendering practical without manual cleanup or perfect input data.

Core claim

DualSplat is a Failure-to-Prior framework that converts first-pass reconstruction failures into explicit priors for a second reconstruction stage. Transients that appear in only a subset of views often manifest as incomplete fragments during conservative initial training. These fragments are exploited to construct object-level pseudo-masks by fusing photometric residuals, feature mismatches, and instance boundaries. The pseudo-masks guide clean second-pass 3DGS optimization, while a lightweight MLP refines them online by shifting from prior supervision to self-consistency.

What carries the argument

The Failure-to-Prior framework that bootstraps object-level pseudo-masks from reconstruction failures by fusing photometric residuals, feature mismatches, and instance boundaries to supervise a second clean optimization pass.

If this is right

  • The second-pass optimization produces higher-quality static scene models on datasets with transient objects.
  • Advantages are largest in scenes where transients occupy many views or cover large image regions.
  • The online mask refinement reduces dependence on the initial pseudo-masks and improves final consistency.
  • Real-time rendering remains possible because the final representation is still a standard 3D Gaussian Splatting model.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same failure-to-prior idea could be tested on other neural rendering backbones that also assume multi-view consistency.
  • If the pseudo-masks prove stable across time, the approach might extend to short video sequences by treating refined masks as temporal priors.
  • Scenes with strong lighting changes or reflections could serve as a natural stress test for whether photometric residuals remain informative.
  • Combining the method with camera pose refinement might further reduce the impact of any residual mask errors.

Load-bearing premise

Transients appearing in only some views will reliably produce incomplete fragments in a conservative first reconstruction that can be turned into accurate object-level pseudo-masks using residuals and boundaries.

What would settle it

A test set of scenes containing transient objects where the first-pass residuals and mismatches are too weak or noisy to produce pseudo-masks that measurably improve second-pass rendering quality over a single-pass baseline.

Figures

Figures reproduced from arXiv: 2604.21631 by Chengwei Pan, Shiyun Xie, Xu Wang, Yisong Chen, Zhiru Wang.

Figure 1
Figure 1. Figure 1: Transient objects in the training images introduce noticeable artifacts in the reconstruction results. Compared with other methods, our approach achieves higher fidelity and more accurate suppression in transient regions. Abstract While 3D Gaussian Splatting (3DGS) achieves real￾time photorealistic rendering, its performance degrades significantly when training images contain transient ob￾jects that violat… view at source ↗
Figure 2
Figure 2. Figure 2: We select three scenes from the NeRF On-the-go dataset [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: DualSplat performs two-stage 3D Gaussian Splatting to suppress transient distractions. The first stage reconstructs a coarse static [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Visualization results using different pretrained mod￾els as feature extractors. FiT3D, when used as the feature extrac￾tor, produces the most distinct feature visualization and accurately identifies transient objects. where L1 denotes per-pixel L1 residual. We retain m as a transient prior only when it simultaneously exhibits low feature consistency and high photometric error, i.e., µm ≤ τsim, ¯ℓm ≥ τL1. (… view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative results on Spot and Mountain from the NeRF On-the-go dataset. [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative results on Statue and Android from the RobustNeRF dataset. [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
read the original abstract

While 3D Gaussian Splatting (3DGS) achieves real-time photorealistic rendering, its performance degrades significantly when training images contain transient objects that violate multi-view consistency. Existing methods face a circular dependency: accurate transient detection requires a well-reconstructed static scene, while clean reconstruction itself depends on reliable transient masks. We address this challenge with DualSplat, a Failure-to-Prior framework that converts first-pass reconstruction failures into explicit priors for a second reconstruction stage. We observe that transients, which appear in only a subset of views, often manifest as incomplete fragments during conservative initial training. We exploit these failures to construct object-level pseudo-masks by combining photometric residuals, feature mismatches, and SAM2 instance boundaries. These pseudo-masks then guide a clean second-pass 3DGS optimization, while a lightweight MLP refines them online by gradually shifting from prior supervision to self-consistency. Experiments on RobustNeRF and NeRF On-the-go show that DualSplat outperforms existing baselines, demonstrating particularly clear advantages in transient-heavy scenes and transient regions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes DualSplat, a Failure-to-Prior framework for robust 3D Gaussian Splatting that breaks the circular dependency between transient detection and clean reconstruction. A conservative first-pass 3DGS produces incomplete transient fragments; these are converted into object-level pseudo-masks by fusing photometric residuals, feature mismatches, and SAM2 instance boundaries. The pseudo-masks then supervise a second-pass 3DGS optimization while a lightweight MLP refines them online, shifting from prior supervision to self-consistency. Experiments claim outperformance over baselines on RobustNeRF and NeRF On-the-go, with largest gains in transient-heavy scenes.

Significance. If the pseudo-masks are shown to be reliable, the approach would provide a practical, largely self-supervised route to handling transients in real-world 3DGS captures without manual masks or heavy external supervision. The inversion of reconstruction failures into explicit priors is conceptually clean and could generalize to other neural rendering pipelines.

major comments (2)
  1. [Abstract] Abstract: the central claim of reliable pseudo-mask construction and consequent second-pass gains rests on the unverified assumption that first-pass residuals and SAM2 boundaries isolate transients. No IoU, precision, or mask-quality metrics against ground-truth transients are reported, leaving open the possibility that view-dependent lighting or inconsistent SAM2 segments produce false positives/negatives that propagate into the second stage.
  2. [Method] Method (pseudo-mask generation and MLP refinement): the fusion step combining photometric residuals, feature mismatches, and SAM2 boundaries is described only at high level. Without an ablation isolating each cue or a quantitative mask evaluation before the second-pass optimization, it is impossible to confirm that the Failure-to-Prior mechanism is load-bearing rather than an artifact of the particular datasets.
minor comments (2)
  1. [Experiments] The manuscript should report error bars, implementation details (learning rates, number of Gaussians, SAM2 prompt strategy), and full ablation tables for both mask quality and final rendering metrics.
  2. Clarify the exact loss terms, architecture, and training schedule of the lightweight MLP; the shift from prior supervision to self-consistency is mentioned but not formalized.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which identify opportunities to strengthen the empirical validation of our pseudo-mask generation. We address each major point below and will revise the manuscript to incorporate additional details and evaluations.

read point-by-point responses
  1. Referee: [Abstract] the central claim of reliable pseudo-mask construction and consequent second-pass gains rests on the unverified assumption that first-pass residuals and SAM2 boundaries isolate transients. No IoU, precision, or mask-quality metrics against ground-truth transients are reported, leaving open the possibility that view-dependent lighting or inconsistent SAM2 segments produce false positives/negatives that propagate into the second stage.

    Authors: We agree that explicit quantitative mask metrics would provide stronger support for the reliability of the pseudo-masks. Although the primary evaluation metric in the paper is end-to-end rendering quality (which indirectly validates the masks via improved reconstruction), we will add IoU, precision, and recall against ground-truth transient labels on RobustNeRF in the revised version. We will also include qualitative analysis of potential failure modes arising from view-dependent effects or SAM2 inconsistencies. revision: yes

  2. Referee: [Method] the fusion step combining photometric residuals, feature mismatches, and SAM2 boundaries is described only at high level. Without an ablation isolating each cue or a quantitative mask evaluation before the second-pass optimization, it is impossible to confirm that the Failure-to-Prior mechanism is load-bearing rather than an artifact of the particular datasets.

    Authors: We will expand the Method section with a detailed description of the fusion process, including the specific weighting and combination rules for photometric residuals, feature mismatches, and SAM2 boundaries. We will also add an ablation study that isolates the contribution of each cue, together with quantitative mask-quality metrics evaluated prior to the second-pass optimization, to demonstrate that the mechanism is effective and generalizes beyond the evaluated datasets. revision: yes

Circularity Check

0 steps flagged

No significant circularity; two-stage bootstrapping breaks the stated dependency

full rationale

The paper explicitly identifies the circular dependency in prior work (accurate masks need clean reconstruction and vice versa) and claims to resolve it via a Failure-to-Prior pipeline: a conservative first-pass 3DGS produces incomplete transient fragments whose photometric residuals, feature mismatches, and SAM2 boundaries are fused into pseudo-masks that then supervise a second-pass optimization plus online MLP refinement. This is a sequential, non-self-referential procedure rather than a definitional loop, fitted parameter renamed as prediction, or load-bearing self-citation. No equations, uniqueness theorems, or ansatzes are shown that reduce the central claim to its own inputs by construction. The method rests on an empirical assumption about how transients appear in initial training, but that assumption is external to the derivation chain and does not create circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The approach rests on domain assumptions about how transients behave under conservative 3DGS training and on the reliability of combining residuals with an external segmentation model; no free parameters or new physical entities are explicitly introduced in the abstract.

axioms (2)
  • domain assumption Transients appear in only a subset of views and manifest as incomplete fragments during conservative initial training.
    Stated directly as the key observation enabling pseudo-mask construction.
  • domain assumption Photometric residuals, feature mismatches, and SAM2 instance boundaries together produce reliable object-level transient masks.
    Core premise of the mask-generation step.
invented entities (2)
  • Pseudo-masks bootstrapped from reconstruction failures no independent evidence
    purpose: To provide explicit priors that guide clean second-pass 3DGS optimization
    Newly derived from first-pass artifacts rather than supplied externally.
  • Lightweight MLP for online mask refinement no independent evidence
    purpose: To gradually shift supervision from the initial pseudo-masks to self-consistency during second-pass training
    Introduced as the mechanism for mask improvement.

pith-pipeline@v0.9.0 · 5500 in / 1459 out tokens · 46344 ms · 2026-05-09T22:08:33.938422+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

44 extracted references · 10 canonical work pages · 4 internal anchors

  1. [1]

    Mip-nerf: A multiscale representation for anti-aliasing neu- ral radiance fields

    Jonathan T Barron, Ben Mildenhall, Matthew Tancik, Peter Hedman, Ricardo Martin-Brualla, and Pratul P Srinivasan. Mip-nerf: A multiscale representation for anti-aliasing neu- ral radiance fields. InProceedings of the IEEE/CVF inter- national conference on computer vision, pages 5855–5864,

  2. [2]

    Mip-nerf 360: Unbounded anti-aliased neural radiance fields

    Jonathan T Barron, Ben Mildenhall, Dor Verbin, Pratul P Srinivasan, and Peter Hedman. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5470–5479, 2022. 2

  3. [3]

    Segment any 3d gaussians

    Jiazhong Cen, Jiemin Fang, Chen Yang, Lingxi Xie, Xi- aopeng Zhang, Wei Shen, and Qi Tian. Segment any 3d gaussians. InProceedings of the AAAI Conference on Ar- tificial Intelligence, pages 1971–1979, 2025. 3

  4. [4]

    Hallucinated neural radiance fields in the wild

    Xingyu Chen, Qi Zhang, Xiaoyu Li, Yue Chen, Ying Feng, Xuan Wang, and Jue Wang. Hallucinated neural radiance fields in the wild. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 12943–12952, 2022. 2

  5. [5]

    Plenoxels: Radiance fields without neural networks

    Sara Fridovich-Keil, Alex Yu, Matthew Tancik, Qinhong Chen, Benjamin Recht, and Angjoo Kanazawa. Plenoxels: Radiance fields without neural networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5501–5510, 2022. 2

  6. [6]

    Robustsplat: Decoupling densification and dynamics for transient-free 3dgs.arXiv preprint arXiv:2506.02751, 2025

    Chuanyu Fu, Yuqi Zhang, Kunbin Yao, Guanying Chen, Yuan Xiong, Chuan Huang, Shuguang Cui, and Xiaochun Cao. Robustsplat: Decoupling densification and dynamics for transient-free 3dgs.arXiv preprint arXiv:2506.02751,

  7. [7]

    Deep residual learning for image recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceed- ings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. 3, 8

  8. [8]

    3d gaussian splatting for real-time radiance field rendering.ACM Trans

    Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Trans. Graph., 42(4):139–1,

  9. [9]

    Segment any- thing

    Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer White- head, Alexander C Berg, Wan-Yen Lo, et al. Segment any- thing. InProceedings of the IEEE/CVF international confer- ence on computer vision, pages 4015–4026, 2023. 3

  10. [10]

    Wildgaussians: 3d gaussian splatting in the wild.arXiv preprint arXiv:2407.08447, 2024

    Jonas Kulhanek, Songyou Peng, Zuzana Kukelova, Marc Pollefeys, and Torsten Sattler. Wildgaussians: 3d gaussian splatting in the wild.arXiv preprint arXiv:2407.08447, 2024. 1, 2, 6, 7

  11. [11]

    arXiv preprint arXiv:2507.07136 (2025) 3 Contrastive Language-Colored Pointmap Pretraining 17

    Wanhua Li, Yujie Zhao, Minghan Qin, Yang Liu, Yuanhao Cai, Chuang Gan, and Hanspeter Pfister. Langsplatv2: High- dimensional 3d language gaussian splatting with 450+ fps. arXiv preprint arXiv:2507.07136, 2025. 3

  12. [12]

    Analytic-splatting: Anti-aliased 3d gaussian splatting via analytic integration

    Zhihao Liang, Qi Zhang, Wenbo Hu, Lei Zhu, Ying Feng, and Kui Jia. Analytic-splatting: Anti-aliased 3d gaussian splatting via analytic integration. InEuropean conference on computer vision, pages 281–297. Springer, 2024. 2

  13. [13]

    Hybridgs: Decou- pling transients and statics with 2d and 3d gaussian splatting

    Jingyu Lin, Jiaqi Gu, Lubin Fan, Bojian Wu, Yujing Lou, Renjie Chen, Ligang Liu, and Jieping Ye. Hybridgs: Decou- pling transients and statics with 2d and 3d gaussian splatting. InProceedings of the Computer Vision and Pattern Recogni- tion Conference, pages 788–797, 2025. 1, 3

  14. [14]

    Nerf in the wild: Neural radiance fields for uncon- strained photo collections

    Ricardo Martin-Brualla, Noha Radwan, Mehdi SM Sajjadi, Jonathan T Barron, Alexey Dosovitskiy, and Daniel Duck- worth. Nerf in the wild: Neural radiance fields for uncon- strained photo collections. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7210–7219, 2021. 2

  15. [15]

    Instant neural graphics primitives with a mul- tiresolution hash encoding.ACM transactions on graphics (TOG), 41(4):1–15, 2022

    Thomas M ¨uller, Alex Evans, Christoph Schied, and Alexan- der Keller. Instant neural graphics primitives with a mul- tiresolution hash encoding.ACM transactions on graphics (TOG), 41(4):1–15, 2022. 2

  16. [16]

    Unisurf: Unifying neural implicit surfaces and radiance fields for multi-view reconstruction

    Michael Oechsle, Songyou Peng, and Andreas Geiger. Unisurf: Unifying neural implicit surfaces and radiance fields for multi-view reconstruction. InProceedings of the IEEE/CVF international conference on computer vision, pages 5589–5599, 2021. 2

  17. [17]

    DINOv2: Learning Robust Visual Features without Supervision

    Maxime Oquab, Timoth ´ee Darcet, Th ´eo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193, 2023. 2, 3, 5, 8

  18. [18]

    3d vision-language gaussian splatting.arXiv preprint arXiv:2410.07577, 2024

    Qucheng Peng, Benjamin Planche, Zhongpai Gao, Meng Zheng, Anwesa Choudhuri, Terrence Chen, Chen Chen, and Ziyan Wu. 3d vision-language gaussian splatting.arXiv preprint arXiv:2410.07577, 2024. 3

  19. [19]

    Langsplat: 3d language gaussian splatting

    Minghan Qin, Wanhua Li, Jiawei Zhou, Haoqian Wang, and Hanspeter Pfister. Langsplat: 3d language gaussian splatting. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20051–20060, 2024. 3

  20. [20]

    SAM 2: Segment Anything in Images and Videos

    Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman R¨adle, Chloe Rolland, Laura Gustafson, et al. Sam 2: Segment anything in images and videos.arXiv preprint arXiv:2408.00714, 2024. 3, 4, 8

  21. [21]

    Nerf on-the-go: Exploiting uncertainty for distractor-free nerfs in the wild

    Weining Ren, Zihan Zhu, Boyang Sun, Jiaqi Chen, Marc Pollefeys, and Songyou Peng. Nerf on-the-go: Exploiting uncertainty for distractor-free nerfs in the wild. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8931–8940, 2024. 1, 2, 5, 6

  22. [22]

    High-resolution image synthesis with latent diffusion models

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022. 2, 8

  23. [23]

    Robustnerf: Ig- noring distractors with robust losses

    Sara Sabour, Suhani V ora, Daniel Duckworth, Ivan Krasin, David J Fleet, and Andrea Tagliasacchi. Robustnerf: Ig- noring distractors with robust losses. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 20626–20636, 2023. 1, 2, 3, 5, 6

  24. [24]

    Spotlesssplats: Ignoring distractors in 3d gaussian splatting.ACM Transactions on Graphics, 44(2):1–11, 2025

    Sara Sabour, Lily Goli, George Kopanas, Mark Matthews, Dmitry Lagun, Leonidas Guibas, Alec Jacobson, David Fleet, and Andrea Tagliasacchi. Spotlesssplats: Ignoring distractors in 3d gaussian splatting.ACM Transactions on Graphics, 44(2):1–11, 2025. 1, 2, 6, 7

  25. [25]

    DINOv3

    Oriane Sim ´eoni, Huy V V o, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Micha ¨el Ramamonjisoa, et al. Dinov3.arXiv preprint arXiv:2508.10104, 2025. 3

  26. [26]

    Very Deep Convolutional Networks for Large-Scale Image Recognition

    Karen Simonyan and Andrew Zisserman. Very deep convo- lutional networks for large-scale image recognition.arXiv preprint arXiv:1409.1556, 2014. 3, 8

  27. [27]

    arXiv preprint arXiv:2309.16653 , year=

    Jiaxiang Tang, Jiawei Ren, Hang Zhou, Ziwei Liu, and Gang Zeng. Dreamgaussian: Generative gaussian splatting for effi- cient 3d content creation.arXiv preprint arXiv:2309.16653,

  28. [28]

    Dronesplat: 3d gaussian splatting for robust 3d reconstruction from in-the-wild drone imagery

    Jiadong Tang, Yu Gao, Dianyi Yang, Liqi Yan, Yufeng Yue, and Yi Yang. Dronesplat: 3d gaussian splatting for robust 3d reconstruction from in-the-wild drone imagery. InProceed- ings of the Computer Vision and Pattern Recognition Con- ference, pages 833–843, 2025. 3

  29. [29]

    Emergent correspondence from image diffusion.Advances in Neural Information Pro- cessing Systems, 36:1363–1389, 2023

    Luming Tang, Menglin Jia, Qianqian Wang, Cheng Perng Phoo, and Bharath Hariharan. Emergent correspondence from image diffusion.Advances in Neural Information Pro- cessing Systems, 36:1363–1389, 2023. 2

  30. [30]

    Degauss: Dynamic-static decomposition with gaus- sian splatting for distractor-free 3d reconstruction

    Rui Wang, Quentin Lohmeyer, Mirko Meboldt, and Siyu Tang. Degauss: Dynamic-static decomposition with gaus- sian splatting for distractor-free 3d reconstruction. InPro- ceedings of the IEEE/CVF International Conference on Computer Vision, pages 6294–6303, 2025. 6, 7

  31. [31]

    Desplat: Decomposed gaussian splatting for distractor-free rendering

    Yihao Wang, Marcus Klasson, Matias Turkulainen, Shuzhe Wang, Juho Kannala, and Arno Solin. Desplat: Decomposed gaussian splatting for distractor-free rendering. InProceed- ings of the Computer Vision and Pattern Recognition Con- ference, pages 722–732, 2025. 1, 2, 6, 7

  32. [32]

    Sparse view synthesis using 3d gaussian splatting.arXiv preprint arXiv:2312.00206, 2025

    Haolin Xiong, Sairisheek Muttukuru, Rishi Upadhyay, Pradyumna Chari, and Achuta Kadambi. Sparsegs: Real- time 360{\deg}sparse view synthesis using gaussian splat- ting.arXiv preprint arXiv:2312.00206, 2023. 2

  33. [33]

    Wild-gs: Real- time novel view synthesis from unconstrained photo collec- tions.Advances in Neural Information Processing Systems, 37:103334–103355, 2024

    Jiacong Xu, Yiqun Mei, and Vishal Patel. Wild-gs: Real- time novel view synthesis from unconstrained photo collec- tions.Advances in Neural Information Processing Systems, 37:103334–103355, 2024. 2

  34. [34]

    Grid-guided neural radiance fields for large urban scenes

    Linning Xu, Yuanbo Xiangli, Sida Peng, Xingang Pan, Nanxuan Zhao, Christian Theobalt, Bo Dai, and Dahua Lin. Grid-guided neural radiance fields for large urban scenes. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 8296–8306, 2023. 2

  35. [35]

    Mvpgs: Excavating multi-view priors for gaussian splatting from sparse input views

    Wangze Xu, Huachen Gao, Shihe Shen, Rui Peng, Jianbo Jiao, and Ronggang Wang. Mvpgs: Excavating multi-view priors for gaussian splatting from sparse input views. In European Conference on Computer Vision, pages 203–220. Springer, 2024. 2

  36. [36]

    Multi-scale 3d gaussian splatting for anti-aliased rendering

    Zhiwen Yan, Weng Fei Low, Yu Chen, and Gim Hee Lee. Multi-scale 3d gaussian splatting for anti-aliased rendering. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20923–20931, 2024. 2

  37. [37]

    Depth any- thing v2.Advances in Neural Information Processing Sys- tems, 37:21875–21911, 2024

    Lihe Yang, Bingyi Kang, Zilong Huang, Zhen Zhao, Xiao- gang Xu, Jiashi Feng, and Hengshuang Zhao. Depth any- thing v2.Advances in Neural Information Processing Sys- tems, 37:21875–21911, 2024. 3, 5

  38. [38]

    Cross-ray neural radiance fields for novel- view synthesis from unconstrained image collections

    Yifan Yang, Shuhai Zhang, Zixiong Huang, Yubing Zhang, and Mingkui Tan. Cross-ray neural radiance fields for novel- view synthesis from unconstrained image collections. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15901–15911, 2023. 2

  39. [39]

    Gaussiandreamer: Fast generation from text to 3d gaussians by bridging 2d and 3d diffusion models

    Taoran Yi, Jiemin Fang, Junjie Wang, Guanjun Wu, Lingxi Xie, Xiaopeng Zhang, Wenyu Liu, Qi Tian, and Xinggang Wang. Gaussiandreamer: Fast generation from text to 3d gaussians by bridging 2d and 3d diffusion models. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6796–6807, 2024. 2

  40. [40]

    Mip-splatting: Alias-free 3d gaussian splat- ting

    Zehao Yu, Anpei Chen, Binbin Huang, Torsten Sattler, and Andreas Geiger. Mip-splatting: Alias-free 3d gaussian splat- ting. InProceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, pages 19447–19456,

  41. [41]

    Improving 2d feature representations by 3d-aware fine-tuning

    Yuanwen Yue, Anurag Das, Francis Engelmann, Siyu Tang, and Jan Eric Lenssen. Improving 2d feature representations by 3d-aware fine-tuning. InEuropean Conference on Com- puter Vision, pages 57–74. Springer, 2024. 3, 4, 8

  42. [42]

    Cor-gs: sparse-view 3d gaussian splatting via co-regularization

    Jiawei Zhang, Jiahe Li, Xiaohan Yu, Lei Huang, Lin Gu, Jin Zheng, and Xiao Bai. Cor-gs: sparse-view 3d gaussian splatting via co-regularization. InEuropean Conference on Computer Vision, pages 335–352. Springer, 2024. 2

  43. [43]

    Feature 3dgs: Supercharging 3d gaussian splatting to enable distilled feature fields

    Shijie Zhou, Haoran Chang, Sicheng Jiang, Zhiwen Fan, Ze- hao Zhu, Dejia Xu, Pradyumna Chari, Suya You, Zhangyang Wang, and Achuta Kadambi. Feature 3dgs: Supercharging 3d gaussian splatting to enable distilled feature fields. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21676–21685, 2024. 3

  44. [44]

    Fsgs: Real-time few-shot view synthesis using gaussian splatting

    Zehao Zhu, Zhiwen Fan, Yifan Jiang, and Zhangyang Wang. Fsgs: Real-time few-shot view synthesis using gaussian splatting. InEuropean conference on computer vision, pages 145–163. Springer, 2024. 2