pith. machine review for the scientific record. sign in

arxiv: 2604.05715 · v1 · submitted 2026-04-07 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

In Depth We Trust: Reliable Monocular Depth Supervision for Gaussian Splatting

Authors on Pith no claims yet

Pith reviewed 2026-05-10 19:23 UTC · model grok-4.3

classification 💻 cs.CV
keywords Gaussian Splattingmonocular depth supervisiondepth regularization3D reconstructiongeometric accuracynovel view synthesisneural rendering
0
0 comments X

The pith

Monocular depth priors improve Gaussian Splatting when ill-posed geometry is isolated for selective regularization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a training framework that incorporates scale-ambiguous and noisy monocular depth estimates as geometric supervision for 3D Gaussian Splatting. It stresses the value of learning local depth variations rather than absolute scales and introduces a method to identify and isolate regions of ill-posed geometry. Selective regularization is then applied only in those regions, preventing depth errors from affecting areas that are already well-reconstructed from the input views. Experiments across multiple datasets confirm gains in geometric fidelity, depth estimation accuracy, and final rendering quality for various splatting variants and depth model backbones. Readers would care because the approach makes high-quality 3D reconstruction feasible using only standard RGB images and off-the-shelf monocular estimators.

Core claim

We introduce a training framework integrating scale-ambiguous and noisy depth priors into geometric supervision for Gaussian Splatting. We highlight the importance of learning from weakly aligned depth variations. We introduce a method to isolate ill-posed geometry for selective monocular depth regularization, restricting the propagation of depth inaccuracies into well-reconstructed 3D structures.

What carries the argument

Isolation of ill-posed geometry for selective monocular depth regularization, which enables learning from local depth variations while limiting error propagation during optimization.

Load-bearing premise

The isolation procedure can correctly distinguish regions where monocular depths provide useful local signals from regions where they would introduce harmful inaccuracies.

What would settle it

A test on a dataset with known ground-truth depths showing that the selective method produces lower rendering quality or less accurate geometry than either no depth supervision or naive application of the priors would indicate the isolation step does not deliver the claimed benefit.

Figures

Figures reproduced from arXiv: 2604.05715 by Clinton Fookes, David Ahmedt-Aristizabal, Ethan Goan, Leo Lebrat, Olivier Salvado, Rodrigo Santa Cruz, Wenhui Xiao.

Figure 1
Figure 1. Figure 1: The quality of monocular depth priors directly impacts the rendering performance of 3D Gaussian Splatting (3DGS) [ [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Impact of SfM point count in scale alignment on GS [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Overview of our proposed framework. During GS optimization, depth-inconsistent Gaussians are identified using a virtual stereo [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Comparison of different relative depth supervision [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative results on 3DGS. (Top): low-data setting; (Bottom): moderate-data setting. Ground-truth 3DGS (w/o depth) Lsid Ours [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative comparisons of rendered images and depth maps on the ScanNet++ dataset for the baseline (without depth super [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
read the original abstract

Using accurate depth priors in 3D Gaussian Splatting helps mitigate artifacts caused by sparse training data and textureless surfaces. However, acquiring accurate depth maps requires specialized acquisition systems. Foundation monocular depth estimation models offer a cost-effective alternative, but they suffer from scale ambiguity, multi-view inconsistency, and local geometric inaccuracies, which can degrade rendering performance when applied naively. This paper addresses the challenge of reliably leveraging monocular depth priors for Gaussian Splatting (GS) rendering enhancement. To this end, we introduce a training framework integrating scale-ambiguous and noisy depth priors into geometric supervision. We highlight the importance of learning from weakly aligned depth variations. We introduce a method to isolate ill-posed geometry for selective monocular depth regularization, restricting the propagation of depth inaccuracies into well-reconstructed 3D structures. Extensive experiments across diverse datasets show consistent improvements in geometric accuracy, leading to more faithful depth estimation and higher rendering quality across different GS variants and monocular depth backbones tested.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper claims to introduce a training framework for 3D Gaussian Splatting that integrates scale-ambiguous and noisy monocular depth priors through selective regularization of ill-posed geometry. It emphasizes learning from weakly aligned depth variations and isolates problematic regions to prevent depth inaccuracies from propagating into well-reconstructed 3D structures, reporting consistent gains in geometric accuracy and rendering quality across datasets, GS variants, and depth backbones.

Significance. If the isolation procedure reliably distinguishes ill-posed geometry without under- or over-regularization, the work would provide a practical route to leverage off-the-shelf monocular depth estimators in novel-view synthesis, mitigating artifacts from sparse views and textureless surfaces while avoiding the cost of specialized depth hardware.

minor comments (2)
  1. The abstract asserts 'consistent improvements' and 'extensive experiments' but supplies no quantitative metrics, table references, or dataset names; adding a one-sentence summary of key numbers (e.g., PSNR or depth error deltas) would improve immediate readability.
  2. The description of the selective regularization term would benefit from an explicit equation or pseudocode block that distinguishes the proposed mask from standard depth-supervision losses.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary of our work and the recommendation for minor revision. The referee's description accurately reflects the paper's contributions regarding selective regularization of monocular depth priors in Gaussian Splatting. No specific major comments were provided in the report.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper introduces a novel training framework for incorporating scale-ambiguous monocular depth priors into Gaussian Splatting via selective isolation of ill-posed geometry. No load-bearing steps reduce by construction to fitted inputs, self-definitions, or self-citation chains; the method is described as a new procedure grounded in observed inconsistencies of monocular depth models, with claims supported by experiments across datasets and backbones. The derivation chain remains self-contained without renaming known results or smuggling ansatzes via citations.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no explicit free parameters, axioms, or invented entities are stated. The method implicitly assumes monocular depth provides usable relative geometry and that ill-posed regions can be identified without circular dependence on the depth signal itself.

pith-pipeline@v0.9.0 · 5492 in / 1057 out tokens · 41451 ms · 2026-05-10T19:23:21.652526+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

49 extracted references · 7 canonical work pages · 3 internal anchors

  1. [1]

    Loopsparsegs: Loop based sparse-view friendly gaussian splatting.IEEE Transactions on Image Processing, 2025

    Zhenyu Bao, Guibiao Liao, Kaichen Zhou, Kanglin Liu, Qing Li, and Guoping Qiu. Loopsparsegs: Loop based sparse-view friendly gaussian splatting.IEEE Transactions on Image Processing, 2025. 2, 4

  2. [2]

    Mip-nerf 360: Unbounded anti-aliased neural radiance fields

    Jonathan T Barron, Ben Mildenhall, Dor Verbin, Pratul P Srinivasan, and Peter Hedman. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5470–5479, 2022. 5

  3. [3]

    ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth

    Shariq Farooq Bhat, Reiner Birkl, Diana Wofk, Peter Wonka, and Matthias M ¨uller. Zoedepth: Zero-shot transfer by com- bining relative and metric depth. 2023. arXiv:2302.12288. 2

  4. [4]

    Depth Pro: Sharp Monocular Metric Depth in Less Than a Second

    Aleksei Bochkovskii, Ama ¨el Delaunoy, Hugo Germain, Marcel Santos, Yichao Zhou, Stephan R. Richter, and Vladlen Koltun. Depth pro: Sharp monocular metric depth in less than a second. 2024. arXiv:2410.02073. 2

  5. [5]

    Single- image depth perception in the wild.Advances in neural in- formation processing systems, 29, 2016

    Weifeng Chen, Zhao Fu, Dawei Yang, and Jia Deng. Single- image depth perception in the wild.Advances in neural in- formation processing systems, 29, 2016. 2

  6. [6]

    Depth-regularized optimization for 3d gaussian splatting in few-shot images

    Jaeyoung Chung, Jeongtaek Oh, and Kyoung Mu Lee. Depth-regularized optimization for 3d gaussian splatting in few-shot images. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 811–820, 2024. 2, 3, 6

  7. [7]

    Depth-supervised nerf: Fewer views and faster train- ing for free

    Kangle Deng, Andrew Liu, Jun-Yan Zhu, and Deva Ra- manan. Depth-supervised nerf: Fewer views and faster train- ing for free. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12882– 12891, 2022. 1, 2

  8. [8]

    Cam- convs: Camera-aware multi-scale convolutions for single- view depth

    Jose M Facil, Benjamin Ummenhofer, Huizhong Zhou, Luis Montesano, Thomas Brox, and Javier Civera. Cam- convs: Camera-aware multi-scale convolutions for single- view depth. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11826– 11835, 2019. 2

  9. [9]

    Geowiz- ard: Unleashing the diffusion priors for 3d geometry esti- mation from a single image

    Xiao Fu, Wei Yin, Mu Hu, Kaixuan Wang, Yuexin Ma, Ping Tan, Shaojie Shen, Dahua Lin, and Xiaoxiao Long. Geowiz- ard: Unleashing the diffusion priors for 3d geometry esti- mation from a single image. InEuropean Conference on Computer Vision, pages 241–258. Springer, 2024. 2

  10. [10]

    Unsupervised monocular depth estimation with left- right consistency

    Cl ´ement Godard, Oisin Mac Aodha, and Gabriel J Bros- tow. Unsupervised monocular depth estimation with left- right consistency. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 270–279,

  11. [11]

    Depthfm: Fast generative monocular depth estimation with flow matching

    Ming Gui, Johannes Schusterbauer, Ulrich Prestel, Pingchuan Ma, Dmytro Kotovenko, Olga Grebenkova, Stefan Andreas Baumann, Vincent Tao Hu, and Bj ¨orn Om- mer. Depthfm: Fast generative monocular depth estimation with flow matching. 39(3):3203–3211, 2025. 2

  12. [12]

    Towards zero-shot scale-aware monoc- ular depth estimation

    Vitor Guizilini, Igor Vasiljevic, Dian Chen, Rares , Ambrus,, and Adrien Gaidon. Towards zero-shot scale-aware monoc- ular depth estimation. pages 9233–9243, 2023. 2

  13. [13]

    Cambridge university press,

    Richard Hartley and Andrew Zisserman.Multiple view ge- ometry in computer vision. Cambridge university press,

  14. [14]

    Mu Hu, Wei Yin, Chi Zhang, Zhipeng Cai, Xiaoxiao Long, Hao Chen, Kaixuan Wang, Gang Yu, Chunhua Shen, and Shaojie Shen. Metric3d v2: A versatile monocular geomet- ric foundation model for zero-shot metric depth and surface normal estimation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024. 2

  15. [15]

    2d gaussian splatting for geometrically ac- curate radiance fields

    Binbin Huang, Zehao Yu, Anpei Chen, Andreas Geiger, and Shenghua Gao. 2d gaussian splatting for geometrically ac- curate radiance fields. InACM SIGGRAPH 2024 conference papers, pages 1–11, 2024. 5, 7

  16. [16]

    Repurpos- ing diffusion-based image generators for monocular depth estimation

    Bingxin Ke, Anton Obukhov, Shengyu Huang, Nando Met- zger, Rodrigo Caye Daudt, and Konrad Schindler. Repurpos- ing diffusion-based image generators for monocular depth estimation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9492–9502,

  17. [17]

    Marigold: Affordable adaptation of diffusion- based image generators for image analysis.arXiv preprint arXiv:2505.09358, 2025

    Bingxin Ke, Kevin Qu, Tianfu Wang, Nando Metzger, Shengyu Huang, Bo Li, Anton Obukhov, and Konrad Schindler. Marigold: Affordable adaptation of diffusion- based image generators for image analysis.arXiv preprint arXiv:2505.09358, 2025. 2, 6

  18. [18]

    Splatam: Splat track & map 3d gaussians for dense rgb-d slam

    Nikhil Keetha, Jay Karhade, Krishna Murthy Jatavallabhula, Gengshan Yang, Sebastian Scherer, Deva Ramanan, and Jonathon Luiten. Splatam: Splat track & map 3d gaussians for dense rgb-d slam. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 21357–21366, 2024. 2

  19. [19]

    3d gaussian splatting for real-time radiance field rendering.ACM Trans

    Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Trans. Graph., 42(4):139–1,

  20. [20]

    A hierarchical 3d gaussian representation for real-time ren- dering of very large datasets.ACM Transactions on Graphics (TOG), 43(4):1–15, 2024

    Bernhard Kerbl, Andreas Meuleman, Georgios Kopanas, Michael Wimmer, Alexandre Lanvin, and George Drettakis. A hierarchical 3d gaussian representation for real-time ren- dering of very large datasets.ACM Transactions on Graphics (TOG), 43(4):1–15, 2024. 1, 2, 3, 6

  21. [21]

    Tanks and temples: Benchmarking large-scale scene reconstruction.ACM Transactions on Graphics (ToG), 36 (4):1–13, 2017

    Arno Knapitsch, Jaesik Park, Qian-Yi Zhou, and Vladlen Koltun. Tanks and temples: Benchmarking large-scale scene reconstruction.ACM Transactions on Graphics (ToG), 36 (4):1–13, 2017. 5

  22. [22]

    Pulling things out of perspective

    Lubor Ladicky, Jianbo Shi, and Marc Pollefeys. Pulling things out of perspective. InProceedings of the IEEE con- ference on computer vision and pattern recognition, pages 89–96, 2014. 6

  23. [23]

    Dngaussian: Optimizing sparse-view 3d gaussian radiance fields with global-local depth normaliza- tion

    Jiahe Li, Jiawei Zhang, Xiao Bai, Jin Zheng, Xin Ning, Jun Zhou, and Lin Gu. Dngaussian: Optimizing sparse-view 3d gaussian radiance fields with global-local depth normaliza- tion. InProceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, pages 20775–20785,

  24. [24]

    Megadepth: Learning single- view depth prediction from internet photos

    Zhengqi Li and Noah Snavely. Megadepth: Learning single- view depth prediction from internet photos. InProceed- ings of the IEEE conference on computer vision and pattern recognition, pages 2041–2050, 2018. 2, 4

  25. [25]

    Perceptual quality assessment of nerf and neural view syn- thesis methods for front-facing views

    Hanxue Liang, Tianhao Wu, Param Hanji, Francesco Ban- terle, Hongyun Gao, Rafal Mantiuk, and Cengiz ¨Oztireli. Perceptual quality assessment of nerf and neural view syn- thesis methods for front-facing views. InComputer Graphics Forum, page e15036. Wiley Online Library, 2024. 8

  26. [26]

    Dchm: Depth-consistent human modeling for multiview detection

    Jiahao Ma, Tianyu Wang, Miaomiao Liu, David Ahmedt- Aristizabal, and Chuong Nguyen. Dchm: Depth-consistent human modeling for multiview detection. InProceedings of the IEEE/CVF international conference on computer vision,

  27. [27]

    Nerf: Representing scenes as neural radiance fields for view syn- thesis.Communications of the ACM, 65(1):99–106, 2021

    Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis.Communications of the ACM, 65(1):99–106, 2021. 1

  28. [28]

    Terminerf: Ray termination prediction for efficient neural rendering

    Martin Piala and Ronald Clark. Terminerf: Ray termination prediction for efficient neural rendering. In2021 Interna- tional Conference on 3D Vision (3DV), pages 1106–1114. IEEE, 2021. 1

  29. [29]

    Unidepth: Universal monocular metric depth estimation

    Luigi Piccinelli, Yung-Hsu Yang, Christos Sakaridis, Mattia Segu, Siyuan Li, Luc Van Gool, and Fisher Yu. Unidepth: Universal monocular metric depth estimation. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10106–10116, 2024. 2

  30. [30]

    Unidepthv2: Universal monocular metric depth estimation made simpler

    Luigi Piccinelli, Christos Sakaridis, Yung-Hsu Yang, Mat- tia Segu, Siyuan Li, Wim Abbeloos, and Luc Van Gool. Unidepthv2: Universal monocular metric depth estimation made simpler.arXiv preprint arXiv:2502.20110, 2025. 2, 6

  31. [31]

    Modgs: Dy- namic gaussian splatting from casually-captured monocular videos with depth priors

    LIU Qingming, Yuan Liu, Jiepeng Wang, Xianqiang Lyu, Peng Wang, Wenping Wang, and Junhui Hou. Modgs: Dy- namic gaussian splatting from casually-captured monocular videos with depth priors. InThe Thirteenth International Conference on Learning Representations, 2025. 2

  32. [32]

    Ren ´e Ranftl, Katrin Lasinger, David Hafner, Konrad Schindler, and Vladlen Koltun. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer.IEEE transactions on pattern analysis and machine intelligence, 44(3):1623–1637, 2020. 2, 4

  33. [33]

    Dense depth pri- ors for neural radiance fields from sparse input views

    Barbara Roessle, Jonathan T Barron, Ben Mildenhall, Pratul P Srinivasan, and Matthias Nießner. Dense depth pri- ors for neural radiance fields from sparse input views. In Proceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 12892–12901, 2022. 1

  34. [34]

    Indoorgs: Geometric cues guided gaussian splatting for indoor scene reconstruction

    Cong Ruan, Yuesong Wang, Tao Guan, Bin Zhang, and Lili Ju. Indoorgs: Geometric cues guided gaussian splatting for indoor scene reconstruction. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 844–853,

  35. [35]

    Self-evolving depth-supervised 3d gaussian splatting from rendered stereo pairs.arXiv preprint arXiv:2409.07456, 2024

    Sadra Safadoust, Fabio Tosi, Fatma G ¨uney, and Mat- teo Poggi. Self-evolving depth-supervised 3d gaussian splatting from rendered stereo pairs.arXiv preprint arXiv:2409.07456, 2024. 5

  36. [36]

    Gs-2dgs: Geometrically supervised 2dgs for reflective object reconstruction

    Jinguang Tong, Xuesong Li, Fahira Afzal Maken, Sundaram Muthu, Lars Petersson, Chuong Nguyen, and Hongdong Li. Gs-2dgs: Geometrically supervised 2dgs for reflective object reconstruction. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 21547–21557, 2025. 2

  37. [37]

    Nerf-supervised deep stereo

    Fabio Tosi, Alessio Tonioni, Daniele De Gregorio, and Mat- teo Poggi. Nerf-supervised deep stereo. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 855–866, 2023. 4

  38. [38]

    Dig- ging into depth priors for outdoor neural radiance fields

    Chen Wang, Jiadai Sun, Lina Liu, Chenming Wu, Zhelun Shen, Dayan Wu, Yuchao Dai, and Liangjun Zhang. Dig- ging into depth priors for outdoor neural radiance fields. In Proceedings of the 31st ACM International Conference on Multimedia, pages 1221–1230, 2023. 2

  39. [39]

    Moge: Unlocking accurate monocular geometry estimation for open-domain images with optimal training supervision

    Ruicheng Wang, Sicheng Xu, Cassie Dai, Jianfeng Xiang, Yu Deng, Xin Tong, and Jiaolong Yang. Moge: Unlocking accurate monocular geometry estimation for open-domain images with optimal training supervision. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 5261–5271, 2025. 2

  40. [40]

    MoGe-2: Accurate Monocular Geometry with Metric Scale and Sharp Details

    Ruicheng Wang, Sicheng Xu, Yue Dong, Yu Deng, Jianfeng Xiang, Zelong Lv, Guangzhong Sun, Xin Tong, and Jiaolong Yang. Moge-2: Accurate monocular geometry with metric scale and sharp details.arXiv preprint arXiv:2507.02546,

  41. [41]

    Nerfbusters: Re- moving ghostly artifacts from casually captured nerfs

    Frederik Warburg, Ethan Weber, Matthew Tancik, Alek- sander Holynski, and Angjoo Kanazawa. Nerfbusters: Re- moving ghostly artifacts from casually captured nerfs. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 18120–18130, 2023. 1

  42. [42]

    Sparsegs: Sparse view synthesis using 3d gaussian splatting

    Haolin Xiong, Sairisheek Muttukuru, Hanyuan Xiao, Rishi Upadhyay, Pradyumna Chari, Yajie Zhao, and Achuta Kadambi. Sparsegs: Sparse view synthesis using 3d gaussian splatting. InInternational Conference on 3D Vision 2025. 2, 3, 4, 6, 7

  43. [43]

    Depth anything: Unleashing the power of large-scale unlabeled data

    Lihe Yang, Bingyi Kang, Zilong Huang, Xiaogang Xu, Jiashi Feng, and Hengshuang Zhao. Depth anything: Unleashing the power of large-scale unlabeled data. pages 10371–10381,

  44. [44]

    Depth any- thing v2.Advances in Neural Information Processing Sys- tems, 37:21875–21911, 2024

    Lihe Yang, Bingyi Kang, Zilong Huang, Zhen Zhao, Xiao- gang Xu, Jiashi Feng, and Hengshuang Zhao. Depth any- thing v2.Advances in Neural Information Processing Sys- tems, 37:21875–21911, 2024. 2, 6

  45. [45]

    Scannet++: A high-fidelity dataset of 3d in- door scenes

    Chandan Yeshwanth, Yueh-Cheng Liu, Matthias Nießner, and Angela Dai. Scannet++: A high-fidelity dataset of 3d in- door scenes. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 12–22, 2023. 5, 6

  46. [46]

    Diversedepth: Affine-invariant depth predic- tion using diverse data.arXiv preprint arXiv:2002.00569,

    Wei Yin, Xinlong Wang, Chunhua Shen, Yifan Liu, Zhi Tian, Songcen Xu, Changming Sun, and Dou Renyin. Di- versedepth: Affine-invariant depth prediction using diverse data. 2020. arXiv:2002.00569. 2

  47. [47]

    Metric3d: Towards zero-shot metric 3d prediction from a single image

    Wei Yin, Chi Zhang, Hao Chen, Zhipeng Cai, Gang Yu, Kaixuan Wang, Xiaozhi Chen, and Chunhua Shen. Metric3d: Towards zero-shot metric 3d prediction from a single image. InProceedings of the IEEE/CVF international conference on computer vision, pages 9043–9053, 2023. 2

  48. [48]

    The unreasonable effectiveness of deep features as a perceptual metric

    Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shecht- man, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recogni- tion, pages 586–595, 2018. 5

  49. [49]

    Fsgs: Real-time few-shot view synthesis using gaussian splatting

    Zehao Zhu, Zhiwen Fan, Yifan Jiang, and Zhangyang Wang. Fsgs: Real-time few-shot view synthesis using gaussian splatting. InEuropean conference on computer vision, pages 145–163. Springer, 2024. 2, 3, 4, 6, 7