pith. machine review for the scientific record. sign in

arxiv: 2605.10204 · v1 · submitted 2026-05-11 · 💻 cs.CV

Recognition: no theorem link

3DReflecNet: A Large-Scale Dataset for 3D Reconstruction of Reflective, Transparent, and Low-Texture Objects

Zhicheng Liang , Haoyi Yu , Boyan Li , Dayou Zhang , Zijian Cao , Tianyi Gong , Junhua Liu , Shuguang Cui , Fangxin Wang

Authors on Pith no claims yet

Pith reviewed 2026-05-12 04:06 UTC · model grok-4.3

classification 💻 cs.CV
keywords 3D reconstructionreflective objectstransparent surfaceslow-texture objectsdatasetbenchmarkstructure from motionnovel view synthesis
0
0 comments X

The pith

A 22-terabyte hybrid dataset shows that current 3D reconstruction methods lose accuracy on reflective, transparent, and low-texture objects.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces 3DReflecNet to address the fact that existing 3D vision datasets emphasize easy diffuse objects while real scenes often contain surfaces that break photometric consistency and texture cues. It combines over 120,000 synthetic instances from physically-based rendering of 12,000 shapes with more than 1,000 real objects captured by consumer devices, yielding over 7 million frames that span varied materials, lighting, and geometries including LLM-generated shapes. Benchmarks are defined for five tasks: image matching, structure-from-motion, novel view synthesis, reflection removal, and relighting. Experiments on these benchmarks establish that state-of-the-art pipelines cannot maintain accuracy when these material challenges are present. The dataset is positioned as a foundation for developing more robust reconstruction techniques.

Core claim

By releasing this large-scale hybrid dataset of reflective, transparent, and low-texture objects, the work establishes that multi-view reconstruction pipelines fail to deliver reliable geometry and appearance when photometric consistency and distinct texture cues are absent, as demonstrated by the poor performance of existing methods across the five defined benchmarks.

What carries the argument

3DReflecNet, the hybrid dataset of physically-based rendered synthetic scenes and real consumer-device captures that supplies ground-truth geometry and appearance for direct evaluation of 3D vision tasks on challenging materials.

If this is right

  • Structure-from-motion and image matching pipelines will exhibit measurable drops in accuracy and completeness on the new data compared with standard diffuse benchmarks.
  • Novel view synthesis and relighting methods must incorporate explicit handling of reflections and transparency to reach usable quality.
  • Reflection removal performance becomes a necessary intermediate step for accurate 3D geometry recovery on these objects.
  • The scale of the dataset enables quantitative comparison of future algorithms across thousands of instances rather than small hand-selected test sets.
  • LLM-assisted shape generation combined with physical rendering offers a scalable route to expand coverage of rare material-geometry combinations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The dataset could serve as a training signal for networks that learn material-aware priors directly from the provided multi-view frames.
  • Future extensions might add dynamic sequences to test whether current failures persist under object motion or changing illumination.
  • Integration with existing large-scale 3D datasets could quantify exactly how much performance on everyday objects is limited by the presence of a few reflective or transparent surfaces.
  • The five-task benchmark structure suggests a modular evaluation protocol that other researchers can adopt without re-collecting data.

Load-bearing premise

The hybrid synthetic and real captures together represent the full range of difficulties that reflective, transparent, and low-texture objects present in uncontrolled real-world conditions.

What would settle it

A method achieving near-perfect accuracy on all five benchmark tasks when trained and tested exclusively on the dataset's held-out splits would falsify the claim that these material classes inherently cause state-of-the-art pipelines to fail.

Figures

Figures reproduced from arXiv: 2605.10204 by Boyan Li, Dayou Zhang, Fangxin Wang, Haoyi Yu, Junhua Liu, Shuguang Cui, Tianyi Gong, Zhicheng Liang, Zijian Cao.

Figure 1
Figure 1. Figure 1: 3DReflecNet: A large-scale multi-view, object-centric dataset featuring reflective, transparent, and low-texture objects. Providing high-quality annotations for 3D reconstruction. Abstract Accurate 3D reconstruction of objects with reflective, transparent, or low-texture surfaces still remains notoriously challenging. Such materials often violate key assumptions in multi-view reconstruction pipelines, such… view at source ↗
Figure 2
Figure 2. Figure 2: Inaccurate camera pose estimation leads to reconstruction [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: Material parameter sweep across 48 configurations. Each [PITH_FULL_IMAGE:figures/full_fig_p003_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The Dataset Construction and Evaluation Pipeline. [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Overview of synthetic dataset statistics. (a) Distribution [PITH_FULL_IMAGE:figures/full_fig_p005_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Overview of objects with various materials and lighting conditions in the dataset [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗
Figure 10
Figure 10. Figure 10: Qualitative examples of capturing reflective objects [PITH_FULL_IMAGE:figures/full_fig_p006_10.png] view at source ↗
Figure 9
Figure 9. Figure 9: 3D Object Generation Given 2D Reference. [PITH_FULL_IMAGE:figures/full_fig_p006_9.png] view at source ↗
Figure 11
Figure 11. Figure 11: Camera Pose Estimation [PITH_FULL_IMAGE:figures/full_fig_p007_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Representative qualitative results of surface reconstruc [PITH_FULL_IMAGE:figures/full_fig_p008_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Synthetic Enhancement for Indoor Scenes. Top: Local illumination with finite-distance lights. Bottom: Standard infinite￾distance HDRI. A. Instance Breakdown To support various downstream tasks, we provide different types of ground truth data to serve as supervised labels. For each synthesis instance, we provide: • 50 views RGB PNG images of 1, 000 × 1, 000 resolution • corresponding depth images in exr fo… view at source ↗
Figure 14
Figure 14. Figure 14: Qualitative Results of 3D Generated Models with Various Materials and Environment Maps. This figure showcases generated [PITH_FULL_IMAGE:figures/full_fig_p016_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Qualitative results on Efficient Loftr [ [PITH_FULL_IMAGE:figures/full_fig_p017_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Detailed Impact of Roughness and IOR on Reconstruc [PITH_FULL_IMAGE:figures/full_fig_p019_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Comparative analysis of reconstruction quality for metal [PITH_FULL_IMAGE:figures/full_fig_p019_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Specular Reflection J.4. Diffuse Scattering and Absorption Pigmented or rough materials scatter light diffusely. Lam￾bert’s law approximates uniform scattering: fd = ρ π (10) where ρ is albedo. Subsurface scattering in plastics and paints can be modeled via the Kubelka–Munk theory [43] or dipole diffusion [35]. J.5. Foundational Assumptions in Multi-View Re￾construction Standard reconstruction pipelines, … view at source ↗
Figure 19
Figure 19. Figure 19: Input images used for annotation (low/middle/high cam [PITH_FULL_IMAGE:figures/full_fig_p022_19.png] view at source ↗
Figure 20
Figure 20. Figure 20: The prompt for generating description for an asset [PITH_FULL_IMAGE:figures/full_fig_p023_20.png] view at source ↗
Figure 21
Figure 21. Figure 21: The structured tags.json output from our annotation pipeline. This includes the strictly copied category, the VLLM-inferred material and environment properties, and the natural language description. K.5. Annotation Statistics [PITH_FULL_IMAGE:figures/full_fig_p024_21.png] view at source ↗
Figure 22
Figure 22. Figure 22: Materials of various parameters show different physical phenomenon. The parameters is in format of [PITH_FULL_IMAGE:figures/full_fig_p026_22.png] view at source ↗
Figure 23
Figure 23. Figure 23: More real-world capture instances, including semi-transparent, reflective, and low-texture objects. [PITH_FULL_IMAGE:figures/full_fig_p027_23.png] view at source ↗
Figure 24
Figure 24. Figure 24: The synthetic objects of various shapes 27 [PITH_FULL_IMAGE:figures/full_fig_p027_24.png] view at source ↗
Figure 25
Figure 25. Figure 25: The object with the same shape but made of different materials under identical lighting condition [PITH_FULL_IMAGE:figures/full_fig_p028_25.png] view at source ↗
Figure 26
Figure 26. Figure 26: Various shapes of generated 3D assets made of different materials [PITH_FULL_IMAGE:figures/full_fig_p029_26.png] view at source ↗
Figure 27
Figure 27. Figure 27: Generated 3D assets made of Steel material under various lighting condition [PITH_FULL_IMAGE:figures/full_fig_p030_27.png] view at source ↗
read the original abstract

Accurate 3D reconstruction of objects with reflective, transparent, or low-texture surfaces still remains notoriously challenging. Such materials often violate key assumptions in multi-view reconstruction pipelines, such as photometric consistency and the availability on distinct geometric texture cues. Existing datasets primarily focus on diffuse, textured objects, and therefore provide limited insight into performance under real-world material complexities. We introduce 3DReflecNet, a large-scale hybrid dataset exceeding 22 TB that is specifically designed to benchmark and advance 3D vision methods for these challenging materials. 3DReflecNet combines two types of data: over 120,000 synthetic instances generated via physically-based rendering of more than 12,000 shapes, and over 1,000 real-world objects captured using consumer devices. Together, these data consist of more than 7 million multi-view frames. The dataset spans diverse materials, complex lighting conditions, and a wide range of geometric forms, including shapes generated from both real and LLM-synthesized 2D images using diffusion-based pipelines. To support robust evaluation, we design benchmarks for five core tasks: image matching, structure-from-motion, novel view synthesis, reflection removal, and relighting. Extensive experiments demonstrate that state-of-the-art methods struggle to maintain accuracy across these settings, highlighting the need for more resilient 3D vision models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces 3DReflecNet, a hybrid dataset exceeding 22 TB with over 120,000 synthetic PBR instances from >12,000 shapes and >1,000 real objects captured via consumer devices, yielding >7 million multi-view frames. It targets 3D reconstruction challenges for reflective, transparent, and low-texture surfaces, providing benchmarks for image matching, SfM, novel view synthesis, reflection removal, and relighting, while claiming that SOTA methods struggle to maintain accuracy.

Significance. If the ground-truth protocols and evaluation splits are rigorously defined, this dataset could provide a valuable standardized benchmark for advancing 3D vision methods on materials that violate standard photometric and texture assumptions, complementing existing diffuse-object datasets.

major comments (2)
  1. [Dataset construction / real captures] Real-data section: the protocol for obtaining independent 3D ground truth on the >1,000 consumer-device captures is not described with sufficient detail (e.g., no mention of laser scanning, verified multi-view stereo, or calibration procedures). Without this, quantitative claims that SOTA methods 'struggle' on the real split cannot be separated from potential errors in the pseudo-ground-truth itself.
  2. [Experiments and benchmarks] Experiments section: the abstract asserts that 'extensive experiments demonstrate that state-of-the-art methods struggle,' yet no quantitative metrics, error tables, or benchmark protocols (e.g., specific accuracy drops on reflective vs. diffuse subsets) are referenced. This leaves the central empirical claim unsupported in the provided summary.
minor comments (2)
  1. [Abstract and introduction] Clarify the exact split between synthetic and real data volumes and how the hybrid design ensures that synthetic perfect GT does not substitute for real sensor noise in the reported benchmarks.
  2. [Benchmarks] Provide explicit definitions or references for the five benchmark tasks, including evaluation metrics and baseline implementations used.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript introducing 3DReflecNet. We address each major comment below and will revise the paper to improve clarity on real-data protocols and experimental reporting while preserving the core contributions.

read point-by-point responses
  1. Referee: Real-data section: the protocol for obtaining independent 3D ground truth on the >1,000 consumer-device captures is not described with sufficient detail (e.g., no mention of laser scanning, verified multi-view stereo, or calibration procedures). Without this, quantitative claims that SOTA methods 'struggle' on the real split cannot be separated from potential errors in the pseudo-ground-truth itself.

    Authors: We agree that the real-data ground-truth protocol requires more explicit description to allow independent assessment of the quantitative results. In the revised manuscript we will add a dedicated subsection under Dataset Construction that details the consumer-device capture pipeline, intrinsic/extrinsic calibration procedures, and the independent 3D ground-truth acquisition method (including laser-scanner cross-validation and multi-view stereo verification steps). This addition will directly address the concern about separating method errors from potential pseudo-ground-truth inaccuracies. revision: yes

  2. Referee: Experiments section: the abstract asserts that 'extensive experiments demonstrate that state-of-the-art methods struggle,' yet no quantitative metrics, error tables, or benchmark protocols (e.g., specific accuracy drops on reflective vs. diffuse subsets) are referenced. This leaves the central empirical claim unsupported in the provided summary.

    Authors: The full manuscript contains a complete Experiments section (Section 4) with quantitative tables, error metrics, and benchmark protocols for all five tasks, explicitly comparing performance on reflective/transparent/low-texture subsets versus diffuse baselines. To make these results more immediately visible, we will revise the abstract to include a concise reference to key findings and add a summary table in the introduction that highlights representative accuracy drops. We will also expand the benchmark-protocol descriptions for clarity. revision: partial

Circularity Check

0 steps flagged

No circularity: dataset paper with no derivations, predictions, or fitted quantities

full rationale

The paper introduces a hybrid dataset (synthetic PBR + real consumer captures) and runs standard benchmarks on existing SOTA methods for tasks like SfM and NVS. No equations, no parameter fitting, no predictions derived from the data itself, and no self-citation chains that bear the central claim. The contribution is the data collection and curation protocol; evaluation uses off-the-shelf methods whose performance is measured against the provided ground truth, with no reduction of results to the inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

As a dataset introduction paper, there are no free parameters fitted to data, no mathematical axioms invoked, and no new entities postulated; the work relies on standard rendering and capture techniques.

pith-pipeline@v0.9.0 · 5577 in / 1220 out tokens · 36589 ms · 2026-05-12T04:06:26.155352+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

116 extracted references · 116 canonical work pages · 2 internal anchors

  1. [1]

    Large-scale data for multiple-view stereopsis.International Journal of Computer Vision, 120:153–168, 2016

    Henrik Aanæs, Rasmus Ramsbøl Jensen, George V ogiatzis, Engin Tola, and Anders Bjorholm Dahl. Large-scale data for multiple-view stereopsis.International Journal of Computer Vision, 120:153–168, 2016. 3, 5

  2. [2]

    Surf: Speeded up robust features

    Herbert Bay, Tinne Tuytelaars, and Luc Van Gool. Surf: Speeded up robust features. InComputer Vision–ECCV 2006: 9th European Conference on Computer Vision, Graz, Austria, May 7-13, 2006. Proceedings, Part I 9, pages 404–

  3. [3]

    Texfusion: Synthesizing 3d textures with text-guided image diffusion models

    Tianshi Cao, Karsten Kreis, Sanja Fidler, Nicholas Sharp, and Kangxue Yin. Texfusion: Synthesizing 3d textures with text-guided image diffusion models. InProceedings of the IEEE/CVF international conference on computer vision, pages 4169–4181, 2023. 8

  4. [4]

    Alessandro Cefalu, Norbert Haala, and Dieter Fritsch. Hi- erarchical structure from motion combining global image orientation and structureless bundle adjustment.The Inter- national Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 42:535–542, 2017. 3

  5. [5]

    Efficient and robust large-scale rotation averaging

    Avishek Chatterjee and Venu Madhav Govindu. Efficient and robust large-scale rotation averaging. InProceedings of the IEEE international conference on computer vision, pages 521–528, 2013. 3

  6. [6]

    Pgsr: Planar-based gaussian splatting for efficient and high-fidelity surface reconstruction.IEEE Transactions on Visualization and Computer Graphics, 31 (9):6100–6111, 2025

    Danpeng Chen, Hai Li, Weicai Ye, Yifan Wang, Weijian Xie, Shangjin Zhai, Nan Wang, Haomin Liu, Hujun Bao, and Guofeng Zhang. Pgsr: Planar-based gaussian splatting for efficient and high-fidelity surface reconstruction.IEEE Transactions on Visualization and Computer Graphics, 31 (9):6100–6111, 2025. 7, 18

  7. [7]

    Text2tex: Text-driven tex- ture synthesis via diffusion models

    Dave Zhenyu Chen, Yawar Siddiqui, Hsin-Ying Lee, Sergey Tulyakov, and Matthias Nießner. Text2tex: Text-driven tex- ture synthesis via diffusion models. InProceedings of the IEEE/CVF international conference on computer vision, pages 18558–18568, 2023. 8, 22

  8. [8]

    Aspanformer: Detector-free image matching with adaptive span transformer

    Hongkai Chen, Zixin Luo, Lei Zhou, Yurun Tian, Ming- min Zhen, Tian Fang, David Mckinnon, Yanghai Tsin, and Long Quan. Aspanformer: Detector-free image matching with adaptive span transformer. InEuropean Conference on Computer Vision, pages 20–36. Springer, 2022. 2, 7

  9. [9]

    arXiv preprint arXiv:2410.02619 , year=

    Hongze Chen, Zehong Lin, and Jun Zhang. Gi-gs: Global illumination decomposition on gaussian splatting for inverse rendering.arXiv preprint arXiv:2410.02619, 2024. 18, 25

  10. [10]

    Jin et al. Christy. Openmaterial: A comprehensive dataset of complex materials for 3d reconstruction. https:// christy61.github.io/openmaterial.github. io/, 2023. 2

  11. [11]

    Abo: Dataset and benchmarks for real-world 3d object un- derstanding

    Jasmine Collins, Shubham Goel, Kenan Deng, Achlesh- war Luthra, Leon Xu, Erhan Gundogdu, Xi Zhang, Tomas F Yago Vicente, Thomas Dideriksen, Himanshu Arora, et al. Abo: Dataset and benchmarks for real-world 3d object un- derstanding. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 21126– 21136, 2022. 3, 5

  12. [12]

    Cook and Kenneth E

    Robert L. Cook and Kenneth E. Torrance. A reflectance model for computer graphics.ACM Transactions on Graph- ics, 1(1):7–24, 1982. 20

  13. [13]

    Sfm with mrfs: Discrete-continuous optimization for large-scale structure from motion.IEEE transactions on pattern analysis and machine intelligence, 35(12):2841–2853, 2012

    David J Crandall, Andrew Owens, Noah Snavely, and Daniel P Huttenlocher. Sfm with mrfs: Discrete-continuous optimization for large-scale structure from motion.IEEE transactions on pattern analysis and machine intelligence, 35(12):2841–2853, 2012. 2

  14. [14]

    Hsfm: Hybrid structure-from-motion

    Hainan Cui, Xiang Gao, Shuhan Shen, and Zhanyi Hu. Hsfm: Hybrid structure-from-motion. InProceedings of the IEEE conference on computer vision and pattern recog- nition, pages 1212–1221, 2017. 2

  15. [15]

    Openmaterial: A comprehensive dataset of complex materi- als for 3d reconstruction.arXiv preprint arXiv:2406.08894,

    Zheng Dang, Jialu Huang, Fei Wang, and Mathieu Salzmann. Openmaterial: A comprehensive dataset of complex materi- als for 3d reconstruction.arXiv preprint arXiv:2406.08894,

  16. [16]

    Objaverse: A universe of annotated 3d objects

    Matt Deitke, Dustin Schwenk, Jordi Salvador, Luca Weihs, Oscar Michel, Eli VanderBilt, Ludwig Schmidt, Kiana Ehsani, Aniruddha Kembhavi, and Ali Farhadi. Objaverse: A universe of annotated 3d objects. InProceedings of the IEEE/CVF conference on computer vision and pattern recog- nition, pages 13142–13153, 2023. 3, 5

  17. [17]

    Superpoint: Self-supervised interest point detection and description

    Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabi- novich. Superpoint: Self-supervised interest point detection and description. InProceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 224–236, 2018. 2, 7, 16

  18. [18]

    Reloc3r: Large-scale training of relative camera pose regression for generaliz- able, fast, and accurate visual localization.arXiv preprint arXiv:2412.08376, 2024

    Siyan Dong, Shuzhe Wang, Shaohui Liu, Lulu Cai, Qingnan Fan, Juho Kannala, and Yanchao Yang. Reloc3r: Large-scale training of relative camera pose regression for generaliz- able, fast, and accurate visual localization.arXiv preprint arXiv:2412.08376, 2024. 4

  19. [19]

    Zheng Dong, Ke Xu, Yin Yang, Hujun Bao, Weiwei Xu, and Rynson W.H. Lau. Location-aware single image reflection removal. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 5017–5026,

  20. [20]

    D2-net: A trainable cnn for joint description and detection of lo- cal features

    Mihai Dusmanu, Ignacio Rocco, Tomas Pajdla, Marc Polle- feys, Josef Sivic, Akihiko Torii, and Torsten Sattler. D2-net: A trainable cnn for joint description and detection of lo- cal features. InProceedings of the ieee/cvf conference on 9 computer vision and pattern recognition, pages 8092–8101,

  21. [21]

    Roma: Robust dense feature matching

    Johan Edstedt, Qiyu Sun, Georg B ¨okman, M ˚arten Wadenb¨ack, and Michael Felsberg. Roma: Robust dense feature matching. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition, pages 19790–19800, 2024. 2, 4, 7, 16

  22. [22]

    RealityScan

    Epic Games. RealityScan. https : / / www . realityscan.com, 2025. Accessed: 2025-10-06. 6

  23. [23]

    Plenoxels: Radiance fields without neural networks

    Sara Fridovich-Keil, Alex Yu, Matthew Tancik, Qinhong Chen, Benjamin Recht, and Angjoo Kanazawa. Plenoxels: Radiance fields without neural networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5501–5510, 2022. 2, 3

  24. [24]

    Towards high-quality specular highlight removal by leveraging large-scale synthetic data

    Gang Fu, Qing Zhang, Lei Zhu, Chunxia Xiao, and Ping Li. Towards high-quality specular highlight removal by leveraging large-scale synthetic data. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 12857–12865, 2023. 16, 24

  25. [25]

    Colmap-free 3d gaussian splat- ting

    Yang Fu, Sifei Liu, Amey Kulkarni, Jan Kautz, Alexei A Efros, and Xiaolong Wang. Colmap-free 3d gaussian splat- ting. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 20796–20805,

  26. [26]

    Massively parallel multiview stereopsis by surface normal diffusion

    Silvano Galliani, Katrin Lasinger, and Konrad Schindler. Massively parallel multiview stereopsis by surface normal diffusion. InProceedings of the IEEE international confer- ence on computer vision, pages 873–881, 2015. 2, 3

  27. [27]

    Greivenkamp.Field Guide to Geometrical Optics

    John E. Greivenkamp.Field Guide to Geometrical Optics. SPIE Press, 2004. 20

  28. [28]

    Shapefit: Exact location recovery from corrupted pairwise directions.Communications on Pure and Applied Mathe- matics, 71(1):3–50, 2018

    Paul Hand, Choongbum Lee, and Vladislav V oroninski. Shapefit: Exact location recovery from corrupted pairwise directions.Communications on Pure and Applied Mathe- matics, 71(1):3–50, 2018. 3

  29. [29]

    Cambridge university press,

    Richard Hartley and Andrew Zisserman.Multiple view geometry in computer vision. Cambridge university press,

  30. [30]

    Understanding the masking–shadowing function in microfacet-based brdfs.Journal of Computer Graphics Techniques, 3(2):24–78, 2014

    Eric Heitz. Understanding the masking–shadowing function in microfacet-based brdfs.Journal of Computer Graphics Techniques, 3(2):24–78, 2014. 20

  31. [31]

    Single image reflection sep- aration via component synergy

    Qiming Hu and Xiaojie Guo. Single image reflection sep- aration via component synergy. InProceedings of the IEEE/CVF international conference on computer vision, pages 13138–13147, 2023. 18

  32. [32]

    Single image reflection separation via dual-stream interactive transformers

    Qiming Hu, Hainuo Wang, and Xiaojie Guo. Single image reflection separation via dual-stream interactive transformers. Advances in Neural Information Processing Systems, 37: 55228–55248, 2024. 18, 24

  33. [33]

    2d gaussian splatting for geometrically ac- curate radiance fields

    Binbin Huang, Zehao Yu, Anpei Chen, Andreas Geiger, and Shenghua Gao. 2d gaussian splatting for geometrically ac- curate radiance fields. InACM SIGGRAPH 2024 conference papers, pages 1–11, 2024. 2, 3, 7, 18

  34. [34]

    Immel, Michael F

    David S. Immel, Michael F. Cohen, and Donald P. Green- berg. A radiosity method for non-diffuse environments. In Proceedings of the 13th Annual Conference on Computer Graphics and Interactive Techniques, page 133–142, New York, NY , USA, 1986. Association for Computing Machin- ery. 20

  35. [35]

    Jensen, Steve R

    Henrik W. Jensen, Steve R. Marschner, Marc Levoy, and Pat Hanrahan. A practical model for subsurface light transport. InProceedings of SIGGRAPH, pages 511–518, 2001. 21

  36. [36]

    Large scale multi-view stereopsis evalu- ation

    Rasmus Jensen, Anders Dahl, George V ogiatzis, Engin Tola, and Henrik Aanæs. Large scale multi-view stereopsis evalu- ation. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 406–413, 2014. 2

  37. [37]

    A global linear method for camera pose registration

    Nianjuan Jiang, Zhaopeng Cui, and Ping Tan. A global linear method for camera pose registration. InProceedings of the IEEE international conference on computer vision, pages 481–488, 2013. 2

  38. [38]

    Tensoir: Tensorial inverse rendering

    Haian Jin, Isabella Liu, Peijia Xu, Xiaoshuai Zhang, Song- fang Han, Sai Bi, Xiaowei Zhou, Zexiang Xu, and Hao Su. Tensoir: Tensorial inverse rendering. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 165–174, 2023. 18

  39. [39]

    Fresnel reflection of diffusely incident light

    Deane B Judd. Fresnel reflection of diffusely incident light

  40. [40]

    The rendering equation

    James T Kajiya. The rendering equation. InProceedings of the 13th annual conference on Computer graphics and interactive techniques, pages 143–150, 1986. 20

  41. [41]

    3d gaussian splatting for real-time radiance field rendering.ACM Trans

    Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Trans. Graph., 42(4):139–1,

  42. [42]

    Tanks and temples: Benchmarking large-scale scene reconstruction.ACM Transactions on Graphics, 36(4), 2017

    Arno Knapitsch, Jaesik Park, Qian-Yi Zhou, and Vladlen Koltun. Tanks and temples: Benchmarking large-scale scene reconstruction.ACM Transactions on Graphics, 36(4), 2017. 3, 5

  43. [43]

    Ein beitrag zur optik der farbanstriche.Zeitschrift f ¨ur Technische Physik, 12:593–601,

    Paul Kubelka and Franz Munk. Ein beitrag zur optik der farbanstriche.Zeitschrift f ¨ur Technische Physik, 12:593–601,

  44. [44]

    Robust reflection removal with reflection-free flash-only cues

    Chenyang Lei and Qifeng Chen. Robust reflection removal with reflection-free flash-only cues. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14811–14820, 2021. 5

  45. [45]

    Zone: Zero-shot instruction-guided local editing

    Shanglin Li, Bohan Zeng, Yutang Feng, Sicheng Gao, Xi- uhui Liu, Jiaming Liu, Lin Li, Xu Tang, Yao Hu, Jianzhuang Liu, et al. Zone: Zero-shot instruction-guided local editing. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6254–6263, 2024. 8, 23

  46. [46]

    Megadepth: Learning single- view depth prediction from internet photos

    Zhengqi Li and Noah Snavely. Megadepth: Learning single- view depth prediction from internet photos. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 2041–2050, 2018. 7

  47. [47]

    Luciddreamer: Towards high- fidelity text-to-3d generation via interval score matching

    Yixun Liang, Xin Yang, Jiantao Lin, Haodong Li, Xiao- gang Xu, and Yingcong Chen. Luciddreamer: Towards high- fidelity text-to-3d generation via interval score matching. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6517–6526, 2024. 8

  48. [48]

    Gs-ir: 3d gaussian splatting for inverse rendering

    Zhihao Liang, Qi Zhang, Ying Feng, Ying Shan, and Kui Jia. Gs-ir: 3d gaussian splatting for inverse rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21644–21653, 2024. 18, 24

  49. [49]

    Lightglue: Local feature matching at light speed

    Philipp Lindenberger, Paul-Edouard Sarlin, and Marc Polle- feys. Lightglue: Local feature matching at light speed. In 10 Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 17627–17638, 2023. 7

  50. [50]

    One-2-3-45++: Fast single image to 3d objects with consistent multi-view generation and 3d dif- fusion

    Minghua Liu, Ruoxi Shi, Linghao Chen, Zhuoyang Zhang, Chao Xu, Xinyue Wei, Hansheng Chen, Chong Zeng, Ji- ayuan Gu, and Hao Su. One-2-3-45++: Fast single image to 3d objects with consistent multi-view generation and 3d dif- fusion. InProceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, pages 10072–10083,

  51. [51]

    Nero: Neural geometry and brdf reconstruction of reflective objects from multiview images.ACM Transactions on Graphics (ToG), 42(4):1–22, 2023

    Yuan Liu, Peng Wang, Cheng Lin, Xiaoxiao Long, Jiepeng Wang, Lingjie Liu, Taku Komura, and Wenping Wang. Nero: Neural geometry and brdf reconstruction of reflective objects from multiview images.ACM Transactions on Graphics (ToG), 42(4):1–22, 2023. 3, 5, 16

  52. [52]

    Distinctive image features from scale- invariant keypoints.International journal of computer vision, 60(2):91–110, 2004

    David G Lowe. Distinctive image features from scale- invariant keypoints.International journal of computer vision, 60(2):91–110, 2004. 21

  53. [53]

    Fresnel equations.Encyclopedia of Optical Engineering, 27:1–6, 2013

    Alexander I Lvovsky. Fresnel equations.Encyclopedia of Optical Engineering, 27:1–6, 2013. 20

  54. [54]

    Multiview stereo with cascaded epipolar raft

    Zeyu Ma, Zachary Teed, and Jia Deng. Multiview stereo with cascaded epipolar raft. InEuropean Conference on Computer Vision, pages 734–750. Springer, 2022. 2

  55. [55]

    Gaussian splatting slam

    Hidenobu Matsuki, Riku Murai, Paul HJ Kelly, and An- drew J Davison. Gaussian splatting slam. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18039–18048, 2024. 2, 3

  56. [56]

    Nerf: Representing scenes as neural radiance fields for view syn- thesis.Communications of the ACM, 65(1):99–106, 2021

    Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis.Communications of the ACM, 65(1):99–106, 2021. 2, 3, 24

  57. [57]

    Instant neural graphics primitives with a multires- olution hash encoding.ACM Trans

    Thomas M¨uller, Alex Evans, Christoph Schied, and Alexan- der Keller. Instant neural graphics primitives with a multires- olution hash encoding.ACM Trans. Graph., 41(4):102:1– 102:15, 2022. 7

  58. [58]

    Instant neural graphics primitives with a mul- tiresolution hash encoding.ACM transactions on graphics (TOG), 41(4):1–15, 2022

    Thomas M¨uller, Alex Evans, Christoph Schied, and Alexan- der Keller. Instant neural graphics primitives with a mul- tiresolution hash encoding.ACM transactions on graphics (TOG), 41(4):1–15, 2022. 2, 3, 17, 18

  59. [59]

    Extracting triangular 3d models, materials, and lighting from images

    Jacob Munkberg, Jon Hasselgren, Tianchang Shen, Jun Gao, Wenzheng Chen, Alex Evans, Thomas M¨uller, and Sanja Fi- dler. Extracting triangular 3d models, materials, and lighting from images. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8280– 8290, 2022. 18, 24

  60. [60]

    Instant neural graphics primitives with a mul- tiresolution hash encoding.ACM Transactions on Graphics, 41(4):1–15, 2022

    Thomas M¨uller, Alex Evans, Christoph Schied, and Alexan- der Keller. Instant neural graphics primitives with a mul- tiresolution hash encoding.ACM Transactions on Graphics, 41(4):1–15, 2022. TLDR: A versatile new input encoding that permits the use of a smaller network without sacrificing quality, thus significantly reducing the number of floating point ...

  61. [61]

    Contrastive denoising score for text-guided latent diffusion image editing

    Hyelin Nam, Gihyun Kwon, Geon Yeong Park, and Jong Chul Ye. Contrastive denoising score for text-guided latent diffusion image editing. InProceedings of the IEEE/CVF conference on computer vision and pattern recog- nition, pages 9192–9201, 2024. 8, 23

  62. [62]

    Sift: Predicting amino acid changes that affect protein function.Nucleic acids research, 31(13):3812–3814, 2003

    Pauline C Ng and Steven Henikoff. Sift: Predicting amino acid changes that affect protein function.Nucleic acids research, 31(13):3812–3814, 2003. 2

  63. [63]

    Unisurf: Unifying neural implicit surfaces and radiance fields for multi-view reconstruction

    Michael Oechsle, Songyou Peng, and Andreas Geiger. Unisurf: Unifying neural implicit surfaces and radiance fields for multi-view reconstruction. InProceedings of the IEEE/CVF international conference on computer vision, pages 5589–5599, 2021. 2, 3, 16

  64. [64]

    Multiview shape and reflectance from natural illumination

    Geoffrey Oxholm and Ko Nishino. Multiview shape and reflectance from natural illumination. InProceedings of the IEEE Conference on Computer Vision and Pattern Recogni- tion, pages 2155–2162, 2014. 3, 5

  65. [65]

    Robust camera location esti- mation by convex programming

    Onur Ozyesil and Amit Singer. Robust camera location esti- mation by convex programming. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2674–2683, 2015. 3

  66. [66]

    DreamFusion: Text-to-3D using 2D Diffusion

    Ben Poole, Ajay Jain, Jonathan T Barron, and Ben Milden- hall. Dreamfusion: Text-to-3d using 2d diffusion.arXiv preprint arXiv:2209.14988, 2022. 8, 22

  67. [67]

    SAM 2: Segment Anything in Images and Videos

    Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman R¨adle, Chloe Rolland, Laura Gustafson, et al. Sam 2: Segment anything in images and videos.arXiv preprint arXiv:2408.00714, 2024. 6, 16

  68. [68]

    Reizenstein

    Jeremy et al. Reizenstein. Common objects in 3d: Large- scale learning and evaluation of real-life 3d category recon- struction. InICCV, 2021. 2

  69. [69]

    Texture: Text-guided texturing of 3d shapes

    Elad Richardson, Gal Metzer, Yuval Alaluf, Raja Giryes, and Daniel Cohen-Or. Texture: Text-guided texturing of 3d shapes. InACM SIGGRAPH 2023 conference proceedings, pages 1–11, 2023. 8, 22

  70. [70]

    Orb: An efficient alternative to sift or surf

    Ethan Rublee, Vincent Rabaud, Kurt Konolige, and Gary Bradski. Orb: An efficient alternative to sift or surf. In 2011 International conference on computer vision, pages 2564–2571. Ieee, 2011. 2

  71. [71]

    Superglue: Learning feature match- ing with graph neural networks

    Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. Superglue: Learning feature match- ing with graph neural networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recog- nition, pages 4938–4947, 2020. 7

  72. [72]

    Pixelwise view selection for unstructured multi-view stereo

    Johannes L Sch ¨onberger, Enliang Zheng, Jan-Michael Frahm, and Marc Pollefeys. Pixelwise view selection for unstructured multi-view stereo. InComputer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Nether- lands, October 11-14, 2016, Proceedings, Part III 14, pages 501–518. Springer, 2016. 3

  73. [73]

    Learning non-lambertian object intrinsics across shapenet categories

    Jian Shi, Yue Dong, Hao Su, and Stella X Yu. Learning non-lambertian object intrinsics across shapenet categories. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 1685–1694, 2017. 3, 5

  74. [74]

    A survey of multimodal- guided image editing with text-to-image diffusion models

    Xincheng Shuai, Henghui Ding, Xingjun Ma, Rongcheng Tu, Yu-Gang Jiang, and Dacheng Tao. A survey of multimodal- guided image editing with text-to-image diffusion models. arXiv preprint arXiv:2406.14555, 2024. 8 11

  75. [75]

    Photo tourism: exploring photo collections in 3d

    Noah Snavely, Steven M Seitz, and Richard Szeliski. Photo tourism: exploring photo collections in 3d. InACM siggraph 2006 papers, pages 835–846. 2006. 3, 16

  76. [76]

    Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction

    Cheng Sun, Min Sun, and Hwann-Tzong Chen. Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5459– 5469, 2022. 2, 3

  77. [77]

    Loftr: Detector-free local feature matching with transformers

    Jiaming Sun, Zehong Shen, Yuang Wang, Hujun Bao, and Xiaowei Zhou. Loftr: Detector-free local feature matching with transformers. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 8922–8931, 2021. 2, 16

  78. [78]

    Loftr: Detector-free local feature matching with transformers

    Jiaming Sun, Zehong Shen, Yuang Wang, Hujun Bao, and Xiaowei Zhou. Loftr: Detector-free local feature matching with transformers. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 8922–8931, 2021. 4, 7

  79. [79]

    Nerfstudio: A modular framework for neural radiance field development

    Matthew Tancik, Ethan Weber, Evonne Ng, Ruilong Li, Brent Yi, Terrance Wang, Alexander Kristoffersen, Jake Austin, Kamyar Salahi, Abhik Ahuja, et al. Nerfstudio: A modular framework for neural radiance field development. InACM SIGGRAPH 2023 conference proceedings, pages 1–12, 2023. 7, 18

  80. [80]

    Qwen3 technical report, 2025

    Qwen Team. Qwen3 technical report, 2025. 8, 22

Showing first 80 references.