pith. machine review for the scientific record. sign in

arxiv: 2604.14025 · v1 · submitted 2026-04-15 · 💻 cs.CV · cs.AI· cs.GR

Recognition: unknown

Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective

Authors on Pith no claims yet

Pith reviewed 2026-05-10 13:10 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.GR
keywords feed-forward 3D reconstruction3D scene modelingsurveytaxonomycomputer visiongeneralizable 3Dmodel design
0
0 comments X

The pith

Feed-forward 3D reconstruction methods share common design patterns best captured by a taxonomy of five problems rather than output formats.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper surveys methods that map 2D images to 3D scenes in a single forward pass. It notes that approaches using very different 3D output types still rely on similar image feature backbones, multi-view fusion steps, and geometry-aware choices. This observation motivates a new taxonomy built around five recurring problems that drive recent work: feature enhancement, geometry awareness, model efficiency, augmentation strategies, and temporal-aware modeling. The survey backs the taxonomy with organized reviews of benchmarks, datasets, and real-world applications while listing open issues such as scalability and standardized evaluation.

Core claim

Despite diverse geometric output representations ranging from implicit fields to explicit primitives, existing feed-forward approaches share similar high-level architectural patterns such as image feature extraction backbones, multi-view information fusion mechanisms, and geometry-aware design principles. The authors therefore abstract away from output differences and organize the literature by five key problems that shape model design: feature enhancement, geometry awareness, model efficiency, augmentation strategies, and temporal-aware models. The taxonomy is further supported by comprehensive coverage of benchmarks, datasets, categorized applications, and discussion of future challenges.

What carries the argument

The proposed taxonomy centered on five key problems: feature enhancement, geometry awareness, model efficiency, augmentation strategies, and temporal-aware models. This structure abstracts from output representation differences to group methods by shared architectural strategies and design choices.

If this is right

  • Advances in feature enhancement improve robustness to varied input images across many output formats.
  • Geometry awareness mechanisms enforce multi-view consistency in reconstructed scenes.
  • Efficiency-focused designs enable practical deployment of feed-forward models.
  • Augmentation strategies support better generalization to new scenes and categories.
  • Temporal-aware extensions allow the same feed-forward approach to handle dynamic or video inputs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The taxonomy could guide hybrid models that combine solutions from multiple problem areas without regard to final output type.
  • It implies that progress on any one problem may transfer across representation choices, accelerating overall field progress.
  • Researchers might test the taxonomy by checking whether new papers naturally cluster under one or more of the five problems.
  • Similar problem-driven groupings could be explored in adjacent tasks such as 4D reconstruction or neural rendering.

Load-bearing premise

Abstracting away from differences in geometric output representations yields a more useful organization of the literature than taxonomies based on those representations.

What would settle it

A detailed comparison that finds methods with different output representations require fundamentally incompatible architectural choices not captured by the five problems would undermine the taxonomy.

read the original abstract

Reconstructing 3D representations from 2D inputs is a fundamental task in computer vision and graphics, serving as a cornerstone for understanding and interacting with the physical world. While traditional methods achieve high fidelity, they are limited by slow per-scene optimization or category-specific training, which hinders their practical deployment and scalability. Hence, generalizable feed-forward 3D reconstruction has witnessed rapid development in recent years. By learning a model that maps images directly to 3D representations in a single forward pass, these methods enable efficient reconstruction and robust cross-scene generalization. Our survey is motivated by a critical observation: despite the diverse geometric output representations, ranging from implicit fields to explicit primitives, existing feed-forward approaches share similar high-level architectural patterns, such as image feature extraction backbones, multi-view information fusion mechanisms, and geometry-aware design principles. Consequently, we abstract away from these representation differences and instead focus on model design, proposing a novel taxonomy centered on model design strategies that are agnostic to the output format. Our proposed taxonomy organizes the research directions into five key problems that drive recent research development: feature enhancement, geometry awareness, model efficiency, augmentation strategies and temporal-aware models. To support this taxonomy with empirical grounding and standardized evaluation, we further comprehensively review related benchmarks and datasets, and extensively discuss and categorize real-world applications based on feed-forward 3D models. Finally, we outline future directions to address open challenges such as scalability, evaluation standards, and world modeling.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript surveys feed-forward 3D scene reconstruction methods that map 2D images to 3D representations in a single forward pass. It observes that approaches using diverse output formats (implicit fields to explicit primitives) nevertheless share high-level architectural patterns in image feature extraction, multi-view fusion, and geometry-aware design. From this observation the authors derive a representation-agnostic taxonomy organized around five driving problems: feature enhancement, geometry awareness, model efficiency, augmentation strategies, and temporal-aware models. The survey additionally reviews benchmarks and datasets, categorizes real-world applications, and outlines open challenges in scalability and evaluation.

Significance. A well-substantiated taxonomy that successfully decouples design strategies from output representation could usefully reorganize the rapidly expanding feed-forward 3D literature and surface shared research directions. The promised empirical review of benchmarks and applications would further increase the manuscript's value as a reference for the computer-vision community.

major comments (1)
  1. [Abstract and §1] Abstract and §1: the central claim that the five-problem taxonomy is 'agnostic to the output format' and more useful than prior representation-centered surveys is load-bearing, yet the manuscript provides no side-by-side re-categorization of the same papers under both schemes nor quantitative evidence (e.g., performance or efficiency trends grouped by the new axes) that the reorganization better predicts generalization behavior. Without such validation the taxonomy risks being a relabeling rather than an advance.
minor comments (2)
  1. [Taxonomy section] Ensure that every cited method is explicitly mapped to at least one of the five taxonomy categories so readers can verify coverage.
  2. [Taxonomy section] Clarify the exact criteria used to assign papers to 'augmentation strategies' versus 'model efficiency' when a method addresses both.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our taxonomy. We address the major comment point by point below.

read point-by-point responses
  1. Referee: [Abstract and §1] Abstract and §1: the central claim that the five-problem taxonomy is 'agnostic to the output format' and more useful than prior representation-centered surveys is load-bearing, yet the manuscript provides no side-by-side re-categorization of the same papers under both schemes nor quantitative evidence (e.g., performance or efficiency trends grouped by the new axes) that the reorganization better predicts generalization behavior. Without such validation the taxonomy risks being a relabeling rather than an advance.

    Authors: We agree that a side-by-side comparison would help readers see the reorganization in action. In the revised version we will add a table in Section 1 that maps 12–15 representative papers (covering implicit, explicit, and hybrid outputs) to both the traditional representation-centered categories and our five-problem taxonomy. This table will illustrate how methods that differ in output format nevertheless share design choices for feature enhancement, geometry awareness, etc. We also plan to expand the discussion in §1 with concrete examples of cross-representation patterns already noted in the manuscript. Regarding quantitative evidence (performance or efficiency trends grouped by the new axes), we note that a rigorous meta-analysis would require standardized re-implementations and controlled re-evaluations of many methods—an undertaking that exceeds the scope of a survey. Our taxonomy is motivated by the observed architectural commonalities across the literature rather than by new empirical meta-results; the separate benchmark and dataset review in the paper is intended to facilitate such future quantitative studies. We believe the value of the taxonomy lies in its ability to surface shared research directions that representation-centric surveys obscure, even without new performance numbers. revision: partial

Circularity Check

0 steps flagged

No circularity: survey taxonomy is observational synthesis, not derived from self-referential inputs

full rationale

The manuscript is a literature survey whose central contribution is an observational claim that feed-forward 3D methods share high-level architectural patterns (feature backbones, multi-view fusion, geometry-aware designs) across output representations, followed by a proposed taxonomy of five design problems. This claim is grounded in review of external literature rather than any equation, fitted parameter, or self-citation chain that reduces to the paper's own inputs. No predictions, first-principles derivations, uniqueness theorems, or ansatzes are introduced; the taxonomy is presented as an organizational lens, not a result forced by construction. Self-citations, if present for specific methods, are not load-bearing for the taxonomy itself. The paper therefore contains no circular steps of the enumerated kinds.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper introduces no new free parameters, axioms, or invented entities; it is a review that relies on standard computer-vision assumptions already present in the cited literature.

pith-pipeline@v0.9.0 · 5614 in / 1155 out tokens · 61939 ms · 2026-05-10T13:10:41.396362+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

297 extracted references · 98 canonical work pages · 16 internal anchors

  1. [1]

    Nerf: Representing scenes as neural radiance fields for view synthesis

    Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoor- thi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021

  2. [2]

    3d gaussian splatting for real-time radiance field rendering.ACM Trans

    Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Trans. Graph., 42(4):139–1, 2023

  3. [3]

    Dust3r: Geometric 3d vision made easy

    Shuzhe Wang, Vincent Leroy, Yohann Cabon, Boris Chidlovskii, and Jerome Revaud. Dust3r: Geometric 3d vision made easy. InIEEE Conf. Comput. Vis. Pattern Recog., pages 20697–20709, 2024

  4. [4]

    Deepsdf: Learning continuous signed distance functions for shape representation

    Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, and Steven Lovegrove. Deepsdf: Learning continuous signed distance functions for shape representation. InIEEE Conf. Comput. Vis. Pattern Recog., pages 165–174, 2019

  5. [5]

    Occupancy networks: Learning 3d reconstruction in function space

    Lars Mescheder, Michael Oechsle, Michael Niemeyer, Sebastian Nowozin, and Andreas Geiger. Occupancy networks: Learning 3d reconstruction in function space. InIEEE Conf. Comput. Vis. Pattern Recog., pages 4460–4470, 2019

  6. [6]

    Texture fields: Learning texture representations in function space

    Michael Oechsle, Lars Mescheder, Michael Niemeyer, Thilo Strauss, and Andreas Geiger. Texture fields: Learning texture representations in function space. InInt. Conf. Comput. Vis., pages 4531–4540, 2019

  7. [7]

    Learning implicit surface light fields

    Michael Oechsle, Michael Niemeyer, Christian Reiser, Lars Mescheder, Thilo Strauss, and Andreas Geiger. Learning implicit surface light fields. InInt. Conf. 3D Vision (3DV), pages 452–462. IEEE, 2020

  8. [8]

    Differentiable volumetric rendering: Learning implicit 3d representations without 3d supervision

    Michael Niemeyer, Lars Mescheder, Michael Oechsle, and Andreas Geiger. Differentiable volumetric rendering: Learning implicit 3d representations without 3d supervision. InIEEE Conf. Comput. Vis. Pattern Recog., pages 3504–3515, 2020

  9. [9]

    A computer algorithm for reconstructing a scene from two projections.Nature, 293(5828):133–135, 1981

    H Christopher Longuet-Higgins. A computer algorithm for reconstructing a scene from two projections.Nature, 293(5828):133–135, 1981

  10. [10]

    Multi-view stereo: A tutorial.Foundations and trends®in Computer Graphics and Vision, 9(1-2):1–148, 2015

    Yasutaka Furukawa, Carlos Hernández, et al. Multi-view stereo: A tutorial.Foundations and trends®in Computer Graphics and Vision, 9(1-2):1–148, 2015

  11. [11]

    Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling.Advances in neural information processing systems, 29, 2016

    Jiajun Wu, Chengkai Zhang, Tianfan Xue, Bill Freeman, and Josh Tenenbaum. Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling.Advances in neural information processing systems, 29, 2016

  12. [12]

    A point set generation network for 3d object reconstruction from a single image

    Haoqiang Fan, Hao Su, and Leonidas J Guibas. A point set generation network for 3d object reconstruction from a single image. InIEEE Conf. Comput. Vis. Pattern Recog., pages 605–613, 2017

  13. [14]

    Light field networks: Neural scene representations with single-evaluation rendering.Adv

    Vincent Sitzmann, Semon Rezchikov, Bill Freeman, Josh Tenenbaum, and Fredo Durand. Light field networks: Neural scene representations with single-evaluation rendering.Adv. Neural Inf. Process. Syst., 34:19313–19325, 2021. 38 / 66 Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective8 Conclusion

  14. [15]

    pixelsplat: 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction

    David Charatan, Sizhe Lester Li, Andrea Tagliasacchi, and Vincent Sitzmann. pixelsplat: 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction. InIEEE Conf. Comput. Vis. Pattern Recog., pages 19457–19467, 2024

  15. [16]

    Splatter image: Ultra- fast single-view 3d reconstruction

    Stanislaw Szymanowicz, Christian Rupprecht, and Andrea Vedaldi. Splatter image: Ultra- fast single-view 3d reconstruction. InIEEE Conf. Comput. Vis. Pattern Recog., 2024

  16. [17]

    Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images

    Yuedong Chen, Haofei Xu, Chuanxia Zheng, Bohan Zhuang, Marc Pollefeys, Andreas Geiger, Tat-Jen Cham, and Jianfei Cai. Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images. InEur. Conf. Comput. Vis., pages 370–386. Springer, 2024

  17. [18]

    Depthsplat: Connecting gaussian splatting and depth

    Haofei Xu, Songyou Peng, Fangjinhua Wang, Hermann Blum, Daniel Barath, Andreas Geiger, and Marc Pollefeys. Depthsplat: Connecting gaussian splatting and depth. InIEEE Conf. Comput. Vis. Pattern Recog., 2025

  18. [19]

    Vggt: Visual geometry grounded transformer, 2025

    Jianyuan Wang, Minghao Chen, Nikita Karaev, Andrea Vedaldi, Christian Rupprecht, and David Novotny. Vggt: Visual geometry grounded transformer.arXiv preprint arXiv:2503.11651, 2025

  19. [20]

    arXiv preprint arXiv:2507.14501 , year=

    Jiahui Zhang, Yuelei Li, Anpei Chen, Muyu Xu, Kunhao Liu, Jianyuan Wang, Xiao-Xiao Long, Hanxue Liang, Zexiang Xu, Hao Su, et al. Advances in feed-forward 3d reconstruction and view synthesis: A survey.arXiv preprint arXiv:2507.14501, 2025

  20. [21]

    Large scale multi-view stereopsis evaluation

    Rasmus Jensen, Anders Dahl, George Vogiatzis, Engin Tola, and Henrik Aanæs. Large scale multi-view stereopsis evaluation. InIEEE Conf. Comput. Vis. Pattern Recog., pages 406–413, 2014

  21. [22]

    Scannet: Richly-annotated 3d reconstructions of indoor scenes

    Angela Dai, Angel X Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner. Scannet: Richly-annotated 3d reconstructions of indoor scenes. InIEEE Conf. Comput. Vis. Pattern Recog., pages 5828–5839, 2017

  22. [23]

    The Replica Dataset: A Digital Replica of Indoor Spaces

    Julian Straub, Thomas Whelan, Lingni Ma, Yufan Chen, Erik Wijmans, Simon Green, Jakob J Engel, Raul Mur-Artal, Carl Ren, Shobhit Verma, et al. The replica dataset: A digital replica of indoor spaces.arXiv preprint arXiv:1906.05797, 2019

  23. [24]

    Stereo magnification: learning view synthesis using multiplane images.ACM Trans

    Tinghui Zhou, Richard Tucker, John Flynn, Graham Fyffe, and Noah Snavely. Stereo magnification: learning view synthesis using multiplane images.ACM Trans. Graph., 37(4): 1–12, 2018

  24. [25]

    Dl3dv-10k: A large-scale scene dataset for deep learning- based 3d vision

    Lu Ling, Yichen Sheng, Zhi Tu, Wentian Zhao, Cheng Xin, Kun Wan, Lantao Yu, Qianyu Guo, Zixun Yu, Yawen Lu, et al. Dl3dv-10k: A large-scale scene dataset for deep learning- based 3d vision. InIEEE Conf. Comput. Vis. Pattern Recog., pages 22160–22169, 2024

  25. [26]

    Grounding image matching in 3d with mast3r

    Vincent Leroy, Yohann Cabon, and Jérôme Revaud. Grounding image matching in 3d with mast3r. InEur. Conf. Comput. Vis., pages 71–91. Springer, 2024

  26. [27]

    Chao Chen, Yu-Shen Liu, and Zhizhong Han

    Hanzhi Chang, Ruijie Zhu, Wenjie Chang, Mulin Yu, Yanzhe Liang, Jiahao Lu, Zhuoyuan Li, and Tianzhu Zhang. Meshsplat: Generalizable sparse-view surface reconstruction via gaussian splatting.arXiv preprint arXiv:2508.17811, 2025

  27. [28]

    Chen, Zeyu Zhang, Jiawang Bian, Bohan Zhuang, and Chunhua Shen

    Duochao Shi, Weijie Wang, Donny Y. Chen, Zeyu Zhang, Jiawang Bian, Bohan Zhuang, and Chunhua Shen. Revisiting depth representations for feed-forward 3d gaussian splatting. arXiv preprint arXiv:2506.05327, 2025. 39 / 66 Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective8 Conclusion

  28. [29]

    Lightgaussian: Unbounded 3d gaussian compression with 15x reduction and 200+ fps.Adv

    Zhiwen Fan, Kevin Wang, Kairun Wen, Zehao Zhu, Dejia Xu, Zhangyang Wang, et al. Lightgaussian: Unbounded 3d gaussian compression with 15x reduction and 200+ fps.Adv. Neural Inf. Process. Syst., 37:140138–140158, 2024

  29. [30]

    Compact 3d gaussian representation for radiance field

    Joo Chan Lee, Daniel Rho, Xiangyu Sun, Jong Hwan Ko, and Eunbyung Park. Compact 3d gaussian representation for radiance field. InIEEE Conf. Comput. Vis. Pattern Recog., pages 21719–21728, 2024

  30. [31]

    Barron, Ben Mildenhall, Dor Verbin, Pratul P

    Jonathan T. Barron, Ben Mildenhall, Dor Verbin, Pratul P. Srinivasan, and Peter Hedman. Mip-nerf 360: Unbounded anti-aliased neural radiance fields.IEEE Conf. Comput. Vis. Pattern Recog., 2022

  31. [32]

    Regnerf: Regularizing neural radiance fields for view synthesis from sparse inputs

    Michael Niemeyer, Jonathan T Barron, Ben Mildenhall, Mehdi SM Sajjadi, Andreas Geiger, and Noha Radwan. Regnerf: Regularizing neural radiance fields for view synthesis from sparse inputs. InIEEE Conf. Comput. Vis. Pattern Recog., pages 5480–5490, 2022

  32. [33]

    Long-lrm: Long- sequence large reconstruction model for wide-coverage gaussian splats.CoRR, abs/2410.12781, 2024

    Chen Ziwen, Hao Tan, Kai Zhang, Sai Bi, Fujun Luan, Yicong Hong, Li Fuxin, and Zexiang Xu. Long-lrm: Long-sequence large reconstruction model for wide-coverage gaussian splats. arXiv preprint arXiv:2410.12781, 2024

  33. [34]

    Optical models for direct volume rendering.IEEE Trans

    Nelson Max. Optical models for direct volume rendering.IEEE Trans. Vis. Comput. Graph., 1(2):99–108, 2002

  34. [35]

    arXiv preprint arXiv:2209.02417 (2022)

    Andrea Tagliasacchi and Ben Mildenhall. Volume rendering digest (for nerf).arXiv preprint arXiv:2209.02417, 2022

  35. [36]

    Barron, Ben Mildenhall, Matthew Tancik, Peter Hedman, Ricardo Martin- Brualla, and Pratul P

    Jonathan T. Barron, Ben Mildenhall, Matthew Tancik, Peter Hedman, Ricardo Martin- Brualla, and Pratul P. Srinivasan. Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields.Int. Conf. Comput. Vis., 2021

  36. [37]

    Barron, and Pratul P

    Dor Verbin, Peter Hedman, Ben Mildenhall, Todd Zickler, Jonathan T. Barron, and Pratul P. Srinivasan. Ref-NeRF: Structured view-dependent appearance for neural radiance fields. IEEE Conf. Comput. Vis. Pattern Recog., 2022

  37. [38]

    Depth-supervised nerf: Fewer views and faster training for free

    Kangle Deng, Andrew Liu, Jun-Yan Zhu, and Deva Ramanan. Depth-supervised nerf: Fewer views and faster training for free. InIEEE Conf. Comput. Vis. Pattern Recog., pages 12882–12891, 2022

  38. [39]

    Nerfingmvs: Guided optimization of neural radiance fields for indoor multi-view stereo

    Yi Wei, Shaohui Liu, Yongming Rao, Wang Zhao, Jiwen Lu, and Jie Zhou. Nerfingmvs: Guided optimization of neural radiance fields for indoor multi-view stereo. InInt. Conf. Comput. Vis., 2021

  39. [40]

    Instant neural graphics primitives with a multiresolution hash encoding.ACM Trans

    Thomas Müller, Alex Evans, Christoph Schied, and Alexander Keller. Instant neural graphics primitives with a multiresolution hash encoding.ACM Trans. Graph., 41(4):1–15, 2022

  40. [41]

    Fregs: 3d gaussian splatting with progressive frequency regularization

    Jiahui Zhang, Fangneng Zhan, Muyu Xu, Shijian Lu, and Eric Xing. Fregs: 3d gaussian splatting with progressive frequency regularization. InIEEE Conf. Comput. Vis. Pattern Recog., pages 21424–21433, 2024

  41. [42]

    Scaffold-gs: Structured 3d gaussians for view-adaptive rendering

    Tao Lu, Mulin Yu, Linning Xu, Yuanbo Xiangli, Limin Wang, Dahua Lin, and Bo Dai. Scaffold-gs: Structured 3d gaussians for view-adaptive rendering. InIEEE Conf. Comput. Vis. Pattern Recog., pages 20654–20664, 2024

  42. [43]

    Gaussian opacity fields: Efficient adaptive surface reconstruction in unbounded scenes.ACM Trans

    Zehao Yu, Torsten Sattler, and Andreas Geiger. Gaussian opacity fields: Efficient adaptive surface reconstruction in unbounded scenes.ACM Trans. Graph., 43(6):1–13, 2024. 40 / 66 Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective8 Conclusion

  43. [44]

    Gaussianshader: 3d gaussian splatting with shading functions for reflective surfaces

    Yingwenqi Jiang, Jiadong Tu, Yuan Liu, Xifeng Gao, Xiaoxiao Long, Wenping Wang, and Yuexin Ma. Gaussianshader: 3d gaussian splatting with shading functions for reflective surfaces. InIEEE Conf. Comput. Vis. Pattern Recog., pages 5322–5332, 2024

  44. [45]

    Spec-gaussian: Anisotropic view-dependent appearance for 3d gaussian splatting.Adv

    Ziyi Yang, Xinyu Gao, Yang-Tian Sun, Yihua Huang, Xiaoyang Lyu, Wen Zhou, Shaohui Jiao, Xiaojuan Qi, and Xiaogang Jin. Spec-gaussian: Anisotropic view-dependent appearance for 3d gaussian splatting.Adv. Neural Inf. Process. Syst., 37:61192–61216, 2024

  45. [46]

    Mirror-3dgs: Incorporating mirror reflections into 3d gaussian splatting

    Jiarui Meng, Haijie Li, Yanmin Wu, Qiankun Gao, Shuzhou Yang, Jian Zhang, and Siwei Ma. Mirror-3dgs: Incorporating mirror reflections into 3d gaussian splatting. InIEEE Int. Conf. Vis. Commun. Image Process., pages 1–5. IEEE, 2024

  46. [47]

    Bags: Blur agnostic gaussian splatting through multi-scale kernel modeling

    Cheng Peng, Yutao Tang, Yifan Zhou, Nengyu Wang, Xijun Liu, Deming Li, and Rama Chellappa. Bags: Blur agnostic gaussian splatting through multi-scale kernel modeling. In Eur. Conf. Comput. Vis., pages 293–310. Springer, 2024

  47. [48]

    Bad-gaussians: Bundle adjusted deblur gaussian splatting

    Lingzhe Zhao, Peng Wang, and Peidong Liu. Bad-gaussians: Bundle adjusted deblur gaussian splatting. InEur. Conf. Comput. Vis., pages 233–250. Springer, 2024

  48. [49]

    Reducing the memory footprint of 3d gaussian splatting.Proceedings of the ACM on Computer Graphics and Interactive Techniques, 7(1):1–17, 2024

    Panagiotis Papantonakis, Georgios Kopanas, Bernhard Kerbl, Alexandre Lanvin, and George Drettakis. Reducing the memory footprint of 3d gaussian splatting.Proceedings of the ACM on Computer Graphics and Interactive Techniques, 7(1):1–17, 2024

  49. [50]

    Compressed 3d gaussian splatting for accelerated novel view synthesis

    Simon Niedermayr, Josef Stumpfegger, and Rüdiger Westermann. Compressed 3d gaussian splatting for accelerated novel view synthesis. InIEEE Conf. Comput. Vis. Pattern Recog., pages 10349–10358, 2024

  50. [51]

    Hac: Hash-grid assisted context for 3d gaussian splatting compression

    Yihang Chen, Qianyi Wu, Weiyao Lin, Mehrtash Harandi, and Jianfei Cai. Hac: Hash-grid assisted context for 3d gaussian splatting compression. InEur. Conf. Comput. Vis., pages 422–438. Springer, 2024

  51. [52]

    Dsac-differentiable ransac for camera localization

    Eric Brachmann, Alexander Krull, Sebastian Nowozin, Jamie Shotton, Frank Michel, Stefan Gumhold, and Carsten Rother. Dsac-differentiable ransac for camera localization. InIEEE Conf. Comput. Vis. Pattern Recog., pages 6684–6692, 2017

  52. [53]

    Learning less is more-6d camera localization via 3d surface regression

    Eric Brachmann and Carsten Rother. Learning less is more-6d camera localization via 3d surface regression. InIEEE Conf. Comput. Vis. Pattern Recog., pages 4654–4662, 2018

  53. [54]

    Visual camera re-localization from rgb and rgb-d images using dsac.IEEE Trans

    Eric Brachmann and Carsten Rother. Visual camera re-localization from rgb and rgb-d images using dsac.IEEE Trans. Pattern Anal. Mach. Intell., 44(9):5847–5865, 2021

  54. [55]

    Sacreg: Scene-agnostic coordinate regression for visual localization

    Jerome Revaud, Yohann Cabon, Romain Brégier, JongMin Lee, and Philippe Weinzaepfel. Sacreg: Scene-agnostic coordinate regression for visual localization. InIEEE Conf. Comput. Vis. Pattern Recog., pages 688–698, 2024

  55. [56]

    Learning camera localization via dense scene matching

    Shitao Tang, Chengzhou Tang, Rui Huang, Siyu Zhu, and Ping Tan. Learning camera localization via dense scene matching. InIEEE Conf. Comput. Vis. Pattern Recog., pages 1831–1841, 2021

  56. [57]

    Sanet: Scene agnostic network for camera localization

    Luwei Yang, Ziqian Bai, Chengzhou Tang, Honghua Li, Yasutaka Furukawa, and Ping Tan. Sanet: Scene agnostic network for camera localization. InInt. Conf. Comput. Vis., pages 42–51, 2019

  57. [58]

    Learning efficient point cloud generation for dense 3d object reconstruction

    Chen-Hsuan Lin, Chen Kong, and Simon Lucey. Learning efficient point cloud generation for dense 3d object reconstruction. InAAAI, volume 32, 2018. 41 / 66 Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective8 Conclusion

  58. [59]

    Pixels, voxels, and views: A study of shape representations for single view 3d object shape prediction

    Daeyun Shin, Charless C Fowlkes, and Derek Hoiem. Pixels, voxels, and views: A study of shape representations for single view 3d object shape prediction. InIEEE Conf. Comput. Vis. Pattern Recog., pages 3061–3069, 2018

  59. [60]

    Multi-view 3d models from single images with a convolutional network

    Maxim Tatarchenko, Alexey Dosovitskiy, and Thomas Brox. Multi-view 3d models from single images with a convolutional network. InEur. Conf. Comput. Vis., pages 322–337. Springer, 2016

  60. [61]

    Synsin: End-to-end view synthesis from a single image

    Olivia Wiles, Georgia Gkioxari, Richard Szeliski, and Justin Johnson. Synsin: End-to-end view synthesis from a single image. InIEEE Conf. Comput. Vis. Pattern Recog., pages 7467–7477, 2020

  61. [62]

    Croco: Self-supervised pre-training for 3d vision tasks by cross-view completion.Adv

    Philippe Weinzaepfel, Vincent Leroy, Thomas Lucas, Romain Brégier, Yohann Cabon, Vaibhav Arora, Leonid Antsfeld, Boris Chidlovskii, Gabriela Csurka, and Jérôme Revaud. Croco: Self-supervised pre-training for 3d vision tasks by cross-view completion.Adv. Neural Inf. Process. Syst., 35:3502–3516, 2022

  62. [63]

    Learning implicit fields for generative shape modeling

    Zhiqin Chen and Hao Zhang. Learning implicit fields for generative shape modeling. In IEEE Conf. Comput. Vis. Pattern Recog., pages 5939–5948, 2019

  63. [64]

    Convolutional occupancy networks

    Songyou Peng, Michael Niemeyer, Lars Mescheder, Marc Pollefeys, and Andreas Geiger. Convolutional occupancy networks. InEur. Conf. Comput. Vis., 2020

  64. [65]

    Sparseneus: Fast generalizable neural surface reconstruction from sparse views

    Xiaoxiao Long, Cheng Lin, Peng Wang, Taku Komura, and Wenping Wang. Sparseneus: Fast generalizable neural surface reconstruction from sparse views. InEur. Conf. Comput. Vis., pages 210–227. Springer, 2022

  65. [66]

    Volrecon: Volume rendering of signed ray distance functions for generalizable multi-view reconstruction

    Yufan Ren, Fangjinhua Wang, Tong Zhang, Marc Pollefeys, and Sabine Süsstrunk. Volrecon: Volume rendering of signed ray distance functions for generalizable multi-view reconstruction. InIEEE Conf. Comput. Vis. Pattern Recog., pages 16685–16695, 2023

  66. [67]

    Retr: Modeling rendering via transformer for generalizable neural surface reconstruction.Advances in neural information processing systems, 36:62332–62351, 2023

    Yixun Liang, Hao He, and Yingcong Chen. Retr: Modeling rendering via transformer for generalizable neural surface reconstruction.Advances in neural information processing systems, 36:62332–62351, 2023

  67. [68]

    2023 , url =

    Luoyuan Xu, Tao Guan, Yuesong Wang, Wenkai Liu, Zhaojie Zeng, Junle Wang, and Wei Yang. C2f2neus: Cascade cost frustum fusion for high fidelity and generalizable neural surface reconstruction. InInt. Conf. Comput. Vis., pages 18245–18255, 2023. doi: 10.1109/ICCV51070.2023.01677

  68. [69]

    Uforecon: Generalizable sparse-view surface reconstruction from arbitrary and unfavorable sets

    Youngju Na, Woo Jae Kim, Kyu Beom Han, Suhyeon Ha, and Sung-Eui Yoon. Uforecon: Generalizable sparse-view surface reconstruction from arbitrary and unfavorable sets. In IEEE Conf. Comput. Vis. Pattern Recog., pages 5094–5104, 2024

  69. [70]

    Surfacesplat: Connecting surface reconstruction and gaussian splatting.arXiv preprint arXiv:2507.15602, 2025

    Zihui Gao, Jia-Wang Bian, Guosheng Lin, Hao Chen, and Chunhua Shen. Surfacesplat: Connecting surface reconstruction and gaussian splatting.arXiv preprint arXiv:2507.15602, 2025

  70. [71]

    arXiv preprint arXiv:2404.12385 , year=

    Xinyue Wei, Kai Zhang, Sai Bi, Hao Tan, Fujun Luan, Valentin Deschaintre, Kalyan Sunkavalli, Hao Su, and Zexiang Xu. Meshlrm: Large reconstruction model for high-quality meshes.arXiv preprint arXiv:2404.12385, 2024

  71. [72]

    Meshformer: High-quality mesh generation with 3d-guided reconstruction model.Adv

    Minghua Liu, Chong Zeng, Xinyue Wei, Ruoxi Shi, Linghao Chen, Chao Xu, Mengqi Zhang, Zhaoning Wang, Xiaoshuai Zhang, Isabella Liu, et al. Meshformer: High-quality mesh generation with 3d-guided reconstruction model.Adv. Neural Inf. Process. Syst., 37: 59314–59341, 2024. 42 / 66 Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective8 Conclusion

  72. [73]

    Renderformer: Transformer-based neural rendering of triangle meshes with global illumination

    Chong Zeng, Yue Dong, Pieter Peers, Hongzhi Wu, and Xin Tong. Renderformer: Transformer-based neural rendering of triangle meshes with global illumination. InACM SIGGRAPH Conf. Comput. Graph. Interact. Tech., 2025

  73. [74]

    Lara: Efficient large-baseline radiance fields

    Anpei Chen, Haofei Xu, Stefano Esposito, Siyu Tang, and Andreas Geiger. Lara: Efficient large-baseline radiance fields. InEur. Conf. Comput. Vis., pages 338–355. Springer, 2024

  74. [75]

    Lrm: Large reconstruction model for single image to 3d

    Yicong Hong, Kai Zhang, Jiuxiang Gu, Sai Bi, Yang Zhou, Difan Liu, Feng Liu, Kalyan Sunkavalli, Trung Bui, and Hao Tan. Lrm: Large reconstruction model for single image to 3d. InInt. Conf. Learn. Represent., 2024

  75. [76]

    Instant3d: Fast text-to-3d with sparse-view generation and large reconstruction model

    Jiahao Li, Hao Tan, Kai Zhang, Zexiang Xu, Fujun Luan, Yinghao Xu, Yicong Hong, Kalyan Sunkavalli, Greg Shakhnarovich, and Sai Bi. Instant3d: Fast text-to-3d with sparse-view generation and large reconstruction model. InInt. Conf. Learn. Represent., 2024

  76. [77]

    Agg: Amortized generative 3d gaussians for single image to 3d.arXiv preprint arXiv:2401.04099, 2024

    Dejia Xu, Ye Yuan, Morteza Mardani, Sifei Liu, Jiaming Song, Zhangyang Wang, and Arash Vahdat. Agg: Amortized generative 3d gaussians for single image to 3d.arXiv preprint arXiv:2401.04099, 2024

  77. [78]

    Triplane meets gaussian splatting: Fast and generalizable single-view 3d reconstruction with transformers

    Zi-Xin Zou, Zhipeng Yu, Yuan-Chen Guo, Yangguang Li, Ding Liang, Yan-Pei Cao, and Song-Hai Zhang. Triplane meets gaussian splatting: Fast and generalizable single-view 3d reconstruction with transformers. InIEEE Conf. Comput. Vis. Pattern Recog., pages 10324–10335, 2024

  78. [79]

    Generalizable patch-based neural rendering

    Mohammed Suhail, Carlos Esteves, Leonid Sigal, and Ameesh Makadia. Generalizable patch-based neural rendering. InEur. Conf. Comput. Vis., pages 156–174. Springer, 2022

  79. [80]

    Lvsm: A large view synthesis model with minimal 3d inductive bias

    Haian Jin, Hanwen Jiang, Hao Tan, Kai Zhang, Sai Bi, Tianyuan Zhang, Fujun Luan, Noah Snavely, and Zexiang Xu. Lvsm: A large view synthesis model with minimal 3d inductive bias. InInt. Conf. Learn. Represent., 2025. URLhttps://openreview.net/forum?id=QQ BPWtvtcn

  80. [81]

    Pluckerf: A line-based 3d representation for few-view reconstruction

    Sam Bahrami and Dylan Campbell. Pluckerf: A line-based 3d representation for few-view reconstruction. InIEEE Conf. Comput. Vis. Pattern Recog., pages 317–326, 2025

Showing first 80 references.