arxiv: 2604.14025 · v1 · submitted 2026-04-15 · 💻 cs.CV · cs.AI· cs.GR

Recognition: unknown

Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective

Weijie Wang , Qihang Cao , Sensen Gao , Donny Y. Chen , Haofei Xu , Wenjing Bian , Songyou Peng , Tat-Jen Cham

show 5 more authors

Chuanxia Zheng Andreas Geiger Jianfei Cai Jia-Wang Bian Bohan Zhuang

Authors on Pith no claims yet

Pith reviewed 2026-05-10 13:10 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.GR

keywords feed-forward 3D reconstruction3D scene modelingsurveytaxonomycomputer visiongeneralizable 3Dmodel design

0 comments

The pith

Feed-forward 3D reconstruction methods share common design patterns best captured by a taxonomy of five problems rather than output formats.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper surveys methods that map 2D images to 3D scenes in a single forward pass. It notes that approaches using very different 3D output types still rely on similar image feature backbones, multi-view fusion steps, and geometry-aware choices. This observation motivates a new taxonomy built around five recurring problems that drive recent work: feature enhancement, geometry awareness, model efficiency, augmentation strategies, and temporal-aware modeling. The survey backs the taxonomy with organized reviews of benchmarks, datasets, and real-world applications while listing open issues such as scalability and standardized evaluation.

Core claim

Despite diverse geometric output representations ranging from implicit fields to explicit primitives, existing feed-forward approaches share similar high-level architectural patterns such as image feature extraction backbones, multi-view information fusion mechanisms, and geometry-aware design principles. The authors therefore abstract away from output differences and organize the literature by five key problems that shape model design: feature enhancement, geometry awareness, model efficiency, augmentation strategies, and temporal-aware models. The taxonomy is further supported by comprehensive coverage of benchmarks, datasets, categorized applications, and discussion of future challenges.

What carries the argument

The proposed taxonomy centered on five key problems: feature enhancement, geometry awareness, model efficiency, augmentation strategies, and temporal-aware models. This structure abstracts from output representation differences to group methods by shared architectural strategies and design choices.

If this is right

Advances in feature enhancement improve robustness to varied input images across many output formats.
Geometry awareness mechanisms enforce multi-view consistency in reconstructed scenes.
Efficiency-focused designs enable practical deployment of feed-forward models.
Augmentation strategies support better generalization to new scenes and categories.
Temporal-aware extensions allow the same feed-forward approach to handle dynamic or video inputs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The taxonomy could guide hybrid models that combine solutions from multiple problem areas without regard to final output type.
It implies that progress on any one problem may transfer across representation choices, accelerating overall field progress.
Researchers might test the taxonomy by checking whether new papers naturally cluster under one or more of the five problems.
Similar problem-driven groupings could be explored in adjacent tasks such as 4D reconstruction or neural rendering.

Load-bearing premise

Abstracting away from differences in geometric output representations yields a more useful organization of the literature than taxonomies based on those representations.

What would settle it

A detailed comparison that finds methods with different output representations require fundamentally incompatible architectural choices not captured by the five problems would undermine the taxonomy.

read the original abstract

Reconstructing 3D representations from 2D inputs is a fundamental task in computer vision and graphics, serving as a cornerstone for understanding and interacting with the physical world. While traditional methods achieve high fidelity, they are limited by slow per-scene optimization or category-specific training, which hinders their practical deployment and scalability. Hence, generalizable feed-forward 3D reconstruction has witnessed rapid development in recent years. By learning a model that maps images directly to 3D representations in a single forward pass, these methods enable efficient reconstruction and robust cross-scene generalization. Our survey is motivated by a critical observation: despite the diverse geometric output representations, ranging from implicit fields to explicit primitives, existing feed-forward approaches share similar high-level architectural patterns, such as image feature extraction backbones, multi-view information fusion mechanisms, and geometry-aware design principles. Consequently, we abstract away from these representation differences and instead focus on model design, proposing a novel taxonomy centered on model design strategies that are agnostic to the output format. Our proposed taxonomy organizes the research directions into five key problems that drive recent research development: feature enhancement, geometry awareness, model efficiency, augmentation strategies and temporal-aware models. To support this taxonomy with empirical grounding and standardized evaluation, we further comprehensively review related benchmarks and datasets, and extensively discuss and categorize real-world applications based on feed-forward 3D models. Finally, we outline future directions to address open challenges such as scalability, evaluation standards, and world modeling.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A survey that reorganizes feed-forward 3D work around five shared design problems rather than output representations, with decent coverage of benchmarks but limited evidence that the new lens improves on prior ones.

read the letter

The main takeaway is that this paper gives a clean problem-driven taxonomy for feed-forward 3D reconstruction. It groups methods by feature enhancement, geometry awareness, efficiency, augmentation, and temporal modeling instead of the usual implicit-versus-explicit split. The authors note that many pipelines reuse similar backbones and fusion steps across different output formats, which is a fair observation and makes the abstraction reasonable on the surface.

Referee Report

1 major / 2 minor

Summary. The manuscript surveys feed-forward 3D scene reconstruction methods that map 2D images to 3D representations in a single forward pass. It observes that approaches using diverse output formats (implicit fields to explicit primitives) nevertheless share high-level architectural patterns in image feature extraction, multi-view fusion, and geometry-aware design. From this observation the authors derive a representation-agnostic taxonomy organized around five driving problems: feature enhancement, geometry awareness, model efficiency, augmentation strategies, and temporal-aware models. The survey additionally reviews benchmarks and datasets, categorizes real-world applications, and outlines open challenges in scalability and evaluation.

Significance. A well-substantiated taxonomy that successfully decouples design strategies from output representation could usefully reorganize the rapidly expanding feed-forward 3D literature and surface shared research directions. The promised empirical review of benchmarks and applications would further increase the manuscript's value as a reference for the computer-vision community.

major comments (1)

[Abstract and §1] Abstract and §1: the central claim that the five-problem taxonomy is 'agnostic to the output format' and more useful than prior representation-centered surveys is load-bearing, yet the manuscript provides no side-by-side re-categorization of the same papers under both schemes nor quantitative evidence (e.g., performance or efficiency trends grouped by the new axes) that the reorganization better predicts generalization behavior. Without such validation the taxonomy risks being a relabeling rather than an advance.

minor comments (2)

[Taxonomy section] Ensure that every cited method is explicitly mapped to at least one of the five taxonomy categories so readers can verify coverage.
[Taxonomy section] Clarify the exact criteria used to assign papers to 'augmentation strategies' versus 'model efficiency' when a method addresses both.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our taxonomy. We address the major comment point by point below.

read point-by-point responses

Referee: [Abstract and §1] Abstract and §1: the central claim that the five-problem taxonomy is 'agnostic to the output format' and more useful than prior representation-centered surveys is load-bearing, yet the manuscript provides no side-by-side re-categorization of the same papers under both schemes nor quantitative evidence (e.g., performance or efficiency trends grouped by the new axes) that the reorganization better predicts generalization behavior. Without such validation the taxonomy risks being a relabeling rather than an advance.

Authors: We agree that a side-by-side comparison would help readers see the reorganization in action. In the revised version we will add a table in Section 1 that maps 12–15 representative papers (covering implicit, explicit, and hybrid outputs) to both the traditional representation-centered categories and our five-problem taxonomy. This table will illustrate how methods that differ in output format nevertheless share design choices for feature enhancement, geometry awareness, etc. We also plan to expand the discussion in §1 with concrete examples of cross-representation patterns already noted in the manuscript. Regarding quantitative evidence (performance or efficiency trends grouped by the new axes), we note that a rigorous meta-analysis would require standardized re-implementations and controlled re-evaluations of many methods—an undertaking that exceeds the scope of a survey. Our taxonomy is motivated by the observed architectural commonalities across the literature rather than by new empirical meta-results; the separate benchmark and dataset review in the paper is intended to facilitate such future quantitative studies. We believe the value of the taxonomy lies in its ability to surface shared research directions that representation-centric surveys obscure, even without new performance numbers. revision: partial

Circularity Check

0 steps flagged

No circularity: survey taxonomy is observational synthesis, not derived from self-referential inputs

full rationale

The manuscript is a literature survey whose central contribution is an observational claim that feed-forward 3D methods share high-level architectural patterns (feature backbones, multi-view fusion, geometry-aware designs) across output representations, followed by a proposed taxonomy of five design problems. This claim is grounded in review of external literature rather than any equation, fitted parameter, or self-citation chain that reduces to the paper's own inputs. No predictions, first-principles derivations, uniqueness theorems, or ansatzes are introduced; the taxonomy is presented as an organizational lens, not a result forced by construction. Self-citations, if present for specific methods, are not load-bearing for the taxonomy itself. The paper therefore contains no circular steps of the enumerated kinds.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper introduces no new free parameters, axioms, or invented entities; it is a review that relies on standard computer-vision assumptions already present in the cited literature.

pith-pipeline@v0.9.0 · 5614 in / 1155 out tokens · 61939 ms · 2026-05-10T13:10:41.396362+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

297 extracted references · 98 canonical work pages · 16 internal anchors

[1]

Nerf: Representing scenes as neural radiance fields for view synthesis

Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoor- thi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021

2021
[2]

3d gaussian splatting for real-time radiance field rendering.ACM Trans

Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Trans. Graph., 42(4):139–1, 2023

2023
[3]

Dust3r: Geometric 3d vision made easy

Shuzhe Wang, Vincent Leroy, Yohann Cabon, Boris Chidlovskii, and Jerome Revaud. Dust3r: Geometric 3d vision made easy. InIEEE Conf. Comput. Vis. Pattern Recog., pages 20697–20709, 2024

2024
[4]

Deepsdf: Learning continuous signed distance functions for shape representation

Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, and Steven Lovegrove. Deepsdf: Learning continuous signed distance functions for shape representation. InIEEE Conf. Comput. Vis. Pattern Recog., pages 165–174, 2019

2019
[5]

Occupancy networks: Learning 3d reconstruction in function space

Lars Mescheder, Michael Oechsle, Michael Niemeyer, Sebastian Nowozin, and Andreas Geiger. Occupancy networks: Learning 3d reconstruction in function space. InIEEE Conf. Comput. Vis. Pattern Recog., pages 4460–4470, 2019

2019
[6]

Texture fields: Learning texture representations in function space

Michael Oechsle, Lars Mescheder, Michael Niemeyer, Thilo Strauss, and Andreas Geiger. Texture fields: Learning texture representations in function space. InInt. Conf. Comput. Vis., pages 4531–4540, 2019

2019
[7]

Learning implicit surface light fields

Michael Oechsle, Michael Niemeyer, Christian Reiser, Lars Mescheder, Thilo Strauss, and Andreas Geiger. Learning implicit surface light fields. InInt. Conf. 3D Vision (3DV), pages 452–462. IEEE, 2020

2020
[8]

Differentiable volumetric rendering: Learning implicit 3d representations without 3d supervision

Michael Niemeyer, Lars Mescheder, Michael Oechsle, and Andreas Geiger. Differentiable volumetric rendering: Learning implicit 3d representations without 3d supervision. InIEEE Conf. Comput. Vis. Pattern Recog., pages 3504–3515, 2020

2020
[9]

A computer algorithm for reconstructing a scene from two projections.Nature, 293(5828):133–135, 1981

H Christopher Longuet-Higgins. A computer algorithm for reconstructing a scene from two projections.Nature, 293(5828):133–135, 1981

1981
[10]

Multi-view stereo: A tutorial.Foundations and trends®in Computer Graphics and Vision, 9(1-2):1–148, 2015

Yasutaka Furukawa, Carlos Hernández, et al. Multi-view stereo: A tutorial.Foundations and trends®in Computer Graphics and Vision, 9(1-2):1–148, 2015

2015
[11]

Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling.Advances in neural information processing systems, 29, 2016

Jiajun Wu, Chengkai Zhang, Tianfan Xue, Bill Freeman, and Josh Tenenbaum. Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling.Advances in neural information processing systems, 29, 2016

2016
[12]

A point set generation network for 3d object reconstruction from a single image

Haoqiang Fan, Hao Su, and Leonidas J Guibas. A point set generation network for 3d object reconstruction from a single image. InIEEE Conf. Comput. Vis. Pattern Recog., pages 605–613, 2017

2017
[14]

Light field networks: Neural scene representations with single-evaluation rendering.Adv

Vincent Sitzmann, Semon Rezchikov, Bill Freeman, Josh Tenenbaum, and Fredo Durand. Light field networks: Neural scene representations with single-evaluation rendering.Adv. Neural Inf. Process. Syst., 34:19313–19325, 2021. 38 / 66 Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective8 Conclusion

2021
[15]

pixelsplat: 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction

David Charatan, Sizhe Lester Li, Andrea Tagliasacchi, and Vincent Sitzmann. pixelsplat: 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction. InIEEE Conf. Comput. Vis. Pattern Recog., pages 19457–19467, 2024

2024
[16]

Splatter image: Ultra- fast single-view 3d reconstruction

Stanislaw Szymanowicz, Christian Rupprecht, and Andrea Vedaldi. Splatter image: Ultra- fast single-view 3d reconstruction. InIEEE Conf. Comput. Vis. Pattern Recog., 2024

2024
[17]

Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images

Yuedong Chen, Haofei Xu, Chuanxia Zheng, Bohan Zhuang, Marc Pollefeys, Andreas Geiger, Tat-Jen Cham, and Jianfei Cai. Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images. InEur. Conf. Comput. Vis., pages 370–386. Springer, 2024

2024
[18]

Depthsplat: Connecting gaussian splatting and depth

Haofei Xu, Songyou Peng, Fangjinhua Wang, Hermann Blum, Daniel Barath, Andreas Geiger, and Marc Pollefeys. Depthsplat: Connecting gaussian splatting and depth. InIEEE Conf. Comput. Vis. Pattern Recog., 2025

2025
[19]

Vggt: Visual geometry grounded transformer, 2025

Jianyuan Wang, Minghao Chen, Nikita Karaev, Andrea Vedaldi, Christian Rupprecht, and David Novotny. Vggt: Visual geometry grounded transformer.arXiv preprint arXiv:2503.11651, 2025

work page arXiv 2025
[20]

arXiv preprint arXiv:2507.14501 , year=

Jiahui Zhang, Yuelei Li, Anpei Chen, Muyu Xu, Kunhao Liu, Jianyuan Wang, Xiao-Xiao Long, Hanxue Liang, Zexiang Xu, Hao Su, et al. Advances in feed-forward 3d reconstruction and view synthesis: A survey.arXiv preprint arXiv:2507.14501, 2025

work page arXiv 2025
[21]

Large scale multi-view stereopsis evaluation

Rasmus Jensen, Anders Dahl, George Vogiatzis, Engin Tola, and Henrik Aanæs. Large scale multi-view stereopsis evaluation. InIEEE Conf. Comput. Vis. Pattern Recog., pages 406–413, 2014

2014
[22]

Scannet: Richly-annotated 3d reconstructions of indoor scenes

Angela Dai, Angel X Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner. Scannet: Richly-annotated 3d reconstructions of indoor scenes. InIEEE Conf. Comput. Vis. Pattern Recog., pages 5828–5839, 2017

2017
[23]

The Replica Dataset: A Digital Replica of Indoor Spaces

Julian Straub, Thomas Whelan, Lingni Ma, Yufan Chen, Erik Wijmans, Simon Green, Jakob J Engel, Raul Mur-Artal, Carl Ren, Shobhit Verma, et al. The replica dataset: A digital replica of indoor spaces.arXiv preprint arXiv:1906.05797, 2019

work page internal anchor Pith review arXiv 1906
[24]

Stereo magnification: learning view synthesis using multiplane images.ACM Trans

Tinghui Zhou, Richard Tucker, John Flynn, Graham Fyffe, and Noah Snavely. Stereo magnification: learning view synthesis using multiplane images.ACM Trans. Graph., 37(4): 1–12, 2018

2018
[25]

Dl3dv-10k: A large-scale scene dataset for deep learning- based 3d vision

Lu Ling, Yichen Sheng, Zhi Tu, Wentian Zhao, Cheng Xin, Kun Wan, Lantao Yu, Qianyu Guo, Zixun Yu, Yawen Lu, et al. Dl3dv-10k: A large-scale scene dataset for deep learning- based 3d vision. InIEEE Conf. Comput. Vis. Pattern Recog., pages 22160–22169, 2024

2024
[26]

Grounding image matching in 3d with mast3r

Vincent Leroy, Yohann Cabon, and Jérôme Revaud. Grounding image matching in 3d with mast3r. InEur. Conf. Comput. Vis., pages 71–91. Springer, 2024

2024
[27]

Chao Chen, Yu-Shen Liu, and Zhizhong Han

Hanzhi Chang, Ruijie Zhu, Wenjie Chang, Mulin Yu, Yanzhe Liang, Jiahao Lu, Zhuoyuan Li, and Tianzhu Zhang. Meshsplat: Generalizable sparse-view surface reconstruction via gaussian splatting.arXiv preprint arXiv:2508.17811, 2025

work page arXiv 2025
[28]

Chen, Zeyu Zhang, Jiawang Bian, Bohan Zhuang, and Chunhua Shen

Duochao Shi, Weijie Wang, Donny Y. Chen, Zeyu Zhang, Jiawang Bian, Bohan Zhuang, and Chunhua Shen. Revisiting depth representations for feed-forward 3d gaussian splatting. arXiv preprint arXiv:2506.05327, 2025. 39 / 66 Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective8 Conclusion

work page arXiv 2025
[29]

Lightgaussian: Unbounded 3d gaussian compression with 15x reduction and 200+ fps.Adv

Zhiwen Fan, Kevin Wang, Kairun Wen, Zehao Zhu, Dejia Xu, Zhangyang Wang, et al. Lightgaussian: Unbounded 3d gaussian compression with 15x reduction and 200+ fps.Adv. Neural Inf. Process. Syst., 37:140138–140158, 2024

2024
[30]

Compact 3d gaussian representation for radiance field

Joo Chan Lee, Daniel Rho, Xiangyu Sun, Jong Hwan Ko, and Eunbyung Park. Compact 3d gaussian representation for radiance field. InIEEE Conf. Comput. Vis. Pattern Recog., pages 21719–21728, 2024

2024
[31]

Barron, Ben Mildenhall, Dor Verbin, Pratul P

Jonathan T. Barron, Ben Mildenhall, Dor Verbin, Pratul P. Srinivasan, and Peter Hedman. Mip-nerf 360: Unbounded anti-aliased neural radiance fields.IEEE Conf. Comput. Vis. Pattern Recog., 2022

2022
[32]

Regnerf: Regularizing neural radiance fields for view synthesis from sparse inputs

Michael Niemeyer, Jonathan T Barron, Ben Mildenhall, Mehdi SM Sajjadi, Andreas Geiger, and Noha Radwan. Regnerf: Regularizing neural radiance fields for view synthesis from sparse inputs. InIEEE Conf. Comput. Vis. Pattern Recog., pages 5480–5490, 2022

2022
[33]

Long-lrm: Long- sequence large reconstruction model for wide-coverage gaussian splats.CoRR, abs/2410.12781, 2024

Chen Ziwen, Hao Tan, Kai Zhang, Sai Bi, Fujun Luan, Yicong Hong, Li Fuxin, and Zexiang Xu. Long-lrm: Long-sequence large reconstruction model for wide-coverage gaussian splats. arXiv preprint arXiv:2410.12781, 2024

work page arXiv 2024
[34]

Optical models for direct volume rendering.IEEE Trans

Nelson Max. Optical models for direct volume rendering.IEEE Trans. Vis. Comput. Graph., 1(2):99–108, 2002

2002
[35]

arXiv preprint arXiv:2209.02417 (2022)

Andrea Tagliasacchi and Ben Mildenhall. Volume rendering digest (for nerf).arXiv preprint arXiv:2209.02417, 2022

work page arXiv 2022
[36]

Barron, Ben Mildenhall, Matthew Tancik, Peter Hedman, Ricardo Martin- Brualla, and Pratul P

Jonathan T. Barron, Ben Mildenhall, Matthew Tancik, Peter Hedman, Ricardo Martin- Brualla, and Pratul P. Srinivasan. Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields.Int. Conf. Comput. Vis., 2021

2021
[37]

Barron, and Pratul P

Dor Verbin, Peter Hedman, Ben Mildenhall, Todd Zickler, Jonathan T. Barron, and Pratul P. Srinivasan. Ref-NeRF: Structured view-dependent appearance for neural radiance fields. IEEE Conf. Comput. Vis. Pattern Recog., 2022

2022
[38]

Depth-supervised nerf: Fewer views and faster training for free

Kangle Deng, Andrew Liu, Jun-Yan Zhu, and Deva Ramanan. Depth-supervised nerf: Fewer views and faster training for free. InIEEE Conf. Comput. Vis. Pattern Recog., pages 12882–12891, 2022

2022
[39]

Nerfingmvs: Guided optimization of neural radiance fields for indoor multi-view stereo

Yi Wei, Shaohui Liu, Yongming Rao, Wang Zhao, Jiwen Lu, and Jie Zhou. Nerfingmvs: Guided optimization of neural radiance fields for indoor multi-view stereo. InInt. Conf. Comput. Vis., 2021

2021
[40]

Instant neural graphics primitives with a multiresolution hash encoding.ACM Trans

Thomas Müller, Alex Evans, Christoph Schied, and Alexander Keller. Instant neural graphics primitives with a multiresolution hash encoding.ACM Trans. Graph., 41(4):1–15, 2022

2022
[41]

Fregs: 3d gaussian splatting with progressive frequency regularization

Jiahui Zhang, Fangneng Zhan, Muyu Xu, Shijian Lu, and Eric Xing. Fregs: 3d gaussian splatting with progressive frequency regularization. InIEEE Conf. Comput. Vis. Pattern Recog., pages 21424–21433, 2024

2024
[42]

Scaffold-gs: Structured 3d gaussians for view-adaptive rendering

Tao Lu, Mulin Yu, Linning Xu, Yuanbo Xiangli, Limin Wang, Dahua Lin, and Bo Dai. Scaffold-gs: Structured 3d gaussians for view-adaptive rendering. InIEEE Conf. Comput. Vis. Pattern Recog., pages 20654–20664, 2024

2024
[43]

Gaussian opacity fields: Efficient adaptive surface reconstruction in unbounded scenes.ACM Trans

Zehao Yu, Torsten Sattler, and Andreas Geiger. Gaussian opacity fields: Efficient adaptive surface reconstruction in unbounded scenes.ACM Trans. Graph., 43(6):1–13, 2024. 40 / 66 Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective8 Conclusion

2024
[44]

Gaussianshader: 3d gaussian splatting with shading functions for reflective surfaces

Yingwenqi Jiang, Jiadong Tu, Yuan Liu, Xifeng Gao, Xiaoxiao Long, Wenping Wang, and Yuexin Ma. Gaussianshader: 3d gaussian splatting with shading functions for reflective surfaces. InIEEE Conf. Comput. Vis. Pattern Recog., pages 5322–5332, 2024

2024
[45]

Spec-gaussian: Anisotropic view-dependent appearance for 3d gaussian splatting.Adv

Ziyi Yang, Xinyu Gao, Yang-Tian Sun, Yihua Huang, Xiaoyang Lyu, Wen Zhou, Shaohui Jiao, Xiaojuan Qi, and Xiaogang Jin. Spec-gaussian: Anisotropic view-dependent appearance for 3d gaussian splatting.Adv. Neural Inf. Process. Syst., 37:61192–61216, 2024

2024
[46]

Mirror-3dgs: Incorporating mirror reflections into 3d gaussian splatting

Jiarui Meng, Haijie Li, Yanmin Wu, Qiankun Gao, Shuzhou Yang, Jian Zhang, and Siwei Ma. Mirror-3dgs: Incorporating mirror reflections into 3d gaussian splatting. InIEEE Int. Conf. Vis. Commun. Image Process., pages 1–5. IEEE, 2024

2024
[47]

Bags: Blur agnostic gaussian splatting through multi-scale kernel modeling

Cheng Peng, Yutao Tang, Yifan Zhou, Nengyu Wang, Xijun Liu, Deming Li, and Rama Chellappa. Bags: Blur agnostic gaussian splatting through multi-scale kernel modeling. In Eur. Conf. Comput. Vis., pages 293–310. Springer, 2024

2024
[48]

Bad-gaussians: Bundle adjusted deblur gaussian splatting

Lingzhe Zhao, Peng Wang, and Peidong Liu. Bad-gaussians: Bundle adjusted deblur gaussian splatting. InEur. Conf. Comput. Vis., pages 233–250. Springer, 2024

2024
[49]

Reducing the memory footprint of 3d gaussian splatting.Proceedings of the ACM on Computer Graphics and Interactive Techniques, 7(1):1–17, 2024

Panagiotis Papantonakis, Georgios Kopanas, Bernhard Kerbl, Alexandre Lanvin, and George Drettakis. Reducing the memory footprint of 3d gaussian splatting.Proceedings of the ACM on Computer Graphics and Interactive Techniques, 7(1):1–17, 2024

2024
[50]

Compressed 3d gaussian splatting for accelerated novel view synthesis

Simon Niedermayr, Josef Stumpfegger, and Rüdiger Westermann. Compressed 3d gaussian splatting for accelerated novel view synthesis. InIEEE Conf. Comput. Vis. Pattern Recog., pages 10349–10358, 2024

2024
[51]

Hac: Hash-grid assisted context for 3d gaussian splatting compression

Yihang Chen, Qianyi Wu, Weiyao Lin, Mehrtash Harandi, and Jianfei Cai. Hac: Hash-grid assisted context for 3d gaussian splatting compression. InEur. Conf. Comput. Vis., pages 422–438. Springer, 2024

2024
[52]

Dsac-differentiable ransac for camera localization

Eric Brachmann, Alexander Krull, Sebastian Nowozin, Jamie Shotton, Frank Michel, Stefan Gumhold, and Carsten Rother. Dsac-differentiable ransac for camera localization. InIEEE Conf. Comput. Vis. Pattern Recog., pages 6684–6692, 2017

2017
[53]

Learning less is more-6d camera localization via 3d surface regression

Eric Brachmann and Carsten Rother. Learning less is more-6d camera localization via 3d surface regression. InIEEE Conf. Comput. Vis. Pattern Recog., pages 4654–4662, 2018

2018
[54]

Visual camera re-localization from rgb and rgb-d images using dsac.IEEE Trans

Eric Brachmann and Carsten Rother. Visual camera re-localization from rgb and rgb-d images using dsac.IEEE Trans. Pattern Anal. Mach. Intell., 44(9):5847–5865, 2021

2021
[55]

Sacreg: Scene-agnostic coordinate regression for visual localization

Jerome Revaud, Yohann Cabon, Romain Brégier, JongMin Lee, and Philippe Weinzaepfel. Sacreg: Scene-agnostic coordinate regression for visual localization. InIEEE Conf. Comput. Vis. Pattern Recog., pages 688–698, 2024

2024
[56]

Learning camera localization via dense scene matching

Shitao Tang, Chengzhou Tang, Rui Huang, Siyu Zhu, and Ping Tan. Learning camera localization via dense scene matching. InIEEE Conf. Comput. Vis. Pattern Recog., pages 1831–1841, 2021

2021
[57]

Sanet: Scene agnostic network for camera localization

Luwei Yang, Ziqian Bai, Chengzhou Tang, Honghua Li, Yasutaka Furukawa, and Ping Tan. Sanet: Scene agnostic network for camera localization. InInt. Conf. Comput. Vis., pages 42–51, 2019

2019
[58]

Learning efficient point cloud generation for dense 3d object reconstruction

Chen-Hsuan Lin, Chen Kong, and Simon Lucey. Learning efficient point cloud generation for dense 3d object reconstruction. InAAAI, volume 32, 2018. 41 / 66 Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective8 Conclusion

2018
[59]

Pixels, voxels, and views: A study of shape representations for single view 3d object shape prediction

Daeyun Shin, Charless C Fowlkes, and Derek Hoiem. Pixels, voxels, and views: A study of shape representations for single view 3d object shape prediction. InIEEE Conf. Comput. Vis. Pattern Recog., pages 3061–3069, 2018

2018
[60]

Multi-view 3d models from single images with a convolutional network

Maxim Tatarchenko, Alexey Dosovitskiy, and Thomas Brox. Multi-view 3d models from single images with a convolutional network. InEur. Conf. Comput. Vis., pages 322–337. Springer, 2016

2016
[61]

Synsin: End-to-end view synthesis from a single image

Olivia Wiles, Georgia Gkioxari, Richard Szeliski, and Justin Johnson. Synsin: End-to-end view synthesis from a single image. InIEEE Conf. Comput. Vis. Pattern Recog., pages 7467–7477, 2020

2020
[62]

Croco: Self-supervised pre-training for 3d vision tasks by cross-view completion.Adv

Philippe Weinzaepfel, Vincent Leroy, Thomas Lucas, Romain Brégier, Yohann Cabon, Vaibhav Arora, Leonid Antsfeld, Boris Chidlovskii, Gabriela Csurka, and Jérôme Revaud. Croco: Self-supervised pre-training for 3d vision tasks by cross-view completion.Adv. Neural Inf. Process. Syst., 35:3502–3516, 2022

2022
[63]

Learning implicit fields for generative shape modeling

Zhiqin Chen and Hao Zhang. Learning implicit fields for generative shape modeling. In IEEE Conf. Comput. Vis. Pattern Recog., pages 5939–5948, 2019

2019
[64]

Convolutional occupancy networks

Songyou Peng, Michael Niemeyer, Lars Mescheder, Marc Pollefeys, and Andreas Geiger. Convolutional occupancy networks. InEur. Conf. Comput. Vis., 2020

2020
[65]

Sparseneus: Fast generalizable neural surface reconstruction from sparse views

Xiaoxiao Long, Cheng Lin, Peng Wang, Taku Komura, and Wenping Wang. Sparseneus: Fast generalizable neural surface reconstruction from sparse views. InEur. Conf. Comput. Vis., pages 210–227. Springer, 2022

2022
[66]

Volrecon: Volume rendering of signed ray distance functions for generalizable multi-view reconstruction

Yufan Ren, Fangjinhua Wang, Tong Zhang, Marc Pollefeys, and Sabine Süsstrunk. Volrecon: Volume rendering of signed ray distance functions for generalizable multi-view reconstruction. InIEEE Conf. Comput. Vis. Pattern Recog., pages 16685–16695, 2023

2023
[67]

Retr: Modeling rendering via transformer for generalizable neural surface reconstruction.Advances in neural information processing systems, 36:62332–62351, 2023

Yixun Liang, Hao He, and Yingcong Chen. Retr: Modeling rendering via transformer for generalizable neural surface reconstruction.Advances in neural information processing systems, 36:62332–62351, 2023

2023
[68]

2023 , url =

Luoyuan Xu, Tao Guan, Yuesong Wang, Wenkai Liu, Zhaojie Zeng, Junle Wang, and Wei Yang. C2f2neus: Cascade cost frustum fusion for high fidelity and generalizable neural surface reconstruction. InInt. Conf. Comput. Vis., pages 18245–18255, 2023. doi: 10.1109/ICCV51070.2023.01677

work page doi:10.1109/iccv51070.2023.01677 2023
[69]

Uforecon: Generalizable sparse-view surface reconstruction from arbitrary and unfavorable sets

Youngju Na, Woo Jae Kim, Kyu Beom Han, Suhyeon Ha, and Sung-Eui Yoon. Uforecon: Generalizable sparse-view surface reconstruction from arbitrary and unfavorable sets. In IEEE Conf. Comput. Vis. Pattern Recog., pages 5094–5104, 2024

2024
[70]

Surfacesplat: Connecting surface reconstruction and gaussian splatting.arXiv preprint arXiv:2507.15602, 2025

Zihui Gao, Jia-Wang Bian, Guosheng Lin, Hao Chen, and Chunhua Shen. Surfacesplat: Connecting surface reconstruction and gaussian splatting.arXiv preprint arXiv:2507.15602, 2025

work page arXiv 2025
[71]

arXiv preprint arXiv:2404.12385 , year=

Xinyue Wei, Kai Zhang, Sai Bi, Hao Tan, Fujun Luan, Valentin Deschaintre, Kalyan Sunkavalli, Hao Su, and Zexiang Xu. Meshlrm: Large reconstruction model for high-quality meshes.arXiv preprint arXiv:2404.12385, 2024

work page arXiv 2024
[72]

Meshformer: High-quality mesh generation with 3d-guided reconstruction model.Adv

Minghua Liu, Chong Zeng, Xinyue Wei, Ruoxi Shi, Linghao Chen, Chao Xu, Mengqi Zhang, Zhaoning Wang, Xiaoshuai Zhang, Isabella Liu, et al. Meshformer: High-quality mesh generation with 3d-guided reconstruction model.Adv. Neural Inf. Process. Syst., 37: 59314–59341, 2024. 42 / 66 Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective8 Conclusion

2024
[73]

Renderformer: Transformer-based neural rendering of triangle meshes with global illumination

Chong Zeng, Yue Dong, Pieter Peers, Hongzhi Wu, and Xin Tong. Renderformer: Transformer-based neural rendering of triangle meshes with global illumination. InACM SIGGRAPH Conf. Comput. Graph. Interact. Tech., 2025

2025
[74]

Lara: Efficient large-baseline radiance fields

Anpei Chen, Haofei Xu, Stefano Esposito, Siyu Tang, and Andreas Geiger. Lara: Efficient large-baseline radiance fields. InEur. Conf. Comput. Vis., pages 338–355. Springer, 2024

2024
[75]

Lrm: Large reconstruction model for single image to 3d

Yicong Hong, Kai Zhang, Jiuxiang Gu, Sai Bi, Yang Zhou, Difan Liu, Feng Liu, Kalyan Sunkavalli, Trung Bui, and Hao Tan. Lrm: Large reconstruction model for single image to 3d. InInt. Conf. Learn. Represent., 2024

2024
[76]

Instant3d: Fast text-to-3d with sparse-view generation and large reconstruction model

Jiahao Li, Hao Tan, Kai Zhang, Zexiang Xu, Fujun Luan, Yinghao Xu, Yicong Hong, Kalyan Sunkavalli, Greg Shakhnarovich, and Sai Bi. Instant3d: Fast text-to-3d with sparse-view generation and large reconstruction model. InInt. Conf. Learn. Represent., 2024

2024
[77]

Agg: Amortized generative 3d gaussians for single image to 3d.arXiv preprint arXiv:2401.04099, 2024

Dejia Xu, Ye Yuan, Morteza Mardani, Sifei Liu, Jiaming Song, Zhangyang Wang, and Arash Vahdat. Agg: Amortized generative 3d gaussians for single image to 3d.arXiv preprint arXiv:2401.04099, 2024

work page arXiv 2024
[78]

Triplane meets gaussian splatting: Fast and generalizable single-view 3d reconstruction with transformers

Zi-Xin Zou, Zhipeng Yu, Yuan-Chen Guo, Yangguang Li, Ding Liang, Yan-Pei Cao, and Song-Hai Zhang. Triplane meets gaussian splatting: Fast and generalizable single-view 3d reconstruction with transformers. InIEEE Conf. Comput. Vis. Pattern Recog., pages 10324–10335, 2024

2024
[79]

Generalizable patch-based neural rendering

Mohammed Suhail, Carlos Esteves, Leonid Sigal, and Ameesh Makadia. Generalizable patch-based neural rendering. InEur. Conf. Comput. Vis., pages 156–174. Springer, 2022

2022
[80]

Lvsm: A large view synthesis model with minimal 3d inductive bias

Haian Jin, Hanwen Jiang, Hao Tan, Kai Zhang, Sai Bi, Tianyuan Zhang, Fujun Luan, Noah Snavely, and Zexiang Xu. Lvsm: A large view synthesis model with minimal 3d inductive bias. InInt. Conf. Learn. Represent., 2025. URLhttps://openreview.net/forum?id=QQ BPWtvtcn

2025
[81]

Pluckerf: A line-based 3d representation for few-view reconstruction

Sam Bahrami and Dylan Campbell. Pluckerf: A line-based 3d representation for few-view reconstruction. InIEEE Conf. Comput. Vis. Pattern Recog., pages 317–326, 2025

2025

Showing first 80 references.