pith. machine review for the scientific record. sign in

arxiv: 2604.01479 · v2 · submitted 2026-04-01 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

UniRecGen: Unifying Multi-View 3D Reconstruction and Generation

Cheng Lin, Chenyu Hu, Hanzhuo Huang, Jiahao Chen, Mengfei Li, Wenping Wang, Xin Li, Yuan Liu, Yuheng Liu, Zekai Gu, Zhengming Yu, Zhisheng Huang, Zibo Zhao

Authors on Pith no claims yet

Pith reviewed 2026-05-13 22:07 UTC · model grok-4.3

classification 💻 cs.CV
keywords 3D reconstructionmulti-view modelingdiffusion modelsunified frameworksparse observationscooperative learningcanonical space
0
0 comments X

The pith

UniRecGen unifies reconstruction and generation by sharing a canonical space so sparse views yield complete consistent 3D models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to overcome the split where feed-forward reconstruction from few views stays faithful to input but misses global structure, while diffusion generation adds rich details yet fails to keep views consistent. It does so by forcing both modules into one shared canonical space and training them with disentangled cooperation that keeps each stable while they exchange signals at inference time. The reconstruction module supplies geometric anchors in that space; the diffusion module then conditions on latent features to refine and complete the shape. If the approach holds, sparse multi-view inputs could produce finished 3D objects without the usual loss of either fidelity or plausibility.

Core claim

UniRecGen integrates the reconstruction module and diffusion generator inside a single cooperative system by aligning them in a shared canonical space, applying disentangled cooperative learning to maintain training stability, letting the reconstruction supply canonical geometric anchors while the diffusion component uses latent-augmented conditioning to refine and complete the geometry, thereby producing more complete and consistent 3D models from sparse observations than either paradigm alone.

What carries the argument

The shared canonical space plus disentangled cooperative learning that aligns coordinate systems, representations, and objectives so the reconstruction module can anchor geometry while the diffusion generator refines and completes it.

If this is right

  • Sparse multi-view inputs produce finished 3D models with both input fidelity and global structural completeness.
  • Reconstruction supplies stable geometric anchors that guide the diffusion process without destabilizing training.
  • The same framework outperforms prior separate reconstruction and generation methods on fidelity and robustness metrics.
  • Inference becomes a single seamless collaboration rather than two disconnected stages.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Real-time capture pipelines in robotics or mobile AR could adopt the approach to turn a few camera frames into usable 3D assets.
  • The same shared-space cooperation pattern might transfer to other domains where one module supplies local accuracy and another supplies global priors.

Load-bearing premise

That forcing reconstruction and diffusion models into one canonical space with disentangled learning resolves their conflicts in coordinates, representations, and goals without creating fresh inconsistencies or training instability.

What would settle it

A controlled comparison on standard sparse-view benchmarks showing that UniRecGen outputs are no more complete, no more multi-view consistent, or no higher in fidelity than the best separate reconstruction or diffusion baselines.

Figures

Figures reproduced from arXiv: 2604.01479 by Cheng Lin, Chenyu Hu, Hanzhuo Huang, Jiahao Chen, Mengfei Li, Wenping Wang, Xin Li, Yuan Liu, Yuheng Liu, Zekai Gu, Zhengming Yu, Zhisheng Huang, Zibo Zhao.

Figure 1
Figure 1. Figure 1: UniRecGen enables high-fidelity 3D object reconstruction from sparse, unposed images. It first establishes a deterministic point-cloud “geometric anchor” within a canonical coordinate system. Guided by this anchor, a generative model synthesizes a detailed mesh that preserves instance-specific structural features while producing plausible geometric completions for unobserved regions, ensuring high alignmen… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of our method. Given 𝑁 unposed input views, we first canonicalize feed-forward multi-view geometry predictions via branch repurposing and similarity alignment to obtain a canonical point cloud (top). We then train a controllable 3D generator conditioned on this point cloud together with multi-view image features, geometry latents, and camera embeddings to synthesize a high-fidelity mesh (bottom). … view at source ↗
Figure 3
Figure 3. Figure 3: Qualitative Comparison of Canonical Alignment. Our strategy achieves better geometric quality while aligning the canonical object space [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative Comparison of Multi-view Condition. Our strategy (right) preserves dense image context more effectively than point-guided sampling (left), leading to improved input alignment. Multi-view Conditioning Strategy. We evaluate our multi-view integration strategy by comparing our latent-augmented view con￾ditioning against the point-guided sampling baseline introduced in Sec. 3.3. Quantitative result… view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative Comparison on Toys4K and GSO. Compared to state-of-the-art reconstruction and generative baselines, our method produces 3D meshes with higher structural fidelity and superior multi-view consistency from sparse inputs. ACM Trans. Graph., Vol. 1, No. 1, Article . Publication date: April 2026 [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Generalization to Real-world Environments. Our framework demonstrates robustness and superior performance compared to SOTA method [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Additional qualitative results. We show six multi-view shape modeling examples. For each of four input views, from left to right we show: the input image, our mesh rendering overlaid on the input image, our mesh rendering, and the ground-truth mesh rendering. ACM Trans. Graph., Vol. 1, No. 1, Article . Publication date: April 2026 [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗
read the original abstract

Sparse-view 3D modeling represents a fundamental tension between reconstruction fidelity and generative plausibility. While feed-forward reconstruction excels in efficiency and input alignment, it often lacks the global priors needed for structural completeness. Conversely, diffusion-based generation provides rich geometric details but struggles with multi-view consistency. We present UniRecGen, a unified framework that integrates these two paradigms into a single cooperative system. To overcome inherent conflicts in coordinate spaces, 3D representations, and training objectives, we align both models within a shared canonical space. We employ disentangled cooperative learning, which maintains stable training while enabling seamless collaboration during inference. Specifically, the reconstruction module is adapted to provide canonical geometric anchors, while the diffusion generator leverages latent-augmented conditioning to refine and complete the geometric structure. Experimental results demonstrate that UniRecGen achieves superior fidelity and robustness, outperforming existing methods in creating complete and consistent 3D models from sparse observations. Code is available at https://github.com/zsh523/UniRecGen.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper presents UniRecGen, a unified framework integrating feed-forward 3D reconstruction and diffusion-based generation for sparse-view modeling. It aligns both components in a shared canonical space and uses disentangled cooperative learning to resolve conflicts in coordinate systems, representations, and objectives, with the reconstruction module providing geometric anchors and the diffusion module refining structure via latent-augmented conditioning. The central claim is that this yields superior fidelity, robustness, and consistency over existing methods.

Significance. If the empirical claims hold, the work would meaningfully bridge reconstruction efficiency with generative completeness for sparse inputs, offering a practical path to more reliable 3D models without separate pipelines. The availability of code is a positive factor for reproducibility.

major comments (2)
  1. Abstract: the claim that UniRecGen 'achieves superior fidelity and robustness, outperforming existing methods' is presented without any quantitative metrics, tables, ablation results, or error analysis in the provided text, rendering the central empirical claim unevaluable on its own terms.
  2. The description of disentangled cooperative learning (Abstract) is load-bearing for the unification argument yet supplies no loss formulations, training schedule details, or stability analysis; without these, it is unclear whether the shared canonical space actually eliminates the stated conflicts rather than introducing new inconsistencies.
minor comments (1)
  1. Abstract: the GitHub link is given but no statement of what is released (models, training code, evaluation scripts) appears.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the major comments point by point below, indicating planned revisions where appropriate.

read point-by-point responses
  1. Referee: Abstract: the claim that UniRecGen 'achieves superior fidelity and robustness, outperforming existing methods' is presented without any quantitative metrics, tables, ablation results, or error analysis in the provided text, rendering the central empirical claim unevaluable on its own terms.

    Authors: We agree that the abstract would benefit from greater specificity to support its claims. While the full manuscript contains the requested quantitative metrics, tables, ablation studies, and error analysis in Section 4, we will revise the abstract to incorporate brief quantitative highlights (e.g., key improvements in fidelity and consistency metrics) so that the central claim is more self-contained and evaluable from the abstract alone. revision: yes

  2. Referee: The description of disentangled cooperative learning (Abstract) is load-bearing for the unification argument yet supplies no loss formulations, training schedule details, or stability analysis; without these, it is unclear whether the shared canonical space actually eliminates the stated conflicts rather than introducing new inconsistencies.

    Authors: The abstract is a concise summary; the loss formulations for disentangled cooperative learning, the training schedule with alternating optimization, and stability analysis (including convergence behavior) are fully detailed in Section 3 of the manuscript, with experimental validation that the shared canonical space resolves the coordinate, representation, and objective conflicts. We will not alter the abstract but will review Section 3 to ensure the resolution of conflicts is emphasized even more explicitly. revision: no

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The paper introduces UniRecGen as a novel unification of reconstruction and diffusion paradigms via a shared canonical space and disentangled cooperative learning. The abstract describes new alignment mechanisms, adaptation of the reconstruction module for geometric anchors, and latent-augmented conditioning for the generator, without any equations or claims that reduce predictions to fitted inputs, self-definitions, or self-citation chains by construction. No load-bearing step renames known results or imports uniqueness theorems from prior author work as external facts. The framework's central claims rest on the proposed integration and experimental outcomes rather than tautological reductions, making the derivation self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract does not introduce or rely on any explicit free parameters, axioms, or invented entities beyond the general description of the framework.

pith-pipeline@v0.9.0 · 5515 in / 1072 out tokens · 36518 ms · 2026-05-13T22:07:53.462838+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Reference graph

Works this paper leans on

111 extracted references · 111 canonical work pages · 6 internal anchors

  1. [1]

    Eric R Chan, Connor Z Lin, Matthew A Chan, Koki Nagano, Boxiao Pan, Shalini De Mello, Orazio Gallo, Leonidas Guibas, Jonathan Tremblay, Sameh Khamis, et al. 2022. Efficient geometry-aware 3D generative adversarial networks. In IEEE/CVF International Conference on Computer Vision

  2. [2]

    Jiahao Chang, Chongjie Ye, Yushuang Wu, Yuantao Chen, Yidan Zhang, Zhongjin Luo, Chenghong Li, Yihao Zhi, and Xiaoguang Han. 2025. ReconViaGen: Towards Accurate Multi-view 3D Object Reconstruction via Generation.arXiv preprint arXiv:2510.23306(2025)

  3. [3]

    David Charatan, Sizhe Lester Li, Andrea Tagliasacchi, and Vincent Sitzmann

  4. [4]

    InIEEE/CVF Conference on Computer Vision and Pattern Recognition

    pixelSplat: 3d gaussian splats from image pairs for scalable generaliz- able 3d reconstruction. InIEEE/CVF Conference on Computer Vision and Pattern Recognition. 19457–19467

  5. [5]

    Hansheng Chen, Jiatao Gu, Anpei Chen, Wei Tian, Zhuowen Tu, Lingjie Liu, and Hao Su. 2023. Single-stage diffusion nerf: A unified approach to 3d generation and reconstruction. InProceedings of the IEEE/CVF international conference on computer vision. 2416–2425

  6. [6]

    Sijin Chen, Xin Chen, Anqi Pang, Xianfang Zeng, Wei Cheng, Yijun Fu, Fukun Yin, Billzb Wang, Jingyi Yu, Gang Yu, et al. 2024. Meshxl: Neural coordinate field for generative 3d foundation models.Advances in Neural Information Processing Systems (NeurIPS)(2024)

  7. [7]

    Xingyu Chen, Fu-Jen Chu, Pierre Gleize, Kevin J Liang, Alexander Sax, Hao Tang, Weiyao Wang, Michelle Guo, Thibaut Hardin, Xiang Li, et al. 2025. Sam 3d: 3dfy anything in images.arXiv preprint arXiv:2511.16624(2025)

  8. [8]

    Yiwen Chen, Tong He, Di Huang, Weicai Ye, Sijin Chen, Jiaxiang Tang, Xin Chen, Zhongang Cai, Lei Yang, Gang Yu, Guosheng Lin, and Chi Zhang. 2024. Me- shAnything: Artist-Created Mesh Generation with Autoregressive Transformers. arXiv:2406.10163 [cs.CV] https://arxiv.org/abs/2406.10163

  9. [9]

    Yiwen Chen, Yikai Wang, Yihao Luo, Zhengyi Wang, Zilong Chen, Jun Zhu, Chi Zhang, and Guosheng Lin. 2024. MeshAnything V2: Artist-Created Mesh Generation With Adjacent Mesh Tokenization. arXiv:2408.02555 [cs.CV] https: //arxiv.org/abs/2408.02555

  10. [10]

    Zilong Chen, Yikai Wang, Wenqiang Sun, Feng Wang, Yiwen Chen, and Huaping Liu. 2025. MeshGen: Generating PBR Textured Mesh with Render-Enhanced Auto-Encoder and Generative Data Augmentation. arXiv:2505.04656 [cs.GR] https://arxiv.org/abs/2505.04656

  11. [11]

    Zilong Chen, Yikai Wang, Feng Wang, Zhengyi Wang, and Huaping Liu

  12. [12]

    V3d: Video diffusion models are effective 3d generators.arXiv preprint arXiv:2403.06738(2024)

  13. [13]

    Matt Deitke, Ruoshi Liu, Matthew Wallingford, Huong Ngo, Oscar Michel, Aditya Kusupati, Alan Fan, Christian Laforte, Vikram Voleti, Samir Yitzhak Gadre, et al. 2023. Objaverse-xl: A universe of 10m+ 3d objects.Advances in Neural Information Processing Systems36 (2023), 35799–35813

  14. [14]

    Laura Downs, Anthony Francis, Nate Maggio, Brandon Cavalcanti, Gerard Tagli- abue, Jake Varley, and Brian Ichter. 2022. Google scanned objects: A high-quality dataset of 3D scanned household items. In2022 International Conference on Robotics and Automation (ICRA). IEEE, 2553–2560

  15. [15]

    Zhiwen Fan, Wenyan Cong, Kairun Wen, Kevin Wang, Jian Zhang, Xinghao Ding, Danfei Xu, Boris Ivanovic, Marco Pavone, Georgios Pavlakos, et al. 2024. Instantsplat: Unbounded sparse-view pose-free gaussian splatting in 40 seconds. arXiv preprint arXiv:2403.203092, 3 (2024), 4

  16. [16]

    Yang Fu, Sifei Liu, Amey Kulkarni, Jan Kautz, Alexei A Efros, and Xiaolong Wang. 2023. COLMAP-Free 3D Gaussian Splatting.2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)(2023), 20796–20805

  17. [17]

    Jun Gao, Tianchang Shen, Zian Wang, Wenzheng Chen, Kangxue Yin, Daiqing Li, Or Litany, Zan Gojcic, and Sanja Fidler. 2022. Get3d: A generative model of high quality 3d textured shapes learned from images.Advances In Neural Information Processing Systems35 (2022), 31841–31854

  18. [18]

    Hyojun Go, Dominik Narnhofer, Goutam Bhat, Prune Truong, Federico Tombari, and Konrad Schindler. 2025. VIST3A: Text-to-3D by Stitching a Multi-view Reconstruction Network to a Video Generator.arXiv preprint arXiv:2510.13454 (2025)

  19. [19]

    Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets.Advances in Neural Information Processing Systems27 (2014)

  20. [20]

    Xiaodong Gu, Zhiwen Fan, Siyu Zhu, Zuozhuo Dai, Feitong Tan, and Ping Tan

  21. [21]

    InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

    Cascade Cost Volume for High-Resolution Multi-View Stereo and Stereo Matching. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2495–2504

  22. [22]

    Romero, Tsung-Yi Lin, and Ming-Yu Liu

    Zekun Hao, David W. Romero, Tsung-Yi Lin, and Ming-Yu Liu. 2024. Meshtron: High-Fidelity, Artist-Like 3D Mesh Generation at Scale. arXiv:2412.09548 [cs.GR] https://arxiv.org/abs/2412.09548

  23. [23]

    Hao He, Yixun Liang, Luozhou Wang, Yuanhao Cai, Xinli Xu, Hao-Xiang Guo, Xiang Wen, and Ying-Cong Chen. 2024. Lucidfusion: Generating 3d gaussians with arbitrary unposed images. (2024)

  24. [24]

    Xianglong He, Junyi Chen, Sida Peng, Di Huang, Yangguang Li, Xiaoshui Huang, Chun Yuan, Wanli Ouyang, and Tong He. 2024. GVGEN: Text-to-3D Generation with Volumetric Representation. InEuropean Conference on Computer Vision

  25. [25]

    Xianglong He, Zi-Xin Zou, Chia-Hao Chen, Yuan-Chen Guo, Ding Liang, Chun Yuan, Wanli Ouyang, Yan-Pei Cao, and Yangguang Li. 2025. SparseFlex: High-Resolution and Arbitrary-Topology 3D Shape Modeling. arXiv:2503.21732 [cs.CV] https://arxiv.org/abs/2503.21732

  26. [26]

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models.Advances in Neural Information Processing Systems33 (2020), 6840–6851. ACM Trans. Graph., Vol. 1, No. 1, Article . Publication date: April 2026. UniRecGen: Unifying Multi-View 3D Reconstruction and Generation•9

  27. [27]

    Fangzhou Hong, Jiaxiang Tang, Ziang Cao, Min Shi, Tong Wu, Zhaoxi Chen, Tengfei Wang, Liang Pan, Dahua Lin, and Ziwei Liu. 2024. 3DTopia: Large Text- to-3D Generation Model with Hybrid Diffusion Priors.CoRRabs/2403.02234 (2024)

  28. [28]

    Jiaxin Huang, Yuanbo Yang, Bangbang Yang, Lin Ma, Yuewen Ma, and Yiyi Liao

  29. [29]

    Gen3R: 3D Scene Generation Meets Feed-Forward Reconstruction.arXiv preprint arXiv:2601.04090(2026)

  30. [30]

    Ka-Hei Hui, Ruihui Li, Jingyu Hu, and Chi-Wing Fu. 2022. Neural wavelet- domain diffusion for 3d shape generation. InSIGGRAPH Asia 2022 Conference Papers. 1–9

  31. [31]

    Team Hunyuan3D, Shuhui Yang, Mingxin Yang, Yifei Feng, Xin Huang, Sheng Zhang, Zebin He, Di Luo, Haolin Liu, Yunfei Zhao, et al. 2025. Hunyuan3D 2.1: From Images to High-Fidelity 3D Assets with Production-Ready PBR Material. arXiv preprint arXiv:2506.15442(2025)

  32. [32]

    Team Hunyuan3D, Bowen Zhang, Chunchao Guo, Haolin Liu, Hongyu Yan, Huiwen Shi, Jingwei Huang, Junlin Yu, Kunhong Li, Penghao Wang, et al. 2025. Hunyuan3d-omni: A unified framework for controllable generation of 3d assets. arXiv preprint arXiv:2509.21245(2025)

  33. [33]

    Shubhendu Jena, Shishir Reddy Vutukur, and Adnane Boukhayma. 2025. Spar- Splat: Fast Multi-View Reconstruction with Generalizable 2D Gaussian Splatting. arXiv preprint arXiv:2505.02175(2025)

  34. [34]

    Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, and George Drettakis

  35. [35]

    3D Gaussian Splatting for Real-Time Radiance Field Rendering.ACM Transactions on Graphics42, 4 (2023), 139–1

  36. [36]

    Zeqiang Lai, Yunfei Zhao, Haolin Liu, Zibo Zhao, Qingxiang Lin, Huiwen Shi, Xianghui Yang, Mingxin Yang, Shuhui Yang, Yifei Feng, Sheng Zhang, Xin Huang, Di Luo, Fan Yang, Fang Yang, Lifu Wang, Sicong Liu, Yixuan Tang, Yulin Cai, Zebin He, Tian Liu, Yuhong Liu, Jie Jiang, Linus, Jingwei Huang, and Chunchao Guo. 2025. Hunyuan3D 2.5: Towards High-Fidelity 3...

  37. [37]

    Yushi Lan, Fangzhou Hong, Shuai Yang, Shangchen Zhou, Xuyi Meng, Bo Dai, Xingang Pan, and Chen Change Loy. 2024. LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation. InECCV

  38. [38]

    Vincent Leroy, Yohann Cabon, and Jérôme Revaud. 2024. Grounding image matching in 3d with MASt3R. InEuropean Conference on Computer Vision. 71– 91

  39. [39]

    Jiahao Li, Hao Tan, Kai Zhang, Zexiang Xu, Fujun Luan, Yinghao Xu, Yicong Hong, Kalyan Sunkavalli, Greg Shakhnarovich, and Sai Bi. 2023. Instant3d: Fast text-to-3d with sparse-view generation and large reconstruction model.arXiv preprint arXiv:2311.06214(2023)

  40. [40]

    Weiyu Li, Jiarui Liu, Rui Chen, Yixun Liang, Xuelin Chen, Ping Tan, and Xi- aoxiao Long. 2024. CraftsMan: High-fidelity Mesh Generation with 3D Native Generation and Interactive Geometry Refiner.arXiv preprint arXiv:2405.14979 (2024)

  41. [41]

    Yangguang Li, Zi-Xin Zou, Zexiang Liu, Dehu Wang, Yuan Liang, Zhipeng Yu, Xingchao Liu, Yuan-Chen Guo, Ding Liang, Wanli Ouyang, et al. 2025. TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models.arXiv preprint arXiv:2502.06608(2025)

  42. [42]

    Zhihao Li, Yufei Wang, Heliang Zheng, Yihao Luo, and Bihan Wen. 2025. Sparc3D: Sparse Representation and Construction for High-Resolution 3D Shapes Model- ing. arXiv:2505.14521 [cs.CV] https://arxiv.org/abs/2505.14521

  43. [43]

    Hanwen Liang, Junli Cao, Vidit Goel, Guocheng Qian, Sergei Korolev, Demetri Terzopoulos, Konstantinos N Plataniotis, Sergey Tulyakov, and Jian Ren. 2025. Wonderland: Navigating 3d scenes from a single image. InIEEE/CVF Conference on Computer Vision and Pattern Recognition. 798–810

  44. [44]

    Chenguo Lin, Panwang Pan, Bangbang Yang, Zeming Li, and Yadong MU. 2025. DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation. InInternational Conference on Learning Representations

  45. [45]

    Chen-Hsuan Lin, Wei-Chiu Ma, Antonio Torralba, and Simon Lucey. 2021. BARF: Bundle-Adjusting Neural Radiance Fields. InProceedings of the IEEE/CVF Inter- national Conference on Computer Vision (ICCV). 5741–5751

  46. [46]

    Yuchen Lin, Chenguo Lin, Panwang Pan, Honglei Yan, Yiqiang Feng, Yadong Mu, and Katerina Fragkiadaki. 2025. PartCrafter: Structured 3D Mesh Generation via Compositional Latent Diffusion Transformers. arXiv:2506.05573 [cs.CV] https://arxiv.org/abs/2506.05573

  47. [47]

    Minghua Liu, Chao Xu, Haian Jin, Linghao Chen, Mukund Varma T, Zexiang Xu, and Hao Su. 2023. One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimization.Advances in Neural Information Processing Systems36 (2023), 22226–22246

  48. [48]

    Ruoshi Liu, Rundi Wu, Basile Van Hoorick, Pavel Tokmakov, Sergey Zakharov, and Carl Vondrick. 2023. Zero-1-to-3: Zero-shot one image to 3d object. In Proceedings of the IEEE/CVF international conference on computer vision. 9298– 9309

  49. [49]

    Xiaoxiao Long, Yuan-Chen Guo, Cheng Lin, Yuan Liu, Zhiyang Dou, Lingjie Liu, Yuexin Ma, Song-Hai Zhang, Marc Habermann, Christian Theobalt, et al. 2024. Wonder3d: Single image to 3d using cross-domain diffusion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9970–9980

  50. [50]

    William E Lorensen and Harvey E Cline. 1987. Marching cubes: A high resolution 3D surface construction algorithm.ACM SIGGRAPH Computer Graphics21, 4 (1987), 163–169

  51. [51]

    Shitong Luo and Wei Hu. 2021. Diffusion probabilistic models for 3d point cloud generation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2837–2845

  52. [52]

    Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. 2020. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. InEuropean Conference on Computer Vision (ECCV). 405–421

  53. [53]

    Norman Müller, Yawar Siddiqui, Lorenzo Porzi, Samuel Rota Bulo, Peter Kontschieder, and Matthias Nießner. 2023. Diffrf: Rendering-guided 3d radiance field diffusion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4328–4338

  54. [54]

    Alex Nichol, Heewoo Jun, Prafulla Dhariwal, Pamela Mishkin, and Mark Chen

  55. [55]

    Point-E: A System for Generating 3D Point Clouds from Complex Prompts

    Point-e: A system for generating 3d point clouds from complex prompts. arXiv preprint arXiv:2212.08751(2022)

  56. [56]

    Shadows can be

    Michael Niemeyer, Jonathan T. Barron, Ben Mildenhall, Mehdi S. M. Sajjadi, An- dreas Geiger, and Noha Radwan. 2022. RegNeRF: Regularizing Neural Radiance Fields for View Synthesis from Sparse Inputs. In2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, New Orleans, LA, USA. doi:10.1109/CVPR52688.2022.00540

  57. [57]

    Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El- Nouby, et al. 2023. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193(2023)

  58. [58]

    Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, and Steven Lovegrove. 2019. Deepsdf: Learning continuous signed distance functions for shape representation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 165–174

  59. [59]

    Ben Poole, Ajay Jain, Jonathan T Barron, and Ben Mildenhall. 2023. Dream- Fusion: Text-to-3D using 2D Diffusion. InInternational Conference on Learning Representations

  60. [60]

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion. 10684–10695

  61. [61]

    Johannes L Schönberger and Jan-Michael Frahm. 2016. Structure-from-motion revisited. InIEEE Conference on Computer Vision and Pattern Recognition. 4104– 4113

  62. [62]

    Schönberger, Enliang Zheng, Marc Pollefeys, and Jan-Michael Frahm

    Johannes L. Schönberger, Enliang Zheng, Marc Pollefeys, and Jan-Michael Frahm

  63. [63]

    InEuropean Conference on Computer Vision (ECCV)

    Pixelwise View Selection for Unstructured Multi-View Stereo. InEuropean Conference on Computer Vision (ECCV). 501–518

  64. [64]

    Katja Schwarz, Norman Mueller, and Peter Kontschieder. 2025. Generative Gaussian splatting: Generating 3D scenes with video diffusion priors.arXiv preprint arXiv:2503.13272(2025)

  65. [65]

    Yichun Shi, Peng Wang, Jianglong Ye, Long Mai, Kejie Li, and Xiao Yang. 2024. MVDream: Multi-view Diffusion for 3D Generation. InInternational Conference on Learning Representations

  66. [66]

    Yawar Siddiqui, Antonio Alliegro, Alexey Artemov, Tatiana Tommasi, Daniele Sirigatti, Vladislav Rosov, Angela Dai, and Matthias Nießner. 2024. Meshgpt: Generating triangle meshes with decoder-only transformers. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

  67. [67]

    Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli

  68. [68]

    In International Conference on Machine Learning

    Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning. 2256–2265

  69. [69]

    Stefan Stojanov, Anh Thai, and James M Rehg. 2021. Using shape to categorize: Low-shot learning with an explicit shape bias. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 1798–1808

  70. [70]

    Stanislaw Szymanowicz, Jason Y Zhang, Pratul Srinivasan, Ruiqi Gao, Arthur Brussee, Aleksander Holynski, Ricardo Martin-Brualla, Jonathan T Barron, and Philipp Henzler. 2025. Bolt3D: Generating 3D scenes in seconds.arXiv preprint arXiv:2503.14445(2025)

  71. [71]

    Bin Tan, Nan Xue, Tianfu Wu, and Gui-Song Xia. 2023. NOPE-SAC: Neural One- Plane RANSAC for Sparse-View Planar 3D Reconstruction.IEEE Transactions on Pattern Analysis and Machine Intelligence45 (2023). doi:10.1109/TPAMI.2023. 3314745

  72. [72]

    Jiaxiang Tang, Zhaoxi Chen, Xiaokang Chen, Tengfei Wang, Gang Zeng, and Ziwei Liu. 2024. Lgm: Large multi-view gaussian model for high-resolution 3d content creation. InEuropean Conference on Computer Vision. 1–18

  73. [73]

    Jiaxiang Tang, Zhaoshuo Li, Zekun Hao, Xian Liu, Gang Zeng, Ming-Yu Liu, and Qinsheng Zhang. 2024. EdgeRunner: Auto-regressive Auto-encoder for Artistic Mesh Generation. arXiv:2409.18114 [cs.CV] https://arxiv.org/abs/2409.18114

  74. [74]

    Jiaxiang Tang, Ruijie Lu, Zhaoshuo Li, Zekun Hao, Xuan Li, Fangyin Wei, Shuran Song, Gang Zeng, Ming-Yu Liu, and Tsung-Yi Lin. 2025. Efficient Part-level ACM Trans. Graph., Vol. 1, No. 1, Article . Publication date: April 2026. 10•Huang et al. 3D Object Generation via Dual Volume Packing. arXiv:2506.09980 [cs.CV] https://arxiv.org/abs/2506.09980

  75. [75]

    Junshu Tang, Tengfei Wang, Bo Zhang, Ting Zhang, Ran Yi, Lizhuang Ma, and Dong Chen. 2023. Make-it-3d: High-fidelity 3d creation from a single image with diffusion prior. InProceedings of the IEEE/CVF international conference on computer vision. 22819–22829

  76. [76]

    Shengji Tang, Weicai Ye, Peng Ye, Weihao Lin, Yang Zhou, Tao Chen, and Wanli Ouyang. 2024. HiSplat: Hierarchical 3D Gaussian Splatting for Generalizable Sparse-View Reconstruction.arXiv preprint arXiv:2410.06245(2024)

  77. [77]

    Arash Vahdat, Francis Williams, Zan Gojcic, Or Litany, Sanja Fidler, Karsten Kreis, et al. 2022. Lion: Latent point diffusion models for 3d shape generation. Advances in Neural Information Processing Systems35 (2022), 10021–10039

  78. [78]

    Vikram Voleti, Chun-Han Yao, Mark Boss, Adam Letts, David Pankratz, Dmitry Tochilkin, Christian Laforte, Robin Rombach, and Varun Jampani. 2024. Sv3d: Novel multi-view synthesis and 3d generation from a single image using latent video diffusion. InEuropean Conference on Computer Vision. Springer, 439–457

  79. [79]

    Fangjinhua Wang, Silvano Galliani, Christoph Vogel, Pablo Speciale, and Marc Pollefeys. 2021. PatchmatchNet: Learned Multi-View Patchmatch Stereo. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 14194–14203

  80. [80]

    Jianyuan Wang, Minghao Chen, Nikita Karaev, Andrea Vedaldi, Christian Rup- precht, and David Novotny. 2025. VGGT: Visual geometry grounded transformer. InIEEE/CVF Conference on Computer Vision and Pattern Recognition. 5294–5306

Showing first 80 references.