arxiv: 2604.01479 · v2 · submitted 2026-04-01 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

UniRecGen: Unifying Multi-View 3D Reconstruction and Generation

Cheng Lin, Chenyu Hu, Hanzhuo Huang, Jiahao Chen, Mengfei Li, Wenping Wang, Xin Li, Yuan Liu, Yuheng Liu, Zekai Gu, Zhengming Yu, Zhisheng Huang, Zibo Zhao

Authors on Pith no claims yet

Pith reviewed 2026-05-13 22:07 UTC · model grok-4.3

classification 💻 cs.CV

keywords 3D reconstructionmulti-view modelingdiffusion modelsunified frameworksparse observationscooperative learningcanonical space

0 comments

The pith

UniRecGen unifies reconstruction and generation by sharing a canonical space so sparse views yield complete consistent 3D models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to overcome the split where feed-forward reconstruction from few views stays faithful to input but misses global structure, while diffusion generation adds rich details yet fails to keep views consistent. It does so by forcing both modules into one shared canonical space and training them with disentangled cooperation that keeps each stable while they exchange signals at inference time. The reconstruction module supplies geometric anchors in that space; the diffusion module then conditions on latent features to refine and complete the shape. If the approach holds, sparse multi-view inputs could produce finished 3D objects without the usual loss of either fidelity or plausibility.

Core claim

UniRecGen integrates the reconstruction module and diffusion generator inside a single cooperative system by aligning them in a shared canonical space, applying disentangled cooperative learning to maintain training stability, letting the reconstruction supply canonical geometric anchors while the diffusion component uses latent-augmented conditioning to refine and complete the geometry, thereby producing more complete and consistent 3D models from sparse observations than either paradigm alone.

What carries the argument

The shared canonical space plus disentangled cooperative learning that aligns coordinate systems, representations, and objectives so the reconstruction module can anchor geometry while the diffusion generator refines and completes it.

If this is right

Sparse multi-view inputs produce finished 3D models with both input fidelity and global structural completeness.
Reconstruction supplies stable geometric anchors that guide the diffusion process without destabilizing training.
The same framework outperforms prior separate reconstruction and generation methods on fidelity and robustness metrics.
Inference becomes a single seamless collaboration rather than two disconnected stages.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Real-time capture pipelines in robotics or mobile AR could adopt the approach to turn a few camera frames into usable 3D assets.
The same shared-space cooperation pattern might transfer to other domains where one module supplies local accuracy and another supplies global priors.

Load-bearing premise

That forcing reconstruction and diffusion models into one canonical space with disentangled learning resolves their conflicts in coordinates, representations, and goals without creating fresh inconsistencies or training instability.

What would settle it

A controlled comparison on standard sparse-view benchmarks showing that UniRecGen outputs are no more complete, no more multi-view consistent, or no higher in fidelity than the best separate reconstruction or diffusion baselines.

Figures

Figures reproduced from arXiv: 2604.01479 by Cheng Lin, Chenyu Hu, Hanzhuo Huang, Jiahao Chen, Mengfei Li, Wenping Wang, Xin Li, Yuan Liu, Yuheng Liu, Zekai Gu, Zhengming Yu, Zhisheng Huang, Zibo Zhao.

**Figure 1.** Figure 1: UniRecGen enables high-fidelity 3D object reconstruction from sparse, unposed images. It first establishes a deterministic point-cloud “geometric anchor” within a canonical coordinate system. Guided by this anchor, a generative model synthesizes a detailed mesh that preserves instance-specific structural features while producing plausible geometric completions for unobserved regions, ensuring high alignmen… view at source ↗

**Figure 2.** Figure 2: Overview of our method. Given 𝑁 unposed input views, we first canonicalize feed-forward multi-view geometry predictions via branch repurposing and similarity alignment to obtain a canonical point cloud (top). We then train a controllable 3D generator conditioned on this point cloud together with multi-view image features, geometry latents, and camera embeddings to synthesize a high-fidelity mesh (bottom). … view at source ↗

**Figure 3.** Figure 3: Qualitative Comparison of Canonical Alignment. Our strategy achieves better geometric quality while aligning the canonical object space [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative Comparison of Multi-view Condition. Our strategy (right) preserves dense image context more effectively than point-guided sampling (left), leading to improved input alignment. Multi-view Conditioning Strategy. We evaluate our multi-view integration strategy by comparing our latent-augmented view conditioning against the point-guided sampling baseline introduced in Sec. 3.3. Quantitative result… view at source ↗

**Figure 5.** Figure 5: Qualitative Comparison on Toys4K and GSO. Compared to state-of-the-art reconstruction and generative baselines, our method produces 3D meshes with higher structural fidelity and superior multi-view consistency from sparse inputs. ACM Trans. Graph., Vol. 1, No. 1, Article . Publication date: April 2026 [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 6.** Figure 6: Generalization to Real-world Environments. Our framework demonstrates robustness and superior performance compared to SOTA method [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗

**Figure 7.** Figure 7: Additional qualitative results. We show six multi-view shape modeling examples. For each of four input views, from left to right we show: the input image, our mesh rendering overlaid on the input image, our mesh rendering, and the ground-truth mesh rendering. ACM Trans. Graph., Vol. 1, No. 1, Article . Publication date: April 2026 [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗

read the original abstract

Sparse-view 3D modeling represents a fundamental tension between reconstruction fidelity and generative plausibility. While feed-forward reconstruction excels in efficiency and input alignment, it often lacks the global priors needed for structural completeness. Conversely, diffusion-based generation provides rich geometric details but struggles with multi-view consistency. We present UniRecGen, a unified framework that integrates these two paradigms into a single cooperative system. To overcome inherent conflicts in coordinate spaces, 3D representations, and training objectives, we align both models within a shared canonical space. We employ disentangled cooperative learning, which maintains stable training while enabling seamless collaboration during inference. Specifically, the reconstruction module is adapted to provide canonical geometric anchors, while the diffusion generator leverages latent-augmented conditioning to refine and complete the geometric structure. Experimental results demonstrate that UniRecGen achieves superior fidelity and robustness, outperforming existing methods in creating complete and consistent 3D models from sparse observations. Code is available at https://github.com/zsh523/UniRecGen.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

UniRecGen unifies feed-forward reconstruction with diffusion generation through shared canonical space and disentangled cooperative learning, which cleanly targets the consistency gap on sparse views but leaves the size of the gains to the experiments.

read the letter

The main point is that this paper puts a reconstruction network and a diffusion generator into one canonical coordinate system and trains them cooperatively so the first supplies geometric anchors while the second fills gaps. The disentangled learning keeps the objectives from fighting each other during training, and the latent-augmented conditioning lets them collaborate at inference time. That setup is the concrete new piece: it directly tackles the coordinate, representation, and objective mismatches instead of hoping they average out. The paper does a clear job stating the starting tension between fast but incomplete reconstruction and detailed but inconsistent generation, and releasing the code is useful for anyone who wants to check the implementation. The cooperative inference path is a practical addition that could let users get both speed and completeness from the same model. The soft spot is that the abstract only reports superior fidelity and robustness without showing the actual metrics, baselines, or ablations in the text we have. If the full experiments isolate the canonical alignment as the driver and confirm no extra instability from the joint training, the claim holds; otherwise the gains might trace to other details. The assumption that the shared space plus disentanglement resolves conflicts without new side effects is plausible but still needs the numbers. This work is for researchers who build 3D models from few images, especially those already mixing deterministic and generative components. A reader who needs a hybrid system for sparse inputs will see a workable template. I would send it for peer review so the experiments can be checked directly.

Referee Report

2 major / 1 minor

Summary. The paper presents UniRecGen, a unified framework integrating feed-forward 3D reconstruction and diffusion-based generation for sparse-view modeling. It aligns both components in a shared canonical space and uses disentangled cooperative learning to resolve conflicts in coordinate systems, representations, and objectives, with the reconstruction module providing geometric anchors and the diffusion module refining structure via latent-augmented conditioning. The central claim is that this yields superior fidelity, robustness, and consistency over existing methods.

Significance. If the empirical claims hold, the work would meaningfully bridge reconstruction efficiency with generative completeness for sparse inputs, offering a practical path to more reliable 3D models without separate pipelines. The availability of code is a positive factor for reproducibility.

major comments (2)

Abstract: the claim that UniRecGen 'achieves superior fidelity and robustness, outperforming existing methods' is presented without any quantitative metrics, tables, ablation results, or error analysis in the provided text, rendering the central empirical claim unevaluable on its own terms.
The description of disentangled cooperative learning (Abstract) is load-bearing for the unification argument yet supplies no loss formulations, training schedule details, or stability analysis; without these, it is unclear whether the shared canonical space actually eliminates the stated conflicts rather than introducing new inconsistencies.

minor comments (1)

Abstract: the GitHub link is given but no statement of what is released (models, training code, evaluation scripts) appears.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the major comments point by point below, indicating planned revisions where appropriate.

read point-by-point responses

Referee: Abstract: the claim that UniRecGen 'achieves superior fidelity and robustness, outperforming existing methods' is presented without any quantitative metrics, tables, ablation results, or error analysis in the provided text, rendering the central empirical claim unevaluable on its own terms.

Authors: We agree that the abstract would benefit from greater specificity to support its claims. While the full manuscript contains the requested quantitative metrics, tables, ablation studies, and error analysis in Section 4, we will revise the abstract to incorporate brief quantitative highlights (e.g., key improvements in fidelity and consistency metrics) so that the central claim is more self-contained and evaluable from the abstract alone. revision: yes
Referee: The description of disentangled cooperative learning (Abstract) is load-bearing for the unification argument yet supplies no loss formulations, training schedule details, or stability analysis; without these, it is unclear whether the shared canonical space actually eliminates the stated conflicts rather than introducing new inconsistencies.

Authors: The abstract is a concise summary; the loss formulations for disentangled cooperative learning, the training schedule with alternating optimization, and stability analysis (including convergence behavior) are fully detailed in Section 3 of the manuscript, with experimental validation that the shared canonical space resolves the coordinate, representation, and objective conflicts. We will not alter the abstract but will review Section 3 to ensure the resolution of conflicts is emphasized even more explicitly. revision: no

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The paper introduces UniRecGen as a novel unification of reconstruction and diffusion paradigms via a shared canonical space and disentangled cooperative learning. The abstract describes new alignment mechanisms, adaptation of the reconstruction module for geometric anchors, and latent-augmented conditioning for the generator, without any equations or claims that reduce predictions to fitted inputs, self-definitions, or self-citation chains by construction. No load-bearing step renames known results or imports uniqueness theorems from prior author work as external facts. The framework's central claims rest on the proposed integration and experimental outcomes rather than tautological reductions, making the derivation self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract does not introduce or rely on any explicit free parameters, axioms, or invented entities beyond the general description of the framework.

pith-pipeline@v0.9.0 · 5515 in / 1072 out tokens · 36518 ms · 2026-05-13T22:07:53.462838+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear
we establish a shared canonical 3D modeling space that serves as a unified structural bridge... branch repurposing strategy... latent-augmented multi-view conditioning
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear
modular design... reconstruction module is trained first to provide a stable geometric anchor... generation model is then trained as a prior-driven refiner

Reference graph

Works this paper leans on

111 extracted references · 111 canonical work pages · 6 internal anchors

[1]

Eric R Chan, Connor Z Lin, Matthew A Chan, Koki Nagano, Boxiao Pan, Shalini De Mello, Orazio Gallo, Leonidas Guibas, Jonathan Tremblay, Sameh Khamis, et al. 2022. Efficient geometry-aware 3D generative adversarial networks. In IEEE/CVF International Conference on Computer Vision

work page 2022
[2]

Jiahao Chang, Chongjie Ye, Yushuang Wu, Yuantao Chen, Yidan Zhang, Zhongjin Luo, Chenghong Li, Yihao Zhi, and Xiaoguang Han. 2025. ReconViaGen: Towards Accurate Multi-view 3D Object Reconstruction via Generation.arXiv preprint arXiv:2510.23306(2025)

work page arXiv 2025
[3]

David Charatan, Sizhe Lester Li, Andrea Tagliasacchi, and Vincent Sitzmann

work page
[4]

InIEEE/CVF Conference on Computer Vision and Pattern Recognition

pixelSplat: 3d gaussian splats from image pairs for scalable generaliz- able 3d reconstruction. InIEEE/CVF Conference on Computer Vision and Pattern Recognition. 19457–19467

work page
[5]

Hansheng Chen, Jiatao Gu, Anpei Chen, Wei Tian, Zhuowen Tu, Lingjie Liu, and Hao Su. 2023. Single-stage diffusion nerf: A unified approach to 3d generation and reconstruction. InProceedings of the IEEE/CVF international conference on computer vision. 2416–2425

work page 2023
[6]

Sijin Chen, Xin Chen, Anqi Pang, Xianfang Zeng, Wei Cheng, Yijun Fu, Fukun Yin, Billzb Wang, Jingyi Yu, Gang Yu, et al. 2024. Meshxl: Neural coordinate field for generative 3d foundation models.Advances in Neural Information Processing Systems (NeurIPS)(2024)

work page 2024
[7]

Xingyu Chen, Fu-Jen Chu, Pierre Gleize, Kevin J Liang, Alexander Sax, Hao Tang, Weiyao Wang, Michelle Guo, Thibaut Hardin, Xiang Li, et al. 2025. Sam 3d: 3dfy anything in images.arXiv preprint arXiv:2511.16624(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[8]

Yiwen Chen, Tong He, Di Huang, Weicai Ye, Sijin Chen, Jiaxiang Tang, Xin Chen, Zhongang Cai, Lei Yang, Gang Yu, Guosheng Lin, and Chi Zhang. 2024. Me- shAnything: Artist-Created Mesh Generation with Autoregressive Transformers. arXiv:2406.10163 [cs.CV] https://arxiv.org/abs/2406.10163

work page arXiv 2024
[9]

Yiwen Chen, Yikai Wang, Yihao Luo, Zhengyi Wang, Zilong Chen, Jun Zhu, Chi Zhang, and Guosheng Lin. 2024. MeshAnything V2: Artist-Created Mesh Generation With Adjacent Mesh Tokenization. arXiv:2408.02555 [cs.CV] https: //arxiv.org/abs/2408.02555

work page arXiv 2024
[10]

Zilong Chen, Yikai Wang, Wenqiang Sun, Feng Wang, Yiwen Chen, and Huaping Liu. 2025. MeshGen: Generating PBR Textured Mesh with Render-Enhanced Auto-Encoder and Generative Data Augmentation. arXiv:2505.04656 [cs.GR] https://arxiv.org/abs/2505.04656

work page arXiv 2025
[11]

Zilong Chen, Yikai Wang, Feng Wang, Zhengyi Wang, and Huaping Liu

work page
[12]

V3d: Video diffusion models are effective 3d generators.arXiv preprint arXiv:2403.06738(2024)

work page arXiv 2024
[13]

Matt Deitke, Ruoshi Liu, Matthew Wallingford, Huong Ngo, Oscar Michel, Aditya Kusupati, Alan Fan, Christian Laforte, Vikram Voleti, Samir Yitzhak Gadre, et al. 2023. Objaverse-xl: A universe of 10m+ 3d objects.Advances in Neural Information Processing Systems36 (2023), 35799–35813

work page 2023
[14]

Laura Downs, Anthony Francis, Nate Maggio, Brandon Cavalcanti, Gerard Tagli- abue, Jake Varley, and Brian Ichter. 2022. Google scanned objects: A high-quality dataset of 3D scanned household items. In2022 International Conference on Robotics and Automation (ICRA). IEEE, 2553–2560

work page 2022
[15]

Zhiwen Fan, Wenyan Cong, Kairun Wen, Kevin Wang, Jian Zhang, Xinghao Ding, Danfei Xu, Boris Ivanovic, Marco Pavone, Georgios Pavlakos, et al. 2024. Instantsplat: Unbounded sparse-view pose-free gaussian splatting in 40 seconds. arXiv preprint arXiv:2403.203092, 3 (2024), 4

work page arXiv 2024
[16]

Yang Fu, Sifei Liu, Amey Kulkarni, Jan Kautz, Alexei A Efros, and Xiaolong Wang. 2023. COLMAP-Free 3D Gaussian Splatting.2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)(2023), 20796–20805

work page 2023
[17]

Jun Gao, Tianchang Shen, Zian Wang, Wenzheng Chen, Kangxue Yin, Daiqing Li, Or Litany, Zan Gojcic, and Sanja Fidler. 2022. Get3d: A generative model of high quality 3d textured shapes learned from images.Advances In Neural Information Processing Systems35 (2022), 31841–31854

work page 2022
[18]

Hyojun Go, Dominik Narnhofer, Goutam Bhat, Prune Truong, Federico Tombari, and Konrad Schindler. 2025. VIST3A: Text-to-3D by Stitching a Multi-view Reconstruction Network to a Video Generator.arXiv preprint arXiv:2510.13454 (2025)

work page arXiv 2025
[19]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets.Advances in Neural Information Processing Systems27 (2014)

work page 2014
[20]

Xiaodong Gu, Zhiwen Fan, Siyu Zhu, Zuozhuo Dai, Feitong Tan, and Ping Tan

work page
[21]

InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Cascade Cost Volume for High-Resolution Multi-View Stereo and Stereo Matching. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2495–2504

work page
[22]

Romero, Tsung-Yi Lin, and Ming-Yu Liu

Zekun Hao, David W. Romero, Tsung-Yi Lin, and Ming-Yu Liu. 2024. Meshtron: High-Fidelity, Artist-Like 3D Mesh Generation at Scale. arXiv:2412.09548 [cs.GR] https://arxiv.org/abs/2412.09548

work page arXiv 2024
[23]

Hao He, Yixun Liang, Luozhou Wang, Yuanhao Cai, Xinli Xu, Hao-Xiang Guo, Xiang Wen, and Ying-Cong Chen. 2024. Lucidfusion: Generating 3d gaussians with arbitrary unposed images. (2024)

work page 2024
[24]

Xianglong He, Junyi Chen, Sida Peng, Di Huang, Yangguang Li, Xiaoshui Huang, Chun Yuan, Wanli Ouyang, and Tong He. 2024. GVGEN: Text-to-3D Generation with Volumetric Representation. InEuropean Conference on Computer Vision

work page 2024
[25]

Xianglong He, Zi-Xin Zou, Chia-Hao Chen, Yuan-Chen Guo, Ding Liang, Chun Yuan, Wanli Ouyang, Yan-Pei Cao, and Yangguang Li. 2025. SparseFlex: High-Resolution and Arbitrary-Topology 3D Shape Modeling. arXiv:2503.21732 [cs.CV] https://arxiv.org/abs/2503.21732

work page arXiv 2025
[26]

Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models.Advances in Neural Information Processing Systems33 (2020), 6840–6851. ACM Trans. Graph., Vol. 1, No. 1, Article . Publication date: April 2026. UniRecGen: Unifying Multi-View 3D Reconstruction and Generation•9

work page 2020
[27]

Fangzhou Hong, Jiaxiang Tang, Ziang Cao, Min Shi, Tong Wu, Zhaoxi Chen, Tengfei Wang, Liang Pan, Dahua Lin, and Ziwei Liu. 2024. 3DTopia: Large Text- to-3D Generation Model with Hybrid Diffusion Priors.CoRRabs/2403.02234 (2024)

work page arXiv 2024
[28]

Jiaxin Huang, Yuanbo Yang, Bangbang Yang, Lin Ma, Yuewen Ma, and Yiyi Liao

work page
[29]

Gen3R: 3D Scene Generation Meets Feed-Forward Reconstruction.arXiv preprint arXiv:2601.04090(2026)

work page arXiv 2026
[30]

Ka-Hei Hui, Ruihui Li, Jingyu Hu, and Chi-Wing Fu. 2022. Neural wavelet- domain diffusion for 3d shape generation. InSIGGRAPH Asia 2022 Conference Papers. 1–9

work page 2022
[31]

Team Hunyuan3D, Shuhui Yang, Mingxin Yang, Yifei Feng, Xin Huang, Sheng Zhang, Zebin He, Di Luo, Haolin Liu, Yunfei Zhao, et al. 2025. Hunyuan3D 2.1: From Images to High-Fidelity 3D Assets with Production-Ready PBR Material. arXiv preprint arXiv:2506.15442(2025)

work page arXiv 2025
[32]

Team Hunyuan3D, Bowen Zhang, Chunchao Guo, Haolin Liu, Hongyu Yan, Huiwen Shi, Jingwei Huang, Junlin Yu, Kunhong Li, Penghao Wang, et al. 2025. Hunyuan3d-omni: A unified framework for controllable generation of 3d assets. arXiv preprint arXiv:2509.21245(2025)

work page arXiv 2025
[33]

Shubhendu Jena, Shishir Reddy Vutukur, and Adnane Boukhayma. 2025. Spar- Splat: Fast Multi-View Reconstruction with Generalizable 2D Gaussian Splatting. arXiv preprint arXiv:2505.02175(2025)

work page arXiv 2025
[34]

Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, and George Drettakis

work page
[35]

3D Gaussian Splatting for Real-Time Radiance Field Rendering.ACM Transactions on Graphics42, 4 (2023), 139–1

work page 2023
[36]

Zeqiang Lai, Yunfei Zhao, Haolin Liu, Zibo Zhao, Qingxiang Lin, Huiwen Shi, Xianghui Yang, Mingxin Yang, Shuhui Yang, Yifei Feng, Sheng Zhang, Xin Huang, Di Luo, Fan Yang, Fang Yang, Lifu Wang, Sicong Liu, Yixuan Tang, Yulin Cai, Zebin He, Tian Liu, Yuhong Liu, Jie Jiang, Linus, Jingwei Huang, and Chunchao Guo. 2025. Hunyuan3D 2.5: Towards High-Fidelity 3...

work page arXiv 2025
[37]

Yushi Lan, Fangzhou Hong, Shuai Yang, Shangchen Zhou, Xuyi Meng, Bo Dai, Xingang Pan, and Chen Change Loy. 2024. LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation. InECCV

work page 2024
[38]

Vincent Leroy, Yohann Cabon, and Jérôme Revaud. 2024. Grounding image matching in 3d with MASt3R. InEuropean Conference on Computer Vision. 71– 91

work page 2024
[39]

Jiahao Li, Hao Tan, Kai Zhang, Zexiang Xu, Fujun Luan, Yinghao Xu, Yicong Hong, Kalyan Sunkavalli, Greg Shakhnarovich, and Sai Bi. 2023. Instant3d: Fast text-to-3d with sparse-view generation and large reconstruction model.arXiv preprint arXiv:2311.06214(2023)

work page arXiv 2023
[40]

Weiyu Li, Jiarui Liu, Rui Chen, Yixun Liang, Xuelin Chen, Ping Tan, and Xi- aoxiao Long. 2024. CraftsMan: High-fidelity Mesh Generation with 3D Native Generation and Interactive Geometry Refiner.arXiv preprint arXiv:2405.14979 (2024)

work page arXiv 2024
[41]

Yangguang Li, Zi-Xin Zou, Zexiang Liu, Dehu Wang, Yuan Liang, Zhipeng Yu, Xingchao Liu, Yuan-Chen Guo, Ding Liang, Wanli Ouyang, et al. 2025. TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models.arXiv preprint arXiv:2502.06608(2025)

work page arXiv 2025
[42]

Zhihao Li, Yufei Wang, Heliang Zheng, Yihao Luo, and Bihan Wen. 2025. Sparc3D: Sparse Representation and Construction for High-Resolution 3D Shapes Model- ing. arXiv:2505.14521 [cs.CV] https://arxiv.org/abs/2505.14521

work page arXiv 2025
[43]

Hanwen Liang, Junli Cao, Vidit Goel, Guocheng Qian, Sergei Korolev, Demetri Terzopoulos, Konstantinos N Plataniotis, Sergey Tulyakov, and Jian Ren. 2025. Wonderland: Navigating 3d scenes from a single image. InIEEE/CVF Conference on Computer Vision and Pattern Recognition. 798–810

work page 2025
[44]

Chenguo Lin, Panwang Pan, Bangbang Yang, Zeming Li, and Yadong MU. 2025. DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation. InInternational Conference on Learning Representations

work page 2025
[45]

Chen-Hsuan Lin, Wei-Chiu Ma, Antonio Torralba, and Simon Lucey. 2021. BARF: Bundle-Adjusting Neural Radiance Fields. InProceedings of the IEEE/CVF Inter- national Conference on Computer Vision (ICCV). 5741–5751

work page 2021
[46]

Yuchen Lin, Chenguo Lin, Panwang Pan, Honglei Yan, Yiqiang Feng, Yadong Mu, and Katerina Fragkiadaki. 2025. PartCrafter: Structured 3D Mesh Generation via Compositional Latent Diffusion Transformers. arXiv:2506.05573 [cs.CV] https://arxiv.org/abs/2506.05573

work page arXiv 2025
[47]

Minghua Liu, Chao Xu, Haian Jin, Linghao Chen, Mukund Varma T, Zexiang Xu, and Hao Su. 2023. One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimization.Advances in Neural Information Processing Systems36 (2023), 22226–22246

work page 2023
[48]

Ruoshi Liu, Rundi Wu, Basile Van Hoorick, Pavel Tokmakov, Sergey Zakharov, and Carl Vondrick. 2023. Zero-1-to-3: Zero-shot one image to 3d object. In Proceedings of the IEEE/CVF international conference on computer vision. 9298– 9309

work page 2023
[49]

Xiaoxiao Long, Yuan-Chen Guo, Cheng Lin, Yuan Liu, Zhiyang Dou, Lingjie Liu, Yuexin Ma, Song-Hai Zhang, Marc Habermann, Christian Theobalt, et al. 2024. Wonder3d: Single image to 3d using cross-domain diffusion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9970–9980

work page 2024
[50]

William E Lorensen and Harvey E Cline. 1987. Marching cubes: A high resolution 3D surface construction algorithm.ACM SIGGRAPH Computer Graphics21, 4 (1987), 163–169

work page 1987
[51]

Shitong Luo and Wei Hu. 2021. Diffusion probabilistic models for 3d point cloud generation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2837–2845

work page 2021
[52]

Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. 2020. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. InEuropean Conference on Computer Vision (ECCV). 405–421

work page 2020
[53]

Norman Müller, Yawar Siddiqui, Lorenzo Porzi, Samuel Rota Bulo, Peter Kontschieder, and Matthias Nießner. 2023. Diffrf: Rendering-guided 3d radiance field diffusion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4328–4338

work page 2023
[54]

Alex Nichol, Heewoo Jun, Prafulla Dhariwal, Pamela Mishkin, and Mark Chen

work page
[55]

Point-E: A System for Generating 3D Point Clouds from Complex Prompts

Point-e: A system for generating 3d point clouds from complex prompts. arXiv preprint arXiv:2212.08751(2022)

work page internal anchor Pith review arXiv 2022
[56]

Shadows can be

Michael Niemeyer, Jonathan T. Barron, Ben Mildenhall, Mehdi S. M. Sajjadi, An- dreas Geiger, and Noha Radwan. 2022. RegNeRF: Regularizing Neural Radiance Fields for View Synthesis from Sparse Inputs. In2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, New Orleans, LA, USA. doi:10.1109/CVPR52688.2022.00540

work page doi:10.1109/cvpr52688.2022.00540 2022
[57]

Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El- Nouby, et al. 2023. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[58]

Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, and Steven Lovegrove. 2019. Deepsdf: Learning continuous signed distance functions for shape representation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 165–174

work page 2019
[59]

Ben Poole, Ajay Jain, Jonathan T Barron, and Ben Mildenhall. 2023. Dream- Fusion: Text-to-3D using 2D Diffusion. InInternational Conference on Learning Representations

work page 2023
[60]

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion. 10684–10695

work page 2022
[61]

Johannes L Schönberger and Jan-Michael Frahm. 2016. Structure-from-motion revisited. InIEEE Conference on Computer Vision and Pattern Recognition. 4104– 4113

work page 2016
[62]

Schönberger, Enliang Zheng, Marc Pollefeys, and Jan-Michael Frahm

Johannes L. Schönberger, Enliang Zheng, Marc Pollefeys, and Jan-Michael Frahm

work page
[63]

InEuropean Conference on Computer Vision (ECCV)

Pixelwise View Selection for Unstructured Multi-View Stereo. InEuropean Conference on Computer Vision (ECCV). 501–518

work page
[64]

Katja Schwarz, Norman Mueller, and Peter Kontschieder. 2025. Generative Gaussian splatting: Generating 3D scenes with video diffusion priors.arXiv preprint arXiv:2503.13272(2025)

work page arXiv 2025
[65]

Yichun Shi, Peng Wang, Jianglong Ye, Long Mai, Kejie Li, and Xiao Yang. 2024. MVDream: Multi-view Diffusion for 3D Generation. InInternational Conference on Learning Representations

work page 2024
[66]

Yawar Siddiqui, Antonio Alliegro, Alexey Artemov, Tatiana Tommasi, Daniele Sirigatti, Vladislav Rosov, Angela Dai, and Matthias Nießner. 2024. Meshgpt: Generating triangle meshes with decoder-only transformers. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

work page 2024
[67]

Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli

work page
[68]

In International Conference on Machine Learning

Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning. 2256–2265

work page
[69]

Stefan Stojanov, Anh Thai, and James M Rehg. 2021. Using shape to categorize: Low-shot learning with an explicit shape bias. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 1798–1808

work page 2021
[70]

Stanislaw Szymanowicz, Jason Y Zhang, Pratul Srinivasan, Ruiqi Gao, Arthur Brussee, Aleksander Holynski, Ricardo Martin-Brualla, Jonathan T Barron, and Philipp Henzler. 2025. Bolt3D: Generating 3D scenes in seconds.arXiv preprint arXiv:2503.14445(2025)

work page arXiv 2025
[71]

Bin Tan, Nan Xue, Tianfu Wu, and Gui-Song Xia. 2023. NOPE-SAC: Neural One- Plane RANSAC for Sparse-View Planar 3D Reconstruction.IEEE Transactions on Pattern Analysis and Machine Intelligence45 (2023). doi:10.1109/TPAMI.2023. 3314745

work page doi:10.1109/tpami.2023 2023
[72]

Jiaxiang Tang, Zhaoxi Chen, Xiaokang Chen, Tengfei Wang, Gang Zeng, and Ziwei Liu. 2024. Lgm: Large multi-view gaussian model for high-resolution 3d content creation. InEuropean Conference on Computer Vision. 1–18

work page 2024
[73]

Jiaxiang Tang, Zhaoshuo Li, Zekun Hao, Xian Liu, Gang Zeng, Ming-Yu Liu, and Qinsheng Zhang. 2024. EdgeRunner: Auto-regressive Auto-encoder for Artistic Mesh Generation. arXiv:2409.18114 [cs.CV] https://arxiv.org/abs/2409.18114

work page arXiv 2024
[74]

Jiaxiang Tang, Ruijie Lu, Zhaoshuo Li, Zekun Hao, Xuan Li, Fangyin Wei, Shuran Song, Gang Zeng, Ming-Yu Liu, and Tsung-Yi Lin. 2025. Efficient Part-level ACM Trans. Graph., Vol. 1, No. 1, Article . Publication date: April 2026. 10•Huang et al. 3D Object Generation via Dual Volume Packing. arXiv:2506.09980 [cs.CV] https://arxiv.org/abs/2506.09980

work page arXiv 2025
[75]

Junshu Tang, Tengfei Wang, Bo Zhang, Ting Zhang, Ran Yi, Lizhuang Ma, and Dong Chen. 2023. Make-it-3d: High-fidelity 3d creation from a single image with diffusion prior. InProceedings of the IEEE/CVF international conference on computer vision. 22819–22829

work page 2023
[76]

Shengji Tang, Weicai Ye, Peng Ye, Weihao Lin, Yang Zhou, Tao Chen, and Wanli Ouyang. 2024. HiSplat: Hierarchical 3D Gaussian Splatting for Generalizable Sparse-View Reconstruction.arXiv preprint arXiv:2410.06245(2024)

work page arXiv 2024
[77]

Arash Vahdat, Francis Williams, Zan Gojcic, Or Litany, Sanja Fidler, Karsten Kreis, et al. 2022. Lion: Latent point diffusion models for 3d shape generation. Advances in Neural Information Processing Systems35 (2022), 10021–10039

work page 2022
[78]

Vikram Voleti, Chun-Han Yao, Mark Boss, Adam Letts, David Pankratz, Dmitry Tochilkin, Christian Laforte, Robin Rombach, and Varun Jampani. 2024. Sv3d: Novel multi-view synthesis and 3d generation from a single image using latent video diffusion. InEuropean Conference on Computer Vision. Springer, 439–457

work page 2024
[79]

Fangjinhua Wang, Silvano Galliani, Christoph Vogel, Pablo Speciale, and Marc Pollefeys. 2021. PatchmatchNet: Learned Multi-View Patchmatch Stereo. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 14194–14203

work page 2021
[80]

Jianyuan Wang, Minghao Chen, Nikita Karaev, Andrea Vedaldi, Christian Rup- precht, and David Novotny. 2025. VGGT: Visual geometry grounded transformer. InIEEE/CVF Conference on Computer Vision and Pattern Recognition. 5294–5306

work page 2025

Showing first 80 references.