arxiv: 2603.16869 · v3 · submitted 2026-03-17 · 💻 cs.CV

Recognition: no theorem link

SegviGen: Repurposing 3D Generative Model for Part Segmentation

Lin Li , Haoran Feng , Zehuan Huang , Haohua Chen , Wenbo Nie , Shaohua Hou , Keqing Fan , Pan Hu

show 3 more authors

Sheng Wang Buyu Li Lu Sheng

Authors on Pith no claims yet

Pith reviewed 2026-05-15 09:25 UTC · model grok-4.3

classification 💻 cs.CV

keywords 3D part segmentationgenerative model priorsvoxel colorizationinteractive segmentationfew-shot 3D learninggeometry-aligned reconstructionminimal supervision

0 comments

The pith

Pretrained 3D generative models can be repurposed for 3D part segmentation by predicting part-specific colors on geometry-aligned voxels.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

SegviGen repurposes a pretrained 3D generative model to segment object parts without building a new discriminative network from scratch. It encodes the 3D asset and predicts distinctive colors for different parts directly on the active voxels of a geometry-aligned reconstruction. This colorization step accesses the structured priors already learned by the generative model. The same pipeline handles interactive part segmentation, full segmentation, and full segmentation guided by 2D masks. Experiments show large accuracy gains while requiring only 0.32 percent of the usual labeled training data.

Core claim

SegviGen encodes a 3D asset and predicts part-indicative colors on active voxels of a geometry-aligned reconstruction, inducing segmentation from the structured priors already present in a pretrained 3D generative model; the same framework supports interactive part segmentation, full segmentation, and full segmentation with 2D guidance.

What carries the argument

Part-indicative colorization prediction performed on geometry-aligned voxels, which extracts segmentation boundaries from the generative model's existing priors.

If this is right

Interactive part segmentation accuracy rises 40 percent over prior state-of-the-art methods.
Full segmentation accuracy rises 15 percent over prior state-of-the-art methods.
The entire pipeline operates with only 0.32 percent of the labeled training data required by previous approaches.
One model handles interactive segmentation, full segmentation, and 2D-guided segmentation without separate training runs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could be tested on generative models trained on different object categories to measure how much domain match between pretraining and target shapes affects boundary quality.
Annotation budgets in robotics or augmented-reality pipelines could drop substantially if only a few dozen labeled examples suffice for new part vocabularies.
Extending the voxel colorization step to time-varying 3D data might allow part tracking in dynamic scenes with the same low-label regime.

Load-bearing premise

The structured priors already learned by the 3D generative model can be aligned to actual part boundaries simply by training the model to predict colors on geometry-aligned voxels.

What would settle it

A controlled experiment in which the colorization prediction head is replaced by a random color assignment or by a non-pretrained encoder, with all other components held fixed, would show whether the performance gains disappear.

Figures

Figures reproduced from arXiv: 2603.16869 by Buyu Li, Haohua Chen, Haoran Feng, Keqing Fan, Lin Li, Lu Sheng, Pan Hu, Shaohua Hou, Sheng Wang, Wenbo Nie, Zehuan Huang.

**Figure 1.** Figure 1: SegviGen enables diverse and accurate 3D part segmentation by leveraging priors from large-scale 3D generative models. With substantially less training data, it produces high-fidelity segmentation results with sharp part boundaries and strong generalization across object categories. Abstract We introduce SegviGen, a framework that repurposes native 3D generative models for 3D part segmentation. Existing … view at source ↗

**Figure 2.** Figure 2: Pipeline of SegviGen. We reformulate 3D part segmentation as a conditional colorization task. During training, given a 3D mesh and its part-color ground truth, we encode both with a pretrained 3D VAE, add noise to the ground-truth latent, and then concatenate the geometry latent, noisy color latent, and point-condition tokens to form the final latent input. Conditioned on the sampled timestep and a task em… view at source ↗

**Figure 3.** Figure 3: Interactive part-segmentation results. We compare [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Full segmentation results. We compare SegviGen against a broad set of prior methods, where different colors indicate different segmented parts. From the results, SegviGen achieves high-accuracy full segmentation with sharp part boundaries using only 3D input. With 2D guidance, it further enables explicit control over granularity and labels for ultra-fine-grained 3D part parsing. forms segmentation solely b… view at source ↗

**Figure 5.** Figure 5: More qualitative segmentation results of our [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Interactive 3D editing results with SegviGen and VoxHammer [27]. SegviGen provides precise part segmentation to facilitate downstream editing models. The target region to be edited is indicated by green points. It demonstrates the practical utility of SegviGen in downstream 3D editing pipelines. 8 [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: Interactive demo. Users specify clicks on the 3D asset to perform interactive part segmentation. [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗

**Figure 8.** Figure 8: More qualitative comparisons for full segmentation. [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗

read the original abstract

We introduce SegviGen, a framework that repurposes native 3D generative models for 3D part segmentation. Existing pipelines either lift strong 2D priors into 3D via distillation or multi-view mask aggregation, often suffering from cross-view inconsistency and blurred boundaries, or explore native 3D discriminative segmentation, which typically requires large-scale annotated 3D data and substantial training resources. In contrast, SegviGen leverages the structured priors encoded in pretrained 3D generative model to induce segmentation through distinctive part colorization, establishing a novel and efficient framework for part segmentation. Specifically, SegviGen encodes a 3D asset and predicts part-indicative colors on active voxels of a geometry-aligned reconstruction. It supports interactive part segmentation, full segmentation, and full segmentation with 2D guidance in a unified framework. Extensive experiments show that SegviGen improves over the prior state of the art by 40% on interactive part segmentation and by 15% on full segmentation, while using only 0.32% of the labeled training data. It demonstrates that pretrained 3D generative priors transfer effectively to 3D part segmentation, enabling strong performance with limited supervision. See our project page at https://fenghora.github.io/SegviGen-Page/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SegviGen gets real efficiency gains on 3D part segmentation by colorizing voxels inside a pretrained generative model, but the gains may lean more on geometry than on the generative priors themselves.

read the letter

The main point is that this paper takes a pretrained 3D generative model and repurposes it for part segmentation by predicting part-specific colors on active voxels of a geometry-aligned reconstruction. It reports 40% better interactive segmentation and 15% better full segmentation while using only 0.32% of the usual labeled data, and it handles interactive, full, and 2D-guided cases in one setup. That efficiency angle is the clearest contribution relative to the 2D-lifting and native discriminative baselines it cites. The framing of inducing segmentation directly from generative priors via colorization is a fresh combination that avoids heavy new 3D annotation or multi-view inconsistency issues. The numbers, if the baselines are fair, show that existing generative structure can transfer usefully to a discriminative task with almost no extra labels. The soft spot sits in the mechanism itself. The approach depends on the color predictions aligning to true part boundaries rather than to the underlying voxel grid or mesh geometry. The stress-test note is on target here: without reported voxel resolution, active-voxel selection rules, or an ablation that removes the generative component while keeping the reconstruction, it is hard to tell how much of the lift comes from the priors versus discretization cues. If boundaries are smoothed by the grid, the claimed prior transfer weakens even if the generative model is strong. This work is aimed at 3D vision researchers who already have access to generative models and want to adapt them for segmentation with limited supervision. A reader focused on data-efficient 3D tasks or cross-task transfer would find the experiments worth examining. It deserves peer review because the efficiency claim and the repurposing idea are concrete enough to test and refine.

Referee Report

2 major / 1 minor

Summary. The paper introduces SegviGen, a framework that repurposes pretrained 3D generative models for 3D part segmentation by encoding a 3D asset and predicting part-indicative colors on active voxels of a geometry-aligned reconstruction. It unifies support for interactive part segmentation, full segmentation, and full segmentation with 2D guidance, claiming improvements of 40% over prior SOTA on interactive segmentation and 15% on full segmentation while using only 0.32% of labeled training data.

Significance. If the results hold under fair comparisons, the work demonstrates that structured priors from 3D generative models can transfer effectively to discriminative part segmentation with minimal supervision, offering a data-efficient alternative to distillation-based or large-scale annotation approaches in 3D vision.

major comments (2)

[Method] The central mechanism (encoding followed by color prediction on active voxels of a geometry-aligned reconstruction) lacks any specification of voxel resolution, active-voxel selection criteria, or reconstruction artifacts. This detail is load-bearing for the claim that generative priors map to true part boundaries rather than voxel-grid geometry, as discretization could blur semantic edges and undermine the 40%/15% gains.
[Experiments] No ablation is described that isolates the contribution of the colorization signal from underlying geometric cues in the reconstruction. Without this, the reported improvements with 0.32% labels cannot be confidently attributed to transfer of generative priors rather than mesh or voxel structure.

minor comments (1)

[Abstract] The abstract references a project page for additional details, but the manuscript text provides no explicit statement on code or model release to support reproducibility of the quantitative claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful to the referee for their thorough review and valuable suggestions. We will revise the manuscript to address the concerns raised regarding methodological details and experimental ablations. Our point-by-point responses are as follows.

read point-by-point responses

Referee: [Method] The central mechanism (encoding followed by color prediction on active voxels of a geometry-aligned reconstruction) lacks any specification of voxel resolution, active-voxel selection criteria, or reconstruction artifacts. This detail is load-bearing for the claim that generative priors map to true part boundaries rather than voxel-grid geometry, as discretization could blur semantic edges and undermine the 40%/15% gains.

Authors: We agree with the referee that these implementation details are crucial for understanding the method and validating the claims. In the revised manuscript, we will expand the method section to include the voxel resolution of the reconstruction grid, the criteria for identifying active voxels (based on the generative model's occupancy output), and an analysis of how reconstruction artifacts are handled. This addition will clarify that the part boundaries are determined by the colorization signal rather than solely by the voxel discretization. revision: yes
Referee: [Experiments] No ablation is described that isolates the contribution of the colorization signal from underlying geometric cues in the reconstruction. Without this, the reported improvements with 0.32% labels cannot be confidently attributed to transfer of generative priors rather than mesh or voxel structure.

Authors: This is a fair criticism, and we will address it by adding a new ablation study in the experiments section. The ablation will evaluate a geometry-only baseline that relies solely on the reconstruction without the generative color prediction. By comparing this to the full SegviGen model, we aim to demonstrate the specific benefit of repurposing the 3D generative priors for part segmentation. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The paper presents SegviGen as an empirical framework that repurposes an external pretrained 3D generative model via colorization prediction on geometry-aligned voxels. No equations, derivations, or self-citations are shown that reduce the claimed 40%/15% gains or 0.32% data efficiency to quantities fitted inside the paper by construction. The approach is described as leveraging independent structured priors from the pretrained model, with performance validated through experiments rather than self-referential definitions or fitted inputs renamed as predictions. This is a standard non-circular empirical repurposing result.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that pretrained generative models already encode part-level structure that can be surfaced via color prediction; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)

domain assumption Pretrained 3D generative models encode structured part priors that transfer to segmentation via colorization on geometry-aligned voxels.
This premise is required for the method to work with minimal labeled data and is stated as the key advantage over prior pipelines.

pith-pipeline@v0.9.0 · 5561 in / 1267 out tokens · 33576 ms · 2026-05-15T09:25:19.905332+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

92 extracted references · 92 canonical work pages · 6 internal anchors

[1]

SAM 3: Segment Anything with Concepts

Nicolas Carion, Laura Gustafson, Yuan-Ting Hu, Shoub- hik Debnath, Ronghang Hu, Didac Suris, Chaitanya Ryali, Kalyan Vasudev Alwala, Haitham Khedr, Andrew Huang, et al. Sam 3: Segment anything with concepts.arXiv preprint arXiv:2511.16719, 2025. 2

work page internal anchor Pith review Pith/arXiv arXiv 2025
[2]

Emerg- ing properties in self-supervised vision transformers

Mathilde Caron, Hugo Touvron, Ishan Misra, Herv ´e J´egou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerg- ing properties in self-supervised vision transformers. In ICCV, 2021. 3

work page 2021
[3]

Meshxl: Neural coordinate field for generative 3d foundation models,

Sijin Chen, Xin Chen, Anqi Pang, Xianfang Zeng, Wei Cheng, Yijun Fu, Fukun Yin, Yanru Wang, Zhibin Wang, Chi Zhang, Jingyi Yu, Gang Yu, Bin Fu, and Tao Chen. Meshxl: Neural coordinate field for generative 3d foundation models,

work page
[4]

A benchmark for 3d mesh segmentation

Xiaobai Chen, Aleksey Golovinskiy, and Thomas Funkhouser. A benchmark for 3d mesh segmentation. InSIGGRAPH, 2009. 2

work page 2009
[5]

Meshanything: Artist- created mesh generation with autoregressive transformers,

Yiwen Chen, Tong He, Di Huang, Weicai Ye, Sijin Chen, Jiaxiang Tang, Xin Chen, Zhongang Cai, Lei Yang, Gang Yu, Guosheng Lin, and Chi Zhang. Meshanything: Artist- created mesh generation with autoregressive transformers,

work page
[6]

Ultra3d: Efficient and high- fidelity 3d generation with part attention, 2025

Yiwen Chen, Zhihao Li, Yikai Wang, Hu Zhang, Qin Li, Chi Zhang, and Guosheng Lin. Ultra3d: Efficient and high- fidelity 3d generation with part attention, 2025. 3

work page 2025
[7]

Chang, Manolis Savva, Maciej Hal- ber, Thomas Funkhouser, and Matthias Nießner

Angela Dai, Angel X. Chang, Manolis Savva, Maciej Hal- ber, Thomas Funkhouser, and Matthias Nießner. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In arXiv, 2017. 2

work page 2017
[8]

Objaverse: A universe of annotated 3d objects

Matt Deitke, Dustin Schwenk, Jordi Salvador, Luca Weihs, Oscar Michel, Eli VanderBilt, Ludwig Schmidt, Kiana Ehsani, Aniruddha Kembhavi, and Ali Farhadi. Objaverse: A universe of annotated 3d objects. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13142–13153, 2023. 3

work page 2023
[9]

Objaverse-xl: A universe of 10m+ 3d objects.Advances in Neural Informa- tion Processing Systems, 36, 2024

Matt Deitke, Ruoshi Liu, Matthew Wallingford, Huong Ngo, Oscar Michel, Aditya Kusupati, Alan Fan, Christian Laforte, Vikram V oleti, Samir Yitzhak Gadre, et al. Objaverse-xl: A universe of 10m+ 3d objects.Advances in Neural Informa- tion Processing Systems, 36, 2024. 3

work page 2024
[10]

Geosam2: Unleashing the power of sam2 for 3d part segmentation

Ken Deng, Yunhan Yang, Jingxiang Sun, Xihui Liu, Yebin Liu, Ding Liang, and Yan-Pei Cao. Geosam2: Unleashing the power of sam2 for 3d part segmentation. InarXiv, 2025. 2

work page 2025
[11]

Tela: Text to layer-wise 3d clothed human generation

Junting Dong, Qi Fang, Zehuan Huang, Xudong Xu, Jingbo Wang, Sida Peng, and Bo Dai. Tela: Text to layer-wise 3d clothed human generation. InEuropean Conference on Com- puter Vision, pages 19–36. Springer, 2025. 3

work page 2025
[12]

From one to more: Contextual part latents for 3d gener- ation, 2025

Shaocong Dong, Lihe Ding, Xiao Chen, Yaokun Li, Yuxin Wang, Yucheng Wang, Qi Wang, Jaehyeok Kim, Chenjian Gao, Zhanpeng Huang, Zibin Wang, Tianfan Xue, and Dan Xu. From one to more: Contextual part latents for 3d gener- ation, 2025. 6

work page 2025
[13]

From one to more: Contextual part latents for 3d gener- ation, 2025

Shaocong Dong, Lihe Ding, Xiao Chen, Yaokun Li, Yuxin Wang, Yucheng Wang, Qi Wang, Jaehyeok Kim, Chenjian Gao, Zhanpeng Huang, Zibin Wang, Tianfan Xue, and Dan Xu. From one to more: Contextual part latents for 3d gener- ation, 2025. 3

work page 2025
[14]

Me- shart: Generating articulated meshes with structure-guided transformers, 2025

Daoyi Gao, Yawar Siddiqui, Lei Li, and Angela Dai. Me- shart: Generating articulated meshes with structure-guided transformers, 2025. 3

work page 2025
[15]

3d part seg- mentation via geometric aggregation of 2d visual features

Marco Garosi, Riccardo Tedoldi, Davide Boscaini, Massim- iliano Mancini, Nicu Sebe, and Fabio Poiesi. 3d part seg- mentation via geometric aggregation of 2d visual features. InarXiv, 2025. 3

work page 2025
[16]

Meshcnn: a network with an edge

Rana Hanocka, Amir Hertz, Noa Fish, Raja Giryes, Shachar Fleishman, and Daniel Cohen-Or. Meshcnn: a network with an edge. InACM, 2019. 2

work page 2019
[17]

Romero, Tsung-Yi Lin, and Ming-Yu Liu

Zekun Hao, David W. Romero, Tsung-Yi Lin, and Ming-Yu Liu. Meshtron: High-fidelity, artist-like 3d mesh generation at scale, 2024. 3

work page 2024
[18]

Neural lightrig: Unlocking accurate object normal and material estimation with multi-light diffusion, 2024

Zexin He, Tengfei Wang, Xin Huang, Xingang Pan, and Zi- wei Liu. Neural lightrig: Unlocking accurate object normal and material estimation with multi-light diffusion, 2024. 3

work page 2024
[19]

Denoising dif- fusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising dif- fusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020. 3

work page 2020
[20]

LRM: Large Reconstruction Model for Single Image to 3D

Yicong Hong, Kai Zhang, Jiuxiang Gu, Sai Bi, Yang Zhou, Difan Liu, Feng Liu, Kalyan Sunkavalli, Trung Bui, and Hao Tan. Lrm: Large reconstruction model for single image to 3d.arXiv preprint arXiv:2311.04400, 2023. 3

work page internal anchor Pith review arXiv 2023
[21]

Segment3d: Learning fine-grained class-agnostic 3d segmentation without manual labels

Rui Huang, Songyou Peng, Ayca Takmaz, Federico Tombari, Marc Pollefeys, Shiji Song, Gao Huang, and Francis Engel- mann. Segment3d: Learning fine-grained class-agnostic 3d segmentation without manual labels. InEuropean Confer- ence on Computer Vision, pages 278–295. Springer, 2024. 2

work page 2024
[22]

Stereo-gs: Multi-view stereo vision model for generalizable 3d gaussian splatting reconstruction,

Xiufeng Huang, Ka Chun Cheung, Runmin Cong, Simon See, and Renjie Wan. Stereo-gs: Multi-view stereo vision model for generalizable 3d gaussian splatting reconstruction,

work page
[23]

Mv-adapter: Multi-view consistent image generation made easy.arXiv preprint arXiv:2412.03632, 2024

Zehuan Huang, Yuan-Chen Guo, Haoran Wang, Ran Yi, Lizhuang Ma, Yan-Pei Cao, and Lu Sheng. Mv-adapter: Multi-view consistent image generation made easy.arXiv preprint arXiv:2412.03632, 2024. 3

work page arXiv 2024
[24]

Epidiff: Enhancing multi-view synthesis via localized epipolar-constrained diffusion

Zehuan Huang, Hao Wen, Junting Dong, Yaohui Wang, Yangguang Li, Xinyuan Chen, Yan-Pei Cao, Ding Liang, Yu Qiao, Bo Dai, et al. Epidiff: Enhancing multi-view synthesis via localized epipolar-constrained diffusion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 9784–9794, 2024. 3 11

work page 2024
[25]

Auto-Encoding Variational Bayes

Diederik P Kingma. Auto-encoding variational bayes.arXiv preprint arXiv:1312.6114, 2013. 3

work page internal anchor Pith review Pith/arXiv arXiv 2013
[26]

Berg, Wan-Yen Lo, Piotr Doll ´ar, and Ross Girshick

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer White- head, Alexander C. Berg, Wan-Yen Lo, Piotr Doll ´ar, and Ross Girshick. Segment anything. InarXiv, 2023. 2, 3

work page 2023
[27]

V oxhammer: Training-free precise and coherent 3d editing in native 3d space.arXiv preprint arXiv:2508.19247, 2025

Lin Li, Zehuan Huang, Haoran Feng, Gengxiong Zhuang, Rui Chen, Chunchao Guo, and Lu Sheng. V oxhammer: Training-free precise and coherent 3d editing in native 3d space.arXiv preprint arXiv:2508.19247, 2025. 8

work page arXiv 2025
[28]

Grounded language-image pre-training

Liunian Harold Li, Pengchuan Zhang, Haotian Zhang, Jian- wei Yang, Chunyuan Li, Yiwu Zhong, Lijuan Wang, Lu Yuan, Lei Zhang, Jenq-Neng Hwang, Kai-Wei Chang, and Jianfeng Gao. Grounded language-image pre-training. In arXiv, 2022. 3

work page 2022
[29]

Craftsman: High-fidelity mesh generation with 3d native generation and interactive geometry refiner.arXiv preprint arXiv:2405.14979, 2024

Weiyu Li, Jiarui Liu, Rui Chen, Yixun Liang, Xuelin Chen, Ping Tan, and Xiaoxiao Long. Craftsman: High-fidelity mesh generation with 3d native generation and interactive geometry refiner.arXiv preprint arXiv:2405.14979, 2024. 3

work page arXiv 2024
[30]

Craftsman3d: High-fidelity mesh generation with 3d native generation and interactive geometry refiner, 2025

Weiyu Li, Jiarui Liu, Hongyu Yan, Rui Chen, Yixun Liang, Xuelin Chen, Ping Tan, and Xiaoxiao Long. Craftsman3d: High-fidelity mesh generation with 3d native generation and interactive geometry refiner, 2025. 3

work page 2025
[31]

Step1x-3d: Towards high-fidelity and controllable generation of textured 3d assets, 2025

Weiyu Li, Xuanyang Zhang, Zheng Sun, Di Qi, Hao Li, Wei Cheng, Weiwei Cai, Shihao Wu, Jiarui Liu, Zihao Wang, Xiao Chen, Feipeng Tian, Jianxiong Pan, Zeming Li, Gang Yu, Xiangyu Zhang, Daxin Jiang, and Ping Tan. Step1x-3d: Towards high-fidelity and controllable generation of textured 3d assets, 2025. 3

work page 2025
[32]

Triposg: High- fidelity 3d shape synthesis using large-scale rectified flow models, 2025

Yangguang Li, Zi-Xin Zou, Zexiang Liu, Dehu Wang, Yuan Liang, Zhipeng Yu, Xingchao Liu, Yuan-Chen Guo, Ding Liang, Wanli Ouyang, and Yan-Pei Cao. Triposg: High- fidelity 3d shape synthesis using large-scale rectified flow models, 2025. 3

work page 2025
[33]

arXiv preprint arXiv:2502.06608 (2025)

Yangguang Li, Zi-Xin Zou, Zexiang Liu, Dehu Wang, Yuan Liang, Zhipeng Yu, Xingchao Liu, Yuan-Chen Guo, Ding Liang, Wanli Ouyang, et al. Triposg: High-fidelity 3d shape synthesis using large-scale rectified flow models.arXiv preprint arXiv:2502.06608, 2025. 3

work page arXiv 2025
[34]

End-to-end hu- man pose and mesh reconstruction with transformers

Kevin Lin, Lijuan Wang, and Zicheng Liu. End-to-end hu- man pose and mesh reconstruction with transformers. In CVPR, 2021. 2

work page 2021
[35]

Partcrafter: Structured 3d mesh generation via compositional latent diffusion transformers, 2025

Yuchen Lin, Chenguo Lin, Panwang Pan, Honglei Yan, Yiqiang Feng, Yadong Mu, and Katerina Fragkiadaki. Partcrafter: Structured 3d mesh generation via compositional latent diffusion transformers, 2025. 3

work page 2025
[36]

Part123: part-aware 3d reconstruction from a single-view image

Anran Liu, Cheng Lin, Yuan Liu, Xiaoxiao Long, Zhiyang Dou, Hao-Xiang Guo, Ping Luo, and Wenping Wang. Part123: part-aware 3d reconstruction from a single-view image. InACM SIGGRAPH 2024 Conference Papers, pages 1–12, 2024. 3

work page 2024
[37]

One-2-3-45++: Fast single im- age to 3d objects with consistent multi-view generation and 3d diffusion

Minghua Liu, Ruoxi Shi, Linghao Chen, Zhuoyang Zhang, Chao Xu, Xinyue Wei, Hansheng Chen, Chong Zeng, Ji- ayuan Gu, and Hao Su. One-2-3-45++: Fast single im- age to 3d objects with consistent multi-view generation and 3d diffusion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10072– 10083, 2024

work page 2024
[38]

One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimiza- tion.Advances in Neural Information Processing Systems, 36, 2024

Minghua Liu, Chao Xu, Haian Jin, Linghao Chen, Mukund Varma T, Zexiang Xu, and Hao Su. One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimiza- tion.Advances in Neural Information Processing Systems, 36, 2024. 3

work page 2024
[39]

Partfield: Learn- ing 3d feature fields for part segmentation and beyond

Minghua Liu, Mikaela Angelina Uy, Donglai Xiang, Hao Su, Sanja Fidler, Nicholas Sharp, and Jun Gao. Partfield: Learn- ing 3d feature fields for part segmentation and beyond. In ICCV, 2025. 3, 6, 9

work page 2025
[40]

Syncdreamer: Gen- erating multiview-consistent images from a single-view im- age.arXiv preprint arXiv:2309.03453, 2023

Yuan Liu, Cheng Lin, Zijiao Zeng, Xiaoxiao Long, Lingjie Liu, Taku Komura, and Wenping Wang. Syncdreamer: Gen- erating multiview-consistent images from a single-view im- age.arXiv preprint arXiv:2309.03453, 2023. 3

work page arXiv 2023
[41]

Wonder3d: Sin- gle image to 3d using cross-domain diffusion

Xiaoxiao Long, Yuan-Chen Guo, Cheng Lin, Yuan Liu, Zhiyang Dou, Lingjie Liu, Yuexin Ma, Song-Hai Zhang, Marc Habermann, Christian Theobalt, et al. Wonder3d: Sin- gle image to 3d using cross-domain diffusion. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9970–9980, 2024. 3

work page 2024
[42]

Decoupled weight decay regularization, 2019

Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization, 2019. 6

work page 2019
[43]

P3-sam: Native 3d part segmentation

Changfeng Ma, Yang Li, Xinhao Yan, Jiachen Xu, Yunhan Yang, Chunshi Wang, Zibo Zhao, Yanwen Guo, Zhuo Chen, and Chunchao Guo. P3-sam: Native 3d part segmentation. InarXiv, 2025. 2, 3, 5, 6, 7, 9

work page 2025
[44]

Find any part in 3d

Ziqi Ma, Yisong Yue, and Georgia Gkioxari. Find any part in 3d. InarXiv, 2025. 3, 6, 9

work page 2025
[45]

Lt3sd: Latent trees for 3d scene diffusion.arXiv preprint arXiv:2409.08215, 2024

Quan Meng, Lei Li, Matthias Nießner, and Angela Dai. Lt3sd: Latent trees for 3d scene diffusion.arXiv preprint arXiv:2409.08215, 2024. 3

work page arXiv 2024
[46]

Chang, Li Yi, Subarna Tripathi, Leonidas J

Kaichun Mo, Shilin Zhu, Angel X. Chang, Li Yi, Subarna Tripathi, Leonidas J. Guibas, and Hao Su. Partnet: A large- scale benchmark for fine-grained and hierarchical part-level 3d object understanding. InCVPR, 2019. 2

work page 2019
[47]

Maxime Oquab, Timoth ´ee Darcet, Theo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Rus- sell Howes, Po-Yao Huang, Hu Xu, Vasu Sharma, Shang- Wen Li, Wojciech Galuba, Mike Rabbat, Mido Assran, Nico- las Ballas, Gabriel Synnaeve, Ishan Misra, Herve Jegou, Julien Mairal, Patri...

work page 2023
[48]

Scalable diffusion models with transformers

William Peebles and Saining Xie. Scalable diffusion models with transformers. InProceedings of the IEEE/CVF Inter- national Conference on Computer Vision, pages 4195–4205,

work page
[49]

PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation

Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. Pointnet: Deep learning on point sets for 3d classifica- tion and segmentation. InarXiv preprint arXiv:1612.00593,

work page internal anchor Pith review Pith/arXiv arXiv
[50]

Deocc-1-to-3: 3d de- occlusion from a single image via self-supervised multi-view diffusion, 2025

Yansong Qu, Shaohui Dai, Xinyang Li, Yuze Wang, You Shen, Liujuan Cao, and Rongrong Ji. Deocc-1-to-3: 3d de- occlusion from a single image via self-supervised multi-view diffusion, 2025. 3

work page 2025
[51]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, 12 Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. InarXiv, 2021. 3

work page 2021
[52]

Sam 2: Segment anything in images and videos

Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman R¨adle, Chloe Rolland, Laura Gustafson, Eric Mintun, Junt- ing Pan, Kalyan Vasudev Alwala, Nicolas Carion, Chao- Yuan Wu, Ross Girshick, Piotr Doll´ar, and Christoph Feicht- enhofer. Sam 2: Segment anything in images and videos. In arXiv, 2024. 2, 3

work page 2024
[53]

L3dg: Latent 3d gaussian diffusion

Barbara Roessle, Norman M ¨uller, Lorenzo Porzi, Samuel Rota Bulø, Peter Kontschieder, Angela Dai, and Matthias Nießner. L3dg: Latent 3d gaussian diffusion. arXiv preprint arXiv:2410.13530, 2024. 3

work page arXiv 2024
[54]

Denoising Diffusion Implicit Models

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models.arXiv preprint arXiv:2010.02502, 2020. 3

work page internal anchor Pith review Pith/arXiv arXiv 2010
[55]

Segment any mesh

George Tang, William Zhao, Logan Ford, David Benhaim, and Paul Zhang. Segment any mesh. InarXiv, 2025. 3

work page 2025
[56]

Lgm: Large multi-view gaussian model for high-resolution 3d content creation

Jiaxiang Tang, Zhaoxi Chen, Xiaokang Chen, Tengfei Wang, Gang Zeng, and Ziwei Liu. Lgm: Large multi-view gaussian model for high-resolution 3d content creation. InEuropean Conference on Computer Vision, pages 1–18. Springer, 2025. 3

work page 2025
[57]

Efficient part-level 3d object generation via dual volume packing.arXiv preprint arXiv:2506.09980,

Jiaxiang Tang, Ruijie Lu, Zhaoshuo Li, Zekun Hao, Xuan Li, Fangyin Wei, Shuran Song, Gang Zeng, Ming-Yu Liu, and Tsung-Yi Lin. Efficient part-level 3d object generation via dual volume packing.arXiv preprint arXiv:2506.09980,

work page arXiv
[58]

Efficient part-level 3d object generation via dual volume packing, 2025

Jiaxiang Tang, Ruijie Lu, Zhaoshuo Li, Zekun Hao, Xuan Li, Fangyin Wei, Shuran Song, Gang Zeng, Ming-Yu Liu, and Tsung-Yi Lin. Efficient part-level 3d object generation via dual volume packing, 2025. 3

work page 2025
[59]

Hunyuan3d 2.1: From images to high-fidelity 3d assets with production-ready pbr material,

Tencent Hunyuan3D Team. Hunyuan3d 2.1: From images to high-fidelity 3d assets with production-ready pbr material,

work page
[60]

Partdistill: 3d shape part segmen- tation by vision-language model distillation

Ardian Umam, Cheng-Kun Yang, Min-Hung Chen, Jen-Hui Chuang, and Yen-Yu Lin. Partdistill: 3d shape part segmen- tation by vision-language model distillation. InarXiv, 2024. 3

work page 2024
[61]

Sv3d: Novel multi-view syn- thesis and 3d generation from a single image using latent video diffusion

Vikram V oleti, Chun-Han Yao, Mark Boss, Adam Letts, David Pankratz, Dmitry Tochilkin, Christian Laforte, Robin Rombach, and Varun Jampani. Sv3d: Novel multi-view syn- thesis and 3d generation from a single image using latent video diffusion. InEuropean Conference on Computer Vi- sion, pages 439–457. Springer, 2025. 3

work page 2025
[62]

Partnext: A next-generation dataset for fine-grained and hierarchical 3d part understand- ing, 2025

Penghao Wang, Yiyang He, Xin Lv, Yukai Zhou, Lan Xu, Jingyi Yu, and Jiayuan Gu. Partnext: A next-generation dataset for fine-grained and hierarchical 3d part understand- ing, 2025. 2, 6

work page 2025
[63]

Llama-mesh: Uni- fying 3d mesh generation with language models, 2024

Zhengyi Wang, Jonathan Lorraine, Yikai Wang, Hang Su, Jun Zhu, Sanja Fidler, and Xiaohui Zeng. Llama-mesh: Uni- fying 3d mesh generation with language models, 2024. 3

work page 2024
[64]

Crm: Single image to 3d textured mesh with convolutional reconstruction model.arXiv preprint arXiv:2403.05034, 2024

Zhengyi Wang, Yikai Wang, Yifei Chen, Chendong Xi- ang, Shuo Chen, Dajiang Yu, Chongxuan Li, Hang Su, and Jun Zhu. Crm: Single image to 3d textured mesh with convolutional reconstruction model.arXiv preprint arXiv:2403.05034, 2024. 3

work page arXiv 2024
[65]

Octgpt: Octree-based multi- scale autoregressive models for 3d shape generation, 2025

Si-Tong Wei, Rui-Huan Wang, Chuan-Zhi Zhou, Baoquan Chen, and Peng-Shuai Wang. Octgpt: Octree-based multi- scale autoregressive models for 3d shape generation, 2025

work page 2025
[66]

Ouroboros3d: Image-to-3d gen- eration via 3d-aware recursive diffusion.arXiv preprint arXiv:2406.03184, 2024

Hao Wen, Zehuan Huang, Yaohui Wang, Xinyuan Chen, Yu Qiao, and Lu Sheng. Ouroboros3d: Image-to-3d gen- eration via 3d-aware recursive diffusion.arXiv preprint arXiv:2406.03184, 2024. 3

work page arXiv 2024
[67]

arXiv preprint arXiv:2405.20343 (2024)

Kailu Wu, Fangfu Liu, Zhihan Cai, Runjie Yan, Hanyang Wang, Yating Hu, Yueqi Duan, and Kaisheng Ma. Unique3d: High-quality and efficient 3d mesh generation from a single image.arXiv preprint arXiv:2405.20343, 2024. 3

work page arXiv 2024
[68]

Dipo: Dual-state images controlled articulated object generation powered by diverse data, 2025

Ruiqi Wu, Xinjie Wang, Liu Liu, Chunle Guo, Jiaxiong Qiu, Chongyi Li, Lichao Huang, Zhizhong Su, and Ming-Ming Cheng. Dipo: Dual-state images controlled articulated object generation powered by diverse data, 2025. 3

work page 2025
[69]

arXiv preprint arXiv:2405.14832 (2024)

Shuang Wu, Youtian Lin, Feihu Zhang, Yifei Zeng, Jingxi Xu, Philip Torr, Xun Cao, and Yao Yao. Direct3d: Scalable image-to-3d generation via 3d latent diffusion transformer. arXiv preprint arXiv:2405.14832, 2024. 3

work page arXiv 2024
[70]

Direct3d-s2: Gigascale 3d generation made easy with spatial sparse attention, 2025

Shuang Wu, Youtian Lin, Feihu Zhang, Yifei Zeng, Yikang Yang, Yajie Bao, Jiachen Qian, Siyu Zhu, Xun Cao, Philip Torr, and Yao Yao. Direct3d-s2: Gigascale 3d generation made easy with spatial sparse attention, 2025. 3

work page 2025
[71]

Point transformer v2: Grouped vector atten- tion and partition-based pooling

Xiaoyang Wu, Yixing Lao, Li Jiang, Xihui Liu, and Heng- shuang Zhao. Point transformer v2: Grouped vector atten- tion and partition-based pooling. InNeurIPS, 2022. 2

work page 2022
[72]

Point transformer v3: Simpler, faster, stronger

Xiaoyang Wu, Li Jiang, Peng-Shuai Wang, Zhijian Liu, Xi- hui Liu, Yu Qiao, Wanli Ouyang, Tong He, and Hengshuang Zhao. Point transformer v3: Simpler, faster, stronger. In CVPR, 2024

work page 2024
[73]

Towards large- scale 3d representation learning with multi-dataset point prompt training

Xiaoyang Wu, Zhuotao Tian, Xin Wen, Bohao Peng, Xihui Liu, Kaicheng Yu, and Hengshuang Zhao. Towards large- scale 3d representation learning with multi-dataset point prompt training. InCVPR, 2024. 2

work page 2024
[74]

Blockfusion: Expandable 3d scene gen- eration using latent tri-plane extrapolation.ACM Transac- tions on Graphics (TOG), 43(4):1–17, 2024

Zhennan Wu, Yang Li, Han Yan, Taizhang Shang, Weixuan Sun, Senbo Wang, Ruikai Cui, Weizhe Liu, Hiroyuki Sato, Hongdong Li, et al. Blockfusion: Expandable 3d scene gen- eration using latent tri-plane extrapolation.ACM Transac- tions on Graphics (TOG), 43(4):1–17, 2024. 3

work page 2024
[75]

Native and compact struc- tured latents for 3d generation

Jianfeng Xiang, Xiaoxue Chen, Sicheng Xu, Ruicheng Wang, Zelong Lv, Yu Deng, Hongyuan Zhu, Yue Dong, Hao Zhao, Nicholas Jing Yuan, et al. Native and compact struc- tured latents for 3d generation. InarXiv, 2025. 3, 4, 6

work page 2025
[76]

Structured 3d latents for scalable and versatile 3d gen- eration

Jianfeng Xiang, Zelong Lv, Sicheng Xu, Yu Deng, Ruicheng Wang, Bowen Zhang, Dong Chen, Xin Tong, and Jiaolong Yang. Structured 3d latents for scalable and versatile 3d gen- eration. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 21469–21480, 2025. 3

work page 2025
[77]

InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models

Jiale Xu, Weihao Cheng, Yiming Gao, Xintao Wang, Shenghua Gao, and Ying Shan. Instantmesh: Efficient 3d mesh generation from a single image with sparse-view large reconstruction models.arXiv preprint arXiv:2404.07191,

work page internal anchor Pith review Pith/arXiv arXiv
[78]

Sampro3d: Locating sam prompts in 3d for zero-shot instance segmentation

Mutian Xu, Xingyilang Yin, Lingteng Qiu, Yang Liu, Xin Tong, and Xiaoguang Han. Sampro3d: Locating sam prompts in 3d for zero-shot instance segmentation. InarXiv,

work page
[79]

Ze- rops: High-quality cross-modal knowledge transfer for zero- shot 3d part segmentation

Yuheng Xue, Nenglun Chen, Jun Liu, and Wenyun Sun. Ze- rops: High-quality cross-modal knowledge transfer for zero- shot 3d part segmentation. InarXiv, 2025

work page 2025
[80]

Sam3d: Segment anything in 3d scenes

Yunhan Yang, Xiaoyang Wu, Tong He, Hengshuang Zhao, and Xihui Liu. Sam3d: Segment anything in 3d scenes. In arXiv, 2023. 3

work page 2023

Showing first 80 references.