Art3D: Training-Free 3D Generation from Flat-Colored Illustration

Jiayi Shen; Rao Fu; Srinath Sridhar; Tao Lu; Xiaoyan Cong; Zekun Li

arxiv: 2504.10466 · v2 · submitted 2025-04-14 · 💻 cs.CV

Art3D: Training-Free 3D Generation from Flat-Colored Illustration

Xiaoyan Cong , Jiayi Shen , Zekun Li , Rao Fu , Tao Lu , Srinath Sridhar This is my paper

Pith reviewed 2026-05-22 19:36 UTC · model grok-4.3

classification 💻 cs.CV

keywords training-free 3D generationflat-colored illustrationimage-to-3D3D illusion enhancementVLM evaluationFlat-2D datasetart content creation

0 comments

The pith

Art3D adds 3D illusion cues to flat-colored illustrations using features from pre-trained 2D models and VLM checks so existing generators can produce 3D assets from drawings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Art3D, a training-free technique that improves how image-to-3D models handle flat-colored inputs such as hand drawings. It extracts structural and semantic features from existing 2D image generation models and applies a vision-language model to evaluate and refine realism. This step creates stronger depth perception in the reference image before it enters any 3D generator. The authors also release the Flat-2D dataset of over 100 samples to test generalization on non-photorealistic art. If the approach works as described, creators gain a simple way to move from 2D sketches to 3D without style-specific retraining.

Core claim

Art3D lifts flat-colored 2D designs into 3D by enhancing the three-dimensional illusion in reference images. It does so through structural and semantic features drawn from pre-trained 2D image generation models together with a VLM-based realism evaluation. The method simplifies 3D generation from 2D inputs and adapts across a wide range of painting styles. The authors introduce the Flat-2D dataset to measure how current image-to-3D models perform on inputs that lack depth cues.

What carries the argument

Extraction of structural and semantic features from pre-trained 2D image generation models combined with VLM-based realism evaluation to insert plausible 3D cues into flat-colored references.

If this is right

Existing image-to-3D models produce higher-quality assets from artistic flat inputs after the enhancement step.
The same pipeline works on many painting styles without any additional training or fine-tuning.
Artists and designers can convert 2D illustrations to 3D more directly in their existing workflows.
The Flat-2D dataset provides a standard test set for measuring generalization on non-photorealistic references.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same cue-insertion idea could serve as a preprocessing step for other 2D-to-3D tasks such as depth estimation or novel-view synthesis from sketches.
Design pipelines might adopt the enhancement as a default first stage, letting users stay in 2D tools longer before committing to 3D.
Extending the feature extraction to abstract or highly stylized art could test whether the method scales beyond the styles tested in the paper.

Load-bearing premise

Structural and semantic features from pre-trained 2D models plus VLM realism checks can add believable 3D cues to flat inputs without creating artifacts that harm later 3D generation.

What would settle it

Feeding Art3D-processed flat-colored images into multiple image-to-3D models and observing no gain or a drop in final 3D asset quality relative to the original flat images would disprove the central claim.

Figures

Figures reproduced from arXiv: 2504.10466 by Jiayi Shen, Rao Fu, Srinath Sridhar, Tao Lu, Xiaoyan Cong, Zekun Li.

**Figure 1.** Figure 1: Our Art3D creates high-quality 3D assets from a single flat-colored illustration and can be adapted to various drawing styles. [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: We compare the generation results from In [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Pipeline. Art3D adds 3D illusion to the flat-colored image through ControlNet [42] based on the structure features, e.g., canny edge or depth map. The mesh is then synthesized based on the proxy image and textured by baking information from the input image. ask ’Which image do you think is the most realistic and shows the most 3D feeling?’. ˆI = VQA({Ri} N i=1). (2) Numerous proxy conditions are available,… view at source ↗

**Figure 4.** Figure 4: Qualitative comparisons. We combine our augmentation module with InstantMesh [40] and perform comparisons with stateof-the-art pre-trained image-to-3D methods Shap-E [14], LN3Diff [16], InstantMesh [40], 3DTopia-XL [5], LGM [31] and Trellis [38]. All the flat-shaped input images are from our curated dataset Flat-2D. Our augmentation module can significantly improve the geometry quality of generated 3D ass… view at source ↗

**Figure 1.** Figure 1: Image Gallery of Flat-2D. 1 arXiv:2504.10466v1 [cs.CV] 14 Apr 2025 [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗

**Figure 2.** Figure 2: Image Gallery of Flat-2D [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Image Gallery of Flat-2D. 2 [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

read the original abstract

Large-scale pre-trained image-to-3D generative models have exhibited remarkable capabilities in diverse shape generations. However, most of them struggle to synthesize plausible 3D assets when the reference image is flat-colored like hand drawings due to the lack of 3D illusion, which are often the most user-friendly input modalities in art content creation. To this end, we propose Art3D, a training-free method that can lift flat-colored 2D designs into 3D. By leveraging structural and semantic features with pre-trained 2D image generation models and a VLM-based realism evaluation, Art3D successfully enhances the three-dimensional illusion in reference images, thus simplifying the process of generating 3D from 2D, and proves adaptable to a wide range of painting styles. To benchmark the generalization performance of existing image-to-3D models on flat-colored images without 3D feeling, we collect a new dataset, Flat-2D, with over 100 samples. Experimental results demonstrate the performance and robustness of Art3D, exhibiting superior generalizable capacity and promising practical applicability. Our source code and dataset will be publicly available on our project page: https://joy-jy11.github.io/ .

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Art3D gives a training-free way to boost 3D cues in flat illustrations using 2D features and VLM scoring plus a new Flat-2D set, but the abstract leaves the actual gains unproven.

read the letter

Hey, checked out the Art3D paper from arXiv. The key takeaway is that they have a training-free approach to make flat-colored illustrations better suited for image-to-3D models by pulling features from 2D generators and looping in a VLM for realism checks. They also introduce the Flat-2D dataset to evaluate this. This stands out because most image-to-3D work assumes inputs with some depth already, and flat art like drawings is a common but tricky case in creative fields. Using off-the-shelf models without retraining is a sensible way to keep it accessible, and putting out the dataset helps others test their own methods on this input type. On the positive side, it directly targets a usability issue in art pipelines where users want to go from sketches to 3D without extra work. The soft spots are more noticeable though. The abstract talks about successful enhancement and robustness but gives no numbers, no ablation results, and minimal implementation details on how the feature extraction or VLM loop actually runs. That leaves open whether the improvements are real or just superficial. The stress-test note about VLM scoring potentially missing geometric inconsistencies that hurt later 3D steps looks like a valid concern here. Since there's no mention of any 3D-specific validation or consistency losses, it's possible the method produces images that fool the VLM but still cause problems in the 3D generator. Overall, this seems aimed at applied researchers or developers building tools for 3D from 2D art. Someone looking for quick practical hacks might find the pipeline ideas useful, even if the evaluation needs strengthening. I think it should go to peer review so the community can see the full experiments and suggest fixes for the evaluation gaps.

Referee Report

2 major / 1 minor

Summary. The paper introduces Art3D, a training-free method to lift flat-colored 2D designs into 3D by enhancing the three-dimensional illusion using structural and semantic features from pre-trained 2D image generation models and a VLM-based realism evaluation. It collects a new Flat-2D dataset with over 100 samples to benchmark image-to-3D models on such inputs and reports experimental results showing superior generalizable capacity and adaptability to various painting styles.

Significance. If validated, this approach could significantly simplify 3D asset creation from user-friendly artistic inputs like hand drawings, with broad applicability in art content creation. The public release of code and the Flat-2D dataset would be valuable contributions for reproducibility and further benchmarking in the field.

major comments (2)

[Abstract] The abstract asserts successful enhancement and robustness yet supplies no quantitative metrics, ablation studies, or concrete description of how the feature extraction and VLM loop are implemented, so it is impossible to judge whether the reported outcomes actually support the stated claims.
[Method and Experiments] The reliance on VLM for realism evaluation lacks safeguards against subtle geometric inconsistencies (e.g., depth ordering errors or perspective-violating shading) that may not be detected by the VLM but would break downstream image-to-3D models; no explicit 3D consistency enforcement or mesh-level validation is described.

minor comments (1)

[Abstract] Consider adding a brief mention of the specific pre-trained models used for feature extraction to improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and the opportunity to clarify our contributions. We address each major comment point-by-point below, indicating planned revisions where appropriate. Our responses focus on substance and aim to strengthen the manuscript without overstating current results.

read point-by-point responses

Referee: [Abstract] The abstract asserts successful enhancement and robustness yet supplies no quantitative metrics, ablation studies, or concrete description of how the feature extraction and VLM loop are implemented, so it is impossible to judge whether the reported outcomes actually support the stated claims.

Authors: We agree that the abstract, as a concise summary, does not include specific metrics or implementation details, which are instead provided in Sections 3 and 4. Section 3.1 describes the structural and semantic feature extraction from pre-trained 2D models, while Section 3.2 details the iterative VLM-based realism evaluation loop. Experimental results in Section 4 include quantitative comparisons on the Flat-2D dataset and qualitative ablations. To address the concern, we will revise the abstract to briefly reference the key performance gains (e.g., improved generalization across painting styles) and outline the core pipeline components, while maintaining brevity. revision: yes
Referee: [Method and Experiments] The reliance on VLM for realism evaluation lacks safeguards against subtle geometric inconsistencies (e.g., depth ordering errors or perspective-violating shading) that may not be detected by the VLM but would break downstream image-to-3D models; no explicit 3D consistency enforcement or mesh-level validation is described.

Authors: This is a fair observation. Our VLM evaluation targets perceptual 3D illusion and realism in the enhanced 2D image, which serves as input to existing image-to-3D models; however, it does not include explicit geometric checks for issues such as depth ordering or shading consistency, nor mesh-level validation, as the method is designed to be training-free and operate in image space. We will add a dedicated limitations paragraph in the revised manuscript discussing these potential failure modes and their impact on downstream 3D generation. We will also include additional qualitative examples illustrating cases where subtle inconsistencies may persist. revision: partial

Circularity Check

0 steps flagged

No circularity: training-free composition of external pre-trained models

full rationale

The paper describes a training-free pipeline that applies structural and semantic features from off-the-shelf pre-trained 2D image generation models together with an external VLM-based realism evaluator to enhance 3D illusion in flat-colored inputs. No equations, fitted parameters, or internal predictions are defined within the paper; the Flat-2D dataset serves only for external benchmarking. The central claims rest on the independent capabilities of cited pre-trained models rather than any self-definitional reduction, self-citation load-bearing step, or ansatz smuggled via prior work by the same authors. The derivation chain is therefore self-contained against external benchmarks and receives a score of 0.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The method rests on the untested premise that existing 2D pre-trained models already encode the right cues for flat art and that a VLM can serve as a reliable proxy for 3D plausibility; no free parameters or new entities are introduced in the abstract.

axioms (2)

domain assumption Pre-trained 2D image generation models supply structural and semantic features that can be repurposed to add 3D illusion to flat-colored inputs.
The enhancement step depends on this transfer of features without additional training.
domain assumption A vision-language model can accurately evaluate and guide the realism of the enhanced 2D image for subsequent 3D generation.
The VLM-based realism evaluation is presented as the key control mechanism.

pith-pipeline@v0.9.0 · 5761 in / 1458 out tokens · 80962 ms · 2026-05-22T19:36:17.458905+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Art3D first leverages the strong priors of a pre-trained flow-based ControlNet to synthesize multiple reference candidates... Then, we utilize VLM to select the geometry proxy image... Finally, we use Hunyuan2.0 to bake the texture
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We attribute this failure mode to the misalignment of data distribution between flat-colored inputs and those used for training image-to-3D models

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

REVIVE 3D: Refinement via Encoded Voluminous Inflated prior for Volume Enhancement
cs.CV 2026-04 unverdicted novelty 6.0

REVIVE 3D generates voluminous 3D assets from flat 2D images via an inflated prior construction followed by latent-space refinement, plus new metrics for volume and flatness validated by user study.

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages · cited by 1 Pith paper · 5 internal anchors

[1]

https : //openai.com/index/gpt- 4v- system- card/ ,

OpenAI: Gpt 4v(ision) system card (2023). https : //openai.com/index/gpt- 4v- system- card/ ,

work page 2023
[2]

Polydiff: Generating 3d polygonal meshes with diffusion models

Antonio Alliegro, Yawar Siddiqui, Tatiana Tommasi, and Matthias Nießner. Polydiff: Generating 3d polygonal meshes with diffusion models. arXiv preprint arXiv:2312.11417 ,

work page arXiv
[3]

A computational approach to edge detection

John Canny. A computational approach to edge detection. IEEE Transactions on pattern analysis and machine intelli- gence, (6):679–698, 1986. 3

work page 1986
[4]

Fan- tasia3d: Disentangling geometry and appearance for high- quality text-to-3d content creation

Rui Chen, Yongwei Chen, Ningxin Jiao, and Kui Jia. Fan- tasia3d: Disentangling geometry and appearance for high- quality text-to-3d content creation. In Proceedings of the IEEE/CVF international conference on computer vision , pages 22246–22256, 2023. 2

work page 2023
[5]

3dtopia-xl: Scaling high-quality 3d asset generation via primitive diffusion, 2024

Zhaoxi Chen, Jiaxiang Tang, Yuhao Dong, Ziang Cao, Fangzhou Hong, Yushi Lan, Tengfei Wang, Haozhe Xie, Tong Wu, Shunsuke Saito, Liang Pan, Dahua Lin, and Ziwei Liu. 3dtopia-xl: Scaling high-quality 3d asset generation via primitive diffusion, 2024. 1, 3, 4

work page 2024
[6]

Sdfusion: Multimodal 3d shape completion, reconstruction, and generation

Yen-Chi Cheng, Hsin-Ying Lee, Sergey Tulyakov, Alexan- der G Schwing, and Liang-Yan Gui. Sdfusion: Multimodal 3d shape completion, reconstruction, and generation. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4456–4465, 2023. 2

work page 2023
[7]

Diffusion-sdf: Conditional generative modeling of signed distance func- tions

Gene Chou, Yuval Bahat, and Felix Heide. Diffusion-sdf: Conditional generative modeling of signed distance func- tions. In Proceedings of the IEEE/CVF international con- ference on computer vision, pages 2262–2272, 2023. 2

work page 2023
[8]

Automatic controllable colorization via imagination

Xiaoyan Cong, Yue Wu, Qifeng Chen, and Chenyang Lei. Automatic controllable colorization via imagination. In Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2609–2619, 2024. 1

work page 2024
[9]

4drecons: 4d neural implicit deformable objects reconstruction from a single rgb- d camera with geometrical and topological regularizations,

Xiaoyan Cong, Haitao Yang, Liyan Chen, Kaifeng Zhang, Li Yi, Chandrajit Bajaj, and Qixing Huang. 4drecons: 4d neural implicit deformable objects reconstruction from a single rgb- d camera with geometrical and topological regularizations,

work page
[10]

Objaverse: A universe of annotated 3d objects, 2022

Matt Deitke, Dustin Schwenk, Jordi Salvador, Luca Weihs, Oscar Michel, Eli VanderBilt, Ludwig Schmidt, Kiana Ehsani, Aniruddha Kembhavi, and Ali Farhadi. Objaverse: A universe of annotated 3d objects, 2022. 2

work page 2022
[11]

Diffusion models beat gans on image synthesis, 2021

Prafulla Dhariwal and Alex Nichol. Diffusion models beat gans on image synthesis, 2021. 1

work page 2021
[12]

3dgen: Triplane latent diffusion for textured mesh generation

Anchit Gupta, Wenhan Xiong, Yixin Nie, Ian Jones, and Bar- las O˘guz. 3dgen: Triplane latent diffusion for textured mesh generation. arXiv preprint arXiv:2303.05371, 2023. 2

work page arXiv 2023
[13]

Denoising diffu- sion probabilistic models, 2020

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffu- sion probabilistic models, 2020. 1

work page 2020
[14]

Shap-e: Generating condi- tional 3d implicit functions, 2023

Heewoo Jun and Alex Nichol. Shap-e: Generating condi- tional 3d implicit functions, 2023. 2, 3, 4

work page 2023
[15]

Black Forest Labs. Flux. https://github.com/ black-forest-labs/flux, 2024. 2

work page 2024
[16]

Ln3diff: Scalable latent neural fields diffusion for speedy 3d generation

Yushi Lan, Fangzhou Hong, Shuai Yang, Shangchen Zhou, Xuyi Meng, Bo Dai, Xingang Pan, and Chen Change Loy. Ln3diff: Scalable latent neural fields diffusion for speedy 3d generation. In ECCV, 2024. 1, 3, 4

work page 2024
[17]

Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models

Junnan Li, Dongxu Li, Silvio Savarese, and Steven Hoi. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. In In- ternational conference on machine learning , pages 19730– 19742. PMLR, 2023. 2

work page 2023
[18]

Magic3d: High-resolution text-to-3d content creation

Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, and Tsung-Yi Lin. Magic3d: High-resolution text-to-3d content creation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 300–309, 2023. 2

work page 2023
[19]

One-2-3-45: Any single image to 3d mesh in 45 seconds without per- shape optimization, 2023

Minghua Liu, Chao Xu, Haian Jin, Linghao Chen, Mukund Varma T, Zexiang Xu, and Hao Su. One-2-3-45: Any single image to 3d mesh in 45 seconds without per- shape optimization, 2023. 2

work page 2023
[20]

Meshdif- fusion: Score-based generative 3d mesh modeling

Zhen Liu, Yao Feng, Michael J Black, Derek Nowrouzezahrai, Liam Paull, and Weiyang Liu. Meshdif- fusion: Score-based generative 3d mesh modeling. arXiv preprint arXiv:2303.08133, 2023. 2

work page arXiv 2023
[21]

Wonder3d: Sin- gle image to 3d using cross-domain diffusion.arXiv preprint arXiv:2310.15008, 2023

Xiaoxiao Long, Yuan-Chen Guo, Cheng Lin, Yuan Liu, Zhiyang Dou, Lingjie Liu, Yuexin Ma, Song-Hai Zhang, Marc Habermann, Christian Theobalt, et al. Wonder3d: Sin- gle image to 3d using cross-domain diffusion.arXiv preprint arXiv:2310.15008, 2023. 2

work page arXiv 2023
[22]

Pc2: Projection-conditioned point cloud diffu- sion for single-image 3d reconstruction

Luke Melas-Kyriazi, Christian Rupprecht, and Andrea Vedaldi. Pc2: Projection-conditioned point cloud diffu- sion for single-image 3d reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12923–12932, 2023. 2

work page 2023
[23]

Diffrf: Rendering-guided 3d radiance field diffusion

Norman M ¨uller, Yawar Siddiqui, Lorenzo Porzi, Samuel Rota Bulo, Peter Kontschieder, and Matthias Nießner. Diffrf: Rendering-guided 3d radiance field diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 4328–4338, 2023. 2

work page 2023
[24]

Point-E: A System for Generating 3D Point Clouds from Complex Prompts

Alex Nichol, Heewoo Jun, Prafulla Dhariwal, Pamela Mishkin, and Mark Chen. Point-e: A system for generat- ing 3d point clouds from complex prompts. arXiv preprint arXiv:2212.08751, 2022. 1, 2

work page internal anchor Pith review Pith/arXiv arXiv 2022
[25]

DreamFusion: Text-to-3D using 2D Diffusion

Ben Poole, Ajay Jain, Jonathan T Barron, and Ben Milden- hall. Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988, 2022. 2

work page internal anchor Pith review Pith/arXiv arXiv 2022
[26]

Magic123: One image to high-quality 3d object generation using both 2d and 3d diffusion priors, 2023

Guocheng Qian, Jinjie Mai, Abdullah Hamdi, Jian Ren, Aliaksandr Siarohin, Bing Li, Hsin-Ying Lee, Ivan Sko- rokhodov, Peter Wonka, Sergey Tulyakov, and Bernard Ghanem. Magic123: One image to high-quality 3d object generation using both 2d and 3d diffusion priors, 2023. 2 5

work page 2023
[27]

Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer, 2020

Ren ´e Ranftl, Katrin Lasinger, David Hafner, Konrad Schindler, and Vladlen Koltun. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer, 2020. 3

work page 2020
[28]

Com- mon objects in 3d: Large-scale learning and evaluation of real-life 3d category reconstruction

Jeremy Reizenstein, Roman Shapovalov, Philipp Henzler, Luca Sbordone, Patrick Labatut, and David Novotny. Com- mon objects in 3d: Large-scale learning and evaluation of real-life 3d category reconstruction. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10901–10911, 2021. 2

work page 2021
[29]

Diffusion-based signed distance fields for 3d shape gener- ation

Jaehyeok Shim, Changwoo Kang, and Kyungdon Joo. Diffusion-based signed distance fields for 3d shape gener- ation. In Proceedings of the IEEE/CVF conference on com- puter vision and pattern recognition , pages 20887–20897,

work page
[30]

Score-Based Generative Modeling through Stochastic Differential Equations

Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Ab- hishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equa- tions. arXiv preprint arXiv:2011.13456, 2020. 1

work page internal anchor Pith review Pith/arXiv arXiv 2011
[31]

Lgm: Large multi-view gaussian model for high-resolution 3d content creation.arXiv preprint arXiv:2402.05054, 2024

Jiaxiang Tang, Zhaoxi Chen, Xiaokang Chen, Tengfei Wang, Gang Zeng, and Ziwei Liu. Lgm: Large multi-view gaussian model for high-resolution 3d content creation.arXiv preprint arXiv:2402.05054, 2024. 1, 2, 3, 4

work page arXiv 2024
[32]

Gecco: Geometrically-conditioned point diffusion models

Michał J Tyszkiewicz, Pascal Fua, and Eduard Trulls. Gecco: Geometrically-conditioned point diffusion models. In Pro- ceedings of the IEEE/CVF International Conference on Computer Vision, pages 2128–2138, 2023. 2

work page 2023
[33]

Yeh, and Greg Shakhnarovich

Haochen Wang, Xiaodan Du, Jiahao Li, Raymond A. Yeh, and Greg Shakhnarovich. Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation. In Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12619–12629, 2023. 2

work page 2023
[34]

Rodin: A generative model for sculpting 3d digital avatars using diffusion

Tengfei Wang, Bo Zhang, Ting Zhang, Shuyang Gu, Jianmin Bao, Tadas Baltrusaitis, Jingjing Shen, Dong Chen, Fang Wen, Qifeng Chen, et al. Rodin: A generative model for sculpting 3d digital avatars using diffusion. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4563–4573, 2023. 2

work page 2023
[35]

Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distilla- tion, 2023

Zhengyi Wang, Cheng Lu, Yikai Wang, Fan Bao, Chongxuan Li, Hang Su, and Jun Zhu. Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distilla- tion, 2023. 2

work page 2023
[36]

Omniobject3d: Large-vocabulary 3d object dataset for realistic perception, reconstruction and generation

Tong Wu, Jiarui Zhang, Xiao Fu, Yuxin Wang, Liang Pan Jiawei Ren, Wayne Wu, Lei Yang, Jiaqi Wang, Chen Qian, Dahua Lin, and Ziwei Liu. Omniobject3d: Large-vocabulary 3d object dataset for realistic perception, reconstruction and generation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023. 2

work page 2023
[37]

Sketch and text guided diffusion model for colored point cloud generation

Zijie Wu, Yaonan Wang, Mingtao Feng, He Xie, and Ajmal Mian. Sketch and text guided diffusion model for colored point cloud generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8929– 8939, 2023. 2

work page 2023
[38]

Structured 3D Latents for Scalable and Versatile 3D Generation

Jianfeng Xiang, Zelong Lv, Sicheng Xu, Yu Deng, Ruicheng Wang, Bowen Zhang, Dong Chen, Xin Tong, and Jiaolong Yang. Structured 3d latents for scalable and versatile 3d gen- eration. arXiv preprint arXiv:2412.01506, 2024. 1, 2, 3, 4

work page internal anchor Pith review Pith/arXiv arXiv 2024
[39]

Holistically-nested edge de- tection

Saining Xie and Zhuowen Tu. Holistically-nested edge de- tection. In Proceedings of the IEEE international conference on computer vision, pages 1395–1403, 2015. 3

work page 2015
[40]

Instantmesh: Efficient 3d mesh generation from a single image with sparse-view large reconstruction models, 2024

Jiale Xu, Weihao Cheng, Yiming Gao, Xintao Wang, Shenghua Gao, and Ying Shan. Instantmesh: Efficient 3d mesh generation from a single image with sparse-view large reconstruction models, 2024. 1, 2, 3, 4

work page 2024
[41]

3dshape2vecset: A 3d shape representation for neu- ral fields and generative diffusion models.ACM Transactions on Graphics (TOG), 42(4):1–16, 2023

Biao Zhang, Jiapeng Tang, Matthias Niessner, and Peter Wonka. 3dshape2vecset: A 3d shape representation for neu- ral fields and generative diffusion models.ACM Transactions on Graphics (TOG), 42(4):1–16, 2023. 2

work page 2023
[42]

Adding conditional control to text-to-image diffusion models, 2023

Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models, 2023. 2, 3

work page 2023
[43]

Hunyuan3d 2.0: Scal- ing diffusion models for high resolution textured 3d assets generation, 2025

Zibo Zhao, Zeqiang Lai, Qingxiang Lin, Yunfei Zhao, Haolin Liu, Shuhui Yang, Yifei Feng, Mingxin Yang, Sheng Zhang, Xianghui Yang, Huiwen Shi, Sicong Liu, Junta Wu, Yihang Lian, Fan Yang, Ruining Tang, Zebin He, Xinzhou Wang, Jian Liu, Xuhui Zuo, Zhuo Chen, Biwen Lei, Hao- han Weng, Jing Xu, Yiling Zhu, Xinhai Liu, Lixin Xu, Changrong Hu, Shaoxiong Yang, ...

work page 2025
[44]

Locally attentional sdf diffusion for controllable 3d shape generation

Xin-Yang Zheng, Hao Pan, Peng-Shuai Wang, Xin Tong, Yang Liu, and Heung-Yeung Shum. Locally attentional sdf diffusion for controllable 3d shape generation. ACM Trans- actions on Graphics (ToG), 42(4):1–13, 2023. 2

work page 2023
[45]

Oscilla- tion inversion: Understand the structure of large flow model through the lens of inversion method, 2024

Yan Zheng, Zhenxiao Liang, Xiaoyan Cong, Lanqing guo, Yuehao Wang, Peihao Wang, and Zhangyang Wang. Oscilla- tion inversion: Understand the structure of large flow model through the lens of inversion method, 2024. 1

work page 2024
[46]

3d shape generation and completion through point-voxel diffusion

Linqi Zhou, Yilun Du, and Jiajun Wu. 3d shape generation and completion through point-voxel diffusion. In Proceed- ings of the IEEE/CVF international conference on computer vision, pages 5826–5835, 2021. 2 6 Art3D: Training-Free 3D Generation from Flat-Colored Illustration Supplementary Material Supplemental materials provide more implementation details o...

work page 2021
[47]

Implementation Details In experiments, we use as reference conditions N1 = 2 canny edge maps and N2 = 2 depth maps. It is noticeable that Art3D can present superior and robust performance with only one canny edge map as the reference condition, making Art3D more efficient and convenient to be applied to downstream tasks

work page
[48]

Art3D: Training-Free 3D Generation from Flat-Colored Illustration

Dataset We show some examples of our newly collected dataset Flat-2D in Figure 2, Figure 2, and Figure 3. Figure 1. Image Gallery of Flat-2D. 1 arXiv:2504.10466v1 [cs.CV] 14 Apr 2025 Figure 2. Image Gallery of Flat-2D. Figure 3. Image Gallery of Flat-2D. 2

work page internal anchor Pith review Pith/arXiv arXiv 2025

[1] [1]

https : //openai.com/index/gpt- 4v- system- card/ ,

OpenAI: Gpt 4v(ision) system card (2023). https : //openai.com/index/gpt- 4v- system- card/ ,

work page 2023

[2] [2]

Polydiff: Generating 3d polygonal meshes with diffusion models

Antonio Alliegro, Yawar Siddiqui, Tatiana Tommasi, and Matthias Nießner. Polydiff: Generating 3d polygonal meshes with diffusion models. arXiv preprint arXiv:2312.11417 ,

work page arXiv

[3] [3]

A computational approach to edge detection

John Canny. A computational approach to edge detection. IEEE Transactions on pattern analysis and machine intelli- gence, (6):679–698, 1986. 3

work page 1986

[4] [4]

Fan- tasia3d: Disentangling geometry and appearance for high- quality text-to-3d content creation

Rui Chen, Yongwei Chen, Ningxin Jiao, and Kui Jia. Fan- tasia3d: Disentangling geometry and appearance for high- quality text-to-3d content creation. In Proceedings of the IEEE/CVF international conference on computer vision , pages 22246–22256, 2023. 2

work page 2023

[5] [5]

3dtopia-xl: Scaling high-quality 3d asset generation via primitive diffusion, 2024

Zhaoxi Chen, Jiaxiang Tang, Yuhao Dong, Ziang Cao, Fangzhou Hong, Yushi Lan, Tengfei Wang, Haozhe Xie, Tong Wu, Shunsuke Saito, Liang Pan, Dahua Lin, and Ziwei Liu. 3dtopia-xl: Scaling high-quality 3d asset generation via primitive diffusion, 2024. 1, 3, 4

work page 2024

[6] [6]

Sdfusion: Multimodal 3d shape completion, reconstruction, and generation

Yen-Chi Cheng, Hsin-Ying Lee, Sergey Tulyakov, Alexan- der G Schwing, and Liang-Yan Gui. Sdfusion: Multimodal 3d shape completion, reconstruction, and generation. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4456–4465, 2023. 2

work page 2023

[7] [7]

Diffusion-sdf: Conditional generative modeling of signed distance func- tions

Gene Chou, Yuval Bahat, and Felix Heide. Diffusion-sdf: Conditional generative modeling of signed distance func- tions. In Proceedings of the IEEE/CVF international con- ference on computer vision, pages 2262–2272, 2023. 2

work page 2023

[8] [8]

Automatic controllable colorization via imagination

Xiaoyan Cong, Yue Wu, Qifeng Chen, and Chenyang Lei. Automatic controllable colorization via imagination. In Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2609–2619, 2024. 1

work page 2024

[9] [9]

4drecons: 4d neural implicit deformable objects reconstruction from a single rgb- d camera with geometrical and topological regularizations,

Xiaoyan Cong, Haitao Yang, Liyan Chen, Kaifeng Zhang, Li Yi, Chandrajit Bajaj, and Qixing Huang. 4drecons: 4d neural implicit deformable objects reconstruction from a single rgb- d camera with geometrical and topological regularizations,

work page

[10] [10]

Objaverse: A universe of annotated 3d objects, 2022

Matt Deitke, Dustin Schwenk, Jordi Salvador, Luca Weihs, Oscar Michel, Eli VanderBilt, Ludwig Schmidt, Kiana Ehsani, Aniruddha Kembhavi, and Ali Farhadi. Objaverse: A universe of annotated 3d objects, 2022. 2

work page 2022

[11] [11]

Diffusion models beat gans on image synthesis, 2021

Prafulla Dhariwal and Alex Nichol. Diffusion models beat gans on image synthesis, 2021. 1

work page 2021

[12] [12]

3dgen: Triplane latent diffusion for textured mesh generation

Anchit Gupta, Wenhan Xiong, Yixin Nie, Ian Jones, and Bar- las O˘guz. 3dgen: Triplane latent diffusion for textured mesh generation. arXiv preprint arXiv:2303.05371, 2023. 2

work page arXiv 2023

[13] [13]

Denoising diffu- sion probabilistic models, 2020

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffu- sion probabilistic models, 2020. 1

work page 2020

[14] [14]

Shap-e: Generating condi- tional 3d implicit functions, 2023

Heewoo Jun and Alex Nichol. Shap-e: Generating condi- tional 3d implicit functions, 2023. 2, 3, 4

work page 2023

[15] [15]

Black Forest Labs. Flux. https://github.com/ black-forest-labs/flux, 2024. 2

work page 2024

[16] [16]

Ln3diff: Scalable latent neural fields diffusion for speedy 3d generation

Yushi Lan, Fangzhou Hong, Shuai Yang, Shangchen Zhou, Xuyi Meng, Bo Dai, Xingang Pan, and Chen Change Loy. Ln3diff: Scalable latent neural fields diffusion for speedy 3d generation. In ECCV, 2024. 1, 3, 4

work page 2024

[17] [17]

Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models

Junnan Li, Dongxu Li, Silvio Savarese, and Steven Hoi. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. In In- ternational conference on machine learning , pages 19730– 19742. PMLR, 2023. 2

work page 2023

[18] [18]

Magic3d: High-resolution text-to-3d content creation

Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, and Tsung-Yi Lin. Magic3d: High-resolution text-to-3d content creation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 300–309, 2023. 2

work page 2023

[19] [19]

One-2-3-45: Any single image to 3d mesh in 45 seconds without per- shape optimization, 2023

Minghua Liu, Chao Xu, Haian Jin, Linghao Chen, Mukund Varma T, Zexiang Xu, and Hao Su. One-2-3-45: Any single image to 3d mesh in 45 seconds without per- shape optimization, 2023. 2

work page 2023

[20] [20]

Meshdif- fusion: Score-based generative 3d mesh modeling

Zhen Liu, Yao Feng, Michael J Black, Derek Nowrouzezahrai, Liam Paull, and Weiyang Liu. Meshdif- fusion: Score-based generative 3d mesh modeling. arXiv preprint arXiv:2303.08133, 2023. 2

work page arXiv 2023

[21] [21]

Wonder3d: Sin- gle image to 3d using cross-domain diffusion.arXiv preprint arXiv:2310.15008, 2023

Xiaoxiao Long, Yuan-Chen Guo, Cheng Lin, Yuan Liu, Zhiyang Dou, Lingjie Liu, Yuexin Ma, Song-Hai Zhang, Marc Habermann, Christian Theobalt, et al. Wonder3d: Sin- gle image to 3d using cross-domain diffusion.arXiv preprint arXiv:2310.15008, 2023. 2

work page arXiv 2023

[22] [22]

Pc2: Projection-conditioned point cloud diffu- sion for single-image 3d reconstruction

Luke Melas-Kyriazi, Christian Rupprecht, and Andrea Vedaldi. Pc2: Projection-conditioned point cloud diffu- sion for single-image 3d reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12923–12932, 2023. 2

work page 2023

[23] [23]

Diffrf: Rendering-guided 3d radiance field diffusion

Norman M ¨uller, Yawar Siddiqui, Lorenzo Porzi, Samuel Rota Bulo, Peter Kontschieder, and Matthias Nießner. Diffrf: Rendering-guided 3d radiance field diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 4328–4338, 2023. 2

work page 2023

[24] [24]

Point-E: A System for Generating 3D Point Clouds from Complex Prompts

Alex Nichol, Heewoo Jun, Prafulla Dhariwal, Pamela Mishkin, and Mark Chen. Point-e: A system for generat- ing 3d point clouds from complex prompts. arXiv preprint arXiv:2212.08751, 2022. 1, 2

work page internal anchor Pith review Pith/arXiv arXiv 2022

[25] [25]

DreamFusion: Text-to-3D using 2D Diffusion

Ben Poole, Ajay Jain, Jonathan T Barron, and Ben Milden- hall. Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988, 2022. 2

work page internal anchor Pith review Pith/arXiv arXiv 2022

[26] [26]

Magic123: One image to high-quality 3d object generation using both 2d and 3d diffusion priors, 2023

Guocheng Qian, Jinjie Mai, Abdullah Hamdi, Jian Ren, Aliaksandr Siarohin, Bing Li, Hsin-Ying Lee, Ivan Sko- rokhodov, Peter Wonka, Sergey Tulyakov, and Bernard Ghanem. Magic123: One image to high-quality 3d object generation using both 2d and 3d diffusion priors, 2023. 2 5

work page 2023

[27] [27]

Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer, 2020

Ren ´e Ranftl, Katrin Lasinger, David Hafner, Konrad Schindler, and Vladlen Koltun. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer, 2020. 3

work page 2020

[28] [28]

Com- mon objects in 3d: Large-scale learning and evaluation of real-life 3d category reconstruction

Jeremy Reizenstein, Roman Shapovalov, Philipp Henzler, Luca Sbordone, Patrick Labatut, and David Novotny. Com- mon objects in 3d: Large-scale learning and evaluation of real-life 3d category reconstruction. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10901–10911, 2021. 2

work page 2021

[29] [29]

Diffusion-based signed distance fields for 3d shape gener- ation

Jaehyeok Shim, Changwoo Kang, and Kyungdon Joo. Diffusion-based signed distance fields for 3d shape gener- ation. In Proceedings of the IEEE/CVF conference on com- puter vision and pattern recognition , pages 20887–20897,

work page

[30] [30]

Score-Based Generative Modeling through Stochastic Differential Equations

Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Ab- hishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equa- tions. arXiv preprint arXiv:2011.13456, 2020. 1

work page internal anchor Pith review Pith/arXiv arXiv 2011

[31] [31]

Lgm: Large multi-view gaussian model for high-resolution 3d content creation.arXiv preprint arXiv:2402.05054, 2024

Jiaxiang Tang, Zhaoxi Chen, Xiaokang Chen, Tengfei Wang, Gang Zeng, and Ziwei Liu. Lgm: Large multi-view gaussian model for high-resolution 3d content creation.arXiv preprint arXiv:2402.05054, 2024. 1, 2, 3, 4

work page arXiv 2024

[32] [32]

Gecco: Geometrically-conditioned point diffusion models

Michał J Tyszkiewicz, Pascal Fua, and Eduard Trulls. Gecco: Geometrically-conditioned point diffusion models. In Pro- ceedings of the IEEE/CVF International Conference on Computer Vision, pages 2128–2138, 2023. 2

work page 2023

[33] [33]

Yeh, and Greg Shakhnarovich

Haochen Wang, Xiaodan Du, Jiahao Li, Raymond A. Yeh, and Greg Shakhnarovich. Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation. In Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12619–12629, 2023. 2

work page 2023

[34] [34]

Rodin: A generative model for sculpting 3d digital avatars using diffusion

Tengfei Wang, Bo Zhang, Ting Zhang, Shuyang Gu, Jianmin Bao, Tadas Baltrusaitis, Jingjing Shen, Dong Chen, Fang Wen, Qifeng Chen, et al. Rodin: A generative model for sculpting 3d digital avatars using diffusion. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4563–4573, 2023. 2

work page 2023

[35] [35]

Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distilla- tion, 2023

Zhengyi Wang, Cheng Lu, Yikai Wang, Fan Bao, Chongxuan Li, Hang Su, and Jun Zhu. Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distilla- tion, 2023. 2

work page 2023

[36] [36]

Omniobject3d: Large-vocabulary 3d object dataset for realistic perception, reconstruction and generation

Tong Wu, Jiarui Zhang, Xiao Fu, Yuxin Wang, Liang Pan Jiawei Ren, Wayne Wu, Lei Yang, Jiaqi Wang, Chen Qian, Dahua Lin, and Ziwei Liu. Omniobject3d: Large-vocabulary 3d object dataset for realistic perception, reconstruction and generation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023. 2

work page 2023

[37] [37]

Sketch and text guided diffusion model for colored point cloud generation

Zijie Wu, Yaonan Wang, Mingtao Feng, He Xie, and Ajmal Mian. Sketch and text guided diffusion model for colored point cloud generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8929– 8939, 2023. 2

work page 2023

[38] [38]

Structured 3D Latents for Scalable and Versatile 3D Generation

Jianfeng Xiang, Zelong Lv, Sicheng Xu, Yu Deng, Ruicheng Wang, Bowen Zhang, Dong Chen, Xin Tong, and Jiaolong Yang. Structured 3d latents for scalable and versatile 3d gen- eration. arXiv preprint arXiv:2412.01506, 2024. 1, 2, 3, 4

work page internal anchor Pith review Pith/arXiv arXiv 2024

[39] [39]

Holistically-nested edge de- tection

Saining Xie and Zhuowen Tu. Holistically-nested edge de- tection. In Proceedings of the IEEE international conference on computer vision, pages 1395–1403, 2015. 3

work page 2015

[40] [40]

Instantmesh: Efficient 3d mesh generation from a single image with sparse-view large reconstruction models, 2024

Jiale Xu, Weihao Cheng, Yiming Gao, Xintao Wang, Shenghua Gao, and Ying Shan. Instantmesh: Efficient 3d mesh generation from a single image with sparse-view large reconstruction models, 2024. 1, 2, 3, 4

work page 2024

[41] [41]

3dshape2vecset: A 3d shape representation for neu- ral fields and generative diffusion models.ACM Transactions on Graphics (TOG), 42(4):1–16, 2023

Biao Zhang, Jiapeng Tang, Matthias Niessner, and Peter Wonka. 3dshape2vecset: A 3d shape representation for neu- ral fields and generative diffusion models.ACM Transactions on Graphics (TOG), 42(4):1–16, 2023. 2

work page 2023

[42] [42]

Adding conditional control to text-to-image diffusion models, 2023

Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models, 2023. 2, 3

work page 2023

[43] [43]

Hunyuan3d 2.0: Scal- ing diffusion models for high resolution textured 3d assets generation, 2025

Zibo Zhao, Zeqiang Lai, Qingxiang Lin, Yunfei Zhao, Haolin Liu, Shuhui Yang, Yifei Feng, Mingxin Yang, Sheng Zhang, Xianghui Yang, Huiwen Shi, Sicong Liu, Junta Wu, Yihang Lian, Fan Yang, Ruining Tang, Zebin He, Xinzhou Wang, Jian Liu, Xuhui Zuo, Zhuo Chen, Biwen Lei, Hao- han Weng, Jing Xu, Yiling Zhu, Xinhai Liu, Lixin Xu, Changrong Hu, Shaoxiong Yang, ...

work page 2025

[44] [44]

Locally attentional sdf diffusion for controllable 3d shape generation

Xin-Yang Zheng, Hao Pan, Peng-Shuai Wang, Xin Tong, Yang Liu, and Heung-Yeung Shum. Locally attentional sdf diffusion for controllable 3d shape generation. ACM Trans- actions on Graphics (ToG), 42(4):1–13, 2023. 2

work page 2023

[45] [45]

Oscilla- tion inversion: Understand the structure of large flow model through the lens of inversion method, 2024

Yan Zheng, Zhenxiao Liang, Xiaoyan Cong, Lanqing guo, Yuehao Wang, Peihao Wang, and Zhangyang Wang. Oscilla- tion inversion: Understand the structure of large flow model through the lens of inversion method, 2024. 1

work page 2024

[46] [46]

3d shape generation and completion through point-voxel diffusion

Linqi Zhou, Yilun Du, and Jiajun Wu. 3d shape generation and completion through point-voxel diffusion. In Proceed- ings of the IEEE/CVF international conference on computer vision, pages 5826–5835, 2021. 2 6 Art3D: Training-Free 3D Generation from Flat-Colored Illustration Supplementary Material Supplemental materials provide more implementation details o...

work page 2021

[47] [47]

Implementation Details In experiments, we use as reference conditions N1 = 2 canny edge maps and N2 = 2 depth maps. It is noticeable that Art3D can present superior and robust performance with only one canny edge map as the reference condition, making Art3D more efficient and convenient to be applied to downstream tasks

work page

[48] [48]

Art3D: Training-Free 3D Generation from Flat-Colored Illustration

Dataset We show some examples of our newly collected dataset Flat-2D in Figure 2, Figure 2, and Figure 3. Figure 1. Image Gallery of Flat-2D. 1 arXiv:2504.10466v1 [cs.CV] 14 Apr 2025 Figure 2. Image Gallery of Flat-2D. Figure 3. Image Gallery of Flat-2D. 2

work page internal anchor Pith review Pith/arXiv arXiv 2025