pith. machine review for the scientific record. sign in

arxiv: 2604.07795 · v1 · submitted 2026-04-09 · 💻 cs.CV · cs.GR

Recognition: unknown

Image-Guided Geometric Stylization of 3D Meshes

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:27 UTC · model grok-4.3

classification 💻 cs.CV cs.GR
keywords geometric stylization3D mesh deformationimage-guided stylediffusion modelsVAE encodercoarse-to-fine pipelinetopology preservationartistic 3D models
0
0 comments X

The pith

A coarse-to-fine pipeline deforms 3D meshes to express the geometric style of an image while retaining the original topology and part semantics.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a geometric stylization framework for 3D meshes that takes a reference image as input. Pre-trained diffusion models first extract an abstract representation of the image style. A coarse-to-fine deformation pipeline then applies this style to drastically alter the mesh geometry. An approximate VAE encoder supplies efficient gradients from rendered views to guide the optimization. The result supports bold artistic 3D variations that extend beyond typical data distributions.

Core claim

We propose a geometric stylization framework that deforms a 3D mesh, allowing it to express the style of an image. While style is inherently ambiguous, we utilize pre-trained diffusion models to extract an abstract representation of the provided image. Our coarse-to-fine stylization pipeline can drastically deform the input 3D model to express a diverse range of geometric variations while retaining the valid topology of the original mesh and part-level semantics. We also propose an approximate VAE encoder that provides efficient and reliable gradients from mesh renderings. Extensive experiments demonstrate that our method can create stylized 3D meshes that reflect unique geometric features.

What carries the argument

The coarse-to-fine stylization pipeline that uses abstract representations extracted from diffusion models and is driven by gradients from an approximate VAE encoder on mesh renderings.

If this is right

  • Stylized 3D meshes can reflect unique geometric features of the pictured assets such as expressive poses and silhouettes.
  • The method retains the valid topology of the original mesh and part-level semantics after deformation.
  • Bold geometric distortions are possible that go beyond existing data distributions.
  • The approach supports the creation of distinctive artistic 3D creations from image references.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This could allow artists to generate varied 3D asset libraries quickly from 2D reference sketches or photos.
  • The method might extend naturally to consistent stylization across multiple viewpoints or short animation sequences.
  • One could test generalization by applying the pipeline to scanned real-world meshes with noisy geometry.
  • Integration with existing 3D editing tools might give users finer control over which parts receive the strongest style influence.

Load-bearing premise

The abstract representation extracted from the image by pre-trained diffusion models corresponds to transferable geometric features for 3D mesh deformation, and the approximate VAE encoder provides reliable gradients from renderings.

What would settle it

Render the stylized output mesh and check whether its silhouette and pose fail to match the reference image or whether the mesh topology becomes invalid or part semantics are lost.

Figures

Figures reproduced from arXiv: 2604.07795 by Changwoon Choi, Cl\'ement Jambon, Hyunsoo Lee, Yael Vinker, Young Min Kim.

Figure 1
Figure 1. Figure 1: Our method enables stylization of 3D meshes driven by image style references. The target stylized meshes retain the coarse [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Text-based deformation often struggles to capture and [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Method overview. We first extract geometric style from [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Effect of model choices on text-guided mesh deforma [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative comparison of the proposed method against baselines [ [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: User study results. We examine each method using three [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Qualitative geometric stylization results using diverse source mesh and style references. Our method effectively transfers the [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Ablation study results. We analyze the effects of the approximated VAE encoder (1st row), cage-coefficient regularization (2nd) [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗
Figure 10
Figure 10. Figure 10: Our approach can transfer geometric style only to user [PITH_FULL_IMAGE:figures/full_fig_p008_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Additional qualitative comparison of our method with the baselines [ [PITH_FULL_IMAGE:figures/full_fig_p012_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Additional qualitative results of the proposed method. [PITH_FULL_IMAGE:figures/full_fig_p013_12.png] view at source ↗
read the original abstract

Recent generative models can create visually plausible 3D representations of objects. However, the generation process often allows for implicit control signals, such as contextual descriptions, and rarely supports bold geometric distortions beyond existing data distributions. We propose a geometric stylization framework that deforms a 3D mesh, allowing it to express the style of an image. While style is inherently ambiguous, we utilize pre-trained diffusion models to extract an abstract representation of the provided image. Our coarse-to-fine stylization pipeline can drastically deform the input 3D model to express a diverse range of geometric variations while retaining the valid topology of the original mesh and part-level semantics. We also propose an approximate VAE encoder that provides efficient and reliable gradients from mesh renderings. Extensive experiments demonstrate that our method can create stylized 3D meshes that reflect unique geometric features of the pictured assets, such as expressive poses and silhouettes, thereby supporting the creation of distinctive artistic 3D creations. Project page: https://changwoonchoi.github.io/GeoStyle

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes an image-guided geometric stylization method for 3D meshes. It extracts an abstract representation from a guiding image via pre-trained diffusion models, then applies a coarse-to-fine optimization pipeline to deform an input mesh so that it expresses the image's geometric style (e.g., expressive poses and silhouettes). An approximate VAE encoder is introduced to supply efficient gradients from differentiable renderings. The method claims to produce large deformations while preserving mesh topology, manifold validity, and part-level semantics, with extensive experiments demonstrating distinctive artistic 3D outputs.

Significance. If the central claims hold, the work would offer a practical bridge between 2D generative models and controllable 3D geometry editing, enabling new artistic workflows. The coarse-to-fine strategy and approximate VAE gradient mechanism address real computational challenges in mesh optimization. However, the significance is tempered by the unproven assumption that diffusion latents supply disentangled geometric signals transferable to 3D vertex displacements; without strong evidence against appearance/viewpoint entanglement or gradient instability, the practical utility remains uncertain.

major comments (2)
  1. [Abstract / Method Overview] The central claim (abstract) that diffusion-derived representations enable 'drastic' yet topology-preserving deformations rests on the untested premise that these latents encode transferable 3D geometric features (poses, silhouettes) rather than entangled 2D appearance and viewpoint cues. The manuscript should add an explicit analysis or ablation (e.g., in the method or experiments section) quantifying how much of the deformation is driven by shape versus texture signals, together with metrics for self-intersection rate and semantic part consistency under large deformations.
  2. [Approximate VAE Encoder / Optimization] The approximate VAE encoder (abstract) is presented as providing 'efficient and reliable gradients from mesh renderings,' yet the skeptic correctly flags that any bias or variance in this approximation could compound across the coarse stage and produce invalid meshes precisely when deformations are largest. The paper must report quantitative validation of gradient accuracy (e.g., comparison to exact rendering gradients or error bounds) and demonstrate stability on examples with extreme pose changes.
minor comments (2)
  1. [Experiments] The abstract states 'extensive experiments demonstrate effectiveness,' but the manuscript would benefit from clearer reporting of failure cases, quantitative baselines, and ablation tables that isolate the contribution of the coarse-to-fine schedule versus the VAE approximation.
  2. [Method] Notation for the diffusion latent extraction and the VAE approximation should be introduced with explicit equations early in the method section to improve readability for readers unfamiliar with the specific pre-trained models used.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and have revised the manuscript to incorporate additional analyses and validations as suggested.

read point-by-point responses
  1. Referee: [Abstract / Method Overview] The central claim (abstract) that diffusion-derived representations enable 'drastic' yet topology-preserving deformations rests on the untested premise that these latents encode transferable 3D geometric features (poses, silhouettes) rather than entangled 2D appearance and viewpoint cues. The manuscript should add an explicit analysis or ablation (e.g., in the method or experiments section) quantifying how much of the deformation is driven by shape versus texture signals, together with metrics for self-intersection rate and semantic part consistency under large deformations.

    Authors: We agree that an explicit ablation would strengthen the central claim. Our optimization objective is formulated to align rendered geometric properties (silhouettes and poses) with diffusion-derived features, which prior work has shown to capture structural information. To directly address potential entanglement, we have added an ablation in the revised experiments section comparing stylization results when using diffusion features from the original image versus texture-ablated inputs (grayscale and edge-extracted versions). We also report quantitative metrics: self-intersection rates computed via standard mesh intersection algorithms and semantic part consistency measured using a pre-trained segmentation network, specifically for large-deformation cases. These results indicate that geometric cues dominate the deformations. revision: yes

  2. Referee: [Approximate VAE Encoder / Optimization] The approximate VAE encoder (abstract) is presented as providing 'efficient and reliable gradients from mesh renderings,' yet the skeptic correctly flags that any bias or variance in this approximation could compound across the coarse stage and produce invalid meshes precisely when deformations are largest. The paper must report quantitative validation of gradient accuracy (e.g., comparison to exact rendering gradients or error bounds) and demonstrate stability on examples with extreme pose changes.

    Authors: We acknowledge the importance of validating the approximate VAE encoder's gradient reliability, particularly for large deformations. In the revised manuscript, we have added a quantitative comparison in the method and experiments sections: we compute the mean squared error between approximate gradients and exact differentiable rendering gradients over a set of test meshes with varying deformation magnitudes, providing explicit error bounds. We further demonstrate stability by including additional examples with extreme pose changes and report aggregate statistics on mesh validity (self-intersection rates and manifold checks) across the full experimental suite, showing no increase in invalid outputs during the coarse stage. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation relies on external pre-trained models

full rationale

The paper proposes a coarse-to-fine stylization pipeline that deforms input 3D meshes using abstract representations extracted from images via pre-trained diffusion models, plus a new approximate VAE encoder to obtain gradients from renderings. No load-bearing steps reduce by construction to self-definitions, fitted inputs renamed as predictions, or self-citation chains. The central claims of drastic geometric variation while preserving topology and semantics are supported by the proposed components and external models without circular reduction to the paper's own inputs or fitted parameters.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract, the central claim rests on the assumption that diffusion models yield usable geometric style representations and that the proposed VAE provides stable gradients; no explicit free parameters or new entities are detailed.

axioms (1)
  • domain assumption Pre-trained diffusion models can extract an abstract representation of the provided image suitable for geometric stylization of 3D meshes
    Invoked as the basis for style extraction in the proposed framework.

pith-pipeline@v0.9.0 · 5492 in / 1249 out tokens · 68079 ms · 2026-05-10T17:27:01.726454+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

59 extracted references · 8 canonical work pages · 3 internal anchors

  1. [1]

    Neural jaco- bian fields: Learning intrinsic mappings of arbitrary meshes

    Noam Aigerman, Kunal Gupta, Vladimir G Kim, Siddhartha Chaudhuri, Jun Saito, and Thibault Groueix. Neural jaco- bian fields: Learning intrinsic mappings of arbitrary meshes. SIGGRAPH, 2022. 2, 3

  2. [2]

    Artflow: Unbiased image style transfer via re- versible neural flows

    Jie An, Siyu Huang, Yibing Song, Dejing Dou, Wei Liu, and Jiebo Luo. Artflow: Unbiased image style transfer via re- versible neural flows. InCVPR, 2021. 2

  3. [3]

    DeepFloyd IF: a novel state- of-the-art open-source text-to-image model with a high de- gree of photorealism and language understanding.https: //www.deepfloyd.ai/deepfloyd-if, 2023

    DeepFloyd Lab at StabilityAI. DeepFloyd IF: a novel state- of-the-art open-source text-to-image model with a high de- gree of photorealism and language understanding.https: //www.deepfloyd.ai/deepfloyd-if, 2023. 2, 4, 6

  4. [4]

    Fast patch-based style transfer of arbitrary style.arXiv:1612.04337, 2016

    Tian Qi Chen and Mark Schmidt. Fast patch-based style transfer of arbitrary style.arXiv:1612.04337, 2016. 2

  5. [5]

    Stylizing 3d scene via im- plicit representation and hypernetwork

    Pei-Ze Chiang, Meng-Shiun Tsai, Hung-Yu Tseng, Wei- Sheng Lai, and Wei-Chen Chiu. Stylizing 3d scene via im- plicit representation and hypernetwork. InWACV, 2022. 2

  6. [6]

    Geometry in style: 3d stylization via surface normal deformation

    Nam Anh Dinh, Itai Lang, Hyunwoo Kim, Oded Stein, and Rana Hanocka. Geometry in style: 3d stylization via surface normal deformation. InCVPR, 2025. 1, 2

  7. [7]

    An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

    Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit H Bermano, Gal Chechik, and Daniel Cohen-Or. An image is worth one word: Personalizing text-to-image gen- eration using textual inversion.arXiv:2208.01618, 2022. 5

  8. [8]

    Textdeformer: Geometry manipu- lation using text guidance

    William Gao, Noam Aigerman, Thibault Groueix, V ova Kim, and Rana Hanocka. Textdeformer: Geometry manipu- lation using text guidance. InSIGGRAPH, 2023. 2, 3, 5, 6, 11, 12

  9. [9]

    Im- age style transfer using convolutional neural networks

    Leon A Gatys, Alexander S Ecker, and Matthias Bethge. Im- age style transfer using convolutional neural networks. In CVPR, 2016. 1, 2

  10. [10]

    Controllable neural style transfer for dynamic meshes

    Guilherme Gomes Haetinger, Jingwei Tang, Raphael Ortiz, Paul Kanyuk, and Vinicius Azevedo. Controllable neural style transfer for dynamic meshes. InSIGGRAPH, 2024. 2

  11. [11]

    Ar- bitrary style transfer with deep feature reshuffle

    Shuyang Gu, Congliang Chen, Jing Liao, and Lu Yuan. Ar- bitrary style transfer with deep feature reshuffle. InCVPR,

  12. [12]

    Deep geometric texture synthesis.ACM TOG, 2020

    Amir Hertz, Rana Hanocka, Raja Giryes, and Daniel Cohen- Or. Deep geometric texture synthesis.ACM TOG, 2020. 2

  13. [13]

    Gans trained by a two time-scale update rule converge to a local nash equilib- rium.NIPS, 2017

    Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilib- rium.NIPS, 2017. 11

  14. [14]

    Denoising diffu- sion probabilistic models.NeurIPS, 2020

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffu- sion probabilistic models.NeurIPS, 2020. 3

  15. [15]

    Stylemesh: Style transfer for indoor 3d scene reconstruc- tions

    Lukas H ¨ollein, Justin Johnson, and Matthias Nießner. Stylemesh: Style transfer for indoor 3d scene reconstruc- tions. InCVPR, 2022. 2

  16. [16]

    Lora: Low-rank adaptation of large language models.ICLR,

    Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models.ICLR,

  17. [17]

    Learning to stylize novel views

    Hsin-Ping Huang, Hung-Yu Tseng, Saurabh Saini, Maneesh Singh, and Ming-Hsuan Yang. Learning to stylize novel views. InICCV, 2021. 2

  18. [18]

    Arbitrary style transfer in real-time with adaptive instance normalization

    Xun Huang and Serge Belongie. Arbitrary style transfer in real-time with adaptive instance normalization. InICCV,

  19. [19]

    Geometry transfer for stylizing radiance fields

    Hyunyoung Jung, Seonghyeon Nam, Nikolaos Sarafianos, Sungjoo Yoo, Alexander Sorkine-Hornung, and Rakesh Ran- jan. Geometry transfer for stylizing radiance fields. InCVPR,

  20. [20]

    Neu- ral 3d mesh renderer

    Hiroharu Kato, Yoshitaka Ushiku, and Tatsuya Harada. Neu- ral 3d mesh renderer. InCVPR, 2018. 2, 5, 6, 11, 12

  21. [21]

    Meshup: Multi-target mesh deformation via blended score distillation

    Hyunwoo Kim, Itai Lang, Noam Aigerman, Thibault Groueix, Vladimir G Kim, and Rana Hanocka. Meshup: Multi-target mesh deformation via blended score distillation. In3DV, 2025. 2, 4, 5, 6, 11, 12

  22. [22]

    Auto-Encoding Variational Bayes

    Diederik P Kingma and Max Welling. Auto-encoding varia- tional bayes.arXiv:1312.6114, 2013. 4, 7, 11

  23. [23]

    Gauss Stylization: Interactive Artistic Mesh Modeling based on Preferred Surface Normals.Com- puter Graphics Forum, 2021

    Maximilian Kohlbrenner, Ugo Finnendahl, Tobias Djuren, and Marc Alexa. Gauss Stylization: Interactive Artistic Mesh Modeling based on Preferred Surface Normals.Com- puter Graphics Forum, 2021. 1, 2

  24. [24]

    Style transfer by relaxed optimal transport and self-similarity

    Nicholas Kolkin, Jason Salavon, and Gregory Shakhnarovich. Style transfer by relaxed optimal transport and self-similarity. InCVPR, 2019. 2

  25. [25]

    arXiv preprint arXiv:2203.13215 (2022)

    Nicholas Kolkin, Michal Kucera, Sylvain Paris, Daniel Sykora, Eli Shechtman, and Greg Shakhnarovich. Neural neighbor style transfer.arXiv:2203.13215, 2022. 2

  26. [26]

    Content and style disentanglement for artistic style transfer

    Dmytro Kotovenko, Artsiom Sanakoyeu, Sabine Lang, and Bjorn Ommer. Content and style disentanglement for artistic style transfer. InICCV, 2019. 2

  27. [27]

    A content transformation block for image style transfer

    Dmytro Kotovenko, Artsiom Sanakoyeu, Pingchuan Ma, Sabine Lang, and Bjorn Ommer. A content transformation block for image style transfer. InCVPR, 2019

  28. [28]

    Rethinking style transfer: From pixels to parameterized brushstrokes

    Dmytro Kotovenko, Matthias Wright, Arthur Heimbrecht, and Bjorn Ommer. Rethinking style transfer: From pixels to parameterized brushstrokes. InCVPR, 2021. 2

  29. [29]

    Modular primitives for high-performance differentiable rendering.ACM TOG,

    Samuli Laine, Janne Hellsten, Tero Karras, Yeongho Seol, Jaakko Lehtinen, and Timo Aila. Modular primitives for high-performance differentiable rendering.ACM TOG,

  30. [30]

    Combining markov random fields and convolutional neural networks for image synthesis

    Chuan Li and Michael Wand. Combining markov random fields and convolutional neural networks for image synthesis. InCVPR, 2016. 2

  31. [31]

    Universal style transfer via feature transforms.NIPS, 2017

    Yijun Li, Chen Fang, Jimei Yang, Zhaowen Wang, Xin Lu, and Ming-Hsuan Yang. Universal style transfer via feature transforms.NIPS, 2017. 2

  32. [32]

    Visual attribute transfer through deep image analogy

    Jing Liao, Yuan Yao, Lu Yuan, Gang Hua, and Sing Bing Kang. Visual attribute transfer through deep image analogy. arXiv:1705.01088, 2017. 2 9

  33. [33]

    Cubic stylization

    Hsueh-Ti Derek Liu and Alec Jacobson. Cubic stylization. ACM TOG, 2019. 1, 2

  34. [34]

    Normal-driven spherical shape analogies

    Hsueh-Ti Derek Liu and Alec Jacobson. Normal-driven spherical shape analogies. InComputer Graphics Forum,

  35. [35]

    Pa- parazzi: surface editing by way of multi-view image process- ing.ACM TOG, 2018

    Hsueh-Ti Derek Liu, Michael Tao, and Alec Jacobson. Pa- parazzi: surface editing by way of multi-view image process- ing.ACM TOG, 2018. 2, 5, 6, 11, 12

  36. [36]

    Stylegaussian: Instant 3d style transfer with gaussian splatting

    Kunhao Liu, Fangneng Zhan, Muyu Xu, Christian Theobalt, Ling Shao, and Shijian Lu. Stylegaussian: Instant 3d style transfer with gaussian splatting. InSIGGRAPH Asia Techni- cal Communications. 2024. 2

  37. [37]

    Partfield: Learning 3d feature fields for part segmentation and beyond

    Minghua Liu, Mikaela Angelina Uy, Donglai Xiang, Hao Su, Sanja Fidler, Nicholas Sharp, and Jun Gao. Partfield: Learning 3d feature fields for part segmentation and beyond. ICCV, 2025. 4, 5, 7

  38. [38]

    Geometric style transfer.arXiv:2007.05471, 2020

    Xiao-Chang Liu, Xuan-Yi Li, Ming-Ming Cheng, and Peter Hall. Geometric style transfer.arXiv:2007.05471, 2020. 2

  39. [39]

    The contextual loss for image transformation with non-aligned data

    Roey Mechrez, Itamar Talmi, and Lihi Zelnik-Manor. The contextual loss for image transformation with non-aligned data. InECCV, 2018. 2

  40. [40]

    Latent-nerf for shape-guided generation of 3d shapes and textures

    Gal Metzer, Elad Richardson, Or Patashnik, Raja Giryes, and Daniel Cohen-Or. Latent-nerf for shape-guided generation of 3d shapes and textures. InCVPR, 2023. 4

  41. [41]

    Text2mesh: Text-driven neural stylization for meshes

    Oscar Michel, Roi Bar-On, Richard Liu, Sagie Benaim, and Rana Hanocka. Text2mesh: Text-driven neural stylization for meshes. InCVPR, 2022. 1, 5, 6, 11, 12

  42. [42]

    3d photo stylization: Learning to generate stylized novel views from a single image

    Fangzhou Mu, Jian Wang, Yicheng Wu, and Yin Li. 3d photo stylization: Learning to generate stylized novel views from a single image. InCVPR, 2022. 2

  43. [43]

    Lo- cally stylized neural radiance fields

    Hong-Wing Pang, Binh-Son Hua, and Sai-Kit Yeung. Lo- cally stylized neural radiance fields. InICCV, 2023. 2

  44. [44]

    Arbitrary style trans- fer with style-attentional networks

    Dae Young Park and Kwang Hee Lee. Arbitrary style trans- fer with style-attentional networks. InCVPR, 2019. 2

  45. [45]

    SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

    Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas M ¨uller, Joe Penna, and Robin Rombach. Sdxl: Improving latent diffusion mod- els for high-resolution image synthesis.arXiv:2307.01952,

  46. [46]

    Dreamfusion: Text-to-3d using 2d diffusion.ICLR,

    Ben Poole, Ajay Jain, Jonathan T Barron, and Ben Milden- hall. Dreamfusion: Text-to-3d using 2d diffusion.ICLR,

  47. [47]

    Learn- ing transferable visual models from natural language super- vision

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learn- ing transferable visual models from natural language super- vision. InICML, 2021. 2, 5

  48. [48]

    arXiv preprint arXiv:1701.08893 (2017)

    Eric Risser, Pierre Wilmot, and Connelly Barnes. Stable and controllable neural texture synthesis and style transfer using histogram losses.arXiv:1701.08893, 2017. 2

  49. [49]

    High-resolution image syn- thesis with latent diffusion models

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj¨orn Ommer. High-resolution image syn- thesis with latent diffusion models. InCVPR, 2022. 3

  50. [50]

    U- net: Convolutional networks for biomedical image segmen- tation

    Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U- net: Convolutional networks for biomedical image segmen- tation. InInternational Conference on Medical image com- puting and computer-assisted intervention, 2015. 3

  51. [51]

    Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation

    Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. InCVPR, 2023. 3, 5, 11

  52. [52]

    Avatar- net: Multi-scale zero-shot style transfer by feature decora- tion

    Lu Sheng, Ziyi Lin, Jing Shao, and Xiaogang Wang. Avatar- net: Multi-scale zero-shot style transfer by feature decora- tion. InCVPR, 2018. 2

  53. [53]

    Very deep convolutional net- works for large-scale image recognition

    K Simonyan and A Zisserman. Very deep convolutional net- works for large-scale image recognition. InICLR, 2015. 2

  54. [54]

    Denois- ing diffusion implicit models.ICLR, 2021

    Jiaming Song, Chenlin Meng, and Stefano Ermon. Denois- ing diffusion implicit models.ICLR, 2021. 3

  55. [55]

    The face of art: landmark detection and geometric style in portraits

    Jordan Yaniv, Yael Newman, and Ariel Shamir. The face of art: landmark detection and geometric style in portraits. ACM TOG, 2019. 2

  56. [56]

    3dstylenet: Creating 3d shapes with geometric and texture style variations

    Kangxue Yin, Jun Gao, Maria Shugrina, Sameh Khamis, and Sanja Fidler. 3dstylenet: Creating 3d shapes with geometric and texture style variations. InICCV, 2021. 2

  57. [57]

    Styl- izedgs: Controllable stylization for 3d gaussian splatting

    Dingxi Zhang, Yu-Jie Yuan, Zhuoxun Chen, Fang-Lue Zhang, Zhenliang He, Shiguang Shan, and Lin Gao. Styl- izedgs: Controllable stylization for 3d gaussian splatting. TPAMI, 2025. 2

  58. [58]

    Arf: Artistic radiance fields

    Kai Zhang, Nick Kolkin, Sai Bi, Fujun Luan, Zexiang Xu, Eli Shechtman, and Noah Snavely. Arf: Artistic radiance fields. InECCV, 2022. 2

  59. [59]

    A photo of TOK sculpture

    Yuechen Zhang, Zexin He, Jinbo Xing, Xufeng Yao, and Ji- aya Jia. Ref-npr: Reference-based non-photorealistic radi- ance fields for controllable scene stylization. InCVPR, 2023. 2 10 Image-Guided Geometric Stylization of 3D Meshes Supplementary Material A. Implementation Details To train LoRA weight [16] on Stable Diffusion XL [45] us- ing DreamBooth [51]...