3DMorph: Single-Image-Guided Local 3D Shape Editing and Morphing
Pith reviewed 2026-06-27 22:18 UTC · model grok-4.3
The pith
A training-free method turns an edited 2D image into precise local changes on a 3D shape.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
3DMorph is a training-free framework for single-image-guided local 3D shape editing and morphing. Given an edited image showing a desired shape modification, the method automatically localizes the relevant 3D region and transfers 2D modifications to 3D while preserving unmodified areas. 3DMorph also enables intermediate shape generation between the original and edited objects, facilitating design exploration. Experimental results show that 3DMorph translates intuitive 2D edits into 3D, outperforming state-of-the-art generative and editing methods on the introduced Delta3D benchmark with paired ground-truth edits.
What carries the argument
Automatic localization of the relevant 3D region from the 2D edit together with transfer of the geometric modification to the mesh while leaving other regions intact.
If this is right
- Local 3D geometric edits become possible from a single edited 2D image without retraining or extra inputs.
- Unmodified regions of the 3D mesh remain exactly as they were after the transfer step.
- Intermediate shapes can be generated between the original and edited versions to support design exploration.
- Results exceed those of existing generative and editing methods when measured on the Delta3D benchmark of paired ground-truth local edits.
Where Pith is reading between the lines
- Users comfortable with 2D image tools could perform targeted 3D geometric adjustments without specialized modeling interfaces.
- The single-image transfer idea might extend to editing sequences for animation by applying the localization across frames.
- Similar localization-plus-transfer logic could be tested on other 3D representations such as point clouds if the 2D-to-3D mapping holds.
Load-bearing premise
The edited 2D image supplies enough geometric information to correctly localize the affected 3D region and transfer the modification without additional user guidance or domain-specific assumptions about object type.
What would settle it
A test case in which the same 2D edit is consistent with modifications in more than one 3D region, causing the localization step to select the wrong area or produce inconsistent 3D output.
Figures
read the original abstract
Despite recent progress in 3D generation, intuitive editing of existing shapes remains limited. Unlike images, which benefit from well-established inpainting tools, general 3D objects such as meshes still lack simple and effective methods for local shape editing. Existing approaches are often global, domain-specific, require complex user interaction, or focus on appearance (color and texture) rather than geometry. We introduce 3DMorph, a training-free framework for single-image-guided local 3D shape editing and morphing. Given an edited image showing a desired shape modification, our method automatically localizes the relevant 3D region and transfers 2D modifications to 3D while preserving unmodified areas. 3DMorph also enables intermediate shape generation between the original and edited objects, facilitating design exploration. To benchmark editing quality, we introduce Delta3D, an image-guided local 3D editing benchmark with paired ground-truth edits. Experimental results show that 3DMorph translates intuitive 2D edits into 3D, outperforming state-of-the-art generative and editing methods.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces 3DMorph, a training-free framework that takes an original 3D mesh and a single edited 2D image depicting a desired local modification, automatically localizes the affected 3D region, transfers the geometric change to the mesh while preserving unmodified areas, and supports generation of intermediate morphed shapes. It also presents the Delta3D benchmark consisting of paired ground-truth local edits and reports that 3DMorph outperforms existing generative and editing methods on this benchmark.
Significance. If the localization and transfer steps prove robust, the work would offer a practical advance for intuitive 3D editing that leverages existing 2D image tools without training or domain-specific priors. The Delta3D benchmark is a useful contribution for standardized evaluation. The significance is tempered by the need to verify that the single-image input supplies sufficient constraints for general meshes.
major comments (2)
- [Method] Method section (localization procedure): the claim that the edited 2D image alone suffices for automatic 3D region localization is load-bearing for the central contribution, yet the description provides no explicit mechanism or test for disambiguating cases where projection is many-to-one (symmetric parts, occlusions, or non-rigid deformations).
- [Experiments] Experiments / Delta3D results: the abstract asserts outperformance over SOTA methods, but the manuscript supplies no quantitative tables, error bars, or failure-case analysis on the new benchmark, preventing assessment of whether the reported superiority holds under the under-constrained localization assumption.
minor comments (2)
- [Abstract] The abstract and introduction use the term "automatically localizes" without a forward reference to the precise algorithmic step that performs the localization.
- [Figures] Figure captions for qualitative results should explicitly state the input mesh, the 2D edit, and the output 3D mesh for each example.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We respond point-by-point to the major comments below, indicating where revisions will be made.
read point-by-point responses
-
Referee: [Method] Method section (localization procedure): the claim that the edited 2D image alone suffices for automatic 3D region localization is load-bearing for the central contribution, yet the description provides no explicit mechanism or test for disambiguating cases where projection is many-to-one (symmetric parts, occlusions, or non-rigid deformations).
Authors: We agree that the current description of the localization procedure lacks sufficient explicit detail on handling projection ambiguities. We will revise the method section to provide a clearer algorithmic description of the localization mechanism along with targeted tests for symmetric parts, occlusions, and non-rigid cases. revision: yes
-
Referee: [Experiments] Experiments / Delta3D results: the abstract asserts outperformance over SOTA methods, but the manuscript supplies no quantitative tables, error bars, or failure-case analysis on the new benchmark, preventing assessment of whether the reported superiority holds under the under-constrained localization assumption.
Authors: We agree that quantitative tables, error bars, and failure-case analysis are needed to properly support the outperformance claims on Delta3D. We will add these elements, including numerical results with standard deviations and discussion of failure modes, to the revised experimental section. revision: yes
Circularity Check
No circularity; empirical framework with external benchmark
full rationale
The paper presents a training-free algorithmic framework for local 3D editing from a single edited image, with performance claims resting on experimental comparisons against SOTA methods on the newly introduced Delta3D benchmark. No equations, derivations, fitted parameters, or self-citation chains appear in the abstract or description that reduce any central claim to its own inputs by construction. The localization and transfer steps are described as automatic but are validated externally rather than defined circularly.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
ShapeTalk: A language dataset and framework for 3d shape edits and deformations
Panos Achlioptas et al. “ShapeTalk: A language dataset and framework for 3d shape edits and deformations”. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2023, pp. 12685–12694
2023
-
[2]
Doodle your 3d: From abstract freehand sketches to precise 3d shapes
Hmrishav Bandyopadhyay et al. “Doodle your 3d: From abstract freehand sketches to precise 3d shapes”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024, pp. 9795–9805
2024
-
[3]
SENS: Part-Aware Sketch- based Implicit Neural Shape Modeling
Alexandre Binninger et al. “SENS: Part-Aware Sketch- based Implicit Neural Shape Modeling”. In:Computer Graphics Forum. V ol. 43. 2. Wiley Online Library. 2024, e15015
2024
-
[4]
Geometric flows of curves in shape space for processing motion of deformable objects
Christopher Brandt, Christoph von Tycowicz, and Klaus Hildebrandt. “Geometric flows of curves in shape space for processing motion of deformable objects”. In:Com- puter Graphics Forum. V ol. 35. 2. Wiley Online Library. 2016, pp. 295–305
2016
-
[5]
Native 3D Editing with Full Atten- tion
Weiwei Cai et al. “Native 3D Editing with Full Atten- tion”. In:arXiv preprint arXiv:2511.17501(2025)
arXiv 2025
-
[6]
Partgen: Part-level 3d generation and reconstruction with multi-view diffusion models
Minghao Chen et al. “Partgen: Part-level 3d generation and reconstruction with multi-view diffusion models”. In:Proceedings of the Computer Vision and Pattern Recognition Conference. 2025, pp. 5881–5892
2025
-
[7]
Shap-editor: Instruction-guided latent 3d editing in seconds
Minghao Chen et al. “Shap-editor: Instruction-guided latent 3d editing in seconds”. In:Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2024, pp. 26456–26466
2024
-
[8]
Gheorghe Comanici et al. “Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities”. In: arXiv preprint arXiv:2507.06261(2025)
Pith/arXiv arXiv 2025
-
[9]
Hamiltonian dynamics for real-world shape interpolation
Marvin Eisenberger and Daniel Cremers. “Hamiltonian dynamics for real-world shape interpolation”. In:Eu- ropean conference on computer vision. Springer. 2020, pp. 179–196
2020
-
[10]
Neuromorph: Unsupervised shape interpolation and correspondence in one go
Marvin Eisenberger et al. “Neuromorph: Unsupervised shape interpolation and correspondence in one go”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021, pp. 7473–7483
2021
-
[11]
Will Gao et al.3D Mesh Editing using Masked LRMs
- [12]
-
[13]
Textdeformer: Geometry manipu- lation using text guidance
William Gao et al. “Textdeformer: Geometry manipu- lation using text guidance”. In:ACM SIGGRAPH 2023 conference proceedings. 2023, pp. 1–11
2023
-
[14]
Splines in the space of shells
Behrend Heeren et al. “Splines in the space of shells”. In:Computer Graphics Forum. V ol. 35. 5. Wiley Online Library. 2016, pp. 111–120
2016
-
[15]
Spaghetti: Editing implicit shapes through part aware generation
Amir Hertz et al. “Spaghetti: Editing implicit shapes through part aware generation”. In:ACM Transactions on Graphics (TOG)41.4 (2022), pp. 1–20
2022
-
[16]
LADIS: Language disentangle- ment for 3D shape editing
Ian Huang et al. “LADIS: Language disentangle- ment for 3D shape editing”. In:arXiv preprint arXiv:2212.05011(2022)
arXiv 2022
-
[17]
Salad: Part-level latent diffusion for 3d shape generation and manipulation
Juil Koo et al. “Salad: Part-level latent diffusion for 3d shape generation and manipulation”. In:Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023, pp. 14441–14451
2023
-
[18]
https://github.com/black- forest-labs/flux
Black Forest Labs.FLUX. https://github.com/black- forest-labs/flux. 2024
2024
-
[19]
MeshPad: Interactive Sketch- Conditioned Artist-Designed Mesh Generation and Editing
Haoxuan Li et al. “MeshPad: Interactive Sketch- Conditioned Artist-Designed Mesh Generation and Editing”. In:arXiv preprint arXiv:2503.01425(2025)
Pith/arXiv arXiv 2025
-
[20]
V oxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space
Lin Li et al. “V oxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space”. In:arXiv preprint arXiv:2508.19247(2025)
arXiv 2025
-
[21]
CMD: Controllable Multiview Diffu- sion for 3D Editing and Progressive Generation
Peng Li et al. “CMD: Controllable Multiview Diffu- sion for 3D Editing and Progressive Generation”. In: Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Con- ference Papers. 2025, pp. 1–10
2025
-
[22]
Triposg: High-fidelity 3d shape synthesis using large-scale rectified flow models
Yangguang Li et al. “Triposg: High-fidelity 3d shape synthesis using large-scale rectified flow models”. In: arXiv preprint arXiv:2502.06608(2025)
Pith/arXiv arXiv 2025
-
[23]
Focaldreamer: Text-driven 3d editing via focal-fusion assembly
Yuhan Li et al. “Focaldreamer: Text-driven 3d editing via focal-fusion assembly”. In:Proceedings of the AAAI conference on artificial intelligence. V ol. 38. 4. 2024, pp. 3279–3287
2024
-
[24]
Wonder3D: Single Image to 3D using Cross-Domain Diffusion
Xiaoxiao Long et al. “Wonder3D: Single Image to 3D using Cross-Domain Diffusion”. In:arXiv preprint arXiv:2310.15008(2023)
arXiv 2023
-
[25]
Repaint: Inpainting using de- noising diffusion probabilistic models
Andreas Lugmayr et al. “Repaint: Inpainting using de- noising diffusion probabilistic models”. In:Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022, pp. 11461–11471
2022
-
[26]
GeoDiffusion: A Training-Free Framework for Accurate 3D Geometric Conditioning in Image Generation
Phillip Mueller et al. “GeoDiffusion: A Training-Free Framework for Accurate 3D Geometric Conditioning in Image Generation”. In:Proceedings of the IEEE/CVF International Conference on Computer Vision. 2025, pp. 6374–6384
2025
-
[27]
Why Are You Wrong? Coun- terfactual Explanations for Language Grounding with 3D Objects
Tobias Preintner et al. “Why Are You Wrong? Coun- terfactual Explanations for Language Grounding with 3D Objects”. In:2025 International Joint Conference on Neural Networks (IJCNN). 2025.DOI: 10 . 1109 / IJCNN64981.2025.11227256
arXiv 2025
-
[28]
High-Resolution Image Synthe- sis With Latent Diffusion Models
Robin Rombach et al. “High-Resolution Image Synthe- sis With Latent Diffusion Models”. In:Proceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR). June 2022, pp. 10684–10695
2022
-
[29]
Palette: Image-to-image diffu- sion models
Chitwan Saharia et al. “Palette: Image-to-image diffu- sion models”. In:ACM SIGGRAPH 2022 conference proceedings. 2022, pp. 1–10
2022
-
[30]
4Deform: Neural Surface Deformation for Robust Shape Interpolation
Lu Sang et al. “4Deform: Neural Surface Deformation for Robust Shape Interpolation”. In:Proceedings of the Computer Vision and Pattern Recognition Conference. 2025, pp. 6542–6551
2025
-
[31]
V ox-e: Text-guided voxel editing of 3d objects
Etai Sella et al. “V ox-e: Text-guided voxel editing of 3d objects”. In:Proceedings of the IEEE/CVF international conference on computer vision. 2023, pp. 430–440
2023
-
[32]
Shapewalk: Compositional shape editing through language-guided chains
Habib Slim and Mohamed Elhoseiny. “Shapewalk: Compositional shape editing through language-guided chains”. In:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024, pp. 22574–22583
2024
-
[33]
3DFaceSculptor: A Common Framework for Image-Guided 3D Face Deformation
Hao Su et al. “ 3DFaceSculptor: A Common Framework for Image-Guided 3D Face Deformation ”. In:IEEE Transactions on Visualization & Computer Graphics 01 (Aug. 5555), pp. 1–18.ISSN: 1941-0506.DOI: 10 . 1109 / TVCG . 2025 . 3596482.URL: https : / / doi . ieeecomputersociety.org/10.1109/TVCG.2025.3596482
-
[34]
Srif: Semantic shape registration empowered by diffusion-based image morphing and flow estimation
Mingze Sun et al. “Srif: Semantic shape registration empowered by diffusion-based image morphing and flow estimation”. In:SIGGRAPH Asia 2024 Conference Papers. 2024, pp. 1–11
2024
-
[35]
Tencent Hunyuan3D Team.Hunyuan3D 2.1: From Im- ages to High-Fidelity 3D Assets with Production-Ready PBR Material. 2025. arXiv: 2506.15442[cs.CV]
Pith/arXiv arXiv 2025
-
[36]
Joinable: Learning bottom-up assembly of parametric cad joints
Karl DD Willis et al. “Joinable: Learning bottom-up assembly of parametric cad joints”. In:Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022, pp. 15849–15860
2022
-
[37]
A continuum mechanical ap- proach to geodesics in shape space
Benedikt Wirth et al. “A continuum mechanical ap- proach to geodesics in shape space”. In:International Journal of Computer Vision93.3 (2011), pp. 293–318
2011
-
[38]
Amodal3r: Amodal 3d reconstruc- tion from occluded 2d images
Tianhao Wu et al. “Amodal3r: Amodal 3d reconstruc- tion from occluded 2d images”. In:arXiv preprint arXiv:2503.13439(2025)
arXiv 2025
-
[39]
Towards Scal- able and Consistent 3D Editing
Ruihao Xia, Yang Tang, and Pan Zhou. “Towards Scal- able and Consistent 3D Editing”. In:arXiv preprint arXiv:2510.02994(2025)
arXiv 2025
-
[40]
Native and Compact Struc- tured Latents for 3D Generation
Jianfeng Xiang et al. “Native and Compact Struc- tured Latents for 3D Generation”. In:arXiv preprint arXiv:2512.14692(2025)
Pith/arXiv arXiv 2025
-
[41]
Structured 3d latents for scalable and versatile 3d generation
Jianfeng Xiang et al. “Structured 3d latents for scalable and versatile 3d generation”. In:Proceedings of the Computer Vision and Pattern Recognition Conference. 2025, pp. 21469–21480
2025
-
[42]
Paint by example: Exemplar-based image editing with diffusion models
Binxin Yang et al. “Paint by example: Exemplar-based image editing with diffusion models”. In:Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2023, pp. 18381–18391
2023
-
[43]
NANO3D: A Training-Free Ap- proach for Efficient 3D Editing Without Masks
Junliang Ye et al. “NANO3D: A Training-Free Ap- proach for Efficient 3D Editing Without Masks”. In: arXiv preprint arXiv:2510.15019(2025)
arXiv 2025
-
[44]
GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models
Taoran Yi et al. “GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models”. In:CVPR. 2024
2024
-
[45]
Adding conditional control to text-to-image diffusion models
Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. “Adding conditional control to text-to-image diffusion models”. In:Proceedings of the IEEE/CVF interna- tional conference on computer vision. 2023, pp. 3836– 3847
2023
-
[46]
AnchorFlow: Training-Free 3D Editing via Latent Anchor-Aligned Flows
Zhenglin Zhou et al. “AnchorFlow: Training-Free 3D Editing via Latent Anchor-Aligned Flows”. In:arXiv preprint arXiv:2511.22357(2025). A. EXTENDEDMETHODOLOGYDETAILS This section provides additional methodological details on the bounding box prediction module introduced in Sec. III-D and the local morphing method described in Sec. III-E. A. Bounding Box Pr...
arXiv 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.