MiXR: Harvesting and Recomposing Geometry from Real-World Objects for In-Situ 3D Design
Pith reviewed 2026-05-21 08:34 UTC · model grok-4.3
The pith
MiXR lets users harvest real-world geometry segments, assemble them manually in XR to define spatial structure, and delegate refinement to generative AI.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that a hybrid workflow of harvesting geometry segments from captured real objects, letting users compose them via direct manipulation in XR, and applying generative AI to produce a unified model enables explicit specification of spatial intent that is difficult to convey through verbal prompts alone, with measurable gains in design accuracy, control, and reduced workload shown in the user study.
What carries the argument
The hybrid workflow of harvesting real-world geometry segments, user-driven direct 3D assembly in XR, and generative AI synthesis to create a coherent final model.
If this is right
- Users can produce 3D designs that more closely match their intended spatial structure.
- Participants experience a stronger sense of control over the modeling process.
- Cognitive workload drops for tasks that require precise geometric composition.
- The approach is suited to in-situ design where real objects supply starting geometry.
Where Pith is reading between the lines
- Designers could begin from familiar physical surroundings rather than abstract prompts, lowering the barrier for non-experts.
- The system might support iterative on-site customization, such as modifying objects within the actual space they will occupy.
- Collaborative versions could let multiple users harvest segments from a shared environment for joint model creation.
Load-bearing premise
Generative AI can reliably turn a user's manually assembled collection of real geometry segments into a coherent, high-quality 3D model without major artifacts or loss of the intended spatial relationships.
What would settle it
A test case in which the AI output shows major visual artifacts, breaks the spatial relationships set by the user's assembly, or produces a model that participants judge as farther from the target than the baseline method.
Figures
read the original abstract
Recent developments in 3D generative AI enable users to create bespoke 3D models from text or image prompts. However, these approaches provide limited control over spatial structure, making them ill suited for tasks requiring precise geometric composition. We present MiXR, an XR system for in-situ compositional modeling that enables users to create new 3D models by harvesting geometry from their environment. Users extract segments from captured objects and assemble new artifacts through direct 3D manipulation, while generative AI synthesizes a coherent model from the user-defined composition. This hybrid workflow allows users to define spatial structure explicitly while delegating geometric refinement to generative models, enabling them to specify spatial intent that is difficult to express through verbal prompts alone. In a controlled user study ($N=12$), participants using MiXR rated their designs as significantly closer to the target, felt more in control, and experienced lower cognitive workload compared to a generative composition baseline.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents MiXR, an XR system for in-situ compositional 3D modeling. Users capture real-world objects, harvest geometric segments, assemble them via direct 3D manipulation to explicitly define spatial structure, and then invoke generative AI to synthesize a coherent final model. The central claim is that this hybrid manual-assembly-plus-generative-refinement workflow enables specification of spatial intent that is difficult to convey via text or image prompts alone. A within-subjects user study (N=12) reports that MiXR produces designs rated significantly closer to target, with higher perceived control and lower cognitive workload, relative to a generative-composition baseline.
Significance. If the results hold, the work demonstrates a practical way to combine human spatial reasoning with current 3D generative models, which could influence the design of future AR/VR authoring tools. The in-situ harvesting of real geometry and the empirical comparison against a prompt-only baseline are positive aspects. The small sample and within-subjects design limit generalizability but still provide useful initial evidence for the hybrid approach.
major comments (2)
- [§4] §4 (User Study), baseline condition: the manuscript must specify the exact prompt formulation and input encoding used for the generative baseline so that readers can judge whether the comparison isolates the benefit of explicit spatial assembly versus differences in prompt quality or model conditioning.
- [§3.3] §3.3 (Generative Synthesis): the description of how the user-assembled composition (including spatial relationships among harvested segments) is encoded and passed to the generative model is insufficient to evaluate whether intended geometry and topology are reliably preserved, which directly bears on the central claim that the hybrid workflow succeeds where pure generative methods fail.
minor comments (2)
- [Figures] Figure 3 and 4 captions should explicitly state the viewpoint, scale, and what visual elements (e.g., harvested segments vs. final output) are highlighted so that readers can interpret the system screenshots without ambiguity.
- [Abstract and §1] The abstract and §1 should briefly note the specific generative model or API employed, as this affects reproducibility and the scope of the claimed advantages.
Simulated Author's Rebuttal
We thank the referee for their positive evaluation and constructive comments, which help clarify key aspects of our hybrid workflow. We address each major comment below and have revised the manuscript to incorporate the requested details.
read point-by-point responses
-
Referee: [§4] §4 (User Study), baseline condition: the manuscript must specify the exact prompt formulation and input encoding used for the generative baseline so that readers can judge whether the comparison isolates the benefit of explicit spatial assembly versus differences in prompt quality or model conditioning.
Authors: We agree that explicit specification of the baseline prompts is necessary for readers to assess the fairness of the comparison. In the revised Section 4, we now include the precise prompt template used for the generative-composition baseline: 'Generate a coherent 3D model that combines the following objects into a single artifact: [list of object names and brief descriptions].' The input is encoded as a single concatenated text prompt with no additional spatial or geometric conditioning beyond the object list; no scene graph or positional data is provided. This formulation matches the information available in the prompt-only condition and isolates the contribution of explicit 3D assembly. revision: yes
-
Referee: [§3.3] §3.3 (Generative Synthesis): the description of how the user-assembled composition (including spatial relationships among harvested segments) is encoded and passed to the generative model is insufficient to evaluate whether intended geometry and topology are reliably preserved, which directly bears on the central claim that the hybrid workflow succeeds where pure generative methods fail.
Authors: We appreciate this observation and have expanded Section 3.3 with a detailed account of the encoding pipeline. The user-assembled composition is represented as a 3D scene graph in which each harvested segment retains its original mesh and is annotated with its world-space pose and adjacency relations derived from direct manipulation. This structure is serialized into a structured prompt that includes (1) a point-cloud sampling of the assembled geometry, (2) textual descriptors of pairwise spatial relationships (e.g., 'segment A is attached to the top face of segment B at a 30-degree angle'), and (3) a topology-preserving mask passed to the generative model via its spatial-conditioning interface. We have added pseudocode and a new figure illustrating the conversion from assembly to model input. These additions make explicit how spatial intent is conveyed and preserved, directly supporting the central claim. revision: yes
Circularity Check
No significant circularity
full rationale
The paper describes an XR system (MiXR) for harvesting real-world geometry segments, manual 3D assembly to define spatial structure, and generative AI for refinement. Central claims rest on a within-subjects user study (N=12) reporting higher target alignment, control, and lower workload versus a generative baseline. No equations, fitted parameters, derivations, or self-citation chains appear in the manuscript; the evaluation is empirical and externally falsifiable via the reported study protocol rather than reducing to any input by construction.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
hybrid workflow allows users to define spatial structure explicitly while delegating geometric refinement to generative models
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Shm Garanganao Almeda, JD Zamfirescu-Pereira, Kyu Won Kim, Pradeep Mani Rathnam, and Bjoern Hartmann. 2024. Prompting for discovery: Flex- ible sense-making for ai art-making with dreamsheets. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems. 1–17
work page 2024
-
[2]
Tyler Angert, Miroslav Suzara, Jenny Han, Christopher Pondoc, and Hariharan Subramonyam. 2023. Spellburst: A node-based interface for exploratory creative coding with natural language prompts. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. 1–22
work page 2023
-
[3]
Oğuz Arslan, Artun Akdoğan, and Mustafa Doga Dogan. 2025. TinkerXR: In-Situ, Reality-Aware CAD and 3D Printing Interface for Novices. InProceedings of the ACM Symposium on Computational Fabrication. 1–19
work page 2025
-
[4]
Karim Benharrak and Amy Pavel. 2025. HistoryPalette: Supporting Exploration and Reuse of Past Alternatives in Image Generation and Editing.arXiv preprint arXiv:2501.04163(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
- [5]
-
[6]
Stephen Brade, Bryan Wang, Mauricio Sousa, Sageev Oore, and Tovi Gross- man. 2023. Promptify: Text-to-image generation through interactive prompt exploration with large language models. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. 1–14
work page 2023
-
[7]
Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology. Qualitative research in psychology3, 2 (2006), 77–101
work page 2006
-
[8]
Siddhartha Chaudhuri, Evangelos Kalogerakis, Stephen Giguere, and Thomas Funkhouser. 2013. Attribit: Content Creation with Semantic Attributes. InPro- ceedings of the 26th Annual ACM Symposium on User Interface Software and Tech- nology(St. Andrews, Scotland, United Kingdom)(UIST ’13). Association for Com- puting Machinery, New York, NY, USA, 193–202. doi...
-
[9]
Xingyu Chen, Fu-Jen Chu, Pierre Gleize, Kevin J Liang, Alexander Sax, Hao Tang, Weiyao Wang, Michelle Guo, Thibaut Hardin, Xiang Li, et al. 2025. Sam 3d: 3dfy anything in images.arXiv preprint arXiv:2511.16624(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[10]
Gene Chou, Yuval Bahat, and Felix Heide. 2023. Diffusion-sdf: Conditional generative modeling of signed distance functions. InProceedings of the IEEE/CVF international conference on computer vision. 2262–2272
work page 2023
-
[11]
John Joon Young Chung and Eytan Adar. 2023. Promptpaint: Steering text-to- image generation through paint medium-like interactions. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. 1–17
work page 2023
-
[12]
Hai Dang, Frederik Brudy, George Fitzmaurice, and Fraser Anderson. 2023. World- smith: Iterative and expressive prompting for world building with a generative ai. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. 1–17
work page 2023
-
[13]
Matt Deitke, Dustin Schwenk, Jordi Salvador, Luca Weihs, Oscar Michel, Eli VanderBilt, Ludwig Schmidt, Kiana Ehsani, Aniruddha Kembhavi, and Ali Farhadi
-
[14]
InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Objaverse: A Universe of Annotated 3D Objects. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13142–13153
-
[15]
Mustafa Doga Dogan, Patrick Baudisch, Hrvoje Benko, Michael Nebeling, Huaishu Peng, Valkyrie Savage, and Stefanie Mueller. 2022. Fabricate It or Render It? Digital Fabrication vs. Virtual Reality for Creating Objects Instantly. InExtended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, 5. do...
-
[16]
Mustafa Doga Dogan, Eric J Gonzalez, Karan Ahuja, Ruofei Du, Andrea Colaço, Johnny Lee, Mar Gonzalez-Franco, and David Kim. 2024. Augmented object intelligence with xr-objects. InProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology. 1–15
work page 2024
-
[17]
Faraz Faruqi, Amira Abdel-Rahman, Leandra Tejedor, Martin Nisser, Jiaji Li, Vrushank Phadnis, Varun Jampani, Neil Gershenfeld, Megan Hofmann, and Stefanie Mueller. 2025. MechStyle: Augmenting Generative AI with Mechanical Simulation to Create Stylized and Structurally Viable 3D Models. InProceedings of the ACM Symposium on Computational Fabrication. 1–15
work page 2025
-
[18]
Faraz Faruqi, Ahmed Katary, Tarik Hasic, Amira Abdel-Rahman, Nayeemur Rahman, Leandra Tejedor, Mackenzie Leake, Megan Hofmann, and Stefanie Mueller. 2023. Style2Fab: Functionality-Aware Segmentation for Fabricating Personalized 3D Models with Generative AI. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. 1–13
work page 2023
-
[19]
Faraz Faruqi, Maxine Perroni-Scharf, Jaskaran Singh Walia, Yunyi Zhu, Shuyue Feng, Donald Degraen, and Stefanie Mueller. 2025. TactStyle: Generating Tactile Textures with Generative AI for Digital Fabrication. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. 1–16
work page 2025
- [20]
-
[21]
Jun Gao, Tianchang Shen, Zian Wang, Wenzheng Chen, Kangxue Yin, Daiqing Li, Or Litany, Zan Gojcic, and Sanja Fidler. 2022. GET3D: A Generative Model of High Quality 3D Textured Shapes Learned from Images. InAdvances In Neural Information Processing Systems
work page 2022
-
[22]
2025.Gemini 3: A Family of Highly Capable Multimodal Models
Gemini Team, Google. 2025.Gemini 3: A Family of Highly Capable Multimodal Models. Technical Report. Google DeepMind. https://storage.googleapis.com/ deepmind-media/Model-Cards/Gemini-3-Pro-Model-Card.pdf
work page 2025
-
[23]
Hyojun Go, Byeongjun Park, Jiho Jang, Jin-Young Kim, Soonwoo Kwon, and Changick Kim. 2025. SplatFlow: Multi-View Rectified Flow Model for 3D Gauss- ian Splatting Synthesis. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 21524–21536
work page 2025
-
[24]
Megan Hofmann, Gabriella Hann, Scott E. Hudson, and Jennifer Mankoff. 2018. Greater than the Sum of Its PARTs: Expressing and Reusing Design Intent in 3D Models. InProceedings of the 2018 CHI Conference on Human Factors in Computing Systems(Montreal QC, Canada)(CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–12. doi:10.1145/3173574.3173875
-
[25]
Yicong Hong, Kai Zhang, Jiuxiang Gu, Sai Bi, Yang Zhou, Difan Liu, Feng Liu, Kalyan Sunkavalli, Trung Bui, and Hao Tan. 2023. Lrm: Large reconstruction model for single image to 3d.arXiv preprint arXiv:2311.04400(2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[26]
Erzhen Hu, Mingyi Li, Jungtaek Hong, Xun Qian, Alex Olwal, David Kim, Seongkook Heo, and Ruofei Du. 2025. Thing2Reality: Enabling Spontaneous Creation of 3D Objects from 2D Content using Generative AI in XR Meetings. In Proceedings of the 38th Annual ACM Symposium on User Interface Software and Technology. 1–16
work page 2025
-
[27]
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, et al
-
[28]
InProceedings of the IEEE/CVF international conference on computer vision
Segment anything. InProceedings of the IEEE/CVF international conference on computer vision. 4015–4026
-
[29]
Roman Klokov, Edmond Boyer, and Jakob Verbeek. 2020. Discrete point flow net- works for efficient point cloud generation. InEuropean Conference on Computer Vision. Springer, 694–710
work page 2020
- [30]
-
[31]
Brian Lee, Savil Srivastava, Ranjitha Kumar, Ronen Brafman, and Scott R Klem- mer. 2010. Designing with interactive example galleries. InProceedings of the SIGCHI conference on human factors in computing systems. 2257–2266
work page 2010
- [32]
-
[33]
Zisu Li, Jiawei Li, Zeyu Xiong, Shumeng Zhang, Faraz Faruqi, Stefanie Mueller, Chen Liang, Xiaojuan Ma, and Mingming Fan. 2025. InteRecon: Towards Re- constructing Interactivity of Personal Memorable Items in Mixed Reality. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems. 1–19
work page 2025
-
[34]
Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, and Tsung-Yi Lin. 2023. Magic3d: High-resolution text-to-3d content creation. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition. 300–309
work page 2023
-
[35]
Tianqi Liu, Guangcong Wang, Shoukang Hu, Liao Shen, Xinyi Ye, Yuhang Zang, Zhiguo Cao, Wei Li, and Ziwei Liu. 2025. MVSGaussian: Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo. InComputer Vision – ECCV 2024, Aleš Leonardis, Elisa Ricci, Stefan Roth, Olga Russakovsky, Torsten Sattler, and Gül Varol (Eds.). Springer Nature Switz...
work page 2025
-
[36]
Damien Masson, Sylvain Malacria, Géry Casiez, and Daniel Vogel. 2024. Direct- gpt: A direct manipulation interface to interact with large language models. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems. 1–16
work page 2024
-
[37]
Yuxuan Mu, Xinxin Zuo, Chuan Guo, Yilin Wang, Juwei Lu, Xiaofeng Wu, Song- cen Xu, Peng Dai, Youliang Yan, and Li Cheng. 2025. GSD: View-Guided Gaussian Splatting Diffusion for 3D Reconstruction. InComputer Vision – ECCV 2024, Aleš Leonardis, Elisa Ricci, Stefan Roth, Olga Russakovsky, Torsten Sattler, and Gül Varol (Eds.). Springer Nature Switzerland, Ch...
work page 2025
-
[38]
Savvas Petridis, Nicholas Diakopoulos, Kevin Crowston, Mark Hansen, Keren Henderson, Stan Jastrzebski, Jeffrey V Nickerson, and Lydia B Chilton. 2023. Anglekindling: Supporting journalistic angle ideation with large language models. InProceedings of the 2023 CHI conference on human factors in computing systems. 1–16
work page 2023
-
[39]
Thijs Jan Roumen, Willi Müller, and Patrick Baudisch. 2018. Grafter: Remixing 3D- Printed Machines. InProceedings of the 2018 CHI Conference on Human Factors in Computing Systems(Montreal QC, Canada)(CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–12. doi:10.1145/3173574.3173637
-
[40]
Ryan Schmidt and Karan Singh. 2010. Meshmixer: An Interface for Rapid Mesh Composition. InACM SIGGRAPH 2010 Talks. Association for Computing Ma- chinery, New York, NY, USA. doi:10.1145/1837026.1837034
-
[41]
Adriana Schulz, Ariel Shamir, David I. W. Levin, Pitchaya Sitthi-amorn, and Wojciech Matusik. 2014. Design and Fabrication by Example.ACM Trans. Graph. 33, 4, Article 62 (jul 2014), 11 pages. doi:10.1145/2601097.2601127
-
[42]
Maria Shugrina, Ariel Shamir, and Wojciech Matusik. 2015. Fab Forms: Cus- tomizable Objects for Fabrication with Validity and Geometry Caching.ACM Faruqi et al. Trans. Graph.34, 4, Article 100 (jul 2015), 12 pages. doi:10.1145/2766994
-
[43]
Evgeny Stemasov, Simon Demharter, Max Rädler, Jan Gugenheimer, and Enrico Rukzio. 2024. pARam: Leveraging Parametric Design in Extended Reality to Support the Personalization of Artifacts for Personal Fabrication. InProceedings of the CHI Conference on Human Factors in Computing Systems (CHI ’24). Association for Computing Machinery, New York, NY, USA, 1–...
-
[44]
Evgeny Stemasov, Jessica Hohn, Maurice Cordts, Anja Schikorr, Enrico Rukzio, and Jan Gugenheimer. 2023. BrickStARt: Enabling In-situ Design and Tangible Exploration for Personal Fabrication using Mixed Reality.Proceedings of the ACM on Human-Computer Interaction7, ISS (Oct. 2023), 64–92. doi:10.1145/3626465
-
[45]
Blair Subbaraman, Shenna Shim, and Nadya Peek. 2023. Forking a sketch: How the openprocessing community uses remixing to collect, annotate, tune, and extend creative code. InProceedings of the 2023 ACM Designing Interactive Systems Conference. 326–342
work page 2023
- [46]
-
[47]
Ryo Suzuki, Parastoo Abtahi, Chen Zhu-Tian, Mustafa Doga Dogan, Andrea Colaco, Eric J Gonzalez, Karan Ahuja, and Mar Gonzalez-Franco. 2025. Pro- grammable reality.Frontiers in Virtual Reality6 (2025), 1649785
work page 2025
-
[48]
Jiaxiang Tang, Zhaoxi Chen, Xiaokang Chen, Tengfei Wang, Gang Zeng, and Ziwei Liu. 2024. Lgm: Large multi-view gaussian model for high-resolution 3d content creation. InEuropean Conference on Computer Vision. Springer, 1–18
work page 2024
-
[49]
Tom Veuskens, Florian Heller, and Raf Ramakers. 2021. CODA: A Design Assis- tant to Facilitate Specifying Constraints and Parametric Behavior in CAD Models. InGraphics Interface 2021. https://openreview.net/forum?id=1dLDPJeafRZ
work page 2021
-
[50]
Zhijie Wang, Yuheng Huang, Da Song, Lei Ma, and Tianyi Zhang. 2024. Promptcharm: Text-to-image generation through multi-modal prompting and refinement. InProceedings of the 2024 CHI Conference on Human Factors in Com- puting Systems. 1–21
work page 2024
-
[51]
Christian Weichel, Manfred Lau, David Kim, Nicolas Villar, and Hans Gellersen
-
[52]
MixFab: a mixed-reality environment for personal fabrication.CHI ’14 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (April 2014), 3855–3864. doi:10.1145/2556288.2557090
-
[53]
Jianfeng Xiang, Zelong Lv, Sicheng Xu, Yu Deng, Ruicheng Wang, Bowen Zhang, Dong Chen, Xin Tong, and Jiaolong Yang. 2025. Structured 3d latents for scalable and versatile 3d generation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 21469–21480
work page 2025
-
[54]
Jiale Xu, Weihao Cheng, Yiming Gao, Xintao Wang, Shenghua Gao, and Ying Shan. 2024. Instantmesh: Efficient 3d mesh generation from a single image with sparse-view large reconstruction models.arXiv preprint arXiv:2404.07191(2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[55]
J Diego Zamfirescu-Pereira, Richmond Y Wong, Bjoern Hartmann, and Qian Yang. 2023. Why Johnny can’t prompt: how non-AI experts try (and fail) to design LLM prompts. InProceedings of the 2023 CHI conference on human factors in computing systems. 1–21
work page 2023
-
[56]
Lei Zhang, Jin Pan, Jacob Gettig, Steve Oney, and Anhong Guo. 2024. Vrcopilot: Authoring 3d layouts with generative ai models in vr. InProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology. 1–13
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.