pith. machine review for the scientific record. sign in

arxiv: 2604.19025 · v1 · submitted 2026-04-21 · 💻 cs.RO

Recognition: unknown

RoomRecon: High-Quality Textured Room Layout Reconstruction on Mobile Devices

Authors on Pith no claims yet

Pith reviewed 2026-05-10 03:02 UTC · model grok-4.3

classification 💻 cs.RO
keywords room reconstruction3D texturingmobile ARgenerative AIindoor mappingreal-time scanningcustomizable 3D modelspermanent elements
0
0 comments X

The pith

RoomRecon uses AR-guided image capture and generative AI in a two-phase pipeline to texture 3D room models on mobile devices while focusing only on permanent elements.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

RoomRecon is a system for capturing indoor spaces and applying textures to 3D models directly on mobile devices in real time. It combines augmented reality to direct image collection with generative AI models that refine the textures in two phases. By limiting the process to fixed structures such as walls, floors, and ceilings, the resulting models stay stable for later customization. This matters because existing 3D reconstructions often produce low-quality visuals, require complete rescans after small changes, and lack easy editing for uses in virtual reality, interior design, and real estate. Tests in multiple rooms show the approach yields better textures and runs quicker on the device than leading alternatives.

Core claim

RoomRecon introduces an interactive real-time scanning and texturing pipeline for 3D room models that integrates AR-guided image capturing with generative AI models in a two-phase texturing pipeline, while focusing on permanent room elements such as walls, floors, and ceilings to enable customizable models, and demonstrates through quantitative results and user studies that it surpasses state-of-the-art methods in texturing quality and on-device computation time.

What carries the argument

The two-phase texturing pipeline that combines AR-guided image capturing for texturing and generative AI models, restricted to permanent room elements.

If this is right

  • 3D room models become updatable without full recapture because textures attach only to the stable permanent structure.
  • VR and XR applications gain more realistic indoor replicas from the improved texturing quality.
  • Interior design and real estate workflows benefit from models that separate fixed architecture from movable items for quick edits.
  • Real-time mobile execution lowers the hardware requirements compared with cloud-dependent or high-end workstation methods.
  • Semantic scene understanding advances by isolating permanent elements as the focus of texturing.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the pipeline maintains quality across rooms, consumer devices could support on-the-spot indoor mapping without professional equipment or cloud uploads.
  • Customizable permanent-element models would let users test furniture arrangements in VR while the base textured shell remains unchanged.
  • Advancing generative AI models could further reduce artifacts in the texturing step, widening the range of usable lighting conditions.
  • The mobile emphasis suggests the method could combine with everyday phone cameras for broad adoption in home renovation planning.

Load-bearing premise

The two-phase texturing pipeline that integrates AR-guided image capturing and generative AI models will reliably improve texturing quality and enable easy customization by focusing only on permanent room elements across varied indoor spaces.

What would settle it

A controlled comparison on the same set of rooms where RoomRecon receives lower user preference scores or worse texture quality metrics than a baseline method when lighting varies or occlusions increase.

Figures

Figures reproduced from arXiv: 2604.19025 by Dinh Duc Cao, Federica Spinola, Kyu Sung Cho, Se Jin Lee, Seok Joon Kim.

Figure 1
Figure 1. Figure 1: Creating realistic textured 3D room models on mobile devices: Capitalizing on the wide availability of mobile devices, RoomRecon creates textured 3D replicas of indoor spaces, readily applicable for use in VR and AR applications. ABSTRACT Widespread RGB-Depth (RGB-D) sensors and advanced 3D re￾construction technologies facilitate the capture of indoor spaces, improving the fields of augmented reality (AR),… view at source ↗
Figure 2
Figure 2. Figure 2: Method overview: Our system constructs detailed textured 3D room models through a two-phase pipeline. Initially, the Capturing step collects a calibrated set of RGB-D images and creates the original mesh (Moriginal). In the Mesh Processing step, this mesh is filtered using the room layout to produce a filtered mesh (Mfiltered), which is then integrated with the layout to create the combined mesh (Mcombined… view at source ↗
Figure 3
Figure 3. Figure 3: Processes and outputs of the Mesh Processing step. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 6
Figure 6. Figure 6: Plane2Image rendering operation for vertical and hori [PITH_FULL_IMAGE:figures/full_fig_p005_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: UI for capturing a ceiling sample (left), and a ceiling [PITH_FULL_IMAGE:figures/full_fig_p005_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Qualitative results of textured planes using PlaneOpt [ [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Textured room layouts after completing our RoomRecon pipeline. Samples with dashed lines show ceiling samples used for creating [PITH_FULL_IMAGE:figures/full_fig_p008_9.png] view at source ↗
Figure 11
Figure 11. Figure 11: SampleMode parameters for the Post Texturing step. [PITH_FULL_IMAGE:figures/full_fig_p011_11.png] view at source ↗
Figure 10
Figure 10. Figure 10: Detailed views of the Mesh Processing step, showing [PITH_FULL_IMAGE:figures/full_fig_p011_10.png] view at source ↗
Figure 13
Figure 13. Figure 13: Example of a floor texture generated from the sample [PITH_FULL_IMAGE:figures/full_fig_p011_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Example of a ceiling texture generated from the sample [PITH_FULL_IMAGE:figures/full_fig_p012_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Extension of RoomRecon to MultiRoom. In this paper, our primary goal is to introduce and evaluate the RoomRecon pipeline in the context of single-room reconstruc￾tion. However, the versatility of RoomRecon extends seamlessly to multi-room environments. By using RoomRecon to reconstruct rooms one by one, we can construct comprehensive multi-room en￾vironments that span entire floors, such as office spaces … view at source ↗
Figure 18
Figure 18. Figure 18: Examples of the Post Texturing (ImageMode) results us [PITH_FULL_IMAGE:figures/full_fig_p013_18.png] view at source ↗
Figure 17
Figure 17. Figure 17: Examples of plane images before and after inpainting: [PITH_FULL_IMAGE:figures/full_fig_p013_17.png] view at source ↗
Figure 19
Figure 19. Figure 19: Visualisation of images inpainted by various methods, including LAMA [ [PITH_FULL_IMAGE:figures/full_fig_p014_19.png] view at source ↗
Figure 20
Figure 20. Figure 20: Additional views of five rooms from the paper. [PITH_FULL_IMAGE:figures/full_fig_p015_20.png] view at source ↗
Figure 21
Figure 21. Figure 21: Texturing problems that occur in a room - A: Geometry and camera misalignment, B: Texture seams, C: Uncaptured areas, D: [PITH_FULL_IMAGE:figures/full_fig_p015_21.png] view at source ↗
Figure 22
Figure 22. Figure 22: Representative samples of the ground truth images captured for each room. [PITH_FULL_IMAGE:figures/full_fig_p016_22.png] view at source ↗
Figure 23
Figure 23. Figure 23: Visualization of textured rooms: The left column shows the room layouts, while the 2nd and 3rd columns show textured room [PITH_FULL_IMAGE:figures/full_fig_p017_23.png] view at source ↗
Figure 24
Figure 24. Figure 24: Additional visualizations of the textured walls for each room after the inpainting process - 1/2. [PITH_FULL_IMAGE:figures/full_fig_p018_24.png] view at source ↗
Figure 25
Figure 25. Figure 25: Additional visualizations of the textured walls for each room after the inpainting process - 2/2. [PITH_FULL_IMAGE:figures/full_fig_p019_25.png] view at source ↗
read the original abstract

Widespread RGB-Depth (RGB-D) sensors and advanced 3D reconstruction technologies facilitate the capture of indoor spaces, improving the fields of augmented reality (AR), virtual reality (VR), and extended reality (XR). Nevertheless, current technologies still face limitations, such as the inability to reflect minor scene changes without a complete recapture, the lack of semantic scene understanding, and various texturing challenges that affect the 3D model's visual quality. These issues affect the realism required for VR experiences and other applications such as in interior design and real estate. To address these challenges, we introduce RoomRecon, an interactive, real-time scanning and texturing pipeline for 3D room models. We propose a two-phase texturing pipeline that integrates AR-guided image capturing for texturing and generative AI models to improve texturing quality and provide better replicas of indoor spaces. Moreover, we suggest focusing only on permanent room elements such as walls, floors, and ceilings, to allow for easily customizable 3D models. We conduct experiments in a variety of indoor spaces to assess the texturing quality and speed of our method. The quantitative results and user study demonstrate that RoomRecon surpasses state-of-the-art methods in terms of texturing quality and on-device computation time.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces RoomRecon, an interactive real-time scanning and texturing pipeline for 3D room models on mobile devices. It proposes a two-phase texturing approach that integrates AR-guided image capturing with generative AI models to improve texturing quality, while focusing only on permanent room elements (walls, floors, ceilings) to enable easy customization. Experiments across varied indoor spaces and a user study are presented to claim that RoomRecon surpasses state-of-the-art methods in texturing quality and on-device computation time.

Significance. If the on-device deployment and quantitative claims are substantiated, the work could advance practical mobile AR/VR and interior design applications by addressing limitations in current RGB-D reconstruction such as handling minor changes and achieving high visual quality without full recaptures. The focus on permanent elements for customizability is a pragmatic contribution.

major comments (2)
  1. [Abstract] Abstract: The central claim that 'the quantitative results and user study demonstrate that RoomRecon surpasses state-of-the-art methods in terms of texturing quality and on-device computation time' is unsupported because the abstract (and available text) supplies no specific metrics, error analysis, dataset descriptions, baseline methods, or statistical details from the experiments. Without these, the evidence cannot be evaluated for soundness.
  2. [Abstract] Abstract (two-phase texturing pipeline): The superiority in on-device computation time is load-bearing for the main claim, yet the integration of generative AI models for texturing raises a deployment concern. Generative models are typically resource-intensive; if the AI stage runs off-device or post hoc rather than fully on-device during real-time scanning, the reported on-device times would only cover the AR-guided capture phase, making the SOTA comparison non-equivalent and undermining the time-quality superiority assertion.
minor comments (1)
  1. [Abstract] Abstract: The phrase 'various texturing challenges that affect the 3D model's visual quality' is vague; specifying the exact challenges (e.g., lighting inconsistencies, occlusion handling) would improve context without altering the core contribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below, providing clarifications and indicating where revisions will be made to strengthen the presentation of our claims.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that 'the quantitative results and user study demonstrate that RoomRecon surpasses state-of-the-art methods in terms of texturing quality and on-device computation time' is unsupported because the abstract (and available text) supplies no specific metrics, error analysis, dataset descriptions, baseline methods, or statistical details from the experiments. Without these, the evidence cannot be evaluated for soundness.

    Authors: We acknowledge that the abstract functions as a concise summary and therefore omits the detailed metrics, error analyses, dataset descriptions, baseline comparisons, and statistical details that are provided in the full manuscript (including the experimental sections describing varied indoor spaces, quantitative texturing quality metrics, timing benchmarks, and user study results). To address this concern and make the abstract more self-contained while preserving its brevity, we will revise the abstract in the next version to incorporate representative quantitative highlights from our results without altering the core claims. revision: yes

  2. Referee: [Abstract] Abstract (two-phase texturing pipeline): The superiority in on-device computation time is load-bearing for the main claim, yet the integration of generative AI models for texturing raises a deployment concern. Generative models are typically resource-intensive; if the AI stage runs off-device or post hoc rather than fully on-device during real-time scanning, the reported on-device times would only cover the AR-guided capture phase, making the SOTA comparison non-equivalent and undermining the time-quality superiority assertion.

    Authors: We thank the referee for highlighting this potential ambiguity. The RoomRecon pipeline is designed such that both phases—including the generative AI texturing components—are executed on-device in real time on mobile hardware. The generative models are optimized for mobile deployment to support the reported end-to-end on-device computation times, which encompass the full pipeline rather than only the AR-guided capture phase. To eliminate any ambiguity and strengthen the equivalence of our SOTA comparisons, we will add explicit clarification and implementation details on the on-device optimizations in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity in engineering pipeline and empirical evaluation

full rationale

The paper describes a two-phase texturing pipeline for room reconstruction using AR-guided capture and generative AI, evaluated via experiments across indoor spaces, quantitative metrics, and a user study against external SOTA methods. No equations, mathematical derivations, fitted parameters renamed as predictions, or self-citation chains appear in the provided claims. The central results rest on direct comparisons to independent benchmarks rather than reducing to self-defined inputs or ansatzes by construction, rendering the work self-contained against external evaluation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Only the abstract is available, so the ledger is necessarily incomplete. No explicit free parameters or invented physical entities are mentioned. The central claim rests on the unverified assumption that generative AI will improve texturing quality in this setting.

axioms (1)
  • domain assumption Generative AI models can be integrated with AR-guided capture to produce higher-quality textures for 3D room models than existing methods
    This assumption underpins the two-phase texturing pipeline and the claim of surpassing state-of-the-art performance.

pith-pipeline@v0.9.0 · 5535 in / 1370 out tokens · 55316 ms · 2026-05-10T03:02:00.468231+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

62 extracted references · 30 canonical work pages

  1. [1]

    Official Adobe Photoshop - Photo & Design Software — adobe.com.https://www.adobe.com/products/photoshop

    Adobe. Official Adobe Photoshop - Photo & Design Software — adobe.com.https://www.adobe.com/products/photoshop. html. [Accessed 07-05-2024]. 2, 5, 6

  2. [2]

    A. Andrew. Another efficient algorithm for convex hulls in two di- mensions.Information Processing Letters, 9(5):216–219, dec 1979. doi: 10.1016/0020-0190(79)90072-3 3, 11, 21

  3. [3]

    M. V . Anglada. An improved incremental algorithm for construct- ing restricted Delaunay triangulations.Computers & Graphics, 21(2):215–223, mar 1997. Graphics Hardware. doi: 10.1016/S0097 -8493(96)00085-4 3, 21

  4. [4]

    ARKit 6 - Augmented Reality - Apple Devel- oper — developer.apple.com.https://developer.apple.com/ augmented-reality/arkit/

    Apple Inc. ARKit 6 - Augmented Reality - Apple Devel- oper — developer.apple.com.https://developer.apple.com/ augmented-reality/arkit/. [Accessed 07-05-2024]. 1, 3, 5, 6, 12

  5. [5]

    RoomPlan - Augmented Reality - Apple Devel- oper — developer.apple.com.https://developer.apple.com/ augmented-reality/roomplan/

    Apple Inc. RoomPlan - Augmented Reality - Apple Devel- oper — developer.apple.com.https://developer.apple.com/ augmented-reality/roomplan/. [Accessed 07-05-2024]. 2, 5

  6. [6]

    Avetisyan, T

    A. Avetisyan, T. Khanova, C. Choy, D. Dash, A. Dai, and M. Nießner. SceneCad: Predicting Object Alignments and Layouts in RGB-D Scans. InProc. ECCV. Springer Nature, Berlin, Germany, August

  7. [7]

    S. Bi, N. K. Kalantari, and R. Ramamoorthi. Patch-based optimiza- tion for image-based texture mapping.ACM Trans. Graph., 36(4), jul

  8. [8]

    doi: 10.1145/3072959.3073610 1

  9. [9]

    C. Cao, Q. Dong, and Y . Fu. Zits++: Image inpainting by improving the incremental transformer on structural priors.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(10):12667–12684, oct 2023. doi: 10.1109/TPAMI.2023.3280222 5, 13

  10. [10]

    X. Chen, H. Zhao, G. Zhou, and Y .-Q. Zhang. PQ-Transformer: Jointly Parsing 3D Objects and Layouts From Point Clouds.IEEE Robotics and Automation Letters, 7(2):2519–2526, apr 2022. doi: 10. 1109/LRA.2022.3143224 2

  11. [11]

    Cipresso, I

    P. Cipresso, I. A. C. Giglioli, M. A. Raya, and G. Riva. The Past, Present, and Future of Virtual and Augmented Reality Research: A Network and Cluster Analysis of the Literature.Frontiers in Psychol- ogy, 9, nov 2018. doi: 10.3389/fpsyg.2018.02086 1

  12. [12]

    Crete, T

    F. Crete, T. Dolmiere, P. Ladret, and M. Nicolas. The blur effect: per- ception and estimation with a new no-reference perceptual blur metric. In B. E. Rogowitz, T. N. Pappas, and S. J. Daly, eds.,Human Vision and Electronic Imaging XII, vol. 6492, p. 64920I. International Soci- ety for Optics and Photonics, SPIE, California, USA, mar 2007. doi: 10.1117/1...

  13. [13]

    A. Dai, M. Nießner, M. Zollh ¨ofer, S. Izadi, and C. Theobalt. Bundle- Fusion: Real-Time Globally Consistent 3D Reconstruction Using On- the-Fly Surface Reintegration.ACM Transactions on Graphics (TOG), 36(3), may 2017. doi: 10.1145/3054739 1

  14. [14]

    Q. Dong, C. Cao, and Y . Fu. Incremental transformer structure en- hanced image inpainting with masking positional encoding. InProc. CVPR, pp. 11348–11358. IEEE Computer Society, Los Alamitos, CA, USA, jun 2022. doi: 10.1109/CVPR52688.2022.01107 2, 5, 6, 7, 13, 14

  15. [15]

    H. Fu, R. Jia, L. Gao, M. Gong, B. Zhao, S. Maybank, and D. Tao. 3D-FUTURE: 3D Furniture Shape with TextURE.International Jour- nal of Computer Vision, 129(12):3313–3337, dec 2021. doi: 10.1007/ s11263-021-01534-z 2

  16. [16]

    Y . Fu, Q. Yan, J. Liao, H. Zhou, J. Tang, and C. Xiao. Seamless texture optimization for rgb-d reconstruction.IEEE Transactions on Visualization & Computer Graphics, 29(03):1845–1859, mar 2023. doi: 10.1109/TVCG.2021.3134105 1, 2, 3, 4, 5, 6, 7, 8, 12, 22, 23

  17. [17]

    Y . Fu, Q. Yan, L. Yang, J. Liao, and C. Xiao. Texture Mapping for 3D Reconstruction with RGB-D Sensor. InProc. CVPR, pp. 4645–4653. IEEE Computer Society, Los Alamitos, CA, USA, jun 2018. doi: 10. 1109/CVPR.2018.00488 2

  18. [18]

    H.-a. Gao, B. Tian, P. Li, X. Chen, H. Zhao, G. Zhou, Y . Chen, and H. Zha. From Semi-supervised to Omni-supervised Room Layout Es- timation Using Point Clouds. InProc. ICRA, pp. 2803–2810. Institute of Electrical and Electronics Engineers (IEEE), New Jersey, United States, 2023. doi: 10.1109/ICRA48891.2023.10161273 2

  19. [19]

    Garland and P

    M. Garland and P. S. Heckbert. Surface simplification using quadric error metrics. InProceedings of the 24th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH ’97, p. 209–216. ACM Press/Addison-Wesley Publishing Co., New York, USA, 1997. doi: 10.1145/258734.258849 5

  20. [20]

    Ghildyal and F

    A. Ghildyal and F. Liu. Shift-Tolerant Perceptual Similarity Metric. In Proc. ECCV, pp. 91–107. Springer Nature, Cham, Switzerland, 2022. 6

  21. [21]

    Goodfellow, J

    I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y . Bengio. Generative adversarial net- works.Commun. ACM, 63(11):139–144, oct 2020. doi: 10.1145/ 3422622 2

  22. [22]

    Build new augmented reality experiences that seam- lessly blend the digital and physical worlds — ARCore — Google for Developers — developers.google.com.https://developers

    Google Inc. Build new augmented reality experiences that seam- lessly blend the digital and physical worlds — ARCore — Google for Developers — developers.google.com.https://developers. google.com/ar. [Accessed 07-05-2024]. 1

  23. [23]

    H ¨ollein, A

    L. H ¨ollein, A. Cao, A. Owens, J. Johnson, and M. Nießner. Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models. InProc. ICCV, pp. 7909–7920. Institute of Electrical and Electronics Engineers (IEEE), New Jersey, United States, October

  24. [24]

    In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp

    J. Huang, J. Thies, A. Dai, A. Kundu, C. Jiang, L. J. Guibas, M. Niess- ner, and T. Funkhouser. Adversarial Texture Optimization From RGB- D Scans. InProc. CVPR, pp. 1556–1565. IEEE Computer Society, Los Alamitos, CA, USA, jun 2020. doi: 10.1109/CVPR42600.2020. 00163 2

  25. [25]

    Karaca and M

    E. Karaca and M. A. Tunga. Interpolation-based image inpainting in color images using high dimensional model representation. In2016 24th European Signal Processing Conference (EUSIPCO), pp. 2425– 2429, 2016. doi: 10.1109/EUSIPCO.2016.7760684 2

  26. [26]

    W. Li, Z. Lin, K. Zhou, L. Qi, Y . Wang, and J. Jia. Mat: Mask- aware transformer for large hole image inpainting. InProc. CVPR, pp. 10748–10758. IEEE Computer Society, Los Alamitos, CA, USA, jun 2022. doi: 10.1109/CVPR52688.2022.01049 2, 5, 14

  27. [27]

    Maninis, S

    K. Maninis, S. Popov, M. Niesner, and V . Ferrari. Vid2CAD: CAD Model Alignment Using Multi-View Constraints From Videos. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(01):1320–1327, jan 2023. doi: 10.1109/TPAMI.2022.3146082 2

  28. [28]

    Metal Overview - Apple Developer — developer.apple.com

    Metal. Metal Overview - Apple Developer — developer.apple.com. https://developer.apple.com/metal/. [Accessed 07-05- 2024]. 4, 5

  29. [29]

    R. A. Newcombe, A. Fitzgibbon, S. Izadi, O. Hilliges, D. Molyneaux, D. Kim, A. J. Davison, P. Kohi, J. Shotton, and S. Hodges. KinectFu- sion: Real-time dense surface mapping and tracking. InProc. ISMAR, pp. 127–136. IEEE Computer Society, Los Alamitos, CA, USA, oct

  30. [30]

    doi: 10.1109/ISMAR.2011.6092378 1, 3

  31. [31]

    Dall·e 3.https://openai.com/index/dall-e-3

    OpenAI. Dall·e 3.https://openai.com/index/dall-e-3. [Ac- cessed 07-05-2024]. 2, 5

  32. [32]

    Portman, A

    M. Portman, A. Natapov, and D. Fisher-Gewirtzman. To go where no man has gone before: Virtual reality in architecture, landscape ar- chitecture and environmental planning.Computers, Environment and Urban Systems, 54:376–384, 06 2015. doi: 10.1016/j.compenvurbsys .2015.05.001 1

  33. [33]

    Z. Qin, Q. Zeng, Y . Zong, and F. Xu. Image inpainting based on deep learning: A review.Displays, 69:102028, 2021. doi: 10.1016/j.displa .2021.102028 2

  34. [34]

    In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp

    R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer. High- resolution image synthesis with latent diffusion models. InProc. CVPR, pp. 10674–10685. IEEE Computer Society, Los Alamitos, CA, USA, jun 2022. doi: 10.1109/CVPR52688.2022.01042 2, 5

  35. [35]

    Schult, S

    J. Schult, S. Tsai, L. H ¨ollein, B. Wu, J. Wang, C.-Y . Ma, K. Li, X. Wang, F. Wimbauer, Z. He, P. Zhang, B. Leibe, P. Vajda, and J. Hou. ControlRoom3D: Room Generation using Semantic Proxy Rooms. InProc. CVPR. Institute of Electrical and Electronics Engi- neers (IEEE), New Jersey, United States, 2024. 1, 2

  36. [36]

    L. Song, L. Cao, H. Xu, K. Kang, F. Tang, J. Yuan, and Z. Yang. RoomDreamer: Text-Driven 3D Indoor Scene Synthesis with Coher- ent Geometry and Texture. InProceedings of the 31st ACM Interna- tional Conference on Multimedia, MM ’23, p. 6898–6906. Associa- tion for Computing Machinery, New York, NY , USA, 2023. doi: 10. 1145/3581783.3611800 1, 2

  37. [37]

    Dn-splatter: Depth and normal priors for gaussian splatting and meshing

    R. Suvorov, E. Logacheva, A. Mashikhin, A. Remizova, A. Ashukha, A. Silvestrov, N. Kong, H. Goka, K. Park, and V . Lempitsky. Resolution-robust large mask inpainting with fourier convolutions. In Proc. WACV, pp. 3172–3182. IEEE Computer Society, Los Alamitos, CA, USA, jan 2022. doi: 10.1109/W ACV51458.2022.00323 2, 5, 14

  38. [38]

    Vartanian, G

    O. Vartanian, G. Navarrete, A. Chatterjee, L. B. Fich, J. L. Gonzalez- Mora, H. Leder, C. Modro ˜no, M. Nadal, N. Rostrup, and M. Skov. Architectural design and the brain: Effects of ceiling height and per- ceived enclosure on beauty judgments and approach-avoidance deci- sions.Journal of Environmental Psychology, 41:10–18, 2015. doi: 10 .1016/j.jenvp.201...

  39. [39]

    Waechter, N

    M. Waechter, N. Moehrle, and M. Goesele. Let There Be Color! Large-Scale Texturing of 3D Reconstructions. InProc. ECCV, pp. 836–850. Springer International Publishing, Berlin, Germany, 2014. 1, 2, 3, 4, 5, 6, 7, 12, 21, 22, 23

  40. [40]

    Wang and X

    C. Wang and X. Guo. Plane-Based Optimization of Geometry and Texture for RGB-D Reconstruction of Indoor Scenes. InProc. 3DV, pp. 533–541. IEEE Computer Society, Los Alamitos, CA, USA, sep

  41. [41]

    doi: 10.1109/3DV.2018.00067 2, 5, 6, 8, 22, 23

  42. [42]

    Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli. Image quality as- sessment: from error visibility to structural similarity.IEEE Transac- tions on Image Processing, 13(4):600–612, 2004. doi: 10.1109/TIP. 2003.819861 6

  43. [43]

    Weerasinghe, K

    M. Weerasinghe, K. ˇCopiˇc Pucihar, J. Ducasse, A. Quigley, A. To- niolo, A. Miguel, N. Caluya, and M. Kljun. Exploring the future building: representational effects on projecting oneself into the fu- ture office space.Virtual Reality, 27(1):51–70, Mar 2023. doi: 10. 1007/s10055-022-00673-z 1

  44. [44]

    Whelan, R

    T. Whelan, R. F. Salas-Moreno, B. Glocker, A. J. Davison, and S. Leutenegger. ElasticFusion: Real-time dense SLAM and light source estimation.The International Journal of Robotics Research, 35(14):1697–1716, sep 2016. doi: 10.1177/0278364916669237 1

  45. [45]

    Winkler and P

    S. Winkler and P. Mohandas. The evolution of video quality measure- ment: From psnr to hybrid metrics.IEEE Transactions on Broadcast- ing, 54:660 – 668, oct 2008. doi: 10.1109/TBC.2008.2000733 6

  46. [46]

    M. Woo, J. Neider, T. Davis, and D. Shreiner.OpenGL programming guide: the official guide to learning OpenGL, version 1.2. Addison- Wesley Longman Publishing Co., Inc., 1999. 4

  47. [47]

    X. Yang, L. Zhou, H. Jiang, Z. Tang, Y . Wang, H. Bao, and G. Zhang. Mobile3DRecon: Real-time Monocular 3D Reconstruction on a Mo- bile Phone.IEEE Transactions on Visualization and Computer Graph- ics, 26(12):3446–3456, dec 2020. doi: 10.1109/TVCG.2020.3023634 1

  48. [48]

    X. Zhao, Z. Zhao, and A. G. Schwing. Initialization and Alignment for Adversarial Texture Optimization. InProc. ECCV. Springer, Berlin, Germany, 2022. 2, 4, 6

  49. [49]

    Zhou and V

    Q.-Y . Zhou and V . Koltun. Color map optimization for 3D reconstruc- tion with consumer depth cameras.ACM Trans. Graph., 33(4), jul

  50. [50]

    doi: 10.1145/2601097.2601134 2, 5, 6, 22, 23

  51. [52]

    Zhuang, Y

    J. Zhuang, Y . Zeng, W. Liu, C. Yuan, and K. Chen. A task is worth one word: Learning with task prompts for high-quality versatile image inpainting.ArXiv, abs/2312.03594, 2023. 13 Overview of Supplementary Material

  52. [53]

    Section A: We provide further visualizations of the RoomRe- con processes and results

  53. [54]

    Section B: We provide pseudocodes that explain the related algorithms in the main paper

  54. [55]

    Section C: We present extensive tables of experiments show- ing quantitative statistics on the test datasets. 7 FURTHERVISUALIZATIONS OF THE ROOMRECONPROCESSES ANDRESULTS 7.1 Detailed visualizations of the Mesh Processing step (a) Room layout (b)M original (c) Room layout andM original overlaid (d)M filtered (e)M combined (f)M combined in wireframes Figur...

  55. [56]

    Sample Width and Sample Height: Specify the dimensions of each sample

  56. [57]

    Sample Offset: Specifies the spacing between consecutive samples

  57. [58]

    Using samples shown in Fig

    Sample Angle: Allows rotation of the sample relative to the MBBalign, the vector derived from the Mesh Processing step (Section 3.4) during the Minimum Bounding Box (MBB) [2] calculation. Using samples shown in Fig. 12a and Fig. 12b, floor and ceiling textures can be created using the SampleMode as shown in Fig. 13 and Fig. 14, respectively. (a) Floor sam...

  58. [59]

    Layout Parsing: This step involves acquiring layout information, including the normals, centers, and four vertices of the vertical walls, as well as the floor and ceiling heights

  59. [60]

    Loop Check and Form Floor and Ceiling: Loop Check step verifies whether the vertical walls form a closed loop, essential for subsequent mesh processing steps. Using this information, floor and ceiling meshes are generated using the Constrained Delaunay Triangulation (CDT) method [3] or the Minimum Bounding Box (MBB) [2] as described in Section 3.4 of the ...

  60. [61]

    Mesh Processing (Section 3.4): This process includes mesh filtering, remeshing, and combining to produce the combined mesh (Mcombined)

  61. [62]

    Plane2Image (Section 3.7): This step involves rendering the textured meshes using our Plane2Image module

  62. [63]

    Displaying Results: Finally, the textured meshes (M textured) are added to the scene for visualization and further Post Texturing processes (Section 3.8). Tab. 6 presents a comparative analysis of the computational time required for texturing-related processes on an iOS device, an iPhone 12 Pro with an A14 Bionic chip, before and after the implementation ...