pith. machine review for the scientific record. sign in

arxiv: 2603.24591 · v2 · submitted 2026-03-25 · 💻 cs.HC

Recognition: 2 theorem links

· Lean Theorem

Vibe Coding XR: Accelerating AI + XR Prototyping with XR Blocks and Gemini

Benjamin Hersh, David Kim, David Li, Faraz Faruqi, Jiahao Ren, Nels Numan, Robert Timothy Bettridge, Ruofei Du, Steve Toh, Xiang 'Anthony' Chen, Xingyue Chen, Xun Qian, Yanhe Chen, Zhongyi Zhou

Authors on Pith no claims yet

Pith reviewed 2026-05-15 00:13 UTC · model grok-4.3

classification 💻 cs.HC
keywords XR prototypingLLM-native frameworksReality Modelmixed realityWebXRAI-assisted developmentone-shot generationspatial computing
0
0 comments X

The pith

XR Blocks' Reality Model enables LLMs to generate functional XR prototypes directly from natural language prompts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces XR Blocks, an open-source WebXR framework built for large language models to prototype intelligent XR experiences. It replaces complex engine hierarchies with a semantic Reality Model that maps users, physical environments, and agents to concise natural language terms. This foundation supports Vibe Coding XR, a workflow that converts high-level prompts into complete, physics-aware mixed-reality applications via a desktop simulation to headset deployment loop. Evaluation on the new VCXR60 dataset of 60 prompts shows high one-shot success rates, letting creators bypass low-level sensor APIs and syntax issues.

Core claim

XR Blocks introduces a semantic Reality Model aligning XR primitives with natural language to support generative AI, and Vibe Coding XR leverages it to translate high-level prompts into functional mixed-reality apps, as shown by high one-shot success rates on VCXR60.

What carries the argument

The Reality Model, a semantic abstraction representing users, physical environments, and agents using concise natural language terms optimized for LLM reasoning.

If this is right

  • High-level prompts can produce working physics-aware XR applications without direct coding of sensors or hierarchies.
  • The desktop-to-headset loop supports rapid iteration with minimal friction for on-device testing.
  • The VCXR60 dataset and automated pipeline enable standardized measurement of LLM performance on XR tasks.
  • Developers can bypass steep learning curves for game engine details when building mixed-reality prototypes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar semantic layers could be developed for other AI-resistant domains such as robotics control or 3D scene editing.
  • Future models fine-tuned on Reality Model descriptions might achieve higher reliability in spatial generation tasks.
  • Open release of the framework allows community additions to expand the supported vocabulary of interactions.

Load-bearing premise

The Reality Model is expressive enough to capture the full range of XR interactions without requiring post-generation fixes or losing critical spatial and physics details.

What would settle it

A collection of test prompts that produce incomplete or non-functional XR code needing repeated manual corrections would disprove reliable one-shot generation.

Figures

Figures reproduced from arXiv: 2603.24591 by Benjamin Hersh, David Kim, David Li, Faraz Faruqi, Jiahao Ren, Nels Numan, Robert Timothy Bettridge, Ruofei Du, Steve Toh, Xiang 'Anthony' Chen, Xingyue Chen, Xun Qian, Yanhe Chen, Zhongyi Zhou.

Figure 1
Figure 1. Figure 1: Example user journey of Vibe Coding XR, an end-to-end workflow for creating immersive AI + XR experiences via vibe coding: (A) User types “create a beautiful dandelion” with XR Blocks Gem (http://xrblocks.github.io/gem) on a Galaxy XR headset in a Chrome browser. (B) Gemini translates the input into an interactive XR application within a minute, while the user reviews its reasoning and coding process. (C) … view at source ↗
Figure 2
Figure 2. Figure 2: Vibe Coding XR accelerates AI + XR prototyping by allowing users to (A) test their “vibe coding” results on desktop in a “simulated reality” environment, and (B) deploy the same demo on an Android XR headset with body and hand interactions. 1 Introduction Recent advances in Large Language Models (LLMs) [9, 28, 43, 48] and agentic workflows [12, 24, 27] are fundamentally reshaping software engineering and c… view at source ↗
Figure 3
Figure 3. Figure 3: Design of the XR Blocks Framework: (A) The “Reality Model” conceptual abstraction, which aligns spatial computing primitives 1:1 with natural language concepts to prevent LLM hallucination over fragmented syntax trees. (B) The modular architecture of the “core” engine encapsulating low-level perception and interaction logic. Subsystems marked with ∗ have not yet been fully open sourced. XR Blocks makes AI … view at source ↗
Figure 3
Figure 3. Figure 3: Design of the XR Blocks Framework: (A) conceptual abstraction of the “Reality Model”, and (B) modular architecture [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
read the original abstract

While large language models (LLMs) have accelerated 2D software development through intent-driven "vibe coding", prototyping intelligent Extended Reality (XR) experiences remains a major challenge. The fundamental barrier is not just the steep learning curve for human creators, but that low-level sensor APIs and complex game engine hierarchies are ill-suited for LLM reasoning, routinely exceeding context windows and inducing syntax hallucinations. To bridge this gap, we contribute XR Blocks, an open-source, LLM-native WebXR framework. Unlike traditional engines, XR Blocks introduces a semantic "Reality Model" that aligns spatial computing primitives (users, physical environments, and agents) with natural language, providing a robust, concise vocabulary optimized for generative AI. Building upon this foundation, we present Vibe Coding XR, an end-to-end prototyping workflow that leverages LLMs to translate high-level prompts (e.g., "create a dandelion that reacts to my hand") directly into functional, physics-aware mixed-reality applications. To minimize the friction of on-device testing, the workflow introduces a seamless desktop "simulated reality" to headset deployment loop. Finally, we introduce VCXR60, a pilot dataset of 60 XR prompts paired with an automated evaluation pipeline. Our technical evaluation demonstrates high one-shot execution success, enabling practitioners to bypass lowlevel hurdles and rapidly move from "idea to reality". Code and live demos are available at https://github.com/google/xrblocks and http://xrblocks.github.io/gem.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces XR Blocks, an open-source WebXR framework featuring a semantic 'Reality Model' that maps spatial computing primitives (users, environments, agents) to natural language for LLM compatibility. It describes the Vibe Coding XR workflow, which uses models such as Gemini to translate high-level prompts (e.g., 'create a dandelion that reacts to my hand') into functional, physics-aware XR applications, supported by a desktop simulated-reality to headset deployment loop. The work also contributes the VCXR60 pilot dataset of 60 XR prompts together with an automated evaluation pipeline, claiming high one-shot execution success that allows practitioners to bypass low-level API and engine hurdles.

Significance. If the one-shot success claims hold under rigorous testing, the work could meaningfully lower barriers to XR prototyping in human-computer interaction by enabling intent-driven generation of mixed-reality experiences. The open-source code release and live demos strengthen potential for adoption and extension by the community.

major comments (2)
  1. [Technical Evaluation / VCXR60] The technical evaluation section reports 'high one-shot execution success' on VCXR60 but supplies no quantitative metrics (e.g., success rate, error breakdown), no description of the automated pipeline's success criteria, and no details on prompt selection or diversity; this leaves the central performance claim without visible supporting derivation or controls.
  2. [Reality Model / Vibe Coding XR workflow] The Reality Model is presented as sufficiently expressive to capture full XR interactions (spatial constraints, physics, hand-tracking) without post-generation fixes, yet no evidence or edge-case analysis is provided to test this assumption against prompts that stress collision fidelity or nuanced spatial relationships.
minor comments (1)
  1. [Abstract] The abstract and introduction could more explicitly state the exact success metric used by the automated pipeline and the size/composition of the VCXR60 prompt set.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and have revised the paper to improve the rigor and transparency of the evaluation and claims.

read point-by-point responses
  1. Referee: [Technical Evaluation / VCXR60] The technical evaluation section reports 'high one-shot execution success' on VCXR60 but supplies no quantitative metrics (e.g., success rate, error breakdown), no description of the automated pipeline's success criteria, and no details on prompt selection or diversity; this leaves the central performance claim without visible supporting derivation or controls.

    Authors: We agree that the technical evaluation section would benefit from explicit quantitative support. The original manuscript presented the results at a high level to emphasize the workflow. In the revised version, we have expanded the section to report the one-shot success rate on VCXR60, include an error breakdown, describe the automated pipeline's success criteria (functional execution without runtime errors and semantic alignment with the prompt), and detail the prompt selection process along with measures taken to ensure diversity across interaction categories. These additions supply the requested derivation and controls. revision: yes

  2. Referee: [Reality Model / Vibe Coding XR workflow] The Reality Model is presented as sufficiently expressive to capture full XR interactions (spatial constraints, physics, hand-tracking) without post-generation fixes, yet no evidence or edge-case analysis is provided to test this assumption against prompts that stress collision fidelity or nuanced spatial relationships.

    Authors: We acknowledge that the manuscript would be strengthened by explicit testing of the Reality Model on challenging cases. While the model incorporates semantic primitives for spatial constraints, physics, and hand-tracking, the original text did not include a dedicated edge-case analysis. We have added a new subsection that examines performance on prompts stressing collision fidelity and nuanced spatial relationships, providing examples of both successful one-shot generations and cases where the underlying physics engine required minor post-generation adjustments. This supplies the requested evidence while noting the model's current limitations. revision: yes

Circularity Check

0 steps flagged

No circularity: new framework and external evaluation pipeline are independent of inputs

full rationale

The paper introduces XR Blocks and the Reality Model as a new semantic abstraction, then evaluates one-shot success on the newly created VCXR60 dataset via an automated pipeline. No equations, fitted parameters, self-citations, or ansatzes are used in any derivation chain; the success metric is defined externally by runtime execution on the prompt set rather than by construction from the model primitives themselves. The central claim therefore remains self-contained and does not reduce to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim depends on the assumption that a compact semantic vocabulary can substitute for low-level XR APIs without loss of necessary functionality for the targeted prompt class.

axioms (1)
  • domain assumption LLMs can reliably translate natural-language spatial descriptions into correct WebXR code when supplied with the Reality Model vocabulary.
    Invoked throughout the workflow description as the enabling premise for one-shot generation.
invented entities (1)
  • Reality Model no independent evidence
    purpose: Provides a concise, LLM-native vocabulary for users, physical environments, and agents in XR.
    New abstraction introduced by the paper to bridge LLM reasoning and spatial computing.

pith-pipeline@v0.9.0 · 5615 in / 1252 out tokens · 41414 ms · 2026-05-15T00:13:58.728914+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

55 extracted references · 30 canonical work pages · 7 internal anchors

  1. [1]

    [n. d.]. Bezi | AI Assistance for Unity Developers & Studios. https://www.bezi.com/

  2. [2]

    TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems

    2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. https://www.tensorflow.org/ Software available from tensorflow.org

  3. [3]

    A-Frame Authors. 2025. A-Frame. https://aframe.io

  4. [4]

    Anthropic. 2026. Claude Code. https://code.claude.com/docs/en/desktop

  5. [5]

    Mixed Reality Toolkit Authors. 2025. MRTK3. https://github.com/ MixedRealityToolkit/MixedRealityToolkit-Unity

  6. [6]

    WebXR authors. 2022. WebXR. https://immersiveweb.dev/

  7. [7]

    Allen Bierbaum, Christopher Just, Patrick Hartling, Kevin Meinert, Albert Baker, and Carolina Cruz-Neira. 2001. VR Juggler: A Virtual Platform for Virtual Reality Application Development. InProceedings IEEE Virtual Reality 2001. 89–96. doi:10. 1109/VR.2001.913774

  8. [8]

    James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman-Milne, and Qiao Zhang. 2018. JAX: Composable Transformations of Python+NumPy Programs. http://github.com/jax-ml/jax

  9. [9]

    Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language Models Are Few-Shot Learners.Advances in Neural Information Processing Systems33 (2020), 1877–1901. doi:10.48550/arXiv.2005. 14165

  10. [10]

    Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian...

  11. [11]

    Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference

    Wei-Lin Chiang, Lianmin Zheng, Ying Sheng, Anastasios Nikolas Angelopoulos, Tianle Li, Dacheng Li, Hao Zhang, Banghua Zhu, Michael Jordan, Joseph E. Gonzalez, and Ion Stoica. 2024. Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference. arXiv:2403.04132 [cs.AI]

  12. [12]

    Cursor. 5. Cursor - the AI Code Editor. https://cursor.com

  13. [13]

    Fernanda De La Torre, CathyMengying Fang, Han Huang, Andrzej Banburski- Fahey, Judith Amores Fernandez, and Jaron Lanier. 2024. LLMR: Real-Time Prompting of Interactive Worlds Using Large Language Models. InProceedings of the CHI Conference on Human Factors in Computing Systems. ACM. doi:10.1145/ 3613904.3642579

  14. [14]

    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Ima- geNet: A Large-Scale Hierarchical Image Database. In2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE. doi:10.1109/CVPR.2009.5206848

  15. [15]

    Ruofei Du, Na Li, Jing Jin, Michelle Carney, Scott Miles, Maria Kleiner, Xi- uxiu Yuan, Yinda Zhang, Anuva Kulkarni, Xingyu Bruce Liu, Ahmed Sabie, Sergio Orts-Escolano, Abhishek Kar, Ping Yu, Ram Iyengar, Adarsh Kowdle, Vibe Coding XR arXiv, 2026, Vibe Coding XR and Alex Olwal. 2023. Rapsai: Accelerating Machine Learning Prototyping of Multimedia Applica...

  16. [16]

    Ruofei Du, Alex Olwal, MathieuLe Goc, Shengzhi Wu, Danhang Tang, Yinda Zhang, Jun Zhang, DavidJoseph Tan, Federico Tombari, and David Kim. 2022. Opportunistic Interfaces for Augmented Reality: Transforming Everyday Objects Into Tangible 6DoF Interfaces Using Ad Hoc UI. InExtended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems (...

  17. [17]

    Ruofei Du, Eric Turner, Maksym Dzitsiuk, Luca Prasso, Ivo Duarte, Jason Dour- garian, Joao Afonso, Jose Pascoal, Josh Gladstone, Nuno Cruces, Shahram Izadi, Adarsh Kowdle, Konstantine Tsotsos, and David Kim. 2020. DepthLab: Real-Time 3D Interaction With Depth Maps for Mobile Augmented Reality. InProceedings of the 33rd Annual ACM Symposium on User Interfa...

  18. [18]

    Sam Earle, Samyak Parajuli, and Andrzej Banburski-Fahey. 2025. DreamGarden: A Designer Assistant for Growing Games From a Single Prompt. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. ACM. doi:10. 1145/3706598.3714233

  19. [19]

    Benj Edwards. 2025. Will the Future of Software Development Run on Vibes. https://arstechnica.com/ai/2025/03/is-vibe-coding-with-ai-gnarly-or- reckless-maybe-some-of-both

  20. [20]

    Cathy Fang, Yang Zhang, Matthew Dworman, and Chris Harrison. 2020. Wire- ality: Enabling Complex Tangible Geometries in Virtual Reality With Worn Multi-String Haptics. InProceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–10. doi:10.1145/3313831.3376470

  21. [21]

    Daniele Giunchi, Nels Numan, Elia Gatti, and Anthony Steed. 4. DreamCodeVR: Towards Democratizing Behavior Design in Virtual Reality With Speech-Driven Programming. In2024 IEEE Conference Virtual Reality and 3D User Interfaces (VR) (2024-03). 579–589. doi:10.1109/VR58804.2024.00078

  22. [22]

    Godot. 2022. Godot Engine. https://godotengine.org/

  23. [23]

    Google. 2025. Android XR. https://android.com/xr

  24. [24]

    Google. 2025. Gemini CLI. https://github.com/google-gemini/gemini-cli

  25. [25]

    Google. 2025. Google Gemini. https://gemini.google.com/canvas

  26. [26]

    Google. 2025. TensorFlow Hub. https://www.tensorflow.org/hub

  27. [27]

    Google. 2026. Google Antigravity. https://antigravity.google

  28. [28]

    Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, et al. 2025. DeepSeek-R1 Incen- tivizes Reasoning in LLMs Through Reinforcement Learning.Nature645, 8081 (2025), 633–638. doi:10.48550/arXiv.2506.14245

  29. [29]

    Fengming He, Xiyun Hu, Jingyu Shi, Xun Qian, Tianyi Wang, and Karthik Ramani

  30. [30]

    InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems

    Ubi Edge: Authoring Edge-Based Opportunistic Tangible User Interfaces in Augmented Reality. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–14. doi:10.1145/3544548.3580704

  31. [31]

    Erzhen Hu, Yanhe Chen, Mingyi Li, Vrushank Phadnis, Pingmei Xu, Xun Qian, Alex Olwal, David Kim, Seongkook Heo, and Ruofei Du. 2025. DialogLab: Author- ing, Simulating, and Testing Dynamic Group Conversations in Hybrid Human-AI Conversations. InProceedings of the 39th Annual ACM Symposium on User Inter- face Software and Technology (UIST). ACM. doi:10.114...

  32. [32]

    Erzhen Hu, Mingyi Li, Andrew Hong, Xun Qian, Alex Olwal, David Kim, Seongkook Heo, and Ruofei Du. 2025. Thing2Reality: Enabling Spontaneous Creation of 3D Objects From 2D Content Using Generative AI in XR Meetings. InProceedings of the 38th Annual ACM Symposium on User Interface Software and Technology(Busan, Republic of Korea). Association for Computing ...

  33. [33]

    Hugging Face. 2025. Hugging Face – the AI Community Building the Future. https://huggingface.co

  34. [34]

    Hirokazu Kato and Mark Billinghurst. 1999. Marker Tracking and HMD Calibra- tion for a Video-Based Augmented Reality Conferencing System. InProceedings 2nd IEEE and ACM International Workshop on Augmented Reality (IW AR’99). 85–94. doi:10.1109/IWAR.1999.803809

  35. [35]

    Geonsun Lee, Min Xia, Nels Numan, Xun Qian, David Li, Yanhe Chen, Achin Kulshrestha, Ishan Chatterjee, Yinda Zhang, Dinesh Manocha, David Kim, and Ruofei Du. 2025. Sensible Agent: A Framework for Unobtrusive Interaction With Proactive AR Agent. InProceedings of the 39th Annual ACM Symposium on User Interface Software and Technology (UIST). ACM. doi:10.114...

  36. [36]

    David Li, Nels Numan, Xun Qian, Yanhe Chen, Zhongyi Zhou, Evgenii Alekseev, Geonsun Lee, Alex Cooper, Min Xia, Scott Chung, Jeremy Nelson, Xiuxiu Yuan, Jolica Dias, Tim Bettridge, Benjamin Hersh, Michelle Huynh, Konrad Piascik, Ricardo Cabello, David Kim, and Ruofei Du. 2025. XR Blocks: Accelerating Human-Centered AI + XR Innovation. InArxiv. 9 pages. doi...

  37. [37]

    Jingyu Li, Qingwen Yang, Kenuo Xu, Yang Zhang, and Chenren Xu. 2025. EchoSight: Streamlining Bidirectional Virtual-Physical Interaction With In-Situ Optical Tethering. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. 1–18. doi:10.1145/3706598.3713925

  38. [38]

    Nels Numan, Daniele Giunchi, Benjamin Congdon, and Anthony Steed. 2023. Ubiq-Genie: Leveraging External Frameworks for Enhanced Social VR Experi- ences. In2023 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW). IEEE, Shanghai, China, 497–501. doi:10.1109/VRW58643. 2023.00108

  39. [39]

    Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Pe...

  40. [40]

    Playwright. 2026. Playwright: Fast and Reliable End-To-End Testing for Modern Web Apps. https://playwright.dev

  41. [41]

    DreamFusion: Text-to-3D using 2D Diffusion

    Ben Poole, Ajay Jain, Jonathan T. Barron, and Ben Mildenhall. 2022. DreamFu- sion: Text-To-3D Using 2D Diffusion. InInternational Conference on Learning Representations. doi:10.48550/arXiv.2209.14988

  42. [42]

    Jingyu Shi, Rahul Jain, Seunggeun Chi, Hyungjun Doh, Hyung-gun Chi, Alexan- der J Quinn, and Karthik Ramani. 2025. Caring-Ai: Towards Authoring Context- Aware Augmented Reality Instruction Through Generative Artificial Intelligence. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. 1–23. doi:10.48550/arXiv.2501.16557

  43. [43]

    Jiaxiang Tang, Zhaoxi Chen, Xiaokang Chen, Tengfei Wang, Gang Zeng, and Ziwei Liu. 2024. LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation.ArXiv Preprint ArXiv:2402.05054(2024). doi:10.48550/arXiv. 2402.05054

  44. [44]

    Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M Dai, Anja Hauth, Katie Millican, et al

  45. [45]

    Gemini: A Family of Highly Capable Multimodal Models

    Gemini: A Family of Highly Capable Multimodal Models.ArXiv Preprint ArXiv:2312.11805(2023). doi:10.48550/arXiv.2312.11805

  46. [46]

    three.js authors. 2022. Three.js. https://threejs.org

  47. [47]

    Unity. 2022. Unity Game Engine. https://unity.com/products/unity-platform

  48. [48]

    Unity. 2025. XR Interaction Toolkit. https://docs.unity3d.com/Packages/com. unity.xr.interaction.toolkit@3.0/manual/index.html

  49. [49]

    Unreal. 2022. Unreal Engine. https://www.unrealengine.com

  50. [50]

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need.Advances in Neural Information Processing Systems30 (2017). doi:10.5555/3295222.3295349

  51. [51]

    Angelopoulos, Wei-Lin Chiang, Kelly Tang, and Luca Manolache

    Aryan Vichare, Anastasios N. Angelopoulos, Wei-Lin Chiang, Kelly Tang, and Luca Manolache. 2025. WebDev Arena: A Live LLM Leaderboard for Web App Development. https://arena.ai/blog/webdev-arena

  52. [52]

    VRTK Authors. 2025. VRTK. https://www.vrtk.io

  53. [53]

    Yinghao Xu, Zifan Shi, Wang Yifan, Hansheng Chen, Ceyuan Yang, Sida Peng, Yujun Shen, and Gordon Wetzstein. 2024. Grm: Large Gaussian Reconstruc- tion Model for Efficient 3d Reconstruction and Generation.ArXiv Preprint ArXiv:2403.14621(2024). doi:10.48550/arXiv.2403.14621

  54. [54]

    Zhongyi Zhou, Jing Jin, Vrushank Phadnis, Xiuxiu Yuan, Jun Jiang, Xun Qian, Jingtao Zhou, Yiyi Huang, Zheng Xu, Yinda Zhang, Kristen Wright, Jason Mayes, Mark Sherwood, Johnny Lee, Alex Olwal, David Kim, Ram Iyengar, Na Li, and Ruofei Du. 2025. InstructPipe: Building Visual Programming Pipelines in Visual Blocks With Human Instructions Using LLMs. InProce...

  55. [55]

    Hand Fire

    Chenfei Zhu, Shao-Kang Hsia, Xiyun Hu, Ziyi Liu, Jingyu Shi, and Karthik Ramani. 2025. agentAR: Creating Augmented Reality Applications with Tool- Augmented LLM-based Autonomous Agents. InProceedings of the 38th Annual ACM Symposium on User Interface Software and Technology. 1–23. doi:10.1145/ 3746059.3747676 A VCXR60 Dataset Prompts 001: blowing_dandelio...