pith. sign in

arxiv: 2607.01588 · v1 · pith:F6PBZH33new · submitted 2026-07-02 · 💻 cs.HC

OrchestrXR: A Multi-Agent System for Idea-to-Prototype XR Study Authoring

Pith reviewed 2026-07-03 07:16 UTC · model grok-4.3

classification 💻 cs.HC
keywords extended realitymulti-agent systemsXR study authoringhuman-AI collaborationprototype generationintent preservationUnity engineHCI workflow
0
0 comments X

The pith

OrchestrXR uses multi-agent orchestration to convert XR study ideas into Unity prototypes while preserving researcher intent across design, scene, and interaction stages.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces OrchestrXR as a multi-agent human-AI workflow that divides XR study creation into sequential stages of study design, scene generation, and interaction generation. It employs structured schemas and interactive human-agent interfaces to produce runnable Unity prototypes from an initial researcher idea. A user study with 12 XR researchers indicates the system offers effective support for early-stage authoring and maintains strong intent preservation throughout the process. This matters because XR study development typically fragments across separate tasks for experimental logic, 3D environments, and interactivity, slowing iteration for HCI researchers.

Core claim

OrchestrXR is a multi-agent workflow for early-stage XR study authoring that supports a controllable process across study design, scene generation, and interaction generation through structured schemas, multi-agent orchestration, and interactive human-agent interfaces, yielding Unity-based prototypes from a researcher's idea, as suggested by a user study with 12 XR researchers showing effective support and strong intent preservation across stages.

What carries the argument

Multi-agent orchestration that coordinates study design, scene generation, and interaction generation using structured schemas and interactive human-agent interfaces to output Unity prototypes.

If this is right

  • XR researchers can iterate from idea to runnable prototype without manual handoff between separate design and coding tools.
  • Intent from the initial study concept remains consistent through the three authoring stages.
  • The workflow reduces fragmentation when specifying experimental tasks, 3D scenes, and interactive logic together.
  • Early-stage XR experiments become more accessible for researchers who lack deep Unity programming experience.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The staged orchestration could transfer to authoring tools for AR training scenarios or VR therapy applications beyond academic studies.
  • Adding quantitative intent-matching metrics in future evaluations would strengthen validation of the preservation claim.
  • Similar multi-agent schemas might streamline idea-to-prototype pipelines in other HCI domains such as wearable interface design.

Load-bearing premise

That a user study with 12 XR researchers provides sufficient evidence of the workflow's effectiveness and intent preservation, even without detailed study design or quantitative metrics.

What would settle it

A replication study or larger evaluation where generated prototypes show frequent loss of original research intent or fail to produce functional Unity scenes would disprove the effectiveness claim.

Figures

Figures reproduced from arXiv: 2607.01588 by Chenfei Zhu, Karthik Ramani, Shuqi Liao, Voicu Popescu.

Figure 1
Figure 1. Figure 1: OrchestrXR structures XR study authoring into three connected agent stages: 1 Study Design (SD), 2 Scene Generation (SG), and 3 Interaction Generation (IG). Users interact with the system through a shared 4 chat interface, while 5 Unity serves as the execution environment. Starting from an XR study idea, the system progressively translates research intent into structured study, scene, and interaction speci… view at source ↗
Figure 2
Figure 2. Figure 2: OrchestrXR system workflow. The system is organized as a stage-based authoring process in which the frontend mediates user interaction with three connected backend agents: Study Design (SD), Scene Generation (SG), and Interaction Generation (IG). Intermediate artifacts are progressively passed across stages, with user revision and confirmation interleaved throughout. Gray numbered circles indicate the main… view at source ↗
Figure 3
Figure 3. Figure 3: OrchestrXR frontend interface. A unified web interface supports the full authoring workflow across five areas: Study Design (A), Scene Generation (B), Interaction Generation (C), Unity Scene Stream (D), and a human–agent chat window (E). Users chat with agents by selecting the target context ( 1 ) and confirming stage progression ( 2 ). 4.2 System Overview [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Workflows of the three agents: SD Agent (top), SG [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Results from the four case studies. The figure shows one prototype result from each case: (1) hand redirection threshold [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Item-level specification propagation scores across tasks, shown as mean values with 95% confidence intervals for each [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: User-reported experience ratings on a 7-point Likert [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗
read the original abstract

Extended Reality (XR) has become an important interaction paradigm in Human-Computer Interaction (HCI). XR studies are used to investigate interaction, perception, and user behavior in immersive environments, and typically involve experimental tasks, 3D scenes, and interactive logic. However, turning an initial XR study idea into a runnable prototype remains fragmented across study design, scene construction, and interaction implementation. We present OrchestrXR, a multi-agent human-AI workflow for early-stage idea-to-prototype XR study authoring. Rather than treating XR study creation as one-shot generation, OrchestrXR supports a controllable workflow across study design, scene generation, and interaction generation through structured schemas, multi-agent orchestration, and interactive human-agent interfaces, producing a Unity-based prototype from a researcher's idea. A user study with 12 XR researchers suggests that OrchestrXR provides effective support for early-stage XR study authoring with strong intent preservation across stages.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper presents OrchestrXR, a multi-agent human-AI workflow for early-stage XR study authoring that transforms researcher ideas into Unity prototypes via structured stages of study design, scene generation, and interaction generation. It uses schemas, multi-agent orchestration, and interactive interfaces to maintain controllability rather than one-shot generation. The central claim is that the system provides effective support for this process with strong intent preservation, as suggested by a user study involving 12 XR researchers.

Significance. If the evaluation were robust, the work could contribute to HCI by addressing the fragmentation in XR prototyping tools through a controllable multi-stage multi-agent approach. The idea of using orchestration to preserve intent across design-to-implementation stages has potential applicability beyond XR. However, the current lack of detailed results limits assessment of its actual significance or advantage over existing methods.

major comments (1)
  1. [User Study section] User Study section: The claim of effective support and strong intent preservation across stages is evidenced solely by a user study with 12 XR researchers. No information is provided on study protocol, participant tasks, comparison conditions or baselines, measurement instruments for intent preservation, quantitative metrics (e.g., success rates, Likert scores), or statistical tests. This is load-bearing for the central claim, as the data cannot be evaluated for support of the stated conclusions or distinguished from subjective preference.
minor comments (1)
  1. [Abstract] Abstract: The summary of the user study omits any methodology overview or quantitative outcomes, which reduces the ability to quickly assess the strength of the reported evidence.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on the user study. We agree that the current manuscript omits critical details needed to evaluate the claims of effective support and intent preservation, and we will revise the User Study section to address this.

read point-by-point responses
  1. Referee: [User Study section] User Study section: The claim of effective support and strong intent preservation across stages is evidenced solely by a user study with 12 XR researchers. No information is provided on study protocol, participant tasks, comparison conditions or baselines, measurement instruments for intent preservation, quantitative metrics (e.g., success rates, Likert scores), or statistical tests. This is load-bearing for the central claim, as the data cannot be evaluated for support of the stated conclusions or distinguished from subjective preference.

    Authors: We acknowledge that the User Study section as written provides insufficient detail for independent evaluation of the results. The study was exploratory and primarily qualitative, focusing on researcher feedback regarding workflow usability and intent preservation through post-session interviews and observations rather than controlled quantitative comparisons. In the revised manuscript we will expand the section to report: (1) full study protocol including recruitment criteria, session structure, and think-aloud procedures; (2) specific participant tasks (authoring three distinct XR study ideas of varying complexity); (3) the absence of a formal baseline condition, with rationale that the evaluation targeted the multi-stage orchestration workflow itself; (4) measurement instruments consisting of structured interview questions and 7-point Likert items on perceived support and intent preservation; (5) any quantitative observations collected (e.g., prototype completion rates, iteration counts); and (6) thematic analysis approach used in place of statistical tests. These additions will allow readers to assess the strength of evidence supporting our claims. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper's central claim of effective support for XR study authoring with intent preservation is supported by an external user study involving 12 XR researchers rather than any self-referential metrics, fitted parameters, or derivations that reduce to the system's own outputs by construction. No equations, ansatzes, or load-bearing self-citations are present that would create circularity; the evaluation chain relies on independent participant input and is therefore self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

As an applied systems paper in human-computer interaction, the central claim depends on the described multi-agent architecture and the outcomes of the user study. No free parameters, standard mathematical axioms, or newly invented entities are indicated in the abstract.

pith-pipeline@v0.9.1-grok · 5697 in / 1139 out tokens · 40795 ms · 2026-07-03T07:16:54.542767+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

96 extracted references · 19 canonical work pages · 8 internal anchors

  1. [1]

    Why Do LLM-based Web Agents Fail? A Hierarchical Planning Perspective

    Mohamed Aghzal, Gregory J. Stein, and Ziyu Yao. 2026. Why Do LLM-based Web Agents Fail? A Hierarchical Planning Perspective.arXiv preprint arXiv:2603.14248 (2026). https://arxiv.org/abs/2603.14248

  2. [2]

    Narges Ashtari, Andrea Bunt, Joanna McGrenere, Michael Nebeling, and Parmit K Chilana. 2020. Creating augmented and virtual reality applications: Current practices, challenges, and opportunities. InProceedings of the 2020 CHI conference on human factors in computing systems. 1–13

  3. [3]

    Adam O Bebko and Nikolaus F Troje. 2020. bmlTUX: Design and control of exper- iments in virtual reality and beyond.i-Perception11, 4 (2020), 2041669520938400

  4. [4]

    John Brooke et al. 1996. SUS-A quick and dirty usability scale.Usability evaluation in industry189, 194 (1996), 4–7

  5. [5]

    Jack Brookes, Matthew Warburton, Mshari Alghadier, Mark Mon-Williams, and Faisal Mushtaq. 2020. Studying human behavior with virtual reality: The Unity Experiment Framework.Behavior research methods52, 2 (2020), 455–463

  6. [6]

    Alessandro Carcangiu, Marco Manca, Jacopo Mereu, Carmen Santoro, Ludovica Simeoli, and Lucio Davide Spano. 2025. Tell-XR: Conversational End-User De- velopment of XR Automations. InHuman-Computer Interaction – INTERACT

  7. [7]
  8. [8]

    Patrick Carlson, Anicia Peters, Stephen B Gilbert, Judy M Vance, and Andy Luse

  9. [9]

    Virtual training: Learning transfer of assembly tasks.IEEE transactions on visualization and computer graphics21, 6 (2015), 770–782

  10. [10]

    Dong Chen, Shaoxin Lin, Muhan Zeng, Daoguang Zan, Jian-Gang Wang, Anton Cheshkov, Jun Sun, Hao Yu, Guoliang Dong, Artem Aliev, et al. 2024. Coder: Issue resolving with multi-agent and task graphs.arXiv preprint arXiv:2406.01304 (2024)

  11. [11]

    Weize Chen, Yusheng Su, Jingwei Zuo, Cheng Yang, Chenfei Yuan, Chi-Min Chan, Heyang Yu, Yaxi Lu, Yi-Hsin Hung, Chen Qian, et al. 2023. Agentverse: Facilitating multi-agent collaboration and exploring emergent behaviors. InThe Twelfth International Conference on Learning Representations

  12. [12]

    Tor-Salve Dalsgaard, Jarrod Knibbe, and Joanna Bergström. 2021. Modeling pointing for 3D target selection in VR. InProceedings of the 27th ACM symposium on virtual reality software and technology. 1–10

  13. [13]

    Fernanda De La Torre, Cathy Mengying Fang, Han Huang, Andrzej Banburski- Fahey, Judith Amores Fernandez, and Jaron Lanier. 2024. Llmr: Real-time prompt- ing of interactive worlds using large language models. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems. 1–22

  14. [14]

    Xiang Deng, Yu Gu, Boyuan Zheng, Shijie Chen, Sam Stevens, Boshi Wang, Huan Sun, and Yu Su. 2023. Mind2web: Towards a generalist agent for the web. Advances in Neural Information Processing Systems36 (2023), 28091–28114

  15. [15]

    Ruofei Du, Benjamin Hersh, David Li, Nels Numan, Xun Qian, Yanhe Chen, Zhongyi Zhou, Xingyue Chen, Jiahao Ren, Robert Timothy Bettridge, et al. 2026. Vibe Coding XR: Accelerating AI+ XR Prototyping with XR Blocks and Gemini. arXiv preprint arXiv:2603.24591(2026)

  16. [16]

    Yilun Du, Shuang Li, Antonio Torralba, Joshua B Tenenbaum, and Igor Mordatch

  17. [17]

    InForty-first international conference on machine learning

    Improving factuality and reasoning in language models through multiagent debate. InForty-first international conference on machine learning

  18. [18]

    Jonathan Ehret, Andrea Bönsch, Janina Fels, Sabine J Schlittmeier, and Torsten W Kuhlen. 2024. StudyFramework: Comfortably setting up and conducting factorial- design studies using the unreal engine. In2024 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW). IEEE, 442–449

  19. [19]

    Epic Games. 2026. Unreal Engine: The Most Powerful Real-Time 3D Creation Tool. https://www.unrealengine.com/. Accessed: 2026-03-31

  20. [20]

    Andrew Estornell and Yang Liu. 2024. Multi-llm debate: Framework, principals, and interventions.Advances in Neural Information Processing Systems37 (2024), 28938–28964

  21. [21]

    A Fourney, G Bansal, H Mozannar, C Tan, E Salinas, Zhu Erkang, F Niedt- ner, G Proebsting, G Bassman, J Gerrits, et al . [n. d.]. Magentic-one: A gen- eralist multi-agent system for solving complex tasks, 2024.URL https://arxiv. org/abs/2411.04468([n. d.])

  22. [22]

    Dawei Gao, Zitao Li, Xuchen Pan, Weirui Kuang, Zhijian Ma, Bingchen Qian, Fei Wei, Wenhao Zhang, Yuexiang Xie, Daoyuan Chen, et al. 2024. Agentscope: A flexible yet robust multi-agent platform.arXiv preprint arXiv:2402.14034(2024)

  23. [23]

    Gege Gao, Weiyang Liu, Anpei Chen, Andreas Geiger, and Bernhard Schölkopf

  24. [24]

    InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Graphdreamer: Compositional 3d scene synthesis from scene graphs. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 21295–21304

  25. [25]

    Alireza Ghafarollahi and Markus J Buehler. 2024. ProtAgents: protein discovery via large language model multi-agent collaborations combining physics and machine learning.Digital Discovery3, 7 (2024), 1389–1409

  26. [26]

    Juraj Gottweis, Wei-Hung Weng, Alexander Daryin, Tao Tu, Anil Palepu, Petar Sirkovic, Artiom Myaskovsky, Felix Weissenberger, Keran Rong, Ryutaro Tanno, et al. 2025. Towards an AI co-scientist.arXiv preprint arXiv:2502.18864(2025)

  27. [27]

    Judith Hartfill, Jenny Gabel, Lucie Kruse, Susanne Schmidt, Kevin Riebandt, Simone Kühn, and Frank Steinicke. 2021. Analysis of detection thresholds for hand redirection during mid-air interactions in virtual reality. InProceedings of the 27th ACM Symposium on Virtual Reality Software and Technology. 1–10

  28. [28]

    Yichen He, Guanhua Huang, Peiyuan Feng, Yuan Lin, Yuchen Zhang, Hang Li, et al. 2025. Pasa: An llm agent for comprehensive academic paper search. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 11663–11679

  29. [29]

    Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, et al

  30. [30]

    InThe twelfth international conference on learning representations

    MetaGPT: Meta programming for a multi-agent collaborative framework. InThe twelfth international conference on learning representations

  31. [31]

    Junyi Hou, Andre Lin Huikai, Nuo Chen, Yiwei Gong, and Bingsheng He. 2025. PaperDebugger: A Plugin-Based Multi-Agent System for In-Editor Academic Writing, Review, and Editing.arXiv preprint arXiv:2512.02589(2025)

  32. [32]

    Ziniu Hu, Ahmet Iscen, Aashi Jain, Thomas Kipf, Yisong Yue, David A Ross, Cordelia Schmid, and Alireza Fathi. 2024. Scenecraft: An llm agent for synthesiz- ing 3d scenes as blender code. InForty-first International Conference on Machine Learning

  33. [33]

    Sebastian Hubenschmid, Jonathan Wieland, Daniel Immanuel Fink, Andrea Batch, Johannes Zagermann, Niklas Elmqvist, and Harald Reiterer. 2022. Relive: Bridging in-situ and ex-situ visual analytics for analyzing mixed reality user studies. InProceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–20

  34. [34]

    Tal Ifargan, Lukas Hafner, Maor Kern, Ori Alcalay, and Roy Kishony. 2025. Au- tonomous LLM-driven research—from data to human-verifiable research papers. NEJM AI2, 1 (2025), AIoa2400555

  35. [35]

    Charles Javerliat, Sophie Villenave, Pierre Raimbaud, and Guillaume Lavoué. 2024. Plume: Record, replay, analyze and share user behavior in 6dof xr experiences. IEEE Transactions on Visualization and Computer Graphics30, 5 (2024), 2087– 2097

  36. [36]

    Hyeonsu B Kang, Tongshuang Wu, Joseph Chee Chang, and Aniket Kittur. 2023. Synergi: A mixed-initiative system for scholarly synthesis and sensemaking. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. 1–19

  37. [37]

    Veronika Krauß, Alexander Boden, Leif Oppermann, and René Reiners. 2021. Current practices, challenges, and design implications for collaborative ar/vr application development. InProceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–15

  38. [38]

    Radha Kumaran, You-Jin Kim, Anne E Milner, Tom Bullock, Barry Giesbrecht, and Tobias Höllerer. 2023. The impact of navigation aids on search performance and object recall in wide-area augmented reality. InProceedings of the 2023 CHI conference on human factors in computing systems. 1–17

  39. [39]

    Jakub Lála, Odhran O’Donoghue, Aleksandar Shtedritski, Sam Cox, Samuel G Rodriques, and Andrew D White. 2023. Paperqa: Retrieval-augmented generative agent for scientific research.arXiv preprint arXiv:2312.07559(2023)

  40. [40]

    Cheryl Lee, Chunqiu Steven Xia, Longji Yang, Jen-tse Huang, Zhouruixing Zhu, Lingming Zhang, and Michael R Lyu. 2025. Unidebugger: Hierarchical multi- agent framework for unified software debugging. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 18248–18277

  41. [41]

    Jaewook Lee, Filippo Aleotti, Diego Mazala, Guillermo Garcia-Hernando, Sara Vicente, Oliver James Johnston, Isabel Kraus-Liang, Jakub Powierza, Donghoon Shin, Jon E Froehlich, et al. 2025. Imaginatear: Ai-assisted in-situ authoring in augmented reality. InProceedings of the 38th Annual ACM Symposium on User Interface Software and Technology. 1–21

  42. [42]

    Jaewook Lee, Raahul Natarrajan, Sebastian S Rodriguez, Payod Panda, and Eyal Ofek. 2022. Remotelab: A vr remote study toolkit. InProceedings of the 35th Annual ACM Symposium on User Interface Software and Technology. 1–9

  43. [43]

    David Li, Nels Numan, Xun Qian, Yanhe Chen, Zhongyi Zhou, Evgenii Alekseev, Geonsun Lee, Alex Cooper, Min Xia, Scott Chung, et al. 2025. XR Blocks: Accel- erating Human-Centered AI+ XR Innovation.arXiv preprint arXiv:2509.25504 (2025)

  44. [44]

    Ruochen Li, Teerth Patel, Qingyun Wang, and Xinya Du. 2024. Mlr-copilot: Autonomous machine learning research based on large language models agents. arXiv preprint arXiv:2408.14033(2024)

  45. [45]

    Yunxuan Li, Yibing Du, Jiageng Zhang, Le Hou, Peter Grabowski, Yeqing Li, and Eugene Ie. 2024. Improving multi-agent debate with sparse communication topology. InFindings of the Association for Computational Linguistics: EMNLP Shuqi Liao, Chenfei Zhu, Karthik Ramani, and Voicu Popescu

  46. [46]

    Tian Liang, Zhiwei He, Wenxiang Jiao, Xing Wang, Yan Wang, Rui Wang, Yujiu Yang, Shuming Shi, and Zhaopeng Tu. 2024. Encouraging divergent thinking in large language models through multi-agent debate. InProceedings of the 2024 conference on empirical methods in natural language processing. 17889–17904

  47. [47]

    Nelson F Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. 2024. Lost in the middle: How language models use long contexts.Transactions of the association for computational linguistics12 (2024), 157–173

  48. [48]

    Chris Lu, Cong Lu, Robert Tjarko Lange, Jakob Foerster, Jeff Clune, and David Ha

  49. [49]

    The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

    The ai scientist: Towards fully automated open-ended scientific discovery. arXiv preprint arXiv:2408.06292(2024)

  50. [50]

    Donghyeok Ma, Hanbee Jang, Joon Hyub Lee, and Seok-Hyung Bae. 2025. Gar- den of Papers: Finding, Reading, and Organizing Research Papers in a Visual, Integrated, and Flexible Workspace. InProceedings of the 38th Annual ACM Symposium on User Interface Software and Technology. 1–15

  51. [51]

    Jacopo Mereu, Valentino Artizzu, Alessandro Carcangiu, Lucio Davide Spano, Ludovica Simeoli, Andrea Mattioli, Marco Manca, Carmen Santoro, and Fabio Paternò. 2024. Empowering end-user in creating extended reality content with a conversational chatbot. InInternational Symposium on Engineering Interactive Computer Systems. Springer, 126–137

  52. [52]

    Paul Milgram and Fumio Kishino. 1994. A taxonomy of mixed reality visual displays.IEICE Transactions on Information and Systems77, 12 (1994), 1321–1329

  53. [53]

    Model Context Protocol. 2025. Model Context Protocol Specification. https: //modelcontextprotocol.io/specification/2025-11-25. Accessed: 2026-03-29

  54. [54]

    Michael Nebeling, Maximilian Speicher, Xizi Wang, Shwetha Rajaram, Brian D Hall, Zijian Xie, Alexander RE Raistrick, Michelle Aebersold, Edward G Happ, Jiayin Wang, et al. 2020. MRAT: The mixed reality analytics toolkit. InProceedings of the 2020 CHI Conference on human factors in computing systems. 1–12

  55. [55]

    Cassandra Overney, Belén Saldías, Dimitra Dimitrakopoulou, and Deb Roy. 2024. Sensemate: An accessible and beginner-friendly human-ai platform for qualita- tive data analysis. InProceedings of the 29th International Conference on Intelligent User Interfaces. 922–939

  56. [56]

    Xueni Pan and Antonia F de C Hamilton. 2018. Why and how to use virtual reality to study human social interaction: The challenges of exploring a new research landscape.British Journal of Psychology109, 3 (2018), 395–417

  57. [57]

    Stéven Picard, Ningyuan Sun, and Jean Botev. 2024. XR MUSE: An Open-Source Unity Framework for Extended Reality-Based Networked Multi-User Studies. Virtual Worlds3, 4 (2024), 404–417. doi:10.3390/virtualworlds3040022

  58. [58]

    Chen Qian, Wei Liu, Hongzhang Liu, Nuo Chen, Yufan Dang, Jiahao Li, Cheng Yang, Weize Chen, Yusheng Su, Xin Cong, et al. 2024. Chatdev: Communicative agents for software development. InProceedings of the 62nd annual meeting of the association for computational linguistics (volume 1: Long papers). 15174–15186

  59. [59]

    Yujia Qin, Yining Ye, Junjie Fang, Haoming Wang, Shihao Liang, Shizuo Tian, Junda Zhang, Jiahao Li, Yunxin Li, Shijue Huang, et al. 2025. Ui-tars: Pioneering automated gui interaction with native agents.arXiv preprint arXiv:2501.12326 (2025)

  60. [60]

    Jack Ratcliffe, Francesco Soave, Nick Bryan-Kinns, Laurissa Tokarchuk, and Ildar Farkhatdinov. 2021. Extended reality (XR) remote research: A survey of drawbacks and opportunities. InProceedings of the 2021 CHI conference on human factors in computing systems. 1–13

  61. [61]

    Samuel Schmidgall, Yusheng Su, Ze Wang, Ximeng Sun, Jialian Wu, Xiaodong Yu, Jiang Liu, Michael Moor, Zicheng Liu, and Emad Barsoum. 2025. Agent laboratory: Using llm agents as research assistants.Findings of the Association for Computational Linguistics: EMNLP 2025(2025), 5977–6043

  62. [62]

    Michael D Skarlinski, Sam Cox, Jon M Laurent, James D Braza, Michaela Hinks, Michael J Hammerling, Manvitha Ponnapati, Samuel G Rodriques, and Andrew D White. 2024. Language agents achieve superhuman synthesis of scientific knowl- edge.arXiv preprint arXiv:2409.13740(2024)

  63. [63]

    Maximilian Speicher, Brian D Hall, and Michael Nebeling. 2019. What is mixed reality?. InProceedings of the 2019 CHI conference on human factors in computing systems. 1–15

  64. [64]

    Anthony Steed, Lisa Izzouzi, Klara Brandstätter, Sebastian Friston, Ben Congdon, Otto Olkkonen, Daniele Giunchi, Nels Numan, and David Swapp. 2022. Ubiq-exp: A toolkit to build and run remote and distributed mixed reality experiments. Frontiers in Virtual Reality3 (2022), 912078

  65. [65]

    Helen Stefanidi, Asterios Leonidis, Maria Korozi, and George Papagiannakis

  66. [66]

    In2022 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct)

    The ARgus Designer: Supporting experts while conducting user studies of AR/MR applications. In2022 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct). IEEE, 885–890

  67. [67]

    Wei Tao, Yucheng Zhou, Yanlin Wang, Wenqiang Zhang, Hongyu Zhang, and Yu Cheng. 2024. Magis: Llm-based multi-agent framework for github issue resolution.Advances in Neural Information Processing Systems37 (2024), 51963– 51993

  68. [68]

    Unity Technologies. 2026. Unity Engine: 2D & 3D Development Platform. https: //unity.com/products/unity-engine. Accessed: 2026-03-31

  69. [69]

    Xingbo Wang, Samantha L Huey, Rui Sheng, Saurabh Mehta, and Fei Wang

  70. [70]

    Scidasynth: Interactive structured knowledge extraction and synthesis from scientific literature with large language model.arXiv e-prints(2024), arXiv– 2404

  71. [71]

    Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, et al. 2024. Autogen: Enabling next-gen LLM applications via multi-agent conversations. InFirst conference on language modeling

  72. [72]

    Liwenhan Xie, Chengbo Zheng, Haijun Xia, Huamin Qu, and Chen Zhu-Tian

  73. [73]

    InProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology

    Waitgpt: Monitoring and steering conversational llm agent in data anal- ysis with on-the-fly code visualization. InProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology. 1–14

  74. [74]

    Yutaro Yamada, Robert Tjarko Lange, Cong Lu, Shengran Hu, Chris Lu, Jakob Foerster, Jeff Clune, and David Ha. 2025. The ai scientist-v2: Workshop-level auto- mated scientific discovery via agentic tree search.arXiv preprint arXiv:2504.08066 (2025)

  75. [75]

    John Yang, Carlos E Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press. 2024. Swe-agent: Agent-computer interfaces enable automated software engineering.Advances in Neural Information Processing Systems37 (2024), 50528–50652

  76. [76]

    Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R Narasimhan, and Yuan Cao. 2022. React: Synergizing reasoning and acting in language models. InThe eleventh international conference on learning representations

  77. [77]

    Chenglin Yu, Yuchen Wang, Songmiao Wang, Hongxia Yang, and Ming Li. 2026. InfiAgent: An Infinite-Horizon Framework for General-Purpose Autonomous Agents.arXiv preprint arXiv:2601.03204(2026). https://arxiv.org/abs/2601.03204

  78. [78]

    Chengbo Zheng, Yuanhao Zhang, Zeyu Huang, Chuhan Shi, Minrui Xu, and Xiaojuan Ma. 2024. Disciplink: Unfolding interdisciplinary information seeking process via human-ai co-exploration. InProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology. 1–20

  79. [79]

    Shuyan Zhou, Frank F Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Tianyue Ou, Yonatan Bisk, Daniel Fried, et al. 2023. Webarena: A realistic web environment for building autonomous agents.arXiv preprint arXiv:2307.13854(2023)

  80. [80]

    Chenfei Zhu, Shao-Kang Hsia, Xiyun Hu, Ziyi Liu, Jingyu Shi, and Karthik Ramani. 2025. agentAR: Creating Augmented Reality Applications with Tool- Augmented LLM-based Autonomous Agents. InProceedings of the 38th Annual ACM Symposium on User Interface Software and Technology. doi:10.1145/3746059. 3747676 OrchestrXR: A Multi-Agent System for Idea-to-Prototy...

Showing first 80 references.