OrchestrXR: A Multi-Agent System for Idea-to-Prototype XR Study Authoring

Chenfei Zhu; Karthik Ramani; Shuqi Liao; Voicu Popescu

arxiv: 2607.01588 · v1 · pith:F6PBZH33new · submitted 2026-07-02 · 💻 cs.HC

OrchestrXR: A Multi-Agent System for Idea-to-Prototype XR Study Authoring

Shuqi Liao , Chenfei Zhu , Karthik Ramani , Voicu Popescu This is my paper

Pith reviewed 2026-07-03 07:16 UTC · model grok-4.3

classification 💻 cs.HC

keywords extended realitymulti-agent systemsXR study authoringhuman-AI collaborationprototype generationintent preservationUnity engineHCI workflow

0 comments

The pith

OrchestrXR uses multi-agent orchestration to convert XR study ideas into Unity prototypes while preserving researcher intent across design, scene, and interaction stages.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces OrchestrXR as a multi-agent human-AI workflow that divides XR study creation into sequential stages of study design, scene generation, and interaction generation. It employs structured schemas and interactive human-agent interfaces to produce runnable Unity prototypes from an initial researcher idea. A user study with 12 XR researchers indicates the system offers effective support for early-stage authoring and maintains strong intent preservation throughout the process. This matters because XR study development typically fragments across separate tasks for experimental logic, 3D environments, and interactivity, slowing iteration for HCI researchers.

Core claim

OrchestrXR is a multi-agent workflow for early-stage XR study authoring that supports a controllable process across study design, scene generation, and interaction generation through structured schemas, multi-agent orchestration, and interactive human-agent interfaces, yielding Unity-based prototypes from a researcher's idea, as suggested by a user study with 12 XR researchers showing effective support and strong intent preservation across stages.

What carries the argument

Multi-agent orchestration that coordinates study design, scene generation, and interaction generation using structured schemas and interactive human-agent interfaces to output Unity prototypes.

If this is right

XR researchers can iterate from idea to runnable prototype without manual handoff between separate design and coding tools.
Intent from the initial study concept remains consistent through the three authoring stages.
The workflow reduces fragmentation when specifying experimental tasks, 3D scenes, and interactive logic together.
Early-stage XR experiments become more accessible for researchers who lack deep Unity programming experience.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The staged orchestration could transfer to authoring tools for AR training scenarios or VR therapy applications beyond academic studies.
Adding quantitative intent-matching metrics in future evaluations would strengthen validation of the preservation claim.
Similar multi-agent schemas might streamline idea-to-prototype pipelines in other HCI domains such as wearable interface design.

Load-bearing premise

That a user study with 12 XR researchers provides sufficient evidence of the workflow's effectiveness and intent preservation, even without detailed study design or quantitative metrics.

What would settle it

A replication study or larger evaluation where generated prototypes show frequent loss of original research intent or fail to produce functional Unity scenes would disprove the effectiveness claim.

Figures

Figures reproduced from arXiv: 2607.01588 by Chenfei Zhu, Karthik Ramani, Shuqi Liao, Voicu Popescu.

**Figure 1.** Figure 1: OrchestrXR structures XR study authoring into three connected agent stages: 1 Study Design (SD), 2 Scene Generation (SG), and 3 Interaction Generation (IG). Users interact with the system through a shared 4 chat interface, while 5 Unity serves as the execution environment. Starting from an XR study idea, the system progressively translates research intent into structured study, scene, and interaction speci… view at source ↗

**Figure 2.** Figure 2: OrchestrXR system workflow. The system is organized as a stage-based authoring process in which the frontend mediates user interaction with three connected backend agents: Study Design (SD), Scene Generation (SG), and Interaction Generation (IG). Intermediate artifacts are progressively passed across stages, with user revision and confirmation interleaved throughout. Gray numbered circles indicate the main… view at source ↗

**Figure 3.** Figure 3: OrchestrXR frontend interface. A unified web interface supports the full authoring workflow across five areas: Study Design (A), Scene Generation (B), Interaction Generation (C), Unity Scene Stream (D), and a human–agent chat window (E). Users chat with agents by selecting the target context ( 1 ) and confirming stage progression ( 2 ). 4.2 System Overview [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Workflows of the three agents: SD Agent (top), SG [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Results from the four case studies. The figure shows one prototype result from each case: (1) hand redirection threshold [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Item-level specification propagation scores across tasks, shown as mean values with 95% confidence intervals for each [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

**Figure 7.** Figure 7: User-reported experience ratings on a 7-point Likert [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗

read the original abstract

Extended Reality (XR) has become an important interaction paradigm in Human-Computer Interaction (HCI). XR studies are used to investigate interaction, perception, and user behavior in immersive environments, and typically involve experimental tasks, 3D scenes, and interactive logic. However, turning an initial XR study idea into a runnable prototype remains fragmented across study design, scene construction, and interaction implementation. We present OrchestrXR, a multi-agent human-AI workflow for early-stage idea-to-prototype XR study authoring. Rather than treating XR study creation as one-shot generation, OrchestrXR supports a controllable workflow across study design, scene generation, and interaction generation through structured schemas, multi-agent orchestration, and interactive human-agent interfaces, producing a Unity-based prototype from a researcher's idea. A user study with 12 XR researchers suggests that OrchestrXR provides effective support for early-stage XR study authoring with strong intent preservation across stages.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

OrchestrXR describes a multi-agent workflow with schemas for XR study prototyping that addresses a real HCI pain point, but the user study evidence is too thin on methods and metrics to support the effectiveness claims.

read the letter

OrchestrXR presents a multi-agent human-AI workflow that splits XR study creation into study design, scene generation, and interaction steps. It uses structured schemas and interactive interfaces to produce a Unity prototype from a researcher's initial idea while aiming to keep the original intent intact.

The contribution is the orchestration approach itself. Treating the process as a controllable sequence rather than one-shot generation makes sense for this domain, and the schemas provide a concrete mechanism for passing information between stages with human oversight. This matches a genuine workflow problem in HCI where building immersive experiments is fragmented.

The evaluation is the weak point. The abstract reports a study with 12 XR researchers that supposedly shows effective support and strong intent preservation, yet gives no protocol details, task descriptions, measurement methods for intent, comparison conditions, or any numbers. Without those elements the results cannot be separated from subjective impressions. The stress-test concern holds based on the available text.

This paper is for HCI researchers who run XR experiments and want faster early-stage prototyping tools. Readers working on agent-based design systems might pick up the schema and orchestration ideas.

It should go to peer review. The architecture is specific enough to be worth referee scrutiny even if the study section requires more work.

Referee Report

1 major / 1 minor

Summary. The paper presents OrchestrXR, a multi-agent human-AI workflow for early-stage XR study authoring that transforms researcher ideas into Unity prototypes via structured stages of study design, scene generation, and interaction generation. It uses schemas, multi-agent orchestration, and interactive interfaces to maintain controllability rather than one-shot generation. The central claim is that the system provides effective support for this process with strong intent preservation, as suggested by a user study involving 12 XR researchers.

Significance. If the evaluation were robust, the work could contribute to HCI by addressing the fragmentation in XR prototyping tools through a controllable multi-stage multi-agent approach. The idea of using orchestration to preserve intent across design-to-implementation stages has potential applicability beyond XR. However, the current lack of detailed results limits assessment of its actual significance or advantage over existing methods.

major comments (1)

[User Study section] User Study section: The claim of effective support and strong intent preservation across stages is evidenced solely by a user study with 12 XR researchers. No information is provided on study protocol, participant tasks, comparison conditions or baselines, measurement instruments for intent preservation, quantitative metrics (e.g., success rates, Likert scores), or statistical tests. This is load-bearing for the central claim, as the data cannot be evaluated for support of the stated conclusions or distinguished from subjective preference.

minor comments (1)

[Abstract] Abstract: The summary of the user study omits any methodology overview or quantitative outcomes, which reduces the ability to quickly assess the strength of the reported evidence.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on the user study. We agree that the current manuscript omits critical details needed to evaluate the claims of effective support and intent preservation, and we will revise the User Study section to address this.

read point-by-point responses

Referee: [User Study section] User Study section: The claim of effective support and strong intent preservation across stages is evidenced solely by a user study with 12 XR researchers. No information is provided on study protocol, participant tasks, comparison conditions or baselines, measurement instruments for intent preservation, quantitative metrics (e.g., success rates, Likert scores), or statistical tests. This is load-bearing for the central claim, as the data cannot be evaluated for support of the stated conclusions or distinguished from subjective preference.

Authors: We acknowledge that the User Study section as written provides insufficient detail for independent evaluation of the results. The study was exploratory and primarily qualitative, focusing on researcher feedback regarding workflow usability and intent preservation through post-session interviews and observations rather than controlled quantitative comparisons. In the revised manuscript we will expand the section to report: (1) full study protocol including recruitment criteria, session structure, and think-aloud procedures; (2) specific participant tasks (authoring three distinct XR study ideas of varying complexity); (3) the absence of a formal baseline condition, with rationale that the evaluation targeted the multi-stage orchestration workflow itself; (4) measurement instruments consisting of structured interview questions and 7-point Likert items on perceived support and intent preservation; (5) any quantitative observations collected (e.g., prototype completion rates, iteration counts); and (6) thematic analysis approach used in place of statistical tests. These additions will allow readers to assess the strength of evidence supporting our claims. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper's central claim of effective support for XR study authoring with intent preservation is supported by an external user study involving 12 XR researchers rather than any self-referential metrics, fitted parameters, or derivations that reduce to the system's own outputs by construction. No equations, ansatzes, or load-bearing self-citations are present that would create circularity; the evaluation chain relies on independent participant input and is therefore self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

As an applied systems paper in human-computer interaction, the central claim depends on the described multi-agent architecture and the outcomes of the user study. No free parameters, standard mathematical axioms, or newly invented entities are indicated in the abstract.

pith-pipeline@v0.9.1-grok · 5697 in / 1139 out tokens · 40795 ms · 2026-07-03T07:16:54.542767+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

96 extracted references · 19 canonical work pages · 8 internal anchors

[1]

Why Do LLM-based Web Agents Fail? A Hierarchical Planning Perspective

Mohamed Aghzal, Gregory J. Stein, and Ziyu Yao. 2026. Why Do LLM-based Web Agents Fail? A Hierarchical Planning Perspective.arXiv preprint arXiv:2603.14248 (2026). https://arxiv.org/abs/2603.14248

work page internal anchor Pith review Pith/arXiv arXiv 2026
[2]

Narges Ashtari, Andrea Bunt, Joanna McGrenere, Michael Nebeling, and Parmit K Chilana. 2020. Creating augmented and virtual reality applications: Current practices, challenges, and opportunities. InProceedings of the 2020 CHI conference on human factors in computing systems. 1–13

2020
[3]

Adam O Bebko and Nikolaus F Troje. 2020. bmlTUX: Design and control of exper- iments in virtual reality and beyond.i-Perception11, 4 (2020), 2041669520938400

2020
[4]

John Brooke et al. 1996. SUS-A quick and dirty usability scale.Usability evaluation in industry189, 194 (1996), 4–7

1996
[5]

Jack Brookes, Matthew Warburton, Mshari Alghadier, Mark Mon-Williams, and Faisal Mushtaq. 2020. Studying human behavior with virtual reality: The Unity Experiment Framework.Behavior research methods52, 2 (2020), 455–463

2020
[6]

Alessandro Carcangiu, Marco Manca, Jacopo Mereu, Carmen Santoro, Ludovica Simeoli, and Lucio Davide Spano. 2025. Tell-XR: Conversational End-User De- velopment of XR Automations. InHuman-Computer Interaction – INTERACT

2025
[7]

doi:10.1007/978-3-032-04999-5_35

Springer. doi:10.1007/978-3-032-04999-5_35

work page doi:10.1007/978-3-032-04999-5_35
[8]

Patrick Carlson, Anicia Peters, Stephen B Gilbert, Judy M Vance, and Andy Luse
[9]

Virtual training: Learning transfer of assembly tasks.IEEE transactions on visualization and computer graphics21, 6 (2015), 770–782

2015
[10]

Dong Chen, Shaoxin Lin, Muhan Zeng, Daoguang Zan, Jian-Gang Wang, Anton Cheshkov, Jun Sun, Hao Yu, Guoliang Dong, Artem Aliev, et al. 2024. Coder: Issue resolving with multi-agent and task graphs.arXiv preprint arXiv:2406.01304 (2024)

work page arXiv 2024
[11]

Weize Chen, Yusheng Su, Jingwei Zuo, Cheng Yang, Chenfei Yuan, Chi-Min Chan, Heyang Yu, Yaxi Lu, Yi-Hsin Hung, Chen Qian, et al. 2023. Agentverse: Facilitating multi-agent collaboration and exploring emergent behaviors. InThe Twelfth International Conference on Learning Representations

2023
[12]

Tor-Salve Dalsgaard, Jarrod Knibbe, and Joanna Bergström. 2021. Modeling pointing for 3D target selection in VR. InProceedings of the 27th ACM symposium on virtual reality software and technology. 1–10

2021
[13]

Fernanda De La Torre, Cathy Mengying Fang, Han Huang, Andrzej Banburski- Fahey, Judith Amores Fernandez, and Jaron Lanier. 2024. Llmr: Real-time prompt- ing of interactive worlds using large language models. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems. 1–22

2024
[14]

Xiang Deng, Yu Gu, Boyuan Zheng, Shijie Chen, Sam Stevens, Boshi Wang, Huan Sun, and Yu Su. 2023. Mind2web: Towards a generalist agent for the web. Advances in Neural Information Processing Systems36 (2023), 28091–28114

2023
[15]

Ruofei Du, Benjamin Hersh, David Li, Nels Numan, Xun Qian, Yanhe Chen, Zhongyi Zhou, Xingyue Chen, Jiahao Ren, Robert Timothy Bettridge, et al. 2026. Vibe Coding XR: Accelerating AI+ XR Prototyping with XR Blocks and Gemini. arXiv preprint arXiv:2603.24591(2026)

work page internal anchor Pith review Pith/arXiv arXiv 2026
[16]

Yilun Du, Shuang Li, Antonio Torralba, Joshua B Tenenbaum, and Igor Mordatch
[17]

InForty-first international conference on machine learning

Improving factuality and reasoning in language models through multiagent debate. InForty-first international conference on machine learning
[18]

Jonathan Ehret, Andrea Bönsch, Janina Fels, Sabine J Schlittmeier, and Torsten W Kuhlen. 2024. StudyFramework: Comfortably setting up and conducting factorial- design studies using the unreal engine. In2024 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW). IEEE, 442–449

2024
[19]

Epic Games. 2026. Unreal Engine: The Most Powerful Real-Time 3D Creation Tool. https://www.unrealengine.com/. Accessed: 2026-03-31

2026
[20]

Andrew Estornell and Yang Liu. 2024. Multi-llm debate: Framework, principals, and interventions.Advances in Neural Information Processing Systems37 (2024), 28938–28964

2024
[21]

A Fourney, G Bansal, H Mozannar, C Tan, E Salinas, Zhu Erkang, F Niedt- ner, G Proebsting, G Bassman, J Gerrits, et al . [n. d.]. Magentic-one: A gen- eralist multi-agent system for solving complex tasks, 2024.URL https://arxiv. org/abs/2411.04468([n. d.])

work page internal anchor Pith review Pith/arXiv arXiv 2024
[22]

Dawei Gao, Zitao Li, Xuchen Pan, Weirui Kuang, Zhijian Ma, Bingchen Qian, Fei Wei, Wenhao Zhang, Yuexiang Xie, Daoyuan Chen, et al. 2024. Agentscope: A flexible yet robust multi-agent platform.arXiv preprint arXiv:2402.14034(2024)

work page arXiv 2024
[23]

Gege Gao, Weiyang Liu, Anpei Chen, Andreas Geiger, and Bernhard Schölkopf
[24]

InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Graphdreamer: Compositional 3d scene synthesis from scene graphs. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 21295–21304
[25]

Alireza Ghafarollahi and Markus J Buehler. 2024. ProtAgents: protein discovery via large language model multi-agent collaborations combining physics and machine learning.Digital Discovery3, 7 (2024), 1389–1409

2024
[26]

Juraj Gottweis, Wei-Hung Weng, Alexander Daryin, Tao Tu, Anil Palepu, Petar Sirkovic, Artiom Myaskovsky, Felix Weissenberger, Keran Rong, Ryutaro Tanno, et al. 2025. Towards an AI co-scientist.arXiv preprint arXiv:2502.18864(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[27]

Judith Hartfill, Jenny Gabel, Lucie Kruse, Susanne Schmidt, Kevin Riebandt, Simone Kühn, and Frank Steinicke. 2021. Analysis of detection thresholds for hand redirection during mid-air interactions in virtual reality. InProceedings of the 27th ACM Symposium on Virtual Reality Software and Technology. 1–10

2021
[28]

Yichen He, Guanhua Huang, Peiyuan Feng, Yuan Lin, Yuchen Zhang, Hang Li, et al. 2025. Pasa: An llm agent for comprehensive academic paper search. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 11663–11679

2025
[29]

Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, et al
[30]

InThe twelfth international conference on learning representations

MetaGPT: Meta programming for a multi-agent collaborative framework. InThe twelfth international conference on learning representations
[31]

Junyi Hou, Andre Lin Huikai, Nuo Chen, Yiwei Gong, and Bingsheng He. 2025. PaperDebugger: A Plugin-Based Multi-Agent System for In-Editor Academic Writing, Review, and Editing.arXiv preprint arXiv:2512.02589(2025)

work page arXiv 2025
[32]

Ziniu Hu, Ahmet Iscen, Aashi Jain, Thomas Kipf, Yisong Yue, David A Ross, Cordelia Schmid, and Alireza Fathi. 2024. Scenecraft: An llm agent for synthesiz- ing 3d scenes as blender code. InForty-first International Conference on Machine Learning

2024
[33]

Sebastian Hubenschmid, Jonathan Wieland, Daniel Immanuel Fink, Andrea Batch, Johannes Zagermann, Niklas Elmqvist, and Harald Reiterer. 2022. Relive: Bridging in-situ and ex-situ visual analytics for analyzing mixed reality user studies. InProceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–20

2022
[34]

Tal Ifargan, Lukas Hafner, Maor Kern, Ori Alcalay, and Roy Kishony. 2025. Au- tonomous LLM-driven research—from data to human-verifiable research papers. NEJM AI2, 1 (2025), AIoa2400555

2025
[35]

Charles Javerliat, Sophie Villenave, Pierre Raimbaud, and Guillaume Lavoué. 2024. Plume: Record, replay, analyze and share user behavior in 6dof xr experiences. IEEE Transactions on Visualization and Computer Graphics30, 5 (2024), 2087– 2097

2024
[36]

Hyeonsu B Kang, Tongshuang Wu, Joseph Chee Chang, and Aniket Kittur. 2023. Synergi: A mixed-initiative system for scholarly synthesis and sensemaking. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. 1–19

2023
[37]

Veronika Krauß, Alexander Boden, Leif Oppermann, and René Reiners. 2021. Current practices, challenges, and design implications for collaborative ar/vr application development. InProceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–15

2021
[38]

Radha Kumaran, You-Jin Kim, Anne E Milner, Tom Bullock, Barry Giesbrecht, and Tobias Höllerer. 2023. The impact of navigation aids on search performance and object recall in wide-area augmented reality. InProceedings of the 2023 CHI conference on human factors in computing systems. 1–17

2023
[39]

Jakub Lála, Odhran O’Donoghue, Aleksandar Shtedritski, Sam Cox, Samuel G Rodriques, and Andrew D White. 2023. Paperqa: Retrieval-augmented generative agent for scientific research.arXiv preprint arXiv:2312.07559(2023)

work page arXiv 2023
[40]

Cheryl Lee, Chunqiu Steven Xia, Longji Yang, Jen-tse Huang, Zhouruixing Zhu, Lingming Zhang, and Michael R Lyu. 2025. Unidebugger: Hierarchical multi- agent framework for unified software debugging. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 18248–18277

2025
[41]

Jaewook Lee, Filippo Aleotti, Diego Mazala, Guillermo Garcia-Hernando, Sara Vicente, Oliver James Johnston, Isabel Kraus-Liang, Jakub Powierza, Donghoon Shin, Jon E Froehlich, et al. 2025. Imaginatear: Ai-assisted in-situ authoring in augmented reality. InProceedings of the 38th Annual ACM Symposium on User Interface Software and Technology. 1–21

2025
[42]

Jaewook Lee, Raahul Natarrajan, Sebastian S Rodriguez, Payod Panda, and Eyal Ofek. 2022. Remotelab: A vr remote study toolkit. InProceedings of the 35th Annual ACM Symposium on User Interface Software and Technology. 1–9

2022
[43]

David Li, Nels Numan, Xun Qian, Yanhe Chen, Zhongyi Zhou, Evgenii Alekseev, Geonsun Lee, Alex Cooper, Min Xia, Scott Chung, et al. 2025. XR Blocks: Accel- erating Human-Centered AI+ XR Innovation.arXiv preprint arXiv:2509.25504 (2025)

work page arXiv 2025
[44]

Ruochen Li, Teerth Patel, Qingyun Wang, and Xinya Du. 2024. Mlr-copilot: Autonomous machine learning research based on large language models agents. arXiv preprint arXiv:2408.14033(2024)

work page arXiv 2024
[45]

Yunxuan Li, Yibing Du, Jiageng Zhang, Le Hou, Peter Grabowski, Yeqing Li, and Eugene Ie. 2024. Improving multi-agent debate with sparse communication topology. InFindings of the Association for Computational Linguistics: EMNLP Shuqi Liao, Chenfei Zhu, Karthik Ramani, and Voicu Popescu

2024
[46]

Tian Liang, Zhiwei He, Wenxiang Jiao, Xing Wang, Yan Wang, Rui Wang, Yujiu Yang, Shuming Shi, and Zhaopeng Tu. 2024. Encouraging divergent thinking in large language models through multi-agent debate. InProceedings of the 2024 conference on empirical methods in natural language processing. 17889–17904

2024
[47]

Nelson F Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. 2024. Lost in the middle: How language models use long contexts.Transactions of the association for computational linguistics12 (2024), 157–173

2024
[48]

Chris Lu, Cong Lu, Robert Tjarko Lange, Jakob Foerster, Jeff Clune, and David Ha
[49]

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

The ai scientist: Towards fully automated open-ended scientific discovery. arXiv preprint arXiv:2408.06292(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[50]

Donghyeok Ma, Hanbee Jang, Joon Hyub Lee, and Seok-Hyung Bae. 2025. Gar- den of Papers: Finding, Reading, and Organizing Research Papers in a Visual, Integrated, and Flexible Workspace. InProceedings of the 38th Annual ACM Symposium on User Interface Software and Technology. 1–15

2025
[51]

Jacopo Mereu, Valentino Artizzu, Alessandro Carcangiu, Lucio Davide Spano, Ludovica Simeoli, Andrea Mattioli, Marco Manca, Carmen Santoro, and Fabio Paternò. 2024. Empowering end-user in creating extended reality content with a conversational chatbot. InInternational Symposium on Engineering Interactive Computer Systems. Springer, 126–137

2024
[52]

Paul Milgram and Fumio Kishino. 1994. A taxonomy of mixed reality visual displays.IEICE Transactions on Information and Systems77, 12 (1994), 1321–1329

1994
[53]

Model Context Protocol. 2025. Model Context Protocol Specification. https: //modelcontextprotocol.io/specification/2025-11-25. Accessed: 2026-03-29

2025
[54]

Michael Nebeling, Maximilian Speicher, Xizi Wang, Shwetha Rajaram, Brian D Hall, Zijian Xie, Alexander RE Raistrick, Michelle Aebersold, Edward G Happ, Jiayin Wang, et al. 2020. MRAT: The mixed reality analytics toolkit. InProceedings of the 2020 CHI Conference on human factors in computing systems. 1–12

2020
[55]

Cassandra Overney, Belén Saldías, Dimitra Dimitrakopoulou, and Deb Roy. 2024. Sensemate: An accessible and beginner-friendly human-ai platform for qualita- tive data analysis. InProceedings of the 29th International Conference on Intelligent User Interfaces. 922–939

2024
[56]

Xueni Pan and Antonia F de C Hamilton. 2018. Why and how to use virtual reality to study human social interaction: The challenges of exploring a new research landscape.British Journal of Psychology109, 3 (2018), 395–417

2018
[57]

Stéven Picard, Ningyuan Sun, and Jean Botev. 2024. XR MUSE: An Open-Source Unity Framework for Extended Reality-Based Networked Multi-User Studies. Virtual Worlds3, 4 (2024), 404–417. doi:10.3390/virtualworlds3040022

work page doi:10.3390/virtualworlds3040022 2024
[58]

Chen Qian, Wei Liu, Hongzhang Liu, Nuo Chen, Yufan Dang, Jiahao Li, Cheng Yang, Weize Chen, Yusheng Su, Xin Cong, et al. 2024. Chatdev: Communicative agents for software development. InProceedings of the 62nd annual meeting of the association for computational linguistics (volume 1: Long papers). 15174–15186

2024
[59]

Yujia Qin, Yining Ye, Junjie Fang, Haoming Wang, Shihao Liang, Shizuo Tian, Junda Zhang, Jiahao Li, Yunxin Li, Shijue Huang, et al. 2025. Ui-tars: Pioneering automated gui interaction with native agents.arXiv preprint arXiv:2501.12326 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[60]

Jack Ratcliffe, Francesco Soave, Nick Bryan-Kinns, Laurissa Tokarchuk, and Ildar Farkhatdinov. 2021. Extended reality (XR) remote research: A survey of drawbacks and opportunities. InProceedings of the 2021 CHI conference on human factors in computing systems. 1–13

2021
[61]

Samuel Schmidgall, Yusheng Su, Ze Wang, Ximeng Sun, Jialian Wu, Xiaodong Yu, Jiang Liu, Michael Moor, Zicheng Liu, and Emad Barsoum. 2025. Agent laboratory: Using llm agents as research assistants.Findings of the Association for Computational Linguistics: EMNLP 2025(2025), 5977–6043

2025
[62]

Michael D Skarlinski, Sam Cox, Jon M Laurent, James D Braza, Michaela Hinks, Michael J Hammerling, Manvitha Ponnapati, Samuel G Rodriques, and Andrew D White. 2024. Language agents achieve superhuman synthesis of scientific knowl- edge.arXiv preprint arXiv:2409.13740(2024)

work page arXiv 2024
[63]

Maximilian Speicher, Brian D Hall, and Michael Nebeling. 2019. What is mixed reality?. InProceedings of the 2019 CHI conference on human factors in computing systems. 1–15

2019
[64]

Anthony Steed, Lisa Izzouzi, Klara Brandstätter, Sebastian Friston, Ben Congdon, Otto Olkkonen, Daniele Giunchi, Nels Numan, and David Swapp. 2022. Ubiq-exp: A toolkit to build and run remote and distributed mixed reality experiments. Frontiers in Virtual Reality3 (2022), 912078

2022
[65]

Helen Stefanidi, Asterios Leonidis, Maria Korozi, and George Papagiannakis
[66]

In2022 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct)

The ARgus Designer: Supporting experts while conducting user studies of AR/MR applications. In2022 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct). IEEE, 885–890
[67]

Wei Tao, Yucheng Zhou, Yanlin Wang, Wenqiang Zhang, Hongyu Zhang, and Yu Cheng. 2024. Magis: Llm-based multi-agent framework for github issue resolution.Advances in Neural Information Processing Systems37 (2024), 51963– 51993

2024
[68]

Unity Technologies. 2026. Unity Engine: 2D & 3D Development Platform. https: //unity.com/products/unity-engine. Accessed: 2026-03-31

2026
[69]

Xingbo Wang, Samantha L Huey, Rui Sheng, Saurabh Mehta, and Fei Wang
[70]

Scidasynth: Interactive structured knowledge extraction and synthesis from scientific literature with large language model.arXiv e-prints(2024), arXiv– 2404

2024
[71]

Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, et al. 2024. Autogen: Enabling next-gen LLM applications via multi-agent conversations. InFirst conference on language modeling

2024
[72]

Liwenhan Xie, Chengbo Zheng, Haijun Xia, Huamin Qu, and Chen Zhu-Tian
[73]

InProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology

Waitgpt: Monitoring and steering conversational llm agent in data anal- ysis with on-the-fly code visualization. InProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology. 1–14
[74]

Yutaro Yamada, Robert Tjarko Lange, Cong Lu, Shengran Hu, Chris Lu, Jakob Foerster, Jeff Clune, and David Ha. 2025. The ai scientist-v2: Workshop-level auto- mated scientific discovery via agentic tree search.arXiv preprint arXiv:2504.08066 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[75]

John Yang, Carlos E Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press. 2024. Swe-agent: Agent-computer interfaces enable automated software engineering.Advances in Neural Information Processing Systems37 (2024), 50528–50652

2024
[76]

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R Narasimhan, and Yuan Cao. 2022. React: Synergizing reasoning and acting in language models. InThe eleventh international conference on learning representations

2022
[77]

Chenglin Yu, Yuchen Wang, Songmiao Wang, Hongxia Yang, and Ming Li. 2026. InfiAgent: An Infinite-Horizon Framework for General-Purpose Autonomous Agents.arXiv preprint arXiv:2601.03204(2026). https://arxiv.org/abs/2601.03204

work page arXiv 2026
[78]

Chengbo Zheng, Yuanhao Zhang, Zeyu Huang, Chuhan Shi, Minrui Xu, and Xiaojuan Ma. 2024. Disciplink: Unfolding interdisciplinary information seeking process via human-ai co-exploration. InProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology. 1–20

2024
[79]

Shuyan Zhou, Frank F Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Tianyue Ou, Yonatan Bisk, Daniel Fried, et al. 2023. Webarena: A realistic web environment for building autonomous agents.arXiv preprint arXiv:2307.13854(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[80]

Chenfei Zhu, Shao-Kang Hsia, Xiyun Hu, Ziyi Liu, Jingyu Shi, and Karthik Ramani. 2025. agentAR: Creating Augmented Reality Applications with Tool- Augmented LLM-based Autonomous Agents. InProceedings of the 38th Annual ACM Symposium on User Interface Software and Technology. doi:10.1145/3746059. 3747676 OrchestrXR: A Multi-Agent System for Idea-to-Prototy...

work page doi:10.1145/3746059 2025

Showing first 80 references.

[1] [1]

Why Do LLM-based Web Agents Fail? A Hierarchical Planning Perspective

Mohamed Aghzal, Gregory J. Stein, and Ziyu Yao. 2026. Why Do LLM-based Web Agents Fail? A Hierarchical Planning Perspective.arXiv preprint arXiv:2603.14248 (2026). https://arxiv.org/abs/2603.14248

work page internal anchor Pith review Pith/arXiv arXiv 2026

[2] [2]

Narges Ashtari, Andrea Bunt, Joanna McGrenere, Michael Nebeling, and Parmit K Chilana. 2020. Creating augmented and virtual reality applications: Current practices, challenges, and opportunities. InProceedings of the 2020 CHI conference on human factors in computing systems. 1–13

2020

[3] [3]

Adam O Bebko and Nikolaus F Troje. 2020. bmlTUX: Design and control of exper- iments in virtual reality and beyond.i-Perception11, 4 (2020), 2041669520938400

2020

[4] [4]

John Brooke et al. 1996. SUS-A quick and dirty usability scale.Usability evaluation in industry189, 194 (1996), 4–7

1996

[5] [5]

Jack Brookes, Matthew Warburton, Mshari Alghadier, Mark Mon-Williams, and Faisal Mushtaq. 2020. Studying human behavior with virtual reality: The Unity Experiment Framework.Behavior research methods52, 2 (2020), 455–463

2020

[6] [6]

Alessandro Carcangiu, Marco Manca, Jacopo Mereu, Carmen Santoro, Ludovica Simeoli, and Lucio Davide Spano. 2025. Tell-XR: Conversational End-User De- velopment of XR Automations. InHuman-Computer Interaction – INTERACT

2025

[7] [7]

doi:10.1007/978-3-032-04999-5_35

Springer. doi:10.1007/978-3-032-04999-5_35

work page doi:10.1007/978-3-032-04999-5_35

[8] [8]

Patrick Carlson, Anicia Peters, Stephen B Gilbert, Judy M Vance, and Andy Luse

[9] [9]

Virtual training: Learning transfer of assembly tasks.IEEE transactions on visualization and computer graphics21, 6 (2015), 770–782

2015

[10] [10]

Dong Chen, Shaoxin Lin, Muhan Zeng, Daoguang Zan, Jian-Gang Wang, Anton Cheshkov, Jun Sun, Hao Yu, Guoliang Dong, Artem Aliev, et al. 2024. Coder: Issue resolving with multi-agent and task graphs.arXiv preprint arXiv:2406.01304 (2024)

work page arXiv 2024

[11] [11]

Weize Chen, Yusheng Su, Jingwei Zuo, Cheng Yang, Chenfei Yuan, Chi-Min Chan, Heyang Yu, Yaxi Lu, Yi-Hsin Hung, Chen Qian, et al. 2023. Agentverse: Facilitating multi-agent collaboration and exploring emergent behaviors. InThe Twelfth International Conference on Learning Representations

2023

[12] [12]

Tor-Salve Dalsgaard, Jarrod Knibbe, and Joanna Bergström. 2021. Modeling pointing for 3D target selection in VR. InProceedings of the 27th ACM symposium on virtual reality software and technology. 1–10

2021

[13] [13]

Fernanda De La Torre, Cathy Mengying Fang, Han Huang, Andrzej Banburski- Fahey, Judith Amores Fernandez, and Jaron Lanier. 2024. Llmr: Real-time prompt- ing of interactive worlds using large language models. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems. 1–22

2024

[14] [14]

Xiang Deng, Yu Gu, Boyuan Zheng, Shijie Chen, Sam Stevens, Boshi Wang, Huan Sun, and Yu Su. 2023. Mind2web: Towards a generalist agent for the web. Advances in Neural Information Processing Systems36 (2023), 28091–28114

2023

[15] [15]

Ruofei Du, Benjamin Hersh, David Li, Nels Numan, Xun Qian, Yanhe Chen, Zhongyi Zhou, Xingyue Chen, Jiahao Ren, Robert Timothy Bettridge, et al. 2026. Vibe Coding XR: Accelerating AI+ XR Prototyping with XR Blocks and Gemini. arXiv preprint arXiv:2603.24591(2026)

work page internal anchor Pith review Pith/arXiv arXiv 2026

[16] [16]

Yilun Du, Shuang Li, Antonio Torralba, Joshua B Tenenbaum, and Igor Mordatch

[17] [17]

InForty-first international conference on machine learning

Improving factuality and reasoning in language models through multiagent debate. InForty-first international conference on machine learning

[18] [18]

Jonathan Ehret, Andrea Bönsch, Janina Fels, Sabine J Schlittmeier, and Torsten W Kuhlen. 2024. StudyFramework: Comfortably setting up and conducting factorial- design studies using the unreal engine. In2024 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW). IEEE, 442–449

2024

[19] [19]

Epic Games. 2026. Unreal Engine: The Most Powerful Real-Time 3D Creation Tool. https://www.unrealengine.com/. Accessed: 2026-03-31

2026

[20] [20]

Andrew Estornell and Yang Liu. 2024. Multi-llm debate: Framework, principals, and interventions.Advances in Neural Information Processing Systems37 (2024), 28938–28964

2024

[21] [21]

A Fourney, G Bansal, H Mozannar, C Tan, E Salinas, Zhu Erkang, F Niedt- ner, G Proebsting, G Bassman, J Gerrits, et al . [n. d.]. Magentic-one: A gen- eralist multi-agent system for solving complex tasks, 2024.URL https://arxiv. org/abs/2411.04468([n. d.])

work page internal anchor Pith review Pith/arXiv arXiv 2024

[22] [22]

Dawei Gao, Zitao Li, Xuchen Pan, Weirui Kuang, Zhijian Ma, Bingchen Qian, Fei Wei, Wenhao Zhang, Yuexiang Xie, Daoyuan Chen, et al. 2024. Agentscope: A flexible yet robust multi-agent platform.arXiv preprint arXiv:2402.14034(2024)

work page arXiv 2024

[23] [23]

Gege Gao, Weiyang Liu, Anpei Chen, Andreas Geiger, and Bernhard Schölkopf

[24] [24]

InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Graphdreamer: Compositional 3d scene synthesis from scene graphs. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 21295–21304

[25] [25]

Alireza Ghafarollahi and Markus J Buehler. 2024. ProtAgents: protein discovery via large language model multi-agent collaborations combining physics and machine learning.Digital Discovery3, 7 (2024), 1389–1409

2024

[26] [26]

Juraj Gottweis, Wei-Hung Weng, Alexander Daryin, Tao Tu, Anil Palepu, Petar Sirkovic, Artiom Myaskovsky, Felix Weissenberger, Keran Rong, Ryutaro Tanno, et al. 2025. Towards an AI co-scientist.arXiv preprint arXiv:2502.18864(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[27] [27]

Judith Hartfill, Jenny Gabel, Lucie Kruse, Susanne Schmidt, Kevin Riebandt, Simone Kühn, and Frank Steinicke. 2021. Analysis of detection thresholds for hand redirection during mid-air interactions in virtual reality. InProceedings of the 27th ACM Symposium on Virtual Reality Software and Technology. 1–10

2021

[28] [28]

Yichen He, Guanhua Huang, Peiyuan Feng, Yuan Lin, Yuchen Zhang, Hang Li, et al. 2025. Pasa: An llm agent for comprehensive academic paper search. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 11663–11679

2025

[29] [29]

Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, et al

[30] [30]

InThe twelfth international conference on learning representations

MetaGPT: Meta programming for a multi-agent collaborative framework. InThe twelfth international conference on learning representations

[31] [31]

Junyi Hou, Andre Lin Huikai, Nuo Chen, Yiwei Gong, and Bingsheng He. 2025. PaperDebugger: A Plugin-Based Multi-Agent System for In-Editor Academic Writing, Review, and Editing.arXiv preprint arXiv:2512.02589(2025)

work page arXiv 2025

[32] [32]

Ziniu Hu, Ahmet Iscen, Aashi Jain, Thomas Kipf, Yisong Yue, David A Ross, Cordelia Schmid, and Alireza Fathi. 2024. Scenecraft: An llm agent for synthesiz- ing 3d scenes as blender code. InForty-first International Conference on Machine Learning

2024

[33] [33]

Sebastian Hubenschmid, Jonathan Wieland, Daniel Immanuel Fink, Andrea Batch, Johannes Zagermann, Niklas Elmqvist, and Harald Reiterer. 2022. Relive: Bridging in-situ and ex-situ visual analytics for analyzing mixed reality user studies. InProceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–20

2022

[34] [34]

Tal Ifargan, Lukas Hafner, Maor Kern, Ori Alcalay, and Roy Kishony. 2025. Au- tonomous LLM-driven research—from data to human-verifiable research papers. NEJM AI2, 1 (2025), AIoa2400555

2025

[35] [35]

Charles Javerliat, Sophie Villenave, Pierre Raimbaud, and Guillaume Lavoué. 2024. Plume: Record, replay, analyze and share user behavior in 6dof xr experiences. IEEE Transactions on Visualization and Computer Graphics30, 5 (2024), 2087– 2097

2024

[36] [36]

Hyeonsu B Kang, Tongshuang Wu, Joseph Chee Chang, and Aniket Kittur. 2023. Synergi: A mixed-initiative system for scholarly synthesis and sensemaking. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. 1–19

2023

[37] [37]

Veronika Krauß, Alexander Boden, Leif Oppermann, and René Reiners. 2021. Current practices, challenges, and design implications for collaborative ar/vr application development. InProceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–15

2021

[38] [38]

Radha Kumaran, You-Jin Kim, Anne E Milner, Tom Bullock, Barry Giesbrecht, and Tobias Höllerer. 2023. The impact of navigation aids on search performance and object recall in wide-area augmented reality. InProceedings of the 2023 CHI conference on human factors in computing systems. 1–17

2023

[39] [39]

Jakub Lála, Odhran O’Donoghue, Aleksandar Shtedritski, Sam Cox, Samuel G Rodriques, and Andrew D White. 2023. Paperqa: Retrieval-augmented generative agent for scientific research.arXiv preprint arXiv:2312.07559(2023)

work page arXiv 2023

[40] [40]

Cheryl Lee, Chunqiu Steven Xia, Longji Yang, Jen-tse Huang, Zhouruixing Zhu, Lingming Zhang, and Michael R Lyu. 2025. Unidebugger: Hierarchical multi- agent framework for unified software debugging. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 18248–18277

2025

[41] [41]

Jaewook Lee, Filippo Aleotti, Diego Mazala, Guillermo Garcia-Hernando, Sara Vicente, Oliver James Johnston, Isabel Kraus-Liang, Jakub Powierza, Donghoon Shin, Jon E Froehlich, et al. 2025. Imaginatear: Ai-assisted in-situ authoring in augmented reality. InProceedings of the 38th Annual ACM Symposium on User Interface Software and Technology. 1–21

2025

[42] [42]

Jaewook Lee, Raahul Natarrajan, Sebastian S Rodriguez, Payod Panda, and Eyal Ofek. 2022. Remotelab: A vr remote study toolkit. InProceedings of the 35th Annual ACM Symposium on User Interface Software and Technology. 1–9

2022

[43] [43]

David Li, Nels Numan, Xun Qian, Yanhe Chen, Zhongyi Zhou, Evgenii Alekseev, Geonsun Lee, Alex Cooper, Min Xia, Scott Chung, et al. 2025. XR Blocks: Accel- erating Human-Centered AI+ XR Innovation.arXiv preprint arXiv:2509.25504 (2025)

work page arXiv 2025

[44] [44]

Ruochen Li, Teerth Patel, Qingyun Wang, and Xinya Du. 2024. Mlr-copilot: Autonomous machine learning research based on large language models agents. arXiv preprint arXiv:2408.14033(2024)

work page arXiv 2024

[45] [45]

Yunxuan Li, Yibing Du, Jiageng Zhang, Le Hou, Peter Grabowski, Yeqing Li, and Eugene Ie. 2024. Improving multi-agent debate with sparse communication topology. InFindings of the Association for Computational Linguistics: EMNLP Shuqi Liao, Chenfei Zhu, Karthik Ramani, and Voicu Popescu

2024

[46] [46]

Tian Liang, Zhiwei He, Wenxiang Jiao, Xing Wang, Yan Wang, Rui Wang, Yujiu Yang, Shuming Shi, and Zhaopeng Tu. 2024. Encouraging divergent thinking in large language models through multi-agent debate. InProceedings of the 2024 conference on empirical methods in natural language processing. 17889–17904

2024

[47] [47]

Nelson F Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. 2024. Lost in the middle: How language models use long contexts.Transactions of the association for computational linguistics12 (2024), 157–173

2024

[48] [48]

Chris Lu, Cong Lu, Robert Tjarko Lange, Jakob Foerster, Jeff Clune, and David Ha

[49] [49]

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

The ai scientist: Towards fully automated open-ended scientific discovery. arXiv preprint arXiv:2408.06292(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[50] [50]

Donghyeok Ma, Hanbee Jang, Joon Hyub Lee, and Seok-Hyung Bae. 2025. Gar- den of Papers: Finding, Reading, and Organizing Research Papers in a Visual, Integrated, and Flexible Workspace. InProceedings of the 38th Annual ACM Symposium on User Interface Software and Technology. 1–15

2025

[51] [51]

Jacopo Mereu, Valentino Artizzu, Alessandro Carcangiu, Lucio Davide Spano, Ludovica Simeoli, Andrea Mattioli, Marco Manca, Carmen Santoro, and Fabio Paternò. 2024. Empowering end-user in creating extended reality content with a conversational chatbot. InInternational Symposium on Engineering Interactive Computer Systems. Springer, 126–137

2024

[52] [52]

Paul Milgram and Fumio Kishino. 1994. A taxonomy of mixed reality visual displays.IEICE Transactions on Information and Systems77, 12 (1994), 1321–1329

1994

[53] [53]

Model Context Protocol. 2025. Model Context Protocol Specification. https: //modelcontextprotocol.io/specification/2025-11-25. Accessed: 2026-03-29

2025

[54] [54]

Michael Nebeling, Maximilian Speicher, Xizi Wang, Shwetha Rajaram, Brian D Hall, Zijian Xie, Alexander RE Raistrick, Michelle Aebersold, Edward G Happ, Jiayin Wang, et al. 2020. MRAT: The mixed reality analytics toolkit. InProceedings of the 2020 CHI Conference on human factors in computing systems. 1–12

2020

[55] [55]

Cassandra Overney, Belén Saldías, Dimitra Dimitrakopoulou, and Deb Roy. 2024. Sensemate: An accessible and beginner-friendly human-ai platform for qualita- tive data analysis. InProceedings of the 29th International Conference on Intelligent User Interfaces. 922–939

2024

[56] [56]

Xueni Pan and Antonia F de C Hamilton. 2018. Why and how to use virtual reality to study human social interaction: The challenges of exploring a new research landscape.British Journal of Psychology109, 3 (2018), 395–417

2018

[57] [57]

Stéven Picard, Ningyuan Sun, and Jean Botev. 2024. XR MUSE: An Open-Source Unity Framework for Extended Reality-Based Networked Multi-User Studies. Virtual Worlds3, 4 (2024), 404–417. doi:10.3390/virtualworlds3040022

work page doi:10.3390/virtualworlds3040022 2024

[58] [58]

Chen Qian, Wei Liu, Hongzhang Liu, Nuo Chen, Yufan Dang, Jiahao Li, Cheng Yang, Weize Chen, Yusheng Su, Xin Cong, et al. 2024. Chatdev: Communicative agents for software development. InProceedings of the 62nd annual meeting of the association for computational linguistics (volume 1: Long papers). 15174–15186

2024

[59] [59]

Yujia Qin, Yining Ye, Junjie Fang, Haoming Wang, Shihao Liang, Shizuo Tian, Junda Zhang, Jiahao Li, Yunxin Li, Shijue Huang, et al. 2025. Ui-tars: Pioneering automated gui interaction with native agents.arXiv preprint arXiv:2501.12326 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[60] [60]

Jack Ratcliffe, Francesco Soave, Nick Bryan-Kinns, Laurissa Tokarchuk, and Ildar Farkhatdinov. 2021. Extended reality (XR) remote research: A survey of drawbacks and opportunities. InProceedings of the 2021 CHI conference on human factors in computing systems. 1–13

2021

[61] [61]

Samuel Schmidgall, Yusheng Su, Ze Wang, Ximeng Sun, Jialian Wu, Xiaodong Yu, Jiang Liu, Michael Moor, Zicheng Liu, and Emad Barsoum. 2025. Agent laboratory: Using llm agents as research assistants.Findings of the Association for Computational Linguistics: EMNLP 2025(2025), 5977–6043

2025

[62] [62]

Michael D Skarlinski, Sam Cox, Jon M Laurent, James D Braza, Michaela Hinks, Michael J Hammerling, Manvitha Ponnapati, Samuel G Rodriques, and Andrew D White. 2024. Language agents achieve superhuman synthesis of scientific knowl- edge.arXiv preprint arXiv:2409.13740(2024)

work page arXiv 2024

[63] [63]

Maximilian Speicher, Brian D Hall, and Michael Nebeling. 2019. What is mixed reality?. InProceedings of the 2019 CHI conference on human factors in computing systems. 1–15

2019

[64] [64]

Anthony Steed, Lisa Izzouzi, Klara Brandstätter, Sebastian Friston, Ben Congdon, Otto Olkkonen, Daniele Giunchi, Nels Numan, and David Swapp. 2022. Ubiq-exp: A toolkit to build and run remote and distributed mixed reality experiments. Frontiers in Virtual Reality3 (2022), 912078

2022

[65] [65]

Helen Stefanidi, Asterios Leonidis, Maria Korozi, and George Papagiannakis

[66] [66]

In2022 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct)

The ARgus Designer: Supporting experts while conducting user studies of AR/MR applications. In2022 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct). IEEE, 885–890

[67] [67]

Wei Tao, Yucheng Zhou, Yanlin Wang, Wenqiang Zhang, Hongyu Zhang, and Yu Cheng. 2024. Magis: Llm-based multi-agent framework for github issue resolution.Advances in Neural Information Processing Systems37 (2024), 51963– 51993

2024

[68] [68]

Unity Technologies. 2026. Unity Engine: 2D & 3D Development Platform. https: //unity.com/products/unity-engine. Accessed: 2026-03-31

2026

[69] [69]

Xingbo Wang, Samantha L Huey, Rui Sheng, Saurabh Mehta, and Fei Wang

[70] [70]

Scidasynth: Interactive structured knowledge extraction and synthesis from scientific literature with large language model.arXiv e-prints(2024), arXiv– 2404

2024

[71] [71]

Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, et al. 2024. Autogen: Enabling next-gen LLM applications via multi-agent conversations. InFirst conference on language modeling

2024

[72] [72]

Liwenhan Xie, Chengbo Zheng, Haijun Xia, Huamin Qu, and Chen Zhu-Tian

[73] [73]

InProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology

Waitgpt: Monitoring and steering conversational llm agent in data anal- ysis with on-the-fly code visualization. InProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology. 1–14

[74] [74]

Yutaro Yamada, Robert Tjarko Lange, Cong Lu, Shengran Hu, Chris Lu, Jakob Foerster, Jeff Clune, and David Ha. 2025. The ai scientist-v2: Workshop-level auto- mated scientific discovery via agentic tree search.arXiv preprint arXiv:2504.08066 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[75] [75]

John Yang, Carlos E Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press. 2024. Swe-agent: Agent-computer interfaces enable automated software engineering.Advances in Neural Information Processing Systems37 (2024), 50528–50652

2024

[76] [76]

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R Narasimhan, and Yuan Cao. 2022. React: Synergizing reasoning and acting in language models. InThe eleventh international conference on learning representations

2022

[77] [77]

Chenglin Yu, Yuchen Wang, Songmiao Wang, Hongxia Yang, and Ming Li. 2026. InfiAgent: An Infinite-Horizon Framework for General-Purpose Autonomous Agents.arXiv preprint arXiv:2601.03204(2026). https://arxiv.org/abs/2601.03204

work page arXiv 2026

[78] [78]

Chengbo Zheng, Yuanhao Zhang, Zeyu Huang, Chuhan Shi, Minrui Xu, and Xiaojuan Ma. 2024. Disciplink: Unfolding interdisciplinary information seeking process via human-ai co-exploration. InProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology. 1–20

2024

[79] [79]

Shuyan Zhou, Frank F Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Tianyue Ou, Yonatan Bisk, Daniel Fried, et al. 2023. Webarena: A realistic web environment for building autonomous agents.arXiv preprint arXiv:2307.13854(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[80] [80]

Chenfei Zhu, Shao-Kang Hsia, Xiyun Hu, Ziyi Liu, Jingyu Shi, and Karthik Ramani. 2025. agentAR: Creating Augmented Reality Applications with Tool- Augmented LLM-based Autonomous Agents. InProceedings of the 38th Annual ACM Symposium on User Interface Software and Technology. doi:10.1145/3746059. 3747676 OrchestrXR: A Multi-Agent System for Idea-to-Prototy...

work page doi:10.1145/3746059 2025