Recognition: unknown
Beyond Chat and Clicks: GUI Agents for In-Situ Assistance via Live Interface Transformation
Pith reviewed 2026-05-10 11:16 UTC · model grok-4.3
The pith
GUI agents deliver help by directly editing live web interfaces through reversible DOM changes instead of separate chats.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper proposes in-situ assistance as a mode of support delivered directly within any live web interface through lightweight, browser-level interventions on the Document Object Model without rebuilding the application or modifying its underlying logic. A design space and computational pipeline characterize how GUI agents can insert, mutate, or recompose web elements to make interfaces easier to understand and navigate, instantiated in a Chrome extension that grounds requests to UI elements and executes reversible manipulations including contextual tooltips, control highlighting, and layout reorganization.
What carries the argument
The computational pipeline for DOM-mediated in-situ assistance that interprets user help requests and live interface context, grounds them to relevant UI elements, and executes reversible manipulations.
If this is right
- In-situ assistance becomes deployable on arbitrary web interfaces without application-specific engineering.
- Users receive contextual help that integrates directly into the live view through element changes.
- GUI agents shift from sideline conversational support to active live interface reconfiguration.
- Quantitative results confirm reliable and efficient assistance delivery on complex visual interfaces.
Where Pith is reading between the lines
- The method could extend to other platforms if similar access to interface elements is feasible.
- Agents might combine request grounding with usage pattern detection to offer adjustments proactively.
- Widespread use would require safeguards for dynamic pages where structure changes rapidly.
Load-bearing premise
Lightweight reversible manipulations of page structure can be performed reliably across arbitrary web interfaces without breaking functionality.
What would settle it
Applying the pipeline to a broad sample of popular web applications and observing frequent assistance failures or unintended breaks in original interface behavior.
Figures
read the original abstract
Complex visual interfaces are powerful yet have a steep learning curve, as users must navigate feature-rich visual interfaces while reasoning about domain-specific operations. Existing approaches either deliver assistance through a separate chat-based interaction, or require substantial application-specific engineering to build support natively into each interface. To address the gaps, we propose in-situ assistance: a mode of support delivered directly within any live web interface through lightweight, browser-level interventions on the Document Object Model (DOM), without rebuilding the application or modifying its underlying logic. We contribute a design space and a computational pipeline for DOM-mediated in-situ assistance, characterizing how GUI agents can insert, mutate, or recompose web elements to make the interface easier for users to understand, use, and navigate. We instantiate in-situ assistance in DOMSteer, a Chrome extension that interprets a user's help request and live interface context, grounds it to relevant UI elements, and executes reversible DOM manipulations directly on the live page to deliver assistance, including contextual tooltips, control highlighting, layout reorganization. Quantitative evaluations on two complex visual interfaces show that DOMSteer delivers reliable and efficient in-situ assistance. Use cases and a comparative user study with baseline ChatGPTAtlas demonstrate the usability and effectiveness of DOMSteer. Altogether, these findings point to a broader role for GUI agents: not just assisting from the sidelines, but actively reconfiguring live interfaces to support users in the moment.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes in-situ assistance as a new mode of support for complex web interfaces, achieved through lightweight, reversible interventions on the live DOM by a GUI agent in the DOMSteer Chrome extension. It contributes a design space and pipeline for inserting, mutating, or recomposing UI elements to provide contextual help, highlighting, and layout changes without modifying the underlying application. Claims are supported by quantitative evaluations on two interfaces showing reliable assistance and a comparative user study demonstrating usability over chat-based baselines.
Significance. If the approach generalizes reliably, this work could meaningfully advance GUI agents and HCI by enabling agents to actively reconfigure live interfaces in the moment rather than relying on separate chat or per-app engineering. The design space and pipeline for DOM-mediated transformations represent a concrete step toward more embedded agent assistance.
major comments (1)
- [Abstract and quantitative evaluations] The central claim that lightweight, reversible DOM manipulations (insert, mutate, recompose) can be performed reliably on arbitrary live web interfaces without breaking functionality or requiring app-specific engineering (Abstract) is load-bearing for the contribution. However, quantitative evaluations are reported on only two complex interfaces; no evidence is given that the grounding pipeline or manipulation primitives handle common cases such as virtual DOMs (React/Vue), shadow DOMs, heavy event delegation, or client-side state that can invalidate direct edits even when intended to be reversible.
minor comments (1)
- [Abstract] The abstract refers to 'two complex visual interfaces' without naming them; adding the specific interfaces and a brief characterization in the evaluation section would aid reader understanding.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address the concern about the scope of our evaluations and the generalizability of the DOM manipulation pipeline by clarifying its reliance on standard web APIs and committing to expanded discussion of limitations.
read point-by-point responses
-
Referee: [Abstract and quantitative evaluations] The central claim that lightweight, reversible DOM manipulations (insert, mutate, recompose) can be performed reliably on arbitrary live web interfaces without breaking functionality or requiring app-specific engineering (Abstract) is load-bearing for the contribution. However, quantitative evaluations are reported on only two complex interfaces; no evidence is given that the grounding pipeline or manipulation primitives handle common cases such as virtual DOMs (React/Vue), shadow DOMs, heavy event delegation, or client-side state that can invalidate direct edits even when intended to be reversible.
Authors: We appreciate the referee highlighting this important point. The claim of broad applicability is indeed central to the contribution. Our quantitative evaluations were performed on two complex interfaces (a feature-rich dashboard and a collaborative productivity tool), as reported in Sections 5 and 6, demonstrating reliable performance in those cases. The pipeline and primitives are intentionally built on standard browser DOM APIs (query selectors, element creation/mutation, and event preservation), which operate on the rendered live DOM after any framework-specific rendering occurs. Virtual DOM approaches (React/Vue) ultimately expose standard HTML elements, so post-render manipulations apply without app-specific engineering. Shadow DOM encapsulation can be traversed using standard extension APIs when the extension runs in the appropriate context. Heavy event delegation is addressed by preserving original listeners during insert/mutate/recompose operations and relying on reversible snapshots. Client-side state changes are mitigated via mutation observers and full DOM restoration on dismissal to ensure reversibility. We acknowledge, however, that these mechanisms were not exhaustively validated across every possible edge case in the current evaluations. In the revised version, we will add a dedicated 'Limitations and Future Extensions' subsection in the Discussion that explicitly discusses these scenarios, potential failure modes, and how the design space could be extended (e.g., framework-aware grounding). We will also moderate the abstract language from 'arbitrary live web interfaces' to 'a wide range of live web interfaces' to better reflect the evaluated scope. This revision strengthens the paper without altering the core technical contribution. revision: partial
Circularity Check
No significant circularity in the derivation chain
full rationale
The paper is a systems/HCI contribution that proposes in-situ assistance via lightweight DOM interventions, describes a design space and computational pipeline, instantiates it in DOMSteer, and supports the claims through quantitative evaluation on two interfaces plus a comparative user study. No mathematical derivations, equations, fitted parameters, or self-referential definitions appear in the abstract or description. Claims rest on system implementation details and independent empirical results rather than reducing by construction to inputs or self-citations. The work is self-contained against external benchmarks with no load-bearing circular steps.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Browser extensions can access and modify the DOM of live web pages
- domain assumption GUI agents can interpret user help requests and ground them to relevant UI elements
invented entities (2)
-
in-situ assistance
no independent evidence
-
DOMSteer
no independent evidence
Reference graph
Works this paper leans on
-
[1]
X-WebArena-Leaderboard
2026. X-WebArena-Leaderboard. https://docs.google.com/spreadsheets/d/ 1M801lEpBbKSNwP-vDBkC_pF7LdyGU1f_ufZb_NWNBZQ/edit?gid=0 Google Sheets leaderboard
2026
-
[2]
Agarwal and W
B. Agarwal and W. Stuerzlinger. 2013. WidgetLens: a system for adaptive content magnification of widgets. InProceedings of the 27th International BCS Human Computer Interaction Conference(London, UK)(BCS-HCI ’13). BCS Learning & Development Ltd., Swindon, GBR, Article 2, 10 pages
2013
-
[3]
LangChain AI. 2024. LangGraph: Build Stateful, Multi-Agent Applications with LLMs. https://github.com/langchain-ai/langgraph. Accessed: 2025
2024
-
[4]
In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems
Saleema Amershi, Dan Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N. Bennett, Kori Inkpen, Jaime Teevan, Ruth Kikin-Gil, and Eric Horvitz. 2019. Guidelines for Human- AI Interaction. InProceedings of the 2019 CHI Conference on Human Factors in Computing Systems(Glasgow, Scotland Uk)(CHI ’19). Associa...
-
[5]
Patrick Baudisch, Desney S. Tan, Maxime Collomb, Daniel Robbins, Ken Hinckley, Maneesh Agrawala, Shengdong Zhao, and Gonzalo Ramos. 2006. Phosphor: Ex- plaining Transitions in the User Interface Using Afterglow Effects. InProceedings of the 19th Annual ACM Symposium on User Interface Software and Technology (UIST ’06). ACM, New York, NY, USA, 169–178. doi...
-
[6]
Davide Ceneda, Theresia Gschwandtner, Thorsten May, Silvia Miksch, Hans-Jörg Schulz, Marc Streit, and Christian Tominski. 2017. Characterizing Guidance in Visual Analytics.IEEE Transactions on Visualization and Computer Graphics23, 1 (2017), 111–120. doi:10.1109/TVCG.2016.2598468
-
[7]
Juntong Chen, Jiang Wu, Jiajing Guo, Vikram Mohanty, Xueming Li, Jorge Pi- azentin Ono, Wenbin He, Liu Ren, and Dongyu Liu. 2025. InterChat: Enhancing generative visual analytics using multimodal interactions. InComputer Graphics Forum, Vol. 44. Wiley Online Library, e70112
2025
- [8]
-
[9]
Justin Cranshaw, Emad Elwany, Todd Newman, Rafal Kocielnik, Bowen Yu, Sandeep Soni, Jaime Teevan, and Andrés Monroy-Hernández. 2017. Calen- dar.help: Designing a Workflow-Based Scheduling Agent with Humans in the Loop. InProceedings of the 2017 CHI Conference on Human Factors in Computing Systems(Denver, Colorado, USA)(CHI ’17). Association for Computing ...
- [10]
-
[11]
Leah Findlater and Joanna McGrenere. 2004. A comparison of static, adaptive, and adaptable menus. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems(Vienna, Austria)(CHI ’04). Association for Computing Machinery, New York, NY, USA, 89–96. doi:10.1145/985692.985704
-
[12]
Tan, and Daniel S
Krzysztof Z Gajos, Mary Czerwinski, Desney S. Tan, and Daniel S. Weld. 2006. Exploring the design space for adaptive graphical user interfaces. InInternational Working Conference on Advanced Visual Interfaces. https://api.semanticscholar. org/CorpusID:207158977
2006
-
[13]
Gajos, Katherine Everitt, Desney S
Krzysztof Z. Gajos, Katherine Everitt, Desney S. Tan, Mary Czerwinski, and Daniel S. Weld. 2008. Predictability and accuracy in adaptive user interfaces. InCHI ’08: Proceeding of the twenty-sixth annual SIGCHI conference on Human factors in computing systems. ACM, New York, NY, USA, 1271–1274. doi:10.1145/ 1357054.1357252
-
[14]
Krzysztof Z Gajos and Daniel S. Weld. 2004. SUPPLE: automatically generating user interfaces. InInternational Conference on Intelligent User Interfaces. https: //api.semanticscholar.org/CorpusID:2533528
2004
-
[15]
Krzysztof Z. Gajos, Jacob O. Wobbrock, and Daniel S. Weld. 2008. Improving the performance of motor-impaired users with automatically-generated, ability- based interfaces. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems(Florence, Italy)(CHI ’08). Association for Computing Machinery, New York, NY, USA, 1257–1266. doi:10.1145/13...
-
[16]
Camille Gobert, Kashyap Todi, Gilles Bailly, and Antti Oulasvirta. 2019. SAM: a modular framework for self-adapting web menus. InProceedings of the 24th International Conference on Intelligent User Interfaces(Marina del Ray, California) (IUI ’19). Association for Computing Machinery, New York, NY, USA, 481–484. doi:10.1145/3301275.3302314
-
[17]
Google. 2026. Gemini in Chrome — AI assistance, right in your browser. https://gemini.google/overview/gemini-in-chrome/
2026
-
[18]
Tovi Grossman and George Fitzmaurice. 2010. ToolClips: an investigation of contextual video assistance for functionality understanding. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems(Atlanta, Georgia, USA)(CHI ’10). Association for Computing Machinery, New York, NY, USA, 1515–1524. doi:10.1145/1753326.1753552
-
[19]
Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems(Pittsburgh, Pennsylvania, USA)(CHI ’99). Association for Computing Machinery, New York, NY, USA, 159–166. doi:10.1145/302979.303030
-
[20]
Kexin Huang, Payal Chandak, Qianwen Wang, Shreyas Havaldar, Akhil Vaid, Jure Leskovec, Girish N. Nadkarni, Benjamin S. Glicksberg, Nils Gehlenborg, and Marinka Zitnik. 2024. A Foundation Model for Clinician-Centered Drug Repurposing.Nature Medicine30, 12 (2024), 3601–3613. doi:10.1038/s41591-024- 03233-x
-
[21]
Zeyuan Huang, Cangjun Gao, Yaxian Shan, Haoxiang Hu, Qingkun Li, Xiaoming Deng, Cuixia Ma, Yu-Kun Lai, Yong-Jin Liu, Feng Tian, Guozhong Dai, and Hongan Wang. 2025. SketchGPT: A Sketch-based Multimodal Interface for Application-Agnostic LLM Interaction. InProceedings of the 38th Annual ACM Symposium on User Interface Software and Technology (UIST ’25). As...
-
[22]
Xu, Tianyue Ou, Shuyan Zhou, Jeffrey P
Faria Huq, Zora Zhiruo Wang, Frank F. Xu, Tianyue Ou, Shuyan Zhou, Jeffrey P. Bigham, and Graham Neubig. 2025. CowPilot: A Framework for Autonomous and Human-Agent Collaborative Web Navigation. InProceedings of the 2025 Con- ference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (System...
-
[23]
Anthony Jameson. 2008. Adaptive Interfaces and Agents. InThe Human- Computer Interaction Handbook: Fundamentals, Evolving Technologies and Emerg- ing Applications(2nd edition ed.), Andrew Sears and Julie A. Jacko (Eds.). CRC Press, Boca Raton, FL, 433–458
2008
-
[24]
Anjali Khurana, Xiaotian Su, April Yi Wang, and Parmit K Chilana. 2025. Do It For Me vs. Do It With Me: Investigating User Perceptions of Different Paradigms of Automation in Copilots for Feature-Rich Software. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25). Association for Computing Machinery, New York, NY, USA, ...
-
[25]
Anjali Khurana, Hariharan Subramonyam, and Parmit K Chilana. 2024. Why and When LLM-Based Assistants Can Go Wrong: Investigating the Effectiveness of Prompt-Based Interactions for Software Help-Seeking. InProceedings of the 29th International Conference on Intelligent User Interfaces(Greenville, SC, USA) (IUI ’24). Association for Computing Machinery, New...
-
[26]
Kimia Kiani, George Cui, Andrea Bunt, Joanna McGrenere, and Parmit K. Chilana
-
[27]
Beyond "One-Size-Fits-All": Understanding the Diversity in How Software Newcomers Discover and Make Use of Help Resources. InProceedings of the 2019 CHI Conference on Human Factors in Computing Systems(Glasgow, Scotland Uk)(CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–14. doi:10.1145/3290605.3300570
-
[28]
UW Interactive Data Lab. 2024. Vega Datasets. https://vega.github.io/vega- datasets/ A collection of datasets used in Vega, Vega-Lite, and related projects
2024
-
[29]
Benjamin Lafreniere, Tovi Grossman, and George Fitzmaurice. 2013. Commu- nity enhanced tutorials: improving tutorials with multiple demonstrations. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Paris, France)(CHI ’13). Association for Computing Machinery, New York, NY, USA, 1779–1788. doi:10.1145/2470654.2466235
-
[30]
Lam, Omar Shaikh, Hallie Xu, Alice Guo, Diyi Yang, Jeffrey Heer, James A
Michelle S. Lam, Omar Shaikh, Hallie Xu, Alice Guo, Diyi Yang, Jeffrey Heer, James A. Landay, and Michael S. Bernstein. 2025. Just-In-Time Objectives: A General Approach for Specialized AI Interactions. arXiv:2510.14591 [cs.HC] https://arxiv.org/abs/2510.14591
-
[31]
Pattie Maes and Robyn Kozierok. 1993. Learning interface agents. InAAAI, Vol. 93. 459–465
1993
-
[32]
Justin Matejka, Tovi Grossman, and George Fitzmaurice. 2011. Ambient help. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Vancouver, BC, Canada)(CHI ’11). Association for Computing Machinery, New York, NY, USA, 2751–2760. doi:10.1145/1978942.1979349
-
[33]
Gui agents: A survey.arXiv preprint arXiv:2412.13501, 2024
Dang Nguyen, Jian Chen, Yu Wang, Gang Wu, Namyong Park, Zhengmian Hu, Hanjia Lyu, Junda Wu, Ryan Aponte, Yu Xia, Xintong Li, Jing Shi, Hongjie Chen, Viet Dac Lai, Zhouhang Xie, Sungchul Kim, Ruiyi Zhang, Tong Yu, Mehrab Tanjim, Nesreen K. Ahmed, Puneet Mathur, Seunghyun Yoon, Lina Yao, Branislav Kveton, Jihyung Kil, Thien Huu Nguyen, Trung Bui, Tianyi Zho...
-
[34]
OpenAI. 2026. ChatGPT Atlas. https://chatgpt.com/atlas/. Accessed: 2026-02-24. Hao et al
2026
-
[35]
Yi-Hao Peng, Dingzeyu Li, Jeffrey P. Bigham, and Amy Pavel. 2025. Morae: Proactively Pausing UI Agents for User Choices. arXiv:2508.21456 [cs.HC] https://arxiv.org/abs/2508.21456
-
[36]
Donghao Ren, Fred Hohman, Halden Lin, and Dominik Moritz. 2025. Embedding Atlas: Low-Friction, Interactive Embedding Visualization. arXiv:2505.06386 [cs.HC] doi:10.48550/arXiv.2505.06386
- [37]
-
[38]
Donghoon Shin, Daniel Lee, Gary Hsieh, and Gromit Yeuk-Yin Chan. 2025. PosterMate: Audience-driven Collaborative Persona Agents for Poster Design. InProceedings of the 38th Annual ACM Symposium on User Interface Software and Technology (UIST ’25). Association for Computing Machinery, New York, NY, USA, Article 201, 20 pages. doi:10.1145/3746059.3747769
-
[39]
Daniel Smilkov, Shan Carter, D. Sculley, Fernanda B. Viégas, and Martin Wattenberg. 2017. Direct-Manipulation Visualization of Deep Networks. arXiv:1708.03788 [cs.LG] https://arxiv.org/abs/1708.03788
-
[40]
Arjun Srinivasan, Vidya Setlur, and Arvind Satyanarayan. 2025. Pluto: Author- ing Semantically Aligned Text and Charts for Data-Driven Communication. In Proceedings of the 30th International Conference on Intelligent User Interfaces (IUI ’25). Association for Computing Machinery, New York, NY, USA, 1123–1140. doi:10.1145/3708359.3712122
-
[41]
Christina Stoiber, Conny Walchshofer, Margit Pohl, Benjamin Potzmann, Florian Grassinger, Holger Stitz, Marc Streit, and Wolfgang Aigner. 2022. Comparative evaluations of visualization onboarding methods.Visual Informatics6, 4 (2022), 34–50
2022
-
[42]
Hari Subramonyam, Roy Pea, Christopher Pondoc, Maneesh Agrawala, and Colleen Seifert. 2024. Bridging the Gulf of Envisioning: Cognitive Challenges in Prompt Based Interactions with LLMs. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems(Honolulu, HI, USA)(CHI ’24). Asso- ciation for Computing Machinery, New York, NY, USA, Arti...
-
[43]
Qiushi Sun, Zhoumianze Liu, Chang Ma, Zichen Ding, Fangzhi Xu, Zhangyue Yin, Haiteng Zhao, Zhenyu Wu, Kanzhi Cheng, Zhaoyang Liu, Jianing Wang, Qintong Li, Xiangru Tang, Tianbao Xie, Xiachong Feng, Xiang Li, Ben Kao, Wenhai Wang, Biqing Qi, Lingpeng Kong, and Zhiyong Wu. 2025. ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific W...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[44]
Tavily. 2026. Tavily Search API. https://docs.tavily.com/documentation/api- reference/endpoint/search. Accessed: 2026-03-28
2026
-
[45]
Nanobrowser Team. 2025. Nanobrowser: Open-Source Chrome Extension for AI-Powered Web Automation. https://github.com/nanobrowser/nanobrowser. Accessed: 2025
2025
-
[46]
Kashyap Todi, Jussi Jokinen, Kris Luyten, and Antti Oulasvirta. 2018. Famil- iarisation: Restructuring Layouts with Visual Learning Models. InProceedings of the 23rd International Conference on Intelligent User Interfaces(Tokyo, Japan) (IUI ’18). Association for Computing Machinery, New York, NY, USA, 547–558. doi:10.1145/3172944.3172949
-
[47]
Glassman, Jeevana Priya Inala, and Chenglong Wang
Priyan Vaithilingam, Elena L. Glassman, Jeevana Priya Inala, and Chenglong Wang. 2024. DynaVis: Dynamically Synthesized UI Widgets for Visualization Editing. arXiv:2401.10880 [cs.HC] https://arxiv.org/abs/2401.10880
-
[48]
Qianwen Wang, Zhen Li, Siwei Fu, Weiwei Cui, and Huamin Qu. 2018. Narvis: Authoring narrative slideshows for introducing data visualization designs.IEEE transactions on visualization and computer graphics25, 1 (2018), 779–788
2018
- [49]
-
[50]
Jacob O. Wobbrock, Shaun K. Kane, Krzysztof Z. Gajos, Susumu Harada, and Jon Froehlich. 2011. Ability-Based Design: Concept, Principles and Examples.ACM Trans. Access. Comput.3, 3, Article 9 (April 2011), 27 pages. doi:10.1145/1952383. 1952384
-
[51]
Kanit Wongsuphasawat, Zening Qu, Dominik Moritz, Riley Chang, Felix Ouk, Anushka Anand, Jock Mackinlay, Bill Howe, and Jeffrey Heer. 2017. Voyager 2: Augmenting Visual Analysis with Partial View Specifications. InProc. ACM Human Factors in Computing Systems (CHI). doi:10.1145/3025453.3025768
-
[52]
World Wide Web Consortium (W3C). 1998. Document Object Model (DOM) Level 1 Specification. W3C Recommendation. October 1, 1998
1998
-
[53]
Tianbao Xie, Danyang Zhang, Jixuan Chen, Xiaochuan Li, Siheng Zhao, Ruisheng Cao, Toh Jing Hua, Zhoujun Cheng, Dongchan Shin, Fangyu Lei, Yitao Liu, Yiheng Xu, Shuyan Zhou, Silvio Savarese, Caiming Xiong, Victor Zhong, and Tao Yu
-
[54]
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments. arXiv:2404.07972 [cs.AI] https://arxiv.org/abs/ 2404.07972
work page internal anchor Pith review arXiv
-
[55]
Zamfirescu-Pereira, Richmond Y
J.D. Zamfirescu-Pereira, Richmond Y. Wong, Bjoern Hartmann, and Qian Yang
-
[56]
Zhang, Jonathan Bragg, and Joseph Chee Chang
Why Johnny Can’t Prompt: How Non-AI Experts Try (and Fail) to Design LLM Prompts. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems(Hamburg, Germany)(CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 437, 21 pages. doi:10.1145/3544548. 3581388
-
[57]
Yuheng Zhao, Xueli Shu, Liwen Fan, Lin Gao, Yu Zhang, and Siming Chen
-
[58]
arXiv:2507.18165 [cs.HC] https://arxiv.org/abs/2507.18165
ProactiveVA: Proactive Visual Analytics with LLM-Based UI Agent. arXiv:2507.18165 [cs.HC] https://arxiv.org/abs/2507.18165
-
[59]
WebArena: A Realistic Web Environment for Building Autonomous Agents
Shuyan Zhou, Frank F. Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Tianyue Ou, Yonatan Bisk, Daniel Fried, Uri Alon, and Graham Neubig. 2024. WebArena: A Realistic Web Environment for Building Autonomous Agents. arXiv:2307.13854 [cs.AI] https://arxiv.org/abs/2307.13854 A Abridged Prompts We outline prompts in more detail for the in-s...
work page Pith review arXiv 2024
-
[60]
insert.inline_control
-
[61]
mutate.representation
-
[62]
Insert an inline'Search fields' input next to [text] Fields to locate [control] Production Budget quickly
recompose.layout, configuration: [execution configuration of the DOM manipulation type], targets: [{ uiDescription: exact element label Beyond Chat and Clicks: GUI Agents for In-Situ Assistance via Live Interface Transformation from UI element list }] } An example generated assistance pair: { assistance: "Insert an inline'Search fields' input next to [tex...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.