AOHP: An Open-Source OS-Level Agent Harness for Personalized, Efficient and Secure Interaction

Chao Huang; Guohong Liu; Hao Wen; Jiacheng Liu; Jialei Ye; Jichao Yan; Ju Ren; Shanhui Zhao; Shizuo Tian; Yao Guo

arxiv: 2606.23449 · v1 · pith:2QTRGZY4new · submitted 2026-06-22 · 💻 cs.AI · cs.OS

AOHP: An Open-Source OS-Level Agent Harness for Personalized, Efficient and Secure Interaction

Shanhui Zhao , Jiacheng Liu , Guohong Liu , Jichao Yan , Jialei Ye , Yuhao Yang , Hao Wen , Shizuo Tian

show 8 more authors

Yizhen Yuan Yuxuan Chen Yunxin Liu Ju Ren Ya-Qin Zhang Chao Huang Yao Guo Yuanchun Li

This is my paper

Pith reviewed 2026-06-26 08:39 UTC · model grok-4.3

classification 💻 cs.AI cs.OS

keywords AI agentsOS-level harnessAndroid Open Source Projectpersonalized service compositionefficient agent interfacessecure information flowtask completion ratesecurity policy compliance

0 comments

The pith

AOHP builds an Android harness that treats AI agents as first-class OS actors to raise task completion, cut token use, and tighten security.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces AOHP, an open-source harness layered on the Android Open Source Project, to close the gap between agent-driven workflows and conventional application-centric operating systems. It adds three agent-oriented mechanisms—personalized service composition, efficient agent interfaces, and secure information flow—while keeping the existing Android ecosystem intact. Preliminary tests on demanding tasks report a 21 percent higher completion rate, 51 percent lower token cost, and improved security-policy adherence. Readers would care because the work supplies a concrete, open testbed for exploring native agent support without discarding mature hardware and software stacks.

Core claim

AOHP is an OS-level agent harness on AOSP that treats agents as first-class actors, enabling adaptive user interfaces and agent-friendly runtime environments through personalized service composition, efficient agent interfaces, and secure information flow; on challenging tasks these mechanisms deliver measurable gains in task completion, execution cost, and security compliance over standard Android.

What carries the argument

The three agent-oriented system mechanisms (personalized service composition, efficient agent interfaces, and secure information flow) that let agents interact directly with the OS as first-class actors.

If this is right

Agents complete more tasks on the same device without extra hardware.
Token consumption drops, lowering both latency and monetary cost of agent runs.
Security policies are enforced more reliably during agent actions.
Developers can reuse existing Android apps and drivers while gaining agent support.
The open codebase supplies a shared platform for testing further agent-native primitives.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Future mobile OS versions could incorporate similar first-class agent support as a standard feature.
Comparable harnesses on other platforms might reveal whether the three mechanisms generalize beyond Android.
Longer-running agent sessions could be benchmarked to check whether the reported efficiency gains persist over time.

Load-bearing premise

The preliminary experiments on a set of challenging tasks are representative enough to establish the advantages of the three proposed mechanisms over conventional Android.

What would settle it

A broader suite of tasks on which AOHP fails to improve completion rate, token cost, or security compliance relative to unmodified Android.

read the original abstract

AI agents are driving a new software paradigm, with the ability to autonomously call tools, extract information, manage memory, and complete tasks that span applications and data sources. Most existing end-user operating systems, however, are designed for application-centric workflows and offer little native support for AI agents. This mismatch limits the wider adoption of agents and leads to execution overhead and safety risks when running agents on conventional systems. While the concept of agent-native operating systems is emerging, the research community lacks an open testbed to explore the architectural primitives desired for agent-mediated interaction. We present AOHP (Android Open Harness Project), an OS-level agent harness built on the Android Open Source Project (AOSP). The core design principle of AOHP is to treat agents as first-class OS actors, enabling adaptive user interfaces and agent-friendly runtime environments. AOHP preserves the mature Android software and hardware ecosystem while introducing three agent-oriented system mechanisms: personalized service composition, efficient agent interfaces, and secure information flow. Based on preliminary experiments on challenging tasks covering key capabilities of OS agents, AOHP shows clear advantages in task completion (+21.12% completion rate), execution cost (-51.55% token cost), and security-policy compliance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

AOHP delivers a working open harness on AOSP plus three mechanisms that produce measurable gains on agent tasks.

read the letter

AOHP is an open-source harness built on AOSP that treats agents as first-class OS actors. It adds three mechanisms—personalized service composition, efficient agent interfaces, and secure information flow—and reports concrete improvements over stock Android.

The new part is the delivered artifact itself. Prior work on agent-native OS ideas has stayed mostly at the proposal stage; this one ships code that preserves the existing Android stack while exposing the new primitives. The evaluation section links the three mechanisms directly to the outcomes: +21% task completion, -51% token cost, and better security-policy compliance on a set of challenging tasks with standard Android as baseline.

The experiments are described with task details and quantitative results, which is better than many agent papers that stop at qualitative claims. The open-source release is real evidence that others can build on it.

The main soft spot is scope. The tasks cover key capabilities but remain a selected set; it is not obvious how far the gains extend to other agent frameworks or to longer-running, multi-app workflows. The security metric focuses on policy compliance, which is useful but leaves some attack surfaces untested.

This paper is for people working on OS support for AI agents who need a concrete platform rather than another architecture sketch. It deserves a serious referee because the artifact and the tied measurements give reviewers something specific to check.

Referee Report

0 major / 2 minor

Summary. The paper presents AOHP, an open-source OS-level agent harness built on the Android Open Source Project (AOSP). It treats agents as first-class OS actors with three agent-oriented mechanisms: personalized service composition, efficient agent interfaces, and secure information flow. Preliminary experiments on challenging tasks show advantages in task completion rate (+21.12%), execution cost (-51.55% token cost), and security-policy compliance compared to conventional Android.

Significance. If the results hold, AOHP provides a valuable open testbed for the research community to explore architectural primitives for agent-mediated interaction in operating systems. It preserves the Android ecosystem while addressing the mismatch with AI agent workflows, potentially reducing overhead and safety risks. The open-source nature, explicit task descriptions, and direct tying of quantitative deltas to the three proposed mechanisms are strengths.

minor comments (2)

[Abstract] Abstract: the performance deltas are stated without any mention of experimental design, number of tasks/trials, or baselines; a one-sentence summary of the evaluation setup would improve standalone readability.
[§4] §4 (Evaluation): confirm that the reported +21.12% and -51.55% figures are accompanied by per-task breakdowns, variance measures, and explicit comparison to the standard Android baseline so readers can assess representativeness.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the supportive summary, recognition of AOHP's potential value as an open testbed, and recommendation of minor revision. The referee's description of the work is accurate.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents AOHP as an OS-level harness introducing three agent-oriented mechanisms (personalized service composition, efficient agent interfaces, secure information flow) and reports empirical advantages from preliminary experiments on tasks. No mathematical derivations, first-principles predictions, fitted parameters renamed as outputs, or self-citation chains appear. The central claims rest on direct experimental comparisons to standard Android baselines, which are externally falsifiable and not reduced to the paper's own inputs by construction. This is a standard systems/empirical contribution with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no free parameters, axioms, or invented entities are described.

pith-pipeline@v0.9.1-grok · 5804 in / 991 out tokens · 20902 ms · 2026-06-26T08:39:41.714481+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

29 extracted references · 9 canonical work pages · 6 internal anchors

[1]

Agent s: An open agentic framework that uses computers like a human

Saaket Agashe, Jiuzhou Han, Shuyu Gan, Jiachen Yang, Ang Li, and Xin Wang. Agent s: An open agentic framework that uses computers like a human. InInternational Conference on Learning Representations, volume 2025, pages 22924–22946, 2025

2025
[2]

Claude Code

Anthropic. Claude Code. https://docs.anthropic.com/en/docs/claude-code/ overview, 2026. Accessed: 2026-06-11

2026
[3]

Screenai: A vision-language model for ui and infographics understanding.arXiv preprint arXiv:2402.04615, 2024

Gilles Baechler, Srinivas Sunkara, Maria Wang, Fedir Zubach, Hassan Mansoor, Vincent Etter, Victor Cărbune, Jason Lin, Jindong Chen, and Abhanshu Sharma. Screenai: A vision-language model for ui and infographics understanding.arXiv preprint arXiv:2402.04615, 2024

work page arXiv 2024
[4]

Seeclick: Harnessing gui grounding for advanced visual gui agents

Kanzhi Cheng, Qiushi Sun, Yougang Chu, Fangzhi Xu, Li YanTao, Jianbing Zhang, and Zhiyong Wu. Seeclick: Harnessing gui grounding for advanced visual gui agents. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 9313–9332, 2024

2024
[5]

Securing AI Agents with Information-Flow Control

Manuel Costa, Boris Köpf, Aashish Kolluri, Andrew Paverd, Mark Russinovich, Ahmed Salem, Shruti Tople, Lukas Wutschitz, and Santiago Zanella-Béguelin. Securing ai agents with information-flow control.arXiv preprint arXiv:2505.23643, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[6]

Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents.Advances in Neural Information Processing Systems, 37:82895–82920, 2024

Edoardo Debenedetti, Jie Zhang, Mislav Balunovic, Luca Beurer-Kellner, Marc Fischer, and Florian Tramèr. Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents.Advances in Neural Information Processing Systems, 37:82895–82920, 2024. 12

2024
[7]

Taintdroid: an information-flow tracking system for realtime privacy monitoring on smartphones.ACM Transactions on Computer Systems (TOCS), 32(2):1–29, 2014

William Enck, Peter Gilbert, Seungyeop Han, Vasant Tendulkar, Byung-Gon Chun, Landon P Cox, Jaeyeon Jung, Patrick McDaniel, and Anmol N Sheth. Taintdroid: an information-flow tracking system for realtime privacy monitoring on smartphones.ACM Transactions on Computer Systems (TOCS), 32(2):1–29, 2014

2014
[8]

Android open source project.https://source.android.com/, 2026

Google. Android open source project.https://source.android.com/, 2026

2026
[9]

Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection

Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection. InProceedings of the 16th ACM workshop on artificial intelligence and security, pages 79–90, 2023

2023
[10]

arXiv preprint arXiv:2512.19432 , year=

Quyu Kong, Xu Zhang, Zhenyu Yang, Nolan Gao, Chen Liu, Panrong Tong, Chenglin Cai, Hanzhang Zhou, Jianan Zhang, Liangyu Chen, et al. Mobileworld: Benchmarking autonomous mobile agents in agent-user interactive and mcp-augmented environments.arXiv preprint arXiv:2512.19432, 2025

work page arXiv 2025
[11]

Mapping natural language instructionstomobileuiactionsequences

Yang Li, Jiacong He, Xin Zhou, Yuan Zhang, and Jason Baldridge. Mapping natural language instructionstomobileuiactionsequences. InProceedingsofthe58thannualmeetingoftheassociation for computational linguistics, pages 8198–8210, 2020

2020
[12]

Droidbot: a lightweight ui-guided test input generator for android

Yuanchun Li, Ziyue Yang, Yao Guo, and Xiangqun Chen. Droidbot: a lightweight ui-guided test input generator for android. In2017 IEEE/ACM 39th international conference on software engineering companion (ICSE-C), pages 23–26. IEEE, 2017

2017
[13]

OpenClaw.https://docs.openclaw.ai/, 2026

OpenClaw Contributors. OpenClaw.https://docs.openclaw.ai/, 2026. Accessed: 2026-06- 11

2026
[14]

UI-TARS: Pioneering Automated GUI Interaction with Native Agents

Yujia Qin, Yining Ye, Junjie Fang, Haoming Wang, Shihao Liang, Shizuo Tian, Junda Zhang, Jiahao Li, Yunxin Li, Shijue Huang, et al. Ui-tars: Pioneering automated gui interaction with native agents. arXiv preprint arXiv:2501.12326, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[15]

Androidworld: A dynamic benchmarking environment for autonomous agents

ChrisRawles, SarahClinckemaillie, YifanChang, JonathanWaltz, GabrielleLau, MarybethFair, Alice Li, William Bishop, Wei Li, Folawiyo Campbell-Ajala, et al. Androidworld: A dynamic benchmarking environment for autonomous agents. InInternational Conference on Learning Representations, volume 2025, pages 406–441, 2025

2025
[16]

Mobile-agent-v2: Mobile device operation assistant with effective navigation via multi-agent collaboration.Advances in Neural Information Processing Systems, 37:2686–2710, 2024

Junyang Wang, Haiyang Xu, Haitao Jia, Xi Zhang, Ming Yan, Weizhou Shen, Ji Zhang, Fei Huang, and Jitao Sang. Mobile-agent-v2: Mobile device operation assistant with effective navigation via multi-agent collaboration.Advances in Neural Information Processing Systems, 37:2686–2710, 2024

2024
[17]

Autodroid: Llm-powered task automation in android

Hao Wen, Yuanchun Li, Guohong Liu, Shanhui Zhao, Tao Yu, Toby Jia-Jun Li, Shiqi Jiang, Yunhao Liu, Yaqin Zhang, and Yunxin Liu. Autodroid: Llm-powered task automation in android. In Proceedings of the 30th Annual International Conference on Mobile Computing and Networking, pages 543–557, 2024

2024
[18]

Autodroid-v2: Boosting slm-based gui agents via code generation

HaoWen, ShizuoTian, BorislavPavlov, WenjieDu, YixuanLi, GeChang, ShanhuiZhao, JiachengLiu, Yunxin Liu, Ya-Qin Zhang, et al. Autodroid-v2: Boosting slm-based gui agents via code generation. InProceedings of the 23rd Annual International Conference on Mobile Systems, Applications and Services, pages 223–235, 2025. 13

2025
[19]

System-Level Defense against Indirect Prompt Injection Attacks: An Information Flow Control Perspective

Fangzhou Wu, Ethan Cecchetti, and Chaowei Xiao. System-level defense against indirect prompt injection attacks: An information flow control perspective.arXiv preprint arXiv:2409.19091, 2024

work page arXiv 2024
[20]

Os-copilot: Towards generalist computer agents with self-improvement, 2024

Zhiyong Wu, Chengcheng Han, Zichen Ding, Zhenmin Weng, Zhoumianze Liu, Shunyu Yao, Tao Yu, and Lingpeng Kong. Os-copilot: Towards generalist computer agents with self-improvement, 2024

2024
[21]

Os-atlas: Foundation action model for generalist gui agents

Zhiyong Wu, Zhenyu Wu, Fangzhi Xu, Yian Wang, Qiushi Sun, Chengyou Jia, Kanzhi Cheng, Zichen Ding, Liheng Chen, Paul Pu Liang, et al. Os-atlas: Foundation action model for generalist gui agents. InInternational Conference on Learning Representations, volume 2025, pages 5090–5108, 2025

2025
[22]

Osworld: Benchmarking multimodal agents for open-ended tasks in real computer environments.Advances in Neural Information Processing Systems, 37:52040–52094, 2024

Tianbao Xie, Danyang Zhang, Jixuan Chen, Xiaochuan Li, Siheng Zhao, Ruisheng Cao, Toh J Hua, Zhoujun Cheng, Dongchan Shin, Fangyu Lei, et al. Osworld: Benchmarking multimodal agents for open-ended tasks in real computer environments.Advances in Neural Information Processing Systems, 37:52040–52094, 2024

2024
[23]

Androidlab: Training and systematic benchmarking of android autonomous agents

Yifan Xu, Xiao Liu, Xueqiao Sun, Siyi Cheng, Hao Yu, Hanyu Lai, Shudan Zhang, Dan Zhang, Jie Tang, and Yuxiao Dong. Androidlab: Training and systematic benchmarking of android autonomous agents. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2144–2166, 2025

2025
[24]

Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction

Yiheng Xu, Zekun Wang, Junli Wang, Dunjie Lu, Tianbao Xie, Amrita Saha, Doyen Sahoo, Tao Yu, and Caiming Xiong. Aguvis: Unified pure vision agents for autonomous gui interaction.arXiv preprint arXiv:2412.04454, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[25]

ReAct: Synergizing Reasoning and Acting in Language Models

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models.arXiv preprint arXiv:2210.03629, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[26]

Mobile-Agent-v3: Fundamental Agents for GUI Automation

Jiabo Ye, Xi Zhang, Haiyang Xu, Haowei Liu, Junyang Wang, Zhaoqing Zhu, Ziwei Zheng, Feiyu Gao, Junjie Cao, Zhengxi Lu, et al. Mobile-agent-v3: Fundamental agents for gui automation.arXiv preprint arXiv:2508.15144, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[27]

Injecagent: Benchmarking indirect prompt injections in tool-integrated large language model agents

Qiusi Zhan, Zhixiang Liang, Zifan Ying, and Daniel Kang. Injecagent: Benchmarking indirect prompt injections in tool-integrated large language model agents. InFindings of the Association for Computational Linguistics: ACL 2024, pages 10471–10506, 2024

2024
[28]

Appagent: Multimodal agents as smartphone users

Chi Zhang, Zhao Yang, Jiaxuan Liu, Yanda Li, Yucheng Han, Xin Chen, Zebiao Huang, Bin Fu, and Gang Yu. Appagent: Multimodal agents as smartphone users. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems, pages 1–20, 2025

2025
[29]

GPT-4V(ision) is a Generalist Web Agent, if Grounded

Boyuan Zheng, Boyu Gou, Jihyung Kil, Huan Sun, and Yu Su. Gpt-4v (ision) is a generalist web agent, if grounded.arXiv preprint arXiv:2401.01614, 2024. 14 A. Benchmark Tasks Our benchmark comprises 30 real-world mobile tasks grouped into five core capability categories plus a hybrid category that composes them, with five tasks each. Table 3 lists all tasks...

work page internal anchor Pith review Pith/arXiv arXiv 2024

[1] [1]

Agent s: An open agentic framework that uses computers like a human

Saaket Agashe, Jiuzhou Han, Shuyu Gan, Jiachen Yang, Ang Li, and Xin Wang. Agent s: An open agentic framework that uses computers like a human. InInternational Conference on Learning Representations, volume 2025, pages 22924–22946, 2025

2025

[2] [2]

Claude Code

Anthropic. Claude Code. https://docs.anthropic.com/en/docs/claude-code/ overview, 2026. Accessed: 2026-06-11

2026

[3] [3]

Screenai: A vision-language model for ui and infographics understanding.arXiv preprint arXiv:2402.04615, 2024

Gilles Baechler, Srinivas Sunkara, Maria Wang, Fedir Zubach, Hassan Mansoor, Vincent Etter, Victor Cărbune, Jason Lin, Jindong Chen, and Abhanshu Sharma. Screenai: A vision-language model for ui and infographics understanding.arXiv preprint arXiv:2402.04615, 2024

work page arXiv 2024

[4] [4]

Seeclick: Harnessing gui grounding for advanced visual gui agents

Kanzhi Cheng, Qiushi Sun, Yougang Chu, Fangzhi Xu, Li YanTao, Jianbing Zhang, and Zhiyong Wu. Seeclick: Harnessing gui grounding for advanced visual gui agents. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 9313–9332, 2024

2024

[5] [5]

Securing AI Agents with Information-Flow Control

Manuel Costa, Boris Köpf, Aashish Kolluri, Andrew Paverd, Mark Russinovich, Ahmed Salem, Shruti Tople, Lukas Wutschitz, and Santiago Zanella-Béguelin. Securing ai agents with information-flow control.arXiv preprint arXiv:2505.23643, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[6] [6]

Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents.Advances in Neural Information Processing Systems, 37:82895–82920, 2024

Edoardo Debenedetti, Jie Zhang, Mislav Balunovic, Luca Beurer-Kellner, Marc Fischer, and Florian Tramèr. Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents.Advances in Neural Information Processing Systems, 37:82895–82920, 2024. 12

2024

[7] [7]

Taintdroid: an information-flow tracking system for realtime privacy monitoring on smartphones.ACM Transactions on Computer Systems (TOCS), 32(2):1–29, 2014

William Enck, Peter Gilbert, Seungyeop Han, Vasant Tendulkar, Byung-Gon Chun, Landon P Cox, Jaeyeon Jung, Patrick McDaniel, and Anmol N Sheth. Taintdroid: an information-flow tracking system for realtime privacy monitoring on smartphones.ACM Transactions on Computer Systems (TOCS), 32(2):1–29, 2014

2014

[8] [8]

Android open source project.https://source.android.com/, 2026

Google. Android open source project.https://source.android.com/, 2026

2026

[9] [9]

Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection

Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection. InProceedings of the 16th ACM workshop on artificial intelligence and security, pages 79–90, 2023

2023

[10] [10]

arXiv preprint arXiv:2512.19432 , year=

Quyu Kong, Xu Zhang, Zhenyu Yang, Nolan Gao, Chen Liu, Panrong Tong, Chenglin Cai, Hanzhang Zhou, Jianan Zhang, Liangyu Chen, et al. Mobileworld: Benchmarking autonomous mobile agents in agent-user interactive and mcp-augmented environments.arXiv preprint arXiv:2512.19432, 2025

work page arXiv 2025

[11] [11]

Mapping natural language instructionstomobileuiactionsequences

Yang Li, Jiacong He, Xin Zhou, Yuan Zhang, and Jason Baldridge. Mapping natural language instructionstomobileuiactionsequences. InProceedingsofthe58thannualmeetingoftheassociation for computational linguistics, pages 8198–8210, 2020

2020

[12] [12]

Droidbot: a lightweight ui-guided test input generator for android

Yuanchun Li, Ziyue Yang, Yao Guo, and Xiangqun Chen. Droidbot: a lightweight ui-guided test input generator for android. In2017 IEEE/ACM 39th international conference on software engineering companion (ICSE-C), pages 23–26. IEEE, 2017

2017

[13] [13]

OpenClaw.https://docs.openclaw.ai/, 2026

OpenClaw Contributors. OpenClaw.https://docs.openclaw.ai/, 2026. Accessed: 2026-06- 11

2026

[14] [14]

UI-TARS: Pioneering Automated GUI Interaction with Native Agents

Yujia Qin, Yining Ye, Junjie Fang, Haoming Wang, Shihao Liang, Shizuo Tian, Junda Zhang, Jiahao Li, Yunxin Li, Shijue Huang, et al. Ui-tars: Pioneering automated gui interaction with native agents. arXiv preprint arXiv:2501.12326, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[15] [15]

Androidworld: A dynamic benchmarking environment for autonomous agents

ChrisRawles, SarahClinckemaillie, YifanChang, JonathanWaltz, GabrielleLau, MarybethFair, Alice Li, William Bishop, Wei Li, Folawiyo Campbell-Ajala, et al. Androidworld: A dynamic benchmarking environment for autonomous agents. InInternational Conference on Learning Representations, volume 2025, pages 406–441, 2025

2025

[16] [16]

Mobile-agent-v2: Mobile device operation assistant with effective navigation via multi-agent collaboration.Advances in Neural Information Processing Systems, 37:2686–2710, 2024

Junyang Wang, Haiyang Xu, Haitao Jia, Xi Zhang, Ming Yan, Weizhou Shen, Ji Zhang, Fei Huang, and Jitao Sang. Mobile-agent-v2: Mobile device operation assistant with effective navigation via multi-agent collaboration.Advances in Neural Information Processing Systems, 37:2686–2710, 2024

2024

[17] [17]

Autodroid: Llm-powered task automation in android

Hao Wen, Yuanchun Li, Guohong Liu, Shanhui Zhao, Tao Yu, Toby Jia-Jun Li, Shiqi Jiang, Yunhao Liu, Yaqin Zhang, and Yunxin Liu. Autodroid: Llm-powered task automation in android. In Proceedings of the 30th Annual International Conference on Mobile Computing and Networking, pages 543–557, 2024

2024

[18] [18]

Autodroid-v2: Boosting slm-based gui agents via code generation

HaoWen, ShizuoTian, BorislavPavlov, WenjieDu, YixuanLi, GeChang, ShanhuiZhao, JiachengLiu, Yunxin Liu, Ya-Qin Zhang, et al. Autodroid-v2: Boosting slm-based gui agents via code generation. InProceedings of the 23rd Annual International Conference on Mobile Systems, Applications and Services, pages 223–235, 2025. 13

2025

[19] [19]

System-Level Defense against Indirect Prompt Injection Attacks: An Information Flow Control Perspective

Fangzhou Wu, Ethan Cecchetti, and Chaowei Xiao. System-level defense against indirect prompt injection attacks: An information flow control perspective.arXiv preprint arXiv:2409.19091, 2024

work page arXiv 2024

[20] [20]

Os-copilot: Towards generalist computer agents with self-improvement, 2024

Zhiyong Wu, Chengcheng Han, Zichen Ding, Zhenmin Weng, Zhoumianze Liu, Shunyu Yao, Tao Yu, and Lingpeng Kong. Os-copilot: Towards generalist computer agents with self-improvement, 2024

2024

[21] [21]

Os-atlas: Foundation action model for generalist gui agents

Zhiyong Wu, Zhenyu Wu, Fangzhi Xu, Yian Wang, Qiushi Sun, Chengyou Jia, Kanzhi Cheng, Zichen Ding, Liheng Chen, Paul Pu Liang, et al. Os-atlas: Foundation action model for generalist gui agents. InInternational Conference on Learning Representations, volume 2025, pages 5090–5108, 2025

2025

[22] [22]

Osworld: Benchmarking multimodal agents for open-ended tasks in real computer environments.Advances in Neural Information Processing Systems, 37:52040–52094, 2024

Tianbao Xie, Danyang Zhang, Jixuan Chen, Xiaochuan Li, Siheng Zhao, Ruisheng Cao, Toh J Hua, Zhoujun Cheng, Dongchan Shin, Fangyu Lei, et al. Osworld: Benchmarking multimodal agents for open-ended tasks in real computer environments.Advances in Neural Information Processing Systems, 37:52040–52094, 2024

2024

[23] [23]

Androidlab: Training and systematic benchmarking of android autonomous agents

Yifan Xu, Xiao Liu, Xueqiao Sun, Siyi Cheng, Hao Yu, Hanyu Lai, Shudan Zhang, Dan Zhang, Jie Tang, and Yuxiao Dong. Androidlab: Training and systematic benchmarking of android autonomous agents. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2144–2166, 2025

2025

[24] [24]

Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction

Yiheng Xu, Zekun Wang, Junli Wang, Dunjie Lu, Tianbao Xie, Amrita Saha, Doyen Sahoo, Tao Yu, and Caiming Xiong. Aguvis: Unified pure vision agents for autonomous gui interaction.arXiv preprint arXiv:2412.04454, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[25] [25]

ReAct: Synergizing Reasoning and Acting in Language Models

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models.arXiv preprint arXiv:2210.03629, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[26] [26]

Mobile-Agent-v3: Fundamental Agents for GUI Automation

Jiabo Ye, Xi Zhang, Haiyang Xu, Haowei Liu, Junyang Wang, Zhaoqing Zhu, Ziwei Zheng, Feiyu Gao, Junjie Cao, Zhengxi Lu, et al. Mobile-agent-v3: Fundamental agents for gui automation.arXiv preprint arXiv:2508.15144, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[27] [27]

Injecagent: Benchmarking indirect prompt injections in tool-integrated large language model agents

Qiusi Zhan, Zhixiang Liang, Zifan Ying, and Daniel Kang. Injecagent: Benchmarking indirect prompt injections in tool-integrated large language model agents. InFindings of the Association for Computational Linguistics: ACL 2024, pages 10471–10506, 2024

2024

[28] [28]

Appagent: Multimodal agents as smartphone users

Chi Zhang, Zhao Yang, Jiaxuan Liu, Yanda Li, Yucheng Han, Xin Chen, Zebiao Huang, Bin Fu, and Gang Yu. Appagent: Multimodal agents as smartphone users. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems, pages 1–20, 2025

2025

[29] [29]

GPT-4V(ision) is a Generalist Web Agent, if Grounded

Boyuan Zheng, Boyu Gou, Jihyung Kil, Huan Sun, and Yu Su. Gpt-4v (ision) is a generalist web agent, if grounded.arXiv preprint arXiv:2401.01614, 2024. 14 A. Benchmark Tasks Our benchmark comprises 30 real-world mobile tasks grouped into five core capability categories plus a hybrid category that composes them, with five tasks each. Table 3 lists all tasks...

work page internal anchor Pith review Pith/arXiv arXiv 2024