pith. sign in

arxiv: 2606.23449 · v1 · pith:2QTRGZY4new · submitted 2026-06-22 · 💻 cs.AI · cs.OS

AOHP: An Open-Source OS-Level Agent Harness for Personalized, Efficient and Secure Interaction

Pith reviewed 2026-06-26 08:39 UTC · model grok-4.3

classification 💻 cs.AI cs.OS
keywords AI agentsOS-level harnessAndroid Open Source Projectpersonalized service compositionefficient agent interfacessecure information flowtask completion ratesecurity policy compliance
0
0 comments X

The pith

AOHP builds an Android harness that treats AI agents as first-class OS actors to raise task completion, cut token use, and tighten security.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces AOHP, an open-source harness layered on the Android Open Source Project, to close the gap between agent-driven workflows and conventional application-centric operating systems. It adds three agent-oriented mechanisms—personalized service composition, efficient agent interfaces, and secure information flow—while keeping the existing Android ecosystem intact. Preliminary tests on demanding tasks report a 21 percent higher completion rate, 51 percent lower token cost, and improved security-policy adherence. Readers would care because the work supplies a concrete, open testbed for exploring native agent support without discarding mature hardware and software stacks.

Core claim

AOHP is an OS-level agent harness on AOSP that treats agents as first-class actors, enabling adaptive user interfaces and agent-friendly runtime environments through personalized service composition, efficient agent interfaces, and secure information flow; on challenging tasks these mechanisms deliver measurable gains in task completion, execution cost, and security compliance over standard Android.

What carries the argument

The three agent-oriented system mechanisms (personalized service composition, efficient agent interfaces, and secure information flow) that let agents interact directly with the OS as first-class actors.

If this is right

  • Agents complete more tasks on the same device without extra hardware.
  • Token consumption drops, lowering both latency and monetary cost of agent runs.
  • Security policies are enforced more reliably during agent actions.
  • Developers can reuse existing Android apps and drivers while gaining agent support.
  • The open codebase supplies a shared platform for testing further agent-native primitives.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Future mobile OS versions could incorporate similar first-class agent support as a standard feature.
  • Comparable harnesses on other platforms might reveal whether the three mechanisms generalize beyond Android.
  • Longer-running agent sessions could be benchmarked to check whether the reported efficiency gains persist over time.

Load-bearing premise

The preliminary experiments on a set of challenging tasks are representative enough to establish the advantages of the three proposed mechanisms over conventional Android.

What would settle it

A broader suite of tasks on which AOHP fails to improve completion rate, token cost, or security compliance relative to unmodified Android.

read the original abstract

AI agents are driving a new software paradigm, with the ability to autonomously call tools, extract information, manage memory, and complete tasks that span applications and data sources. Most existing end-user operating systems, however, are designed for application-centric workflows and offer little native support for AI agents. This mismatch limits the wider adoption of agents and leads to execution overhead and safety risks when running agents on conventional systems. While the concept of agent-native operating systems is emerging, the research community lacks an open testbed to explore the architectural primitives desired for agent-mediated interaction. We present AOHP (Android Open Harness Project), an OS-level agent harness built on the Android Open Source Project (AOSP). The core design principle of AOHP is to treat agents as first-class OS actors, enabling adaptive user interfaces and agent-friendly runtime environments. AOHP preserves the mature Android software and hardware ecosystem while introducing three agent-oriented system mechanisms: personalized service composition, efficient agent interfaces, and secure information flow. Based on preliminary experiments on challenging tasks covering key capabilities of OS agents, AOHP shows clear advantages in task completion (+21.12% completion rate), execution cost (-51.55% token cost), and security-policy compliance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper presents AOHP, an open-source OS-level agent harness built on the Android Open Source Project (AOSP). It treats agents as first-class OS actors with three agent-oriented mechanisms: personalized service composition, efficient agent interfaces, and secure information flow. Preliminary experiments on challenging tasks show advantages in task completion rate (+21.12%), execution cost (-51.55% token cost), and security-policy compliance compared to conventional Android.

Significance. If the results hold, AOHP provides a valuable open testbed for the research community to explore architectural primitives for agent-mediated interaction in operating systems. It preserves the Android ecosystem while addressing the mismatch with AI agent workflows, potentially reducing overhead and safety risks. The open-source nature, explicit task descriptions, and direct tying of quantitative deltas to the three proposed mechanisms are strengths.

minor comments (2)
  1. [Abstract] Abstract: the performance deltas are stated without any mention of experimental design, number of tasks/trials, or baselines; a one-sentence summary of the evaluation setup would improve standalone readability.
  2. [§4] §4 (Evaluation): confirm that the reported +21.12% and -51.55% figures are accompanied by per-task breakdowns, variance measures, and explicit comparison to the standard Android baseline so readers can assess representativeness.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the supportive summary, recognition of AOHP's potential value as an open testbed, and recommendation of minor revision. The referee's description of the work is accurate.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents AOHP as an OS-level harness introducing three agent-oriented mechanisms (personalized service composition, efficient agent interfaces, secure information flow) and reports empirical advantages from preliminary experiments on tasks. No mathematical derivations, first-principles predictions, fitted parameters renamed as outputs, or self-citation chains appear. The central claims rest on direct experimental comparisons to standard Android baselines, which are externally falsifiable and not reduced to the paper's own inputs by construction. This is a standard systems/empirical contribution with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no free parameters, axioms, or invented entities are described.

pith-pipeline@v0.9.1-grok · 5804 in / 991 out tokens · 20902 ms · 2026-06-26T08:39:41.714481+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

29 extracted references · 9 canonical work pages · 6 internal anchors

  1. [1]

    Agent s: An open agentic framework that uses computers like a human

    Saaket Agashe, Jiuzhou Han, Shuyu Gan, Jiachen Yang, Ang Li, and Xin Wang. Agent s: An open agentic framework that uses computers like a human. InInternational Conference on Learning Representations, volume 2025, pages 22924–22946, 2025

  2. [2]

    Claude Code

    Anthropic. Claude Code. https://docs.anthropic.com/en/docs/claude-code/ overview, 2026. Accessed: 2026-06-11

  3. [3]

    Screenai: A vision-language model for ui and infographics understanding.arXiv preprint arXiv:2402.04615, 2024

    Gilles Baechler, Srinivas Sunkara, Maria Wang, Fedir Zubach, Hassan Mansoor, Vincent Etter, Victor Cărbune, Jason Lin, Jindong Chen, and Abhanshu Sharma. Screenai: A vision-language model for ui and infographics understanding.arXiv preprint arXiv:2402.04615, 2024

  4. [4]

    Seeclick: Harnessing gui grounding for advanced visual gui agents

    Kanzhi Cheng, Qiushi Sun, Yougang Chu, Fangzhi Xu, Li YanTao, Jianbing Zhang, and Zhiyong Wu. Seeclick: Harnessing gui grounding for advanced visual gui agents. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 9313–9332, 2024

  5. [5]

    Securing AI Agents with Information-Flow Control

    Manuel Costa, Boris Köpf, Aashish Kolluri, Andrew Paverd, Mark Russinovich, Ahmed Salem, Shruti Tople, Lukas Wutschitz, and Santiago Zanella-Béguelin. Securing ai agents with information-flow control.arXiv preprint arXiv:2505.23643, 2025

  6. [6]

    Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents.Advances in Neural Information Processing Systems, 37:82895–82920, 2024

    Edoardo Debenedetti, Jie Zhang, Mislav Balunovic, Luca Beurer-Kellner, Marc Fischer, and Florian Tramèr. Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents.Advances in Neural Information Processing Systems, 37:82895–82920, 2024. 12

  7. [7]

    Taintdroid: an information-flow tracking system for realtime privacy monitoring on smartphones.ACM Transactions on Computer Systems (TOCS), 32(2):1–29, 2014

    William Enck, Peter Gilbert, Seungyeop Han, Vasant Tendulkar, Byung-Gon Chun, Landon P Cox, Jaeyeon Jung, Patrick McDaniel, and Anmol N Sheth. Taintdroid: an information-flow tracking system for realtime privacy monitoring on smartphones.ACM Transactions on Computer Systems (TOCS), 32(2):1–29, 2014

  8. [8]

    Android open source project.https://source.android.com/, 2026

    Google. Android open source project.https://source.android.com/, 2026

  9. [9]

    Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection

    Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection. InProceedings of the 16th ACM workshop on artificial intelligence and security, pages 79–90, 2023

  10. [10]

    arXiv preprint arXiv:2512.19432 , year=

    Quyu Kong, Xu Zhang, Zhenyu Yang, Nolan Gao, Chen Liu, Panrong Tong, Chenglin Cai, Hanzhang Zhou, Jianan Zhang, Liangyu Chen, et al. Mobileworld: Benchmarking autonomous mobile agents in agent-user interactive and mcp-augmented environments.arXiv preprint arXiv:2512.19432, 2025

  11. [11]

    Mapping natural language instructionstomobileuiactionsequences

    Yang Li, Jiacong He, Xin Zhou, Yuan Zhang, and Jason Baldridge. Mapping natural language instructionstomobileuiactionsequences. InProceedingsofthe58thannualmeetingoftheassociation for computational linguistics, pages 8198–8210, 2020

  12. [12]

    Droidbot: a lightweight ui-guided test input generator for android

    Yuanchun Li, Ziyue Yang, Yao Guo, and Xiangqun Chen. Droidbot: a lightweight ui-guided test input generator for android. In2017 IEEE/ACM 39th international conference on software engineering companion (ICSE-C), pages 23–26. IEEE, 2017

  13. [13]

    OpenClaw.https://docs.openclaw.ai/, 2026

    OpenClaw Contributors. OpenClaw.https://docs.openclaw.ai/, 2026. Accessed: 2026-06- 11

  14. [14]

    UI-TARS: Pioneering Automated GUI Interaction with Native Agents

    Yujia Qin, Yining Ye, Junjie Fang, Haoming Wang, Shihao Liang, Shizuo Tian, Junda Zhang, Jiahao Li, Yunxin Li, Shijue Huang, et al. Ui-tars: Pioneering automated gui interaction with native agents. arXiv preprint arXiv:2501.12326, 2025

  15. [15]

    Androidworld: A dynamic benchmarking environment for autonomous agents

    ChrisRawles, SarahClinckemaillie, YifanChang, JonathanWaltz, GabrielleLau, MarybethFair, Alice Li, William Bishop, Wei Li, Folawiyo Campbell-Ajala, et al. Androidworld: A dynamic benchmarking environment for autonomous agents. InInternational Conference on Learning Representations, volume 2025, pages 406–441, 2025

  16. [16]

    Mobile-agent-v2: Mobile device operation assistant with effective navigation via multi-agent collaboration.Advances in Neural Information Processing Systems, 37:2686–2710, 2024

    Junyang Wang, Haiyang Xu, Haitao Jia, Xi Zhang, Ming Yan, Weizhou Shen, Ji Zhang, Fei Huang, and Jitao Sang. Mobile-agent-v2: Mobile device operation assistant with effective navigation via multi-agent collaboration.Advances in Neural Information Processing Systems, 37:2686–2710, 2024

  17. [17]

    Autodroid: Llm-powered task automation in android

    Hao Wen, Yuanchun Li, Guohong Liu, Shanhui Zhao, Tao Yu, Toby Jia-Jun Li, Shiqi Jiang, Yunhao Liu, Yaqin Zhang, and Yunxin Liu. Autodroid: Llm-powered task automation in android. In Proceedings of the 30th Annual International Conference on Mobile Computing and Networking, pages 543–557, 2024

  18. [18]

    Autodroid-v2: Boosting slm-based gui agents via code generation

    HaoWen, ShizuoTian, BorislavPavlov, WenjieDu, YixuanLi, GeChang, ShanhuiZhao, JiachengLiu, Yunxin Liu, Ya-Qin Zhang, et al. Autodroid-v2: Boosting slm-based gui agents via code generation. InProceedings of the 23rd Annual International Conference on Mobile Systems, Applications and Services, pages 223–235, 2025. 13

  19. [19]

    System-level defense against indirect prompt injection attacks: An information flow control per- spective,

    Fangzhou Wu, Ethan Cecchetti, and Chaowei Xiao. System-level defense against indirect prompt injection attacks: An information flow control perspective.arXiv preprint arXiv:2409.19091, 2024

  20. [20]

    Os-copilot: Towards generalist computer agents with self-improvement, 2024

    Zhiyong Wu, Chengcheng Han, Zichen Ding, Zhenmin Weng, Zhoumianze Liu, Shunyu Yao, Tao Yu, and Lingpeng Kong. Os-copilot: Towards generalist computer agents with self-improvement, 2024

  21. [21]

    Os-atlas: Foundation action model for generalist gui agents

    Zhiyong Wu, Zhenyu Wu, Fangzhi Xu, Yian Wang, Qiushi Sun, Chengyou Jia, Kanzhi Cheng, Zichen Ding, Liheng Chen, Paul Pu Liang, et al. Os-atlas: Foundation action model for generalist gui agents. InInternational Conference on Learning Representations, volume 2025, pages 5090–5108, 2025

  22. [22]

    Osworld: Benchmarking multimodal agents for open-ended tasks in real computer environments.Advances in Neural Information Processing Systems, 37:52040–52094, 2024

    Tianbao Xie, Danyang Zhang, Jixuan Chen, Xiaochuan Li, Siheng Zhao, Ruisheng Cao, Toh J Hua, Zhoujun Cheng, Dongchan Shin, Fangyu Lei, et al. Osworld: Benchmarking multimodal agents for open-ended tasks in real computer environments.Advances in Neural Information Processing Systems, 37:52040–52094, 2024

  23. [23]

    Androidlab: Training and systematic benchmarking of android autonomous agents

    Yifan Xu, Xiao Liu, Xueqiao Sun, Siyi Cheng, Hao Yu, Hanyu Lai, Shudan Zhang, Dan Zhang, Jie Tang, and Yuxiao Dong. Androidlab: Training and systematic benchmarking of android autonomous agents. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2144–2166, 2025

  24. [24]

    Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction

    Yiheng Xu, Zekun Wang, Junli Wang, Dunjie Lu, Tianbao Xie, Amrita Saha, Doyen Sahoo, Tao Yu, and Caiming Xiong. Aguvis: Unified pure vision agents for autonomous gui interaction.arXiv preprint arXiv:2412.04454, 2024

  25. [25]

    ReAct: Synergizing Reasoning and Acting in Language Models

    Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models.arXiv preprint arXiv:2210.03629, 2022

  26. [26]

    Mobile-Agent-v3: Fundamental Agents for GUI Automation

    Jiabo Ye, Xi Zhang, Haiyang Xu, Haowei Liu, Junyang Wang, Zhaoqing Zhu, Ziwei Zheng, Feiyu Gao, Junjie Cao, Zhengxi Lu, et al. Mobile-agent-v3: Fundamental agents for gui automation.arXiv preprint arXiv:2508.15144, 2025

  27. [27]

    Injecagent: Benchmarking indirect prompt injections in tool-integrated large language model agents

    Qiusi Zhan, Zhixiang Liang, Zifan Ying, and Daniel Kang. Injecagent: Benchmarking indirect prompt injections in tool-integrated large language model agents. InFindings of the Association for Computational Linguistics: ACL 2024, pages 10471–10506, 2024

  28. [28]

    Appagent: Multimodal agents as smartphone users

    Chi Zhang, Zhao Yang, Jiaxuan Liu, Yanda Li, Yucheng Han, Xin Chen, Zebiao Huang, Bin Fu, and Gang Yu. Appagent: Multimodal agents as smartphone users. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems, pages 1–20, 2025

  29. [29]

    GPT-4V(ision) is a Generalist Web Agent, if Grounded

    Boyuan Zheng, Boyu Gou, Jihyung Kil, Huan Sun, and Yu Su. Gpt-4v (ision) is a generalist web agent, if grounded.arXiv preprint arXiv:2401.01614, 2024. 14 A. Benchmark Tasks Our benchmark comprises 30 real-world mobile tasks grouped into five core capability categories plus a hybrid category that composes them, with five tasks each. Table 3 lists all tasks...