arxiv: 2602.22942 · v2 · submitted 2026-02-26 · 💻 cs.MA

Recognition: no theorem link

ClawMobile: Rethinking Smartphone-Native Agentic Systems

Hongchao Du , Shangyu Wu , Qiao Li , Riwei Pan , Jinheng Li , Youcheng Sun , Chun Jason Xue

Authors on Pith no claims yet

Pith reviewed 2026-05-15 19:18 UTC · model grok-4.3

classification 💻 cs.MA

keywords smartphone agentsLLM agentshierarchical architecturedeterministic controlmobile autonomyagentic systemsruntime design

0 comments

The pith

ClawMobile uses hierarchical separation of reasoning and control to stabilize smartphone agent systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents ClawMobile as an approach to agentic systems that operate directly on smartphones. It proposes a hierarchical architecture in which large language models manage high-level reasoning while structured deterministic pathways handle device-specific control tasks. This design aims to address the unique difficulties of mobile environments, such as limited resources and dynamic application states, by reducing reliance on unpredictable probabilistic outputs for low-level actions. The work distills design principles for mobile LLM runtimes and points out ongoing challenges in efficiency, adaptability, and stability. The authors argue that principled coordination between probabilistic planning and deterministic interfaces is essential for robust smartphone autonomy.

Core claim

ClawMobile adopts a hierarchical architecture that separates high-level language reasoning from structured, deterministic control pathways, improving execution stability and reproducibility on real devices.

What carries the argument

Hierarchical architecture separating high-level language reasoning from structured, deterministic control pathways

If this is right

Agent execution on smartphones becomes more stable when high-level plans are executed through fixed control mechanisms rather than direct LLM commands.
Mobile LLM runtimes benefit from explicit coordination between probabilistic and deterministic components.
Key remaining challenges include improving efficiency and adaptability while maintaining stability.
Open-sourcing the implementation allows others to test and extend these design principles on real hardware.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar hierarchical designs could apply to other resource-constrained platforms like embedded systems.
The separation might simplify testing by allowing independent verification of control pathways.
Developers could use this model to integrate LLMs with existing mobile automation tools more reliably.
Future work might explore dynamic switching between reasoning modes based on task complexity.

Load-bearing premise

Separating high-level probabilistic reasoning from deterministic control pathways produces measurable gains in stability and reproducibility on real smartphones.

What would settle it

Comparative experiments measuring task completion rates, error rates, and run-to-run consistency on physical smartphones with and without the hierarchical separation.

Figures

Figures reproduced from arXiv: 2602.22942 by Chun Jason Xue, Hongchao Du, Jinheng Li, Qiao Li, Riwei Pan, Shangyu Wu, Youcheng Sun.

**Figure 1.** Figure 1: ClawMobile architecture. The Agent Orchestrator serves as the central coordination layer. Control Backends provide structured execution interfaces to the smartphone. Memory maintains mobile-specific knowledge and execution preferences that guide runtime behavior [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

read the original abstract

Smartphones represent a uniquely challenging environment for agentic systems. Unlike cloud or desktop settings, mobile devices combine constrained execution contexts, fragmented control interfaces, and rapidly changing application states. As large language models (LLMs) evolve from conversational assistants to action-oriented agents, achieving reliable smartphone-native autonomy requires rethinking how reasoning and control are composed. We introduce ClawMobile as a concrete exploration of this design space. ClawMobile adopts a hierarchical architecture that separates high-level language reasoning from structured, deterministic control pathways, improving execution stability and reproducibility on real devices. Using ClawMobile as a case study, we distill the design principles for mobile LLM runtimes and identify key challenges in efficiency, adaptability, and stability. We argue that building robust smartphone-native agentic systems demands principled coordination between probabilistic planning and deterministic system interfaces. The implementation is open-sourced~\footnote{https://github.com/ClawMobile/ClawMobile} to facilitate future exploration.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ClawMobile describes a hierarchical split for phone agents but asserts stability gains without any measurements or tests to support them.

read the letter

ClawMobile is a concrete system that puts high-level LLM reasoning on top of structured deterministic controls for running agents directly on smartphones. The main point is that this split is meant to handle mobile constraints like limited compute and shifting app states, and the authors open-source the code on GitHub to let others try it out. They also list some design principles for mobile LLM runtimes and flag ongoing issues around efficiency and adaptability. That practical framing is the useful part here, especially for anyone already working on on-device agents rather than cloud setups. The architecture itself is not entirely new in the broader agent literature, but tailoring it to phone-specific interfaces and releasing working code gives it some immediate value as a case study. The clear weakness is the complete absence of evidence. The paper claims the hierarchy improves execution stability and reproducibility, yet it supplies no benchmarks, no success rates, no comparisons against simpler baselines, and no real-device results. Without those numbers the central assertion stays untested. This is the kind of paper that would interest engineers building mobile AI tools or researchers looking for starting code in the on-device agent space. A reader who wants ideas on how to separate planning from control on phones could pull something from it, but anyone expecting validated performance claims will come away empty. It deserves peer review because the topic is timely and the open implementation is a real contribution, though any referee would need to push hard for basic experiments before it could stand as a finished result.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces ClawMobile, a hierarchical architecture for smartphone-native agentic systems that separates high-level language reasoning from structured, deterministic control pathways. It claims this design improves execution stability and reproducibility on real devices, uses the system as a case study to distill design principles for mobile LLM runtimes, identifies challenges in efficiency, adaptability, and stability, and open-sources the implementation.

Significance. If the stability and reproducibility improvements are empirically demonstrated, the work could meaningfully advance reliable LLM agents in constrained mobile environments by clarifying coordination between probabilistic planning and deterministic interfaces. The open-sourced implementation is a clear strength that supports reproducibility and community follow-up.

major comments (2)

[Abstract] Abstract: The central claim that the hierarchical separation 'improving execution stability and reproducibility on real devices' is asserted without any supporting experiments, metrics (e.g., success rates, variance, latency), ablation studies, or comparisons to non-hierarchical baselines under constrained execution and fragmented interfaces.
[Evaluation (or equivalent)] The manuscript provides no evaluation section, results tables, or quantitative benchmarks that would substantiate the load-bearing assertion of measurable gains from the architecture; the contribution therefore rests on an unevidenced design choice rather than demonstrated outcomes.

minor comments (1)

[Abstract] The GitHub footnote could include a direct link or DOI for easier access.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments highlighting the need for empirical support. The manuscript is a design exploration of hierarchical agentic systems for smartphones, but we agree the abstract and contribution would benefit from more cautious language and added evaluation to substantiate the stability claims.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that the hierarchical separation 'improving execution stability and reproducibility on real devices' is asserted without any supporting experiments, metrics (e.g., success rates, variance, latency), ablation studies, or comparisons to non-hierarchical baselines under constrained execution and fragmented interfaces.

Authors: We agree the abstract asserts an improvement without quantitative backing. In revision we will rephrase the relevant sentence to present the separation as a design intended to improve stability and reproducibility, grounded in the challenges and rationale detailed in the body. We will also add a new Evaluation section with preliminary results including task success rates, outcome variance, latency measurements, and comparisons against non-hierarchical baselines on real devices. revision: yes
Referee: [Evaluation (or equivalent)] The manuscript provides no evaluation section, results tables, or quantitative benchmarks that would substantiate the load-bearing assertion of measurable gains from the architecture; the contribution therefore rests on an unevidenced design choice rather than demonstrated outcomes.

Authors: The current version emphasizes architectural principles and open-source implementation as a case study rather than a full empirical benchmark paper. We accept that this leaves the stability claim unsubstantiated. The revised manuscript will include an Evaluation section reporting quantitative results on a set of smartphone agent tasks, with metrics for success rate, reproducibility (variance across runs), latency, and direct comparisons to non-hierarchical LLM-agent baselines under realistic mobile constraints. revision: yes

Circularity Check

0 steps flagged

No circularity: architectural proposal without derivations or fitted predictions

full rationale

The paper presents ClawMobile as a hierarchical system design separating high-level LLM reasoning from deterministic control pathways, asserting gains in stability and reproducibility on mobile devices. No equations, parameter fittings, predictions derived from subsets of data, or self-citations appear in the abstract or described full text. The central claims are design assertions and distilled principles rather than results obtained by reducing to prior inputs by construction. The work is therefore self-contained as an engineering case study with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract introduces no free parameters, mathematical axioms, or new invented entities; it is a descriptive system proposal without derivations or postulates beyond standard LLM and mobile-computing assumptions.

pith-pipeline@v0.9.0 · 5470 in / 1086 out tokens · 31875 ms · 2026-05-15T19:18:29.745832+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages · 2 internal anchors

[1]

Streamline AI Agent Tool Interactions

Amazon 2026. Streamline AI Agent Tool Interactions. https://aws.amazon.com/blogs/machine-learning/streamline-ai- agent-tool-interactions-connect-api-gateway-to-agentcore-gateway- with-mcp

work page 2026
[2]

Jingxuan Chen, Derek Yuen, Bin Xie, Yuhao Yang, Gongwei Chen, Zhihao Wu, Li Yixing, Xurui Zhou, Weiwen Liu, Shuai Wang, Kai- wen Zhou, Rui Shao, Liqiang Nie, Yasheng Wang, Jianye HAO, Jun Wang, and Kun Shao. 2025. SPA-BENCH: A COMPRE- HENSIVE BENCHMARK FOR SMARTPHONE AGENT EVALU- ATION. InInternational Conference on Learning Representations, Y. Yue, A. Ga...

work page 2025
[3]

ClawPhone

ClawPhone 2026. ClawPhone. https://github.com/marshallrichards/ ClawPhone

work page 2026
[4]

DroidRun

DroidRun 2026. DroidRun. https://github.com/droidrun/droidrun

work page 2026
[5]

Hongchao Du, Shangyu Wu, Arina Kharlamova, Nan Guan, and Chun Jason Xue. 2025. FlexInfer: Breaking Memory Constraint via Flexible and Efficient Offloading for On-Device LLM Inference. In Proceedings of the 5th Workshop on Machine Learning and Systems, EuroMLSys 2025, World Trade Center, Rotterdam, The Netherlands, 30 March 2025- 3 April 2025, Eiko Yoneki ...

work page doi:10.1145/3721146.3721961 2025
[6]

Gucongcong Fan, Chaoyue Niu, Chengfei Lyu, Fan Wu, and Gui- hai Chen. 2025. CORE: Reducing UI Exposure in Mobile Agents via Collaboration Between Cloud and Local LLMs.arXiv preprint arXiv:2510.15455(2025)

work page arXiv 2025
[7]

Boyu Gou, Ruohan Wang, Boyuan Zheng, Yanan Xie, Cheng Chang, Yiheng Shu, Huan Sun, and Yu Su. 2024. Navigating the digital world as humans do: Universal visual grounding for gui agents.arXiv preprint arXiv:2410.05243(2024)

work page arXiv 2024
[8]

Jakub Hoscilowicz and Artur Janicki. 2025. Clickagent: Enhancing ui location capabilities of autonomous agents. InProceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue. 471–476

work page 2025
[9]

Yangqin Jiang and Chao Huang. 2025. OpenPhone: Mobile Agentic Foundation Models.arXiv preprint arXiv:2510.22009(2025)

work page arXiv 2025
[10]

Guangyi Liu, Pengxiang Zhao, Yaozhen Liang, Liang Liu, Yaxuan Guo, Han Xiao, Weifeng Lin, Yuxiang Chai, Yue Han, Shuai Ren, Hao Wang, Xiaoyu Liang, WenHao Wang, Tianze Wu, Zhengxi Lu, Siheng Chen, LiLinghao, Hao Wang, Guanjing Xiong, Yong Liu, and Hongsheng Li

work page
[11]

LLM-Powered GUI Agents in Phone Automation: Surveying Progress and Prospects.Transactions on Machine Learning Research (2025).https://openreview.net/forum?id=yWQqoi1G1K

work page 2025
[12]

Xiao Liu, Bo Qin, Dongzhu Liang, Guang Dong, Hanyu Lai, Hanchen Zhang, Hanlin Zhao, Iat Long Iong, Jiadai Sun, Jiaqi Wang, et al. 2024. Autoglm: Autonomous foundation agents for guis.arXiv preprint arXiv:2411.00820(2024)

work page arXiv 2024
[13]

Ahmed, Puneet Mathur, Seunghyun Yoon, Lina Yao, Branislav Kveton, Jihyung Kil, Thien Huu Nguyen, Trung Bui, Tianyi Zhou, Ryan A

Dang Nguyen, Jian Chen, Yu Wang, Gang Wu, Namyong Park, Zheng- mian Hu, Hanjia Lyu, Junda Wu, Ryan Aponte, Yu Xia, Xintong Li, Jing Shi, Hongjie Chen, Viet Dac Lai, Zhouhang Xie, Sungchul Kim, Ruiyi Zhang, Tong Yu, Mehrab Tanjim, Nesreen K. Ahmed, Puneet Mathur, Seunghyun Yoon, Lina Yao, Branislav Kveton, Jihyung Kil, Thien Huu Nguyen, Trung Bui, Tianyi Z...

work page doi:10.18653/v1/2025.findings-acl.1158 2025
[14]

Openclaw Android platforms

OpenClaw 2026. Openclaw Android platforms. https://docs.openclaw.ai/platforms/android

work page 2026
[15]

OpenClaw: Your own personal AI assistant

OpenClaw 2026. OpenClaw: Your own personal AI assistant. https://github.com/openclaw/openclaw (accessed 2026-02-25)

work page 2026
[16]

PhoneClaw: Automate Android phones entirely without root from a side-loaded APK

PhoneClaw 2026. PhoneClaw: Automate Android phones entirely without root from a side-loaded APK. https://github.com/rohanarun/phoneclaw

work page 2026
[17]

Christopher Rawles, Sarah Clinckemaillie, Yifan Chang, Jonathan Waltz, Gabrielle Lau, Marybeth Fair, Alice Li, William Bishop, Wei Li, Folawiyo Campbell-Ajala, Daniel Toyama, Robert Berry, Divya Tyama- gundlu, Timothy Lillicrap, and Oriana Riva. 2025. AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents. arXiv:2405.14573 [cs.AI]https://a...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[18]

Termux-API

Termux-API 2015. Termux-API. https://github.com/termux/termux- api

work page 2015
[19]

Junyang Wang, Haiyang Xu, Jiabo Ye, Ming Yan, Weizhou Shen, Ji Zhang, Fei Huang, and Jitao Sang. 2024. Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception.arXiv preprint arXiv:2401.16158(2024)

work page arXiv 2024
[20]

Yiqin Wang, Haoji Zhang, Jingqi Tian, and Yansong Tang. 2025. Ponder & press: Advancing visual gui agent towards general computer control. InFindings of the Association for Computational Linguistics: ACL 2025. 1461–1473

work page 2025
[21]

Hao Wen, Yuanchun Li, Guohong Liu, Shanhui Zhao, Tao Yu, Toby Jia-Jun Li, Shiqi Jiang, Yunhao Liu, Yaqin Zhang, and Yunxin Liu. 2023. Empowering llm to use smartphone for intelligent task automation. arXiv preprint arXiv:2308.15272(2023)

work page arXiv 2023
[22]

Hao Wen, Yuanchun Li, Guohong Liu, Shanhui Zhao, Tao Yu, Toby Jia-Jun Li, Shiqi Jiang, Yunhao Liu, Yaqin Zhang, and Yunxin Liu. 2024. AutoDroid: LLM-powered Task Automation in Android. InProceed- ings of the 30th Annual International Conference on Mobile Computing and Networking(Washington D.C., DC, USA)(ACM MobiCom ’24). Association for Computing Machine...

work page doi:10.1145/3636534.3649379 2024
[23]

Liangxuan Wu, Yanjie Zhao, Chao Wang, Tianming Liu, and Haoyu Wang. 2024. A first look at llm-powered smartphones. InProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering Workshops. 208–217

work page 2024
[24]

Chi Zhang, Zhao Yang, Jiaxuan Liu, Yanda Li, Yucheng Han, Xin Chen, Zebiao Huang, Bin Fu, and Gang Yu. 2025. AppAgent: Multimodal Agents as Smartphone Users. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25). Association for Computing Machinery, New York, NY, USA, Article 70, 20 pages.https: //doi.org/10.1145/3706598.3713600

work page doi:10.1145/3706598.3713600 2025
[25]

Boyuan Zheng, Boyu Gou, Jihyung Kil, Huan Sun, and Yu Su. 2024. Gpt-4v (ision) is a generalist web agent, if grounded.arXiv preprint arXiv:2401.01614(2024)

work page internal anchor Pith review arXiv 2024
[26]

Hanzhang Zhou, Xu Zhang, Panrong Tong, Jianan Zhang, Liangyu Chen, Quyu Kong, Chenglin Cai, Chen Liu, Yue Wang, Jingren Zhou, et al. 2025. MAI-UI Technical Report: Real-World Centric Foundation GUI Agents.arXiv preprint arXiv:2512.22047(2025)

work page arXiv 2025