Turing Test on Screen: A Benchmark for Mobile GUI Agent Humanization
Pith reviewed 2026-05-15 20:32 UTC · model grok-4.3
The pith
GUI agents can reach high human imitability in mobile touch interactions without losing task performance by minimizing behavioral divergence.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By modeling the interaction as a MinMax problem and optimizing agents to minimize behavioral divergence from human touch kinematics on a collected high-fidelity mobile dataset, agents can achieve high imitability scores both theoretically and empirically without any measurable drop in utility or robustness.
What carries the argument
The MinMax optimization between detector and agent that quantifies behavioral divergence, supported by the Agent Humanization Benchmark and associated detection metrics.
If this is right
- Vanilla LMM-based agents produce detectable unnatural kinematics in touch trajectories.
- Heuristic noise injection and data-driven behavioral matching both raise imitability without harming task performance.
- The new benchmark and metrics make the imitability-utility trade-off measurable and comparable across methods.
- Successful humanization allows agents to operate inside human-centric platforms without triggering adversarial countermeasures.
Where Pith is reading between the lines
- Similar humanization techniques could be required for agents on non-mobile interfaces such as web or desktop.
- Detector designers may need to incorporate higher-order statistics or multi-session patterns once basic kinematic matching becomes common.
- The MinMax framing suggests a possible arms race where continued improvement in humanization forces detectors to adopt more sophisticated models.
Load-bearing premise
The collected dataset of mobile touch dynamics represents the full range of behaviors that real detectors would rely on, and reducing measured divergence in the model produces actual undetectability in deployed systems.
What would settle it
A controlled test in which a humanized agent is run against production mobile-platform detectors on live apps and still receives non-human flags at rates comparable to vanilla agents.
Figures
read the original abstract
The rise of autonomous GUI agents has triggered adversarial countermeasures from digital platforms, yet existing research prioritizes utility and robustness over the critical dimension of anti-detection. We argue that for agents to survive in human-centric ecosystems, they must evolve Humanization capabilities. We introduce the ``Turing Test on Screen,'' formally modeling the interaction as a MinMax optimization problem between a detector and an agent aiming to minimize behavioral divergence. We then collect a new high-fidelity dataset of mobile touch dynamics, and conduct our analysis that vanilla LMM-based agents are easily detectable due to unnatural kinematics. Consequently, we establish the Agent Humanization Benchmark (AHB) and detection metrics to quantify the trade-off between imitability and utility. Finally, we propose methods ranging from heuristic noise to data-driven behavioral matching, demonstrating that agents can achieve high imitability theoretically and empirically without sacrificing performance. This work shifts the paradigm from whether an agent can perform a task to how it performs it within a human-centric ecosystem, laying the groundwork for seamless coexistence in adversarial digital environments.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces the 'Turing Test on Screen' benchmark, modeling mobile GUI agent humanization as a MinMax optimization between an agent minimizing behavioral divergence and a detector. It collects a high-fidelity dataset of mobile touch dynamics, shows that vanilla LMM-based agents are easily detectable due to unnatural kinematics, establishes the Agent Humanization Benchmark (AHB) with associated metrics, and proposes methods (heuristic noise injection and data-driven behavioral matching) that achieve high imitability without sacrificing task performance.
Significance. If the empirical claims hold under rigorous validation, the work could meaningfully advance GUI agent research by formalizing the trade-off between utility and undetectability in adversarial environments. The MinMax framing, specialized touch-dynamics dataset, and AHB provide a concrete foundation for future studies on behavioral naturalness, potentially influencing platform policies and agent deployment strategies.
major comments (2)
- [Analysis and Proposed Methods] The central claim that vanilla LMM agents are 'easily detectable due to unnatural kinematics' and that proposed methods achieve high imitability without utility loss lacks reported validation metrics, error bars, statistical controls, or ablation studies on the benchmark metrics (as highlighted by the low soundness score). This is load-bearing for the empirical success assertions.
- [Dataset Collection and MinMax Formulation] The premise that the collected high-fidelity dataset spans the full distribution of human behavior (and that MinMax divergence minimization on the AHB directly implies evasion against real or adaptive detectors using different features/temporal patterns) is untested. No cross-validation against alternative detectors or online adaptation scenarios is provided.
minor comments (2)
- [Abstract] The abstract and introduction would benefit from explicit quantitative results (e.g., specific imitability scores or detection rates) rather than qualitative statements.
- [Formal Modeling] Clarify the precise mathematical definition of the behavioral divergence metric and the trade-off weight in the MinMax objective to improve reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address the major comments below and will revise the paper to incorporate additional validation and clarifications where appropriate.
read point-by-point responses
-
Referee: [Analysis and Proposed Methods] The central claim that vanilla LMM agents are 'easily detectable due to unnatural kinematics' and that proposed methods achieve high imitability without utility loss lacks reported validation metrics, error bars, statistical controls, or ablation studies on the benchmark metrics (as highlighted by the low soundness score). This is load-bearing for the empirical success assertions.
Authors: We acknowledge the need for stronger statistical support. In the revised manuscript, we will add error bars to all reported metrics, include statistical significance tests (such as paired t-tests), provide ablation studies isolating the contributions of heuristic noise injection and data-driven matching, and report full benchmark scores with controls for task difficulty. These additions will directly substantiate the claims on detectability and the imitability-utility trade-off. revision: yes
-
Referee: [Dataset Collection and MinMax Formulation] The premise that the collected high-fidelity dataset spans the full distribution of human behavior (and that MinMax divergence minimization on the AHB directly implies evasion against real or adaptive detectors using different features/temporal patterns) is untested. No cross-validation against alternative detectors or online adaptation scenarios is provided.
Authors: The dataset was gathered from multiple users performing varied tasks to capture diverse touch dynamics, but we agree that explicit cross-validation and adaptation tests would improve rigor. In revision, we will include experiments evaluating our methods against alternative detector feature sets and discuss limitations for fully adaptive online settings. We will also clarify that the MinMax formulation is a foundational model rather than a complete proof of evasion in all scenarios. revision: partial
Circularity Check
No significant circularity in derivation chain
full rationale
The paper defines the MinMax optimization as an explicit modeling choice for the detector-agent interaction, collects an independent high-fidelity dataset of mobile touch dynamics, performs kinematic analysis on vanilla LMM agents, establishes the AHB benchmark from that analysis, and evaluates proposed methods (heuristic noise and data-driven matching) empirically on the new data. No equation or claim reduces by construction to a fitted parameter from the same dataset, no self-citation bears the central load, and the imitability-utility trade-off is demonstrated rather than assumed via renaming or ansatz smuggling. The derivation remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- trade-off weight in MinMax optimization
Forward citations
Cited by 1 Pith paper
-
Securing Computer-Use Agents: A Unified Architecture-Lifecycle Framework for Deployment-Grounded Reliability
The paper develops a unified framework that organizes computer-use agent reliability around perception-decision-execution layers and creation-deployment-operation-maintenance stages to map security and alignment inter...
Reference graph
Works this paper leans on
-
[1]
OpenAI, Josh Achiam, Steven Adler, and Sandhini Agarwal. Gpt-4 technical report, 2024
work page 2024
-
[2]
Gemini: A family of highly capable multimodal models, 2025
Gemini Team, Rohan Anil, and Sebastian Borgeaud. Gemini: A family of highly capable multimodal models, 2025
work page 2025
-
[3]
Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual instruction tuning. InAdvances in Neural Information Processing Systems, volume 36, 2023
work page 2023
-
[4]
Appagent: Multimodal agents as smartphone users
Chi Zhang, Zhao Yang, Jiaxuan Liu, Yucheng Han, Xin Chen, Zebiao Huang, Bin Fu, and Gang Yu. Appagent: Multimodal agents as smartphone users. InThe Twelfth International Conference on Learning Representations, 2024
work page 2024
-
[5]
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception
Junyang Wang, Haiyang Xu, Jiabo Ye, Ming Yan, Weiezhou Shen, Ji Zhang, Fei Huang, and Jitao Sang. Mobile- agent: Autonomous multi-modal mobile device agent with visual perception.arXiv preprint arXiv:2401.16158, 2024
work page internal anchor Pith review arXiv 2024
-
[6]
Cogagent: A visual language model for gui agents
Wenyi Hong, Weihan Wang, Qingsong Lv, Jiazheng Xu, Wenmeng Yu, Junhui Ji, Yan Wang, Zihan Wang, Yuxiao Dong, Ming Ding, and Jie Tang. Cogagent: A visual language model for gui agents. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14281–14290, 2024
work page 2024
-
[7]
Mind2web: Towards a generalist agent for the web
Xiang Deng, Yu Gu, Boyuan Zheng, Shijie Chen, Samuel Stevens, Boshi Wang, Huan Sun, and Yu Su. Mind2web: Towards a generalist agent for the web. InAdvances in Neural Information Processing Systems, volume 36, 2023
work page 2023
-
[8]
Webshop: Towards scalable real-world web interaction with grounded language agents
Shunyu Yao, Howard Chen, John Yang, and Karthik Narasimhan. Webshop: Towards scalable real-world web interaction with grounded language agents. InAdvances in Neural Information Processing Systems, volume 35, pages 20744–20757, 2022
work page 2022
-
[9]
Superplatforms have to attack ai agents, 2025
Jianghao Lin, Jiachen Zhu, Zheli Zhou, Yunjia Xi, Weiwen Liu, Yong Yu, and Weinan Zhang. Superplatforms have to attack ai agents, 2025. 12 APREPRINT- APRIL14, 2026
work page 2025
-
[10]
Amine Allouah, Omar Besbes, Josué D Figueroa, Yash Kanoria, and Akshit Kumar. What is your ai agent buying? evaluation, biases, model dependence, & emerging implications for agentic e-commerce, 2025
work page 2025
-
[11]
How can recommender systems benefit from large language models: A survey, 2024
Jianghao Lin, Xinyi Dai, Yunjia Xi, Weiwen Liu, Bo Chen, Hao Zhang, Yong Liu, Chuhan Wu, Xiangyang Li, Chenxu Zhu, Huifeng Guo, Yong Yu, Ruiming Tang, and Weinan Zhang. How can recommender systems benefit from large language models: A survey, 2024
work page 2024
-
[12]
Computing machinery and intelligence.Mind, 59(236):433–460, 1950
Alan M Turing. Computing machinery and intelligence.Mind, 59(236):433–460, 1950
work page 1950
-
[13]
Ahmad Zairi Zaidi, Chun Yong Chong, Zhe Jin, Rajendran Parthiban, and Ali Safaa Sadiq. Touch-based continuous mobile device authentication: State-of-the-art, challenges and opportunities.Journal of Network and Computer Applications, 191:103162, 2021
work page 2021
-
[14]
Mario Frank, Ralf Biedert, Eugene Ma, Ivan Martinovic, and Dawn Song. Touchalytics: On the applicability of touchscreen input as a behavioral biometric for continuous authentication.IEEE Transactions on Information Forensics and Security, 8(1):136–148, 2013
work page 2013
-
[15]
AlQahtani, and Muhammad Khurram Khan
Reem Alrawili, Ali Abdullah S. AlQahtani, and Muhammad Khurram Khan. Comprehensive survey: Biometric user authentication application, evaluation, and discussion, 2024
work page 2024
-
[16]
Princeton University Press, Princeton, NJ, 1944
John von Neumann and Oskar Morgenstern.Theory of Games and Economic Behavior. Princeton University Press, Princeton, NJ, 1944
work page 1944
-
[17]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. InAdvances in Neural Information Processing Systems, volume 27, pages 2672–2680, 2014
work page 2014
-
[18]
Ui-tars: Pioneering automated gui interaction with native agents, 2025
Yujia Qin, Yining Ye, Junjie Fang, Haoming Wang, Shihao Liang, Shizuo Tian, Junda Zhang, Jiahao Li, Yunxin Li, Shijue Huang, Wanjun Zhong, Kuanye Li, Jiale Yang, Yu Miao, Woyu Lin, Longxiang Liu, Xu Jiang, Qianli Ma, Jingyu Li, Xiaojun Xiao, Kai Cai, Chuang Li, Yaowei Zheng, Chaolin Jin, Chen Li, Xiao Zhou, Minchao Wang, Haoli Chen, Zhaojian Li, Haihua Ya...
work page 2025
-
[19]
Mobile-agent-e: Self-evolving mobile assistant for complex tasks, 2025
Zhenhailong Wang, Haiyang Xu, Junyang Wang, Xi Zhang, Ming Yan, Ji Zhang, Fei Huang, and Heng Ji. Mobile-agent-e: Self-evolving mobile assistant for complex tasks, 2025
work page 2025
-
[20]
Agentcpm-gui: Building mobile-use agents with reinforcement fine-tuning, 2025
Zhong Zhang, Yaxi Lu, Yikun Fu, Yupeng Huo, Shenzhi Yang, Yesai Wu, Han Si, Xin Cong, Haotian Chen, Yankai Lin, Jie Xie, Wei Zhou, Wang Xu, Yuanheng Zhang, Zhou Su, Zhongwu Zhai, Xiaoming Liu, Yudong Mei, Jianming Xu, Hongyan Tian, Chongyi Wang, Chi Chen, Yuan Yao, Zhiyuan Liu, and Maosong Sun. Agentcpm-gui: Building mobile-use agents with reinforcement f...
work page 2025
-
[21]
Autoglm: Autonomous foundation agents for guis, 2024
Xiao Liu, Bo Qin, Dongzhu Liang, Guang Dong, Hanyu Lai, Hanchen Zhang, Hanlin Zhao, Iat Long Iong, Jiadai Sun, Jiaqi Wang, Junjie Gao, Junjun Shan, Kangning Liu, Shudan Zhang, Shuntian Yao, Siyi Cheng, Wentao Yao, Wenyi Zhao, Xinghan Liu, Xinyi Liu, Xinying Chen, Xinyue Yang, Yang Yang, Yifan Xu, Yu Yang, Yujia Wang, Yulin Xu, Zehan Qi, Yuxiao Dong, and J...
work page 2024
-
[22]
Mario Frank, Ralf Biedert, Eugene Ma, Ivan Martinovic, and Dawn Song. Touchalytics: On the applicability of touchscreen input as a behavioral biometric for continuous authentication.IEEE transactions on information forensics and security, 8(1):136–148, 2012
work page 2012
-
[23]
A mathematical theory of communication.The Bell system technical journal, 27(3):379–423, 1948
Claude E Shannon. A mathematical theory of communication.The Bell system technical journal, 27(3):379–423, 1948
work page 1948
-
[24]
Support-vector networks.Machine learning, 20(3):273–297, 1995
Corinna Cortes and Vladimir Vapnik. Support-vector networks.Machine learning, 20(3):273–297, 1995
work page 1995
-
[25]
Xgboost: A scalable tree boosting system.Cornell University, 2016
Tianqi Chen. Xgboost: A scalable tree boosting system.Cornell University, 2016
work page 2016
-
[26]
On calculating with b-splines.Journal of Approximation Theory, 6(1):50–62, 1972
Carl De Boor. On calculating with b-splines.Journal of Approximation Theory, 6(1):50–62, 1972
work page 1972
-
[27]
Mobile-agent-v3: Fundamental agents for gui automation
Jiabo Ye, Xi Zhang, Haiyang Xu, Haowei Liu, Junyang Wang, Zhaoqing Zhu, Ziwei Zheng, Feiyu Gao, Junjie Cao, Zhengxi Lu, et al. Mobile-agent-v3: Fundamental agents for gui automation, 2025.URL https://arxiv. org/abs/2508.15144, 4:21–27
-
[28]
Xinbei Ma, Zhuosheng Zhang, and Hai Zhao. Coco-agent: A comprehensive cognitive mllm agent for smartphone gui automation.arXiv preprint arXiv:2402.11941, 2024
-
[29]
Ning Li, Xiangmou Qu, Jiamu Zhou, Jun Wang, Muning Wen, Kounianhua Du, Xingyu Lou, Qiuying Peng, and Weinan Zhang. Mobileuse: A gui agent with hierarchical reflection for autonomous mobile operation.arXiv preprint arXiv:2507.16853, 2025
-
[30]
Caution for the environment: Multimodal agents are susceptible to environ- mental distractions
Xinbei Ma, Yiting Wang, Yao Yao, Tongxin Yuan, Aston Zhang, Zhuosheng Zhang, and Hai Zhao. Caution for the environment: Multimodal agents are susceptible to environmental distractions.arXiv preprint arXiv:2408.02544, 2024. 13 APREPRINT- APRIL14, 2026
-
[31]
VeriOS: Query-Driven Proactive Human-Agent-GUI Interaction for Trustworthy OS Agents
Zheng Wu, Heyuan Huang, Xingyu Lou, Xiangmou Qu, Pengzhou Cheng, Zongru Wu, Weiwen Liu, Weinan Zhang, Jun Wang, Zhaoxiang Wang, et al. Verios: Query-driven proactive human-agent-gui interaction for trustworthy os agents.arXiv preprint arXiv:2509.07553, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[32]
Os-kairos: Adaptive interaction for mllm-powered gui agents
Pengzhou Cheng, Zheng Wu, Zongru Wu, Tianjie Ju, Aston Zhang, Zhuosheng Zhang, and Gongshen Liu. Os-kairos: Adaptive interaction for mllm-powered gui agents. InFindings of the Association for Computational Linguistics: ACL 2025, pages 6701–6725, 2025
work page 2025
-
[33]
Mobile-R1: Towards Interactive Capability for VLM-Based Mobile Agent via Systematic Training
Jihao Gu, Qihang Ai, Yingyao Wang, Pi Bu, Jingxuan Xing, Zekun Zhu, Wei Jiang, Ziming Wang, Yingxiu Zhao, Ming-Liang Zhang, et al. Mobile-r1: Towards interactive reinforcement learning for vlm-based mobile agent via task-level rewards.arXiv preprint arXiv:2506.20332, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[34]
Fanbin Lu, Zhisheng Zhong, Shu Liu, Chi-Wing Fu, and Jiaya Jia. Arpo: End-to-end policy optimization for gui agents with experience replay.arXiv preprint arXiv:2505.16282, 2025
-
[35]
Yifan Xu, Xiao Liu, Xinghan Liu, Jiaqi Fu, Hanchen Zhang, Bohao Jing, Shudan Zhang, Yuting Wang, Wenyi Zhao, and Yuxiao Dong. Mobilerl: Advancing mobile use agents with adaptive online reinforcement learning, 2025.URL https://github. com/THUDM/MobileRL
work page 2025
-
[36]
Zhiheng Xi, Wenxiang Chen, Xin Guo, Wei He, Yiwen Ding, Boyang Hong, Ming Zhang, Junzhe Wang, Senjie Jin, Enyu Zhou, et al. The rise and potential of large language model based agents: A survey.Science China Information Sciences, 68(2):121101, 2025
work page 2025
-
[37]
Dissecting adversarial robustness of multimodal lm agents, 2025
Chen Henry Wu, Rishi Shah, Jing Yu Koh, Ruslan Salakhutdinov, Daniel Fried, and Aditi Raghunathan. Dissecting adversarial robustness of multimodal lm agents, 2025
work page 2025
-
[38]
Hanrong Zhang, Jingyuan Huang, Kai Mei, Yifei Yao, Zhenting Wang, Chenlu Zhan, Hongwei Wang, and Yongfeng Zhang. Agent security bench (asb): Formalizing and benchmarking attacks and defenses in llm-based agents, 2025
work page 2025
-
[39]
Advagent: Controllable blackbox red-teaming on web agents, 2025
Chejian Xu, Mintong Kang, Jiawei Zhang, Zeyi Liao, Lingbo Mo, Mengqi Yuan, Huan Sun, and Bo Li. Advagent: Controllable blackbox red-teaming on web agents, 2025
work page 2025
-
[40]
Xiangming Gu, Xiaosen Zheng, Tianyu Pang, Chao Du, Qian Liu, Ye Wang, Jing Jiang, and Min Lin. Agent smith: A single image can jailbreak one million multimodal llm agents exponentially fast.arXiv preprint arXiv:2402.08567, 2024
-
[41]
On the robustness of large multimodal models against image adversarial attacks, 2023
Xuanming Cui, Alejandro Aparcedo, Young Kyun Jang, and Ser-Nam Lim. On the robustness of large multimodal models against image adversarial attacks, 2023
work page 2023
-
[42]
How robust is google’s bard to adversarial image attacks?arXiv preprint arXiv:2309.11751,
Yinpeng Dong, Huanran Chen, Jiawei Chen, Zhengwei Fang, Xiao Yang, Yichi Zhang, Yu Tian, Hang Su, and Jun Zhu. How robust is google’s bard to adversarial image attacks?arXiv preprint arXiv:2309.11751, 2023
-
[43]
Eia: Environmental injection attack on generalist web agents for privacy leakage, 2025
Zeyi Liao, Lingbo Mo, Chejian Xu, Mintong Kang, Jiawei Zhang, Chaowei Xiao, Yuan Tian, Bo Li, and Huan Sun. Eia: Environmental injection attack on generalist web agents for privacy leakage, 2025
work page 2025
-
[44]
Evaluating the robustness of multimodal agents against active environmental injection attacks, 2025
Yurun Chen, Xavier Hu, Keting Yin, Juncheng Li, and Shengyu Zhang. Evaluating the robustness of multimodal agents against active environmental injection attacks, 2025
work page 2025
-
[45]
The obvious invisible threat: Llm-powered gui agents’ vulnerability to fine-print injections, 2025
Chaoran Chen, Zhiping Zhang, Bingcan Guo, Shang Ma, Ibrahim Khalilov, Simret A Gebreegziabher, Yanfang Ye, Ziang Xiao, Yaxing Yao, Tianshi Li, and Toby Jia-Jun Li. The obvious invisible threat: Llm-powered gui agents’ vulnerability to fine-print injections, 2025
work page 2025
-
[46]
Attacking vision-language computer agents via pop-ups, 2025
Yanzhe Zhang, Tao Yu, and Diyi Yang. Attacking vision-language computer agents via pop-ups, 2025
work page 2025
-
[47]
Clip-guided generative networks for transferable targeted adversarial attacks, 2024
Hao Fang, Jiawei Kong, Bin Chen, Tao Dai, Hao Wu, and Shu-Tao Xia. Clip-guided generative networks for transferable targeted adversarial attacks, 2024
work page 2024
-
[48]
Qava: Query-agnostic visual attack to large vision-language models, 2025
Yudong Zhang, Ruobing Xie, Jiansheng Chen, Xingwu Sun, Zhanhui Kang, and Yu Wang. Qava: Query-agnostic visual attack to large vision-language models, 2025
work page 2025
-
[49]
Exploring the adversarial robustness of clip for ai-generated image detection
Vincenzo De Rosa, Fabrizio Guillaro, Giovanni Poggi, Davide Cozzolino, and Luisa Verdoliva. Exploring the adversarial robustness of clip for ai-generated image detection. In2024 IEEE International Workshop on Information Forensics and Security (WIFS), pages 1–6. IEEE, 2024
work page 2024
-
[50]
Badagent: Inserting and activating backdoor attacks in llm agents, 2024
Yifei Wang, Dizhan Xue, Shengjie Zhang, and Shengsheng Qian. Badagent: Inserting and activating backdoor attacks in llm agents, 2024
work page 2024
-
[51]
Watch out for your agents! investigating backdoor threats to llm-based agents, 2024
Wenkai Yang, Xiaohan Bi, Yankai Lin, Sishuo Chen, Jie Zhou, and Xu Sun. Watch out for your agents! investigating backdoor threats to llm-based agents, 2024. 14 APREPRINT- APRIL14, 2026
work page 2024
-
[52]
Foot-in-the-door: A multi-turn jailbreak for LLMs
Zixuan Weng, Xiaolong Jin, Jinyuan Jia, and Xiangyu Zhang. Foot-in-the-door: A multi-turn jailbreak for LLMs. In Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng, editors,Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 1939–1950, Suzhou, China, November 2025. Association for Compu...
work page 2025
-
[53]
Ahmed Mahfouz, Tarek M Mahmoud, and Ahmed Sharaf Eldin. Sensor-based continuous authentication of smartphones’ users using behavioral biometrics: A survey.IEEE Access, 5:15226–15257, 2017
work page 2017
-
[54]
In27th USENIX Security Symposium (USENIX Security 18), pages 135–150, 2018
Antoine Vastel, Pierre Laperdrix, Walter Rudametkin, and Romain Rouvoy.{Fp-Scanner}: The privacy implica- tions of browser fingerprint inconsistencies. In27th USENIX Security Symposium (USENIX Security 18), pages 135–150, 2018
work page 2018
-
[55]
Browser fingerprinting: A survey.ACM Transactions on the Web (TWEB), 14(2):1–33, 2020
Pierre Laperdrix, Nataliia Bielova, Benoit Baudry, and Gildas Avoine. Browser fingerprinting: A survey.ACM Transactions on the Web (TWEB), 14(2):1–33, 2020
work page 2020
-
[56]
Tao Feng, Ziyi Liu, Kyeong-An Kwon, Weidong Larry Shi, Bogdan Carbunar, Jiang Yifei, and Nhung Nguyen. Continuous mobile authentication using touchscreen gestures.2012 IEEE Conference on Technologies for Homeland Security (HST), pages 451–456, 2012
work page 2012
-
[57]
Kroeze and Katherine Mary Malan
Christina J. Kroeze and Katherine Mary Malan. User authentication based on continuous touch biometrics.South Afr. Comput. J., 28, 2016
work page 2016
-
[58]
Zhihao Shen, Shun Li, Xi Zhao, and Jianhua Zou. Increauth: Incremental-learning-based behavioral biometric authentication on smartphones.IEEE Internet of Things Journal, 11:1589–1603, 2024
work page 2024
-
[59]
Mouse dynamics behavioral biometrics: A survey.ACM Computing Surveys, 56(6):1–33, 2024
Simon Khan, Charles Devlen, Michael Manno, and Daqing Hou. Mouse dynamics behavioral biometrics: A survey.ACM Computing Surveys, 56(6):1–33, 2024
work page 2024
-
[60]
Hsing-Kuo Pao, Kuan-Ta Chen, and Hong-Chung Chang. Game bot detection via avatar trajectory analysis.IEEE Transactions on Computational Intelligence and AI in Games, 2(3):162–175, 2010
work page 2010
-
[61]
Forgery-resistant touch-based authentica- tion on mobile devices
Neil Zhenqiang Gong, Mathias Payer, Reza Moazzezi, and Mario Frank. Forgery-resistant touch-based authentica- tion on mobile devices. InProceedings of the 11th ACM on Asia Conference on Computer and Communications Security, ASIA CCS ’16, pages 499–510, New York, NY , USA, 2016. ACM
work page 2016
-
[62]
Abdul Serwadda, Vir V Phoha, Zibo Wang, Rajesh Kumar, and Diksha Shukla. Toward robotic robbery on the touch screen.ACM Transactions on Information and System Security (TISSEC), 18(4):1–25, 2016
work page 2016
-
[63]
Mohit Agrawal, Pragyan Mehrotra, Rajesh Kumar, and Rajiv Ratn Shah. Gantouch: An attack-resilient framework for touch-based continuous authentication system.IEEE Transactions on Biometrics, Behavior, and Identity Science, 4(4):533–543, 2022
work page 2022
-
[64]
A survey of ai agent protocols, 2025
Yingxuan Yang, Huacan Chai, Yuanyi Song, Siyuan Qi, Muning Wen, Ning Li, Junwei Liao, Haoyi Hu, Jianghao Lin, Gaowei Chang, Weiwen Liu, Ying Wen, Yong Yu, and Weinan Zhang. A survey of ai agent protocols, 2025
work page 2025
-
[65]
Agentic information retrieval, 2025
Weinan Zhang, Junwei Liao, Ning Li, Kounianhua Du, and Jianghao Lin. Agentic information retrieval, 2025
work page 2025
-
[66]
A survey of llm-based deep search agents: Paradigm, optimization, evaluation, and challenges, 2025
Yunjia Xi, Jianghao Lin, Yongzhao Xiao, Zheli Zhou, Rong Shan, Te Gao, Jiachen Zhu, Weiwen Liu, Yong Yu, and Weinan Zhang. A survey of llm-based deep search agents: Paradigm, optimization, evaluation, and challenges, 2025
work page 2025
-
[67]
Evolutionary perspectives on the evaluation of llm-based ai agents: A comprehensive survey, 2025
Jiachen Zhu, Menghui Zhu, Renting Rui, Rong Shan, Congmin Zheng, Bo Chen, Yunjia Xi, Jianghao Lin, Weiwen Liu, Ruiming Tang, Yong Yu, and Weinan Zhang. Evolutionary perspectives on the evaluation of llm-based ai agents: A comprehensive survey, 2025
work page 2025
-
[68]
Alex Graves. Long short-term memory.Supervised sequence labelling with recurrent neural networks, pages 37–45, 2012
work page 2012
-
[69]
Attention is all you need.Advances in neural information processing systems, 30, 2017
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017
work page 2017
-
[70]
Weiwen Liu, Jiarui Qin, Xu Huang, Xingshan Zeng, Yunjia Xi, Jianghao Lin, Chuhan Wu, Yasheng Wang, Lifeng Shang, Ruiming Tang, Defu Lian, Yong Yu, and Weinan Zhang. Position: The real barrier to llm agent usability is agentic roi, 2026. A The Conflict Between GUI Agents and App Platforms A.1 Background To understand the gravity of this incident, it is ess...
work page 2026
-
[71]
The OS/Agent Provider (ByteDance/Nubia):They argue forUser AgencyandInnovation. They contend that since the user explicitly authorized the assistant, the AI acts as a legitimate digital proxy for human intent. ByteDance further emphasized that their tool adheres to privacy standards and deliberately avoids sensitive operations like financial transactions
-
[72]
The Super-Platform (Tencent/Banks):They citeSecurity and Ecosystem Integrity. Reports indicate that WeChat’s restrictions were not specifically targeted at Doubao but were unintentional triggers of existing risk control measures. They implies that allowing external programs to drive the apps bypasses critical security checks, creating a vulnerability that...
work page 2026
-
[73]
Swipe (x1, y1), (x2, y2)
-
[74]
Type (text) / Unable to Type
-
[75]
Stop ### Output format ### ### Thought ### ### Action ### ### Operation ### F.4 Action Reflection Prompt Used after an operation to verify if the result meets the expected thought. ### Before the current operation ### Screenshot info & Keyboard status... ### After the current operation ### Screenshot info & Keyboard status... ### Current operation ### Ins...
work page 2026
-
[76]
Observe the current screenshot carefully
-
[77]
Consider the previous actions and the progress made so far
-
[78]
If the task is completed, use the "stop" action
Determine the next logical step. If the task is completed, use the "stop" action
-
[79]
# Action Space - click(x, y): Tap the screen at normalized coordinates (x, y)
All coordinates must be normalized to a range of 0 to 1000. # Action Space - click(x, y): Tap the screen at normalized coordinates (x, y). - swipe(x1, y1, x2, y2): Swipe from (x1, y1) to (x2, y2). - type(text): Type the specified text into the focused input field. - key(name): Press system keys like ’HOME’, ’BACK’, or ’MENU’. - wait(): Wait for the screen...
work page 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.