What Memory Do GUI Agents Really Need? From Passive Records to Active Task-Driving States

Chen Liu; Hanzhang Zhou; Ling Chen; Panrong Tong; Quyu Kong; Steven Hoi; Wenhao Wang; Xin Yu; Xu Zhang; Yue Wang

arxiv: 2606.31612 · v1 · pith:TLI6UBLSnew · submitted 2026-06-30 · 💻 cs.CV

What Memory Do GUI Agents Really Need? From Passive Records to Active Task-Driving States

Chen Liu , Ling Chen , Hanzhang Zhou , Xu Zhang , Quyu Kong , Panrong Tong , Wenhao Wang , Xin Yu

show 2 more authors

Steven Hoi Yue Wang

This is my paper

Pith reviewed 2026-07-01 05:24 UTC · model grok-4.3

classification 💻 cs.CV

keywords GUI agentsmemory managementactive task memoryreinforcement learninglong-horizon tasksmobile benchmarksworkflow state

0 comments

The pith

GUI agents perform long-horizon tasks more reliably when memory actively maintains each value's role and status instead of passive records.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Current memory methods for GUI agents accumulate observations as passive storage, forcing the agent to reconstruct whether a value should be used now, has been used, or is for later. This reconstruction often fails in long trajectories with similar fields, repeated values, distractors, and outdated states, leading to repeated or missed operations. The paper introduces Active Task Driving Memory (ATMem) that maintains task-relevant information as a continually updated execution state linking each value to its role and current status. This state directly supports action selection based on the workflow. They also present STR-GRPO, an online RL method that learns selective memory use by contrasting memory-on and memory-off rollouts and applying cost-aware rewards.

Core claim

ATMem shifts GUI-agent memory from passive storage to an actively maintained execution state that links each value to its role and current status, enabling action selection based on the current workflow state rather than implicit reconstruction from accumulated records.

What carries the argument

Active Task Driving Memory (ATMem), which maintains an execution state linking values to roles and statuses for direct workflow-based decisions.

If this is right

Agents can select actions without inferring value relevance from raw records, reducing errors in complex tasks.
STR-GRPO enables learning when to use memory to improve task completion while minimizing unnecessary costs.
The new benchmark allows evaluation of complete in-scope work and avoidance of out-of-scope actions over long horizons.
Memory use becomes tied to actual contribution to execution success.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

ATMem could be adapted to non-mobile GUI environments or other agent domains with similar long-horizon challenges.
Explicit state maintenance might interact with LLM context limits in ways that reduce overall token usage.
Future work might test if the role-status linking reduces the frequency of hallucinated or outdated actions.
The contrastive RL approach in STR-GRPO may apply to other memory or tool-use decisions in agents.

Load-bearing premise

Explicitly linking each value to its role and current status will allow reliable action selection without introducing new inference errors or excessive overhead.

What would settle it

An experiment showing that agents with ATMem still repeat operations or miss required actions in trajectories containing similar fields, repeated values, and distractors would falsify the benefit of the active state.

Figures

Figures reproduced from arXiv: 2606.31612 by Chen Liu, Hanzhang Zhou, Ling Chen, Panrong Tong, Quyu Kong, Steven Hoi, Wenhao Wang, Xin Yu, Xu Zhang, Yue Wang.

**Figure 1.** Figure 1: Motivation. Passive records preserve past snippets but do not provide stable execution-state awareness, leading to missed, repeated, or over-scoped operations. (c) AndroidWorld statistics show that 83% of tasks involve data operations. (d) With the same GPT-5 planner and UIIns grounding framework, ATMem improves over flat-memory and full-history baselines, and failure analysis attributes flat-memory errors… view at source ↗

**Figure 2.** Figure 2: Overview of our methodology. (a) ATMem organizes task-relevant data into a structured execution state with constraints, schema fields, item content, and item-level status. (b) Verified SFT data are synthesized through task-template instantiation, teacher-agent rollouts, and environment validation. (c) STR-GRPO uses balanced memory-ON/OFF interventions to estimate ATMem utility and learn selective memory in… view at source ↗

**Figure 3.** Figure 3: DataScope statistics and controlled difficulty scaling. (a) Bars show the average numbers of actionable target entries and same-schema distractors per task from DC-V1 to DC-V3, while the line shows the distractor-to-target ratio. The controlled increase in both quantities raises the ratio from 1.39× to 3.22×, testing whether agents can maintain target coverage while filtering increasingly confusable data w… view at source ↗

**Figure 4.** Figure 4: Execution case of ATMem-UI on AndroidWorld. The figure shows how our agent collects taskrelevant data, maintains their structured execution states, and uses these states to guide subsequent actions. By tracking which data items are pending or completed, the agent reliably progresses from data collection to task execution and completes the long-horizon workflow. help maintain item-level eligibility informa… view at source ↗

**Figure 5.** Figure 5: Execution trajectory comparison between recording-centric memory and ATMem. The flatmemory agent records task information as unstructured notes, but fails to explicitly track which data items have been completed or still require action, leading to repeated search and stuck execution. In contrast, the ATMem-based agent maintains structured task data and item-level execution status, enabling more stable pro… view at source ↗

**Figure 6.** Figure 6: Representative failure case on our data-scope benchmark. On a data-scope workflow from our benchmark, MAI-UI-8B identifies the target contact information and begins adding the contact at step 25, but then repeats the same contact-creation operation until step 60. This stuck-loop behavior suggests that the agent can recover relevant values, but fails to track whether the data operation has already been comp… view at source ↗

**Figure 7.** Figure 7: Further analysis of STR-GRPO training dynamics and DataScope failure cases. [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗

read the original abstract

Mobile GUI agents increasingly face long-horizon tasks that require reading, updating, and reusing task-relevant data across pages and applications. Existing memory methods treat memory largely as passive storage, where past observations are accumulated and retrieved when needed. Yet retrieving a value does not reveal its current role in the workflow. The agent must still infer from accumulated records whether the value should be used now, has already been used, or must wait for a later dependency. This implicit reconstruction becomes unreliable in long trajectories with similar fields, repeated values, distractors, and outdated states, causing repeated or missed operations. We propose Active Task Driving Memory (ATMem), which shifts GUI-agent memory from passive storage to an actively maintained execution state. ATMem maintains task-relevant information as a continually updated execution state that links each value to its role and current status, enabling action selection based on the current workflow state. We therefore introduce \textbf{STR-GRPO}, an online reinforcement learning method that learns to use ATMem selectively according to its contribution to task completion. STR-GRPO contrasts memory-on and memory-off rollouts to estimate when memory use improves execution, while memory-cost-aware reward discourages costly memory usage that does not improve execution. To evaluate whether agents can complete all in-scope work while avoiding out-of-scope actions over long-horizon execution, we build a challenging mobile benchmark. From a list of near identical entries, agents must act on every entry that satisfies the instruction and reject entries that violate its constraints.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ATMem and STR-GRPO aim to replace passive memory with explicit role/status tracking in GUI agents, but the update mechanism risks inheriting the same inference issues without results to show otherwise.

read the letter

The paper's main move is to treat memory not as accumulated observations but as an actively maintained execution state. ATMem links each stored value to its workflow role and current status so the agent can read off what to do next instead of re-inferring relevance from raw records. STR-GRPO then uses online RL that contrasts memory-on and memory-off rollouts, with a cost term to discourage memory use that does not improve task completion.

The benchmark is the clearest concrete contribution. It requires an agent to act on every qualifying entry from lists of near-identical items while correctly rejecting the rest, which directly tests the completeness and precision problems that arise in long mobile trajectories.

The motivation is sound. Passive retrieval does leave the model to reconstruct whether a value is pending, already used, or irrelevant, and that reconstruction is fragile when fields repeat or states go stale. Framing memory as execution state is a reasonable response to that pattern.

The soft spot is exactly the one the stress-test note flags. Nothing in the description shows how the role and status links are populated or corrected in the first place. If that step is performed by the same policy that already struggles with distractors and outdated values, the explicit format may simply move the inference burden rather than remove it. The abstract supplies no results, ablations, or overhead numbers, so it is impossible to judge whether the net effect is positive. The full paper will need to demonstrate that state maintenance is reliable and that STR-GRPO produces measurable gains.

This is relevant to anyone building long-horizon GUI agents. The problem is practical and the proposal is specific enough to deserve referee time even if the experiments require strengthening.

Referee Report

2 major / 2 minor

Summary. The paper claims that passive memory in GUI agents forces unreliable implicit inference of value roles and statuses in long-horizon tasks with distractors and repeated values, and proposes Active Task Driving Memory (ATMem) as an actively maintained execution state that explicitly links each value to its workflow role and current status. It introduces STR-GRPO, an online RL algorithm that contrasts memory-on and memory-off rollouts with a memory-cost-aware reward to learn selective use of ATMem, and presents a new mobile GUI benchmark requiring agents to act on every in-scope entry while rejecting out-of-scope ones.

Significance. If the empirical results hold, the work could usefully shift GUI-agent memory design toward explicit, actively updated state representations rather than passive retrieval. The benchmark's emphasis on complete coverage without extraneous actions addresses a relevant evaluation gap. STR-GRPO's on/off contrast provides a falsifiable way to measure memory utility, which is a methodological strength.

major comments (2)

[§3] §3 (ATMem definition): the central claim that ATMem 'links each value to its role and current status' enabling reliable action selection is not supported by any description of the mechanism that populates or corrects those links. If role/status assignment is performed by the same LLM policy already shown to struggle with distractors, repeated values, and outdated states, the explicit representation relocates rather than removes the inference problem; this is load-bearing for the claim that ATMem improves execution over passive records.
[§4] §4 (STR-GRPO): the memory-on vs memory-off contrast assumes that the difference isolates the benefit of explicit state maintenance, but without evidence that role/status links are assigned independently of the policy's inference errors, the contrast may simply compare two error-prone processes; this undermines the interpretation of the reward signal.

minor comments (2)

[§3] The abstract and introduction repeatedly use 'execution state' without a formal definition or pseudocode; add a concise definition or diagram in §3.
Table or figure captions for the benchmark should explicitly state the number of trajectories, average length, and distractor density to allow reproducibility assessment.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and detailed review. The comments highlight important aspects of our claims regarding ATMem and STR-GRPO. We address each major comment below.

read point-by-point responses

Referee: [§3] §3 (ATMem definition): the central claim that ATMem 'links each value to its role and current status' enabling reliable action selection is not supported by any description of the mechanism that populates or corrects those links. If role/status assignment is performed by the same LLM policy already shown to struggle with distractors, repeated values, and outdated states, the explicit representation relocates rather than removes the inference problem; this is load-bearing for the claim that ATMem improves execution over passive records.

Authors: We agree that §3 currently presents ATMem at a conceptual level without sufficient detail on the population and correction mechanisms. The manuscript describes ATMem as a structured execution state that is actively updated during task execution, with the policy deciding updates based on new observations and workflow progress. The explicit linking is intended to reduce repeated implicit inference from raw records. However, we acknowledge the referee's point that without explicit mechanisms (e.g., update rules or examples), the claim risks relocating rather than resolving the inference burden. We will revise §3 to include a more precise description of the state update process, including how the policy interacts with the structured fields. revision: yes
Referee: [§4] §4 (STR-GRPO): the memory-on vs memory-off contrast assumes that the difference isolates the benefit of explicit state maintenance, but without evidence that role/status links are assigned independently of the policy's inference errors, the contrast may simply compare two error-prone processes; this undermines the interpretation of the reward signal.

Authors: The memory-on versus memory-off design in STR-GRPO is meant to isolate the value of explicit state access by giving the memory-off condition the same raw observations but without the structured ATMem representation. The reward signal derives from task completion metrics on the benchmark, which penalizes both missed in-scope actions and extraneous out-of-scope actions. We recognize that the links are not assigned by an independent oracle and that policy errors can affect both conditions; the contrast therefore measures net utility rather than pure isolation of maintenance quality. The empirical gains on long-horizon tasks with distractors support that the structured state provides a measurable advantage. We will add a limitations paragraph discussing this interpretive caveat while retaining the current experimental framing. revision: partial

Circularity Check

0 steps flagged

No circularity: proposal of new memory structure and RL method with no derivations or self-referential reductions.

full rationale

The paper proposes ATMem as an actively maintained execution state linking values to roles/status and STR-GRPO as an RL method contrasting memory-on/off rollouts with cost-aware rewards. No equations, derivations, fitted parameters renamed as predictions, or self-citation chains appear in the provided text. The central claim is a design shift from passive to active memory, justified by described limitations of prior approaches rather than reducing to its own inputs by construction. The benchmark and evaluation are presented as external tests. This is a standard non-circular proposal of an architectural change.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 2 invented entities

Review based solely on abstract; no explicit free parameters, axioms, or invented entities beyond the named methods are described.

invented entities (2)

ATMem no independent evidence
purpose: Active task-driving memory maintaining execution state with roles and status
Introduced in abstract as the core proposed mechanism.
STR-GRPO no independent evidence
purpose: Online RL method contrasting memory-on and memory-off rollouts with cost-aware reward
Introduced in abstract as the training procedure for selective memory use.

pith-pipeline@v0.9.1-grok · 5828 in / 1233 out tokens · 30945 ms · 2026-07-01T05:24:43.836035+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

96 extracted references · 44 canonical work pages · 20 internal anchors

[3]

Spa-bench: A comprehensive benchmark for smartphone agent evaluation

Jingxuan Chen, Derek Yuen, Bin Xie, Yuhao Yang, Gongwei Chen, Zhihao Wu, Li Yixing, Xurui Zhou, Weiwen Liu, Shuai Wang, et al. Spa-bench: A comprehensive benchmark for smartphone agent evaluation. In NeurIPS 2024 Workshop on Open-World Agents, 2024

2024
[7]

Developing a computer use model

DeepMind . Developing a computer use model . Google Blog, Oct 2025. URL https://blog.google/technology/google-deepmind/gemini-computer-use-model/. Accessed: October 22, 2025

2025
[8]

Mobile-bench: An evaluation benchmark for llm-based mobile agents

Shihan Deng, Weikai Xu, Hongda Sun, Wei Liu, Tao Tan, Liujianfeng Liujianfeng, Ang Li, Jian Luan, Bin Wang, Rui Yan, et al. Mobile-bench: An evaluation benchmark for llm-based mobile agents. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8813--8831, 2024

2024
[11]

Mobilegpt: Augmenting llm with human-like app memory for mobile task automation

Sunjae Lee, Junyoung Choi, Jungjae Lee, Munim Hasan Wasi, Hojun Choi, Steve Ko, Sangeun Oh, and Insik Shin. Mobilegpt: Augmenting llm with human-like app memory for mobile task automation. In Proceedings of the 30th Annual International Conference on Mobile Computing and Networking, pages 1119--1133, 2024

2024
[15]

Introducing openai o3 and o4-mini

Team OpenAI. Introducing openai o3 and o4-mini. https://openai. com/index/introducing-o3-and-o4-mini/, 2025

2025
[16]

Generative agents: Interactive simulacra of human behavior

Joon Sung Park, Joseph O'Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. Generative agents: Interactive simulacra of human behavior. In Proceedings of the 36th annual acm symposium on user interface software and technology, pages 1--22, 2023

2023
[18]

Androidinthewild: A large-scale dataset for android device control

Christopher Rawles, Alice Li, Daniel Rodriguez, Oriana Riva, and Timothy Lillicrap. Androidinthewild: A large-scale dataset for android device control. Advances in Neural Information Processing Systems, 36: 0 59708--59728, 2023

2023
[20]

Constructive memory: past and future

Daniel L Schacter. Constructive memory: past and future. Dialogues in clinical neuroscience, 14 0 (1): 0 7--18, 2012

2012
[21]

The cognitive neuroscience of constructive memory: Remembering the past and imagining the future

Daniel L Schacter and Donna Rose Addis. The cognitive neuroscience of constructive memory: Remembering the past and imagining the future. Philosophical Transactions of the Royal Society B: Biological Sciences, 362 0 (1481): 0 773, 2007

2007
[22]

Seed1.8 model card: Towards generalized real-world agency

Bytedance Seed. Seed1.8 model card: Towards generalized real-world agency. arXiv preprint, December 2025 a . Technical Report

2025
[23]

Ui-tars-1.5

ByteDance Seed. Ui-tars-1.5. https://seed-tars.com/1.5, 2025 b

2025
[24]

HybridFlow: A Flexible and Efficient RLHF Framework

Guangming Sheng, Chi Zhang, Zilingfeng Ye, Xibin Wu, Wang Zhang, Ru Zhang, Yanghua Peng, Haibin Lin, and Chuan Wu. Hybridflow: A flexible and efficient rlhf framework. arXiv preprint arXiv: 2409.19256, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[27]

Reflexion: Language agents with verbal reinforcement learning

Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. Reflexion: Language agents with verbal reinforcement learning. Advances in neural information processing systems, 36: 0 8634--8652, 2023

2023
[28]

Cognitive architectures for language agents

Theodore Sumers, Shunyu Yao, Karthik R Narasimhan, and Thomas L Griffiths. Cognitive architectures for language agents. Transactions on Machine Learning Research, 2023

2023
[29]

Fairy: Interactive mobile assistant to real-world tasks via lmm-based multi-agent

Jiazheng Sun, Te Yang, Jiayang Niu, Mingxuan Li, Yongyong Lu, Ruimeng Yang, and Xin Peng. Fairy: Interactive mobile assistant to real-world tasks via lmm-based multi-agent. arXiv e-prints, pages arXiv--2509, 2025

2025
[31]

Gelab-zero: An advanced mobile agent inference system, 2025

GELab Team. Gelab-zero: An advanced mobile agent inference system, 2025. URL https://github.com/stepfun-ai/gelab-zero

2025
[35]

Mobile-agent-v2: Mobile device operation assistant with effective navigation via multi-agent collaboration

Junyang Wang, Haiyang Xu, Haitao Jia, Xi Zhang, Ming Yan, Weizhou Shen, Ji Zhang, Fei Huang, and Jitao Sang. Mobile-agent-v2: Mobile device operation assistant with effective navigation via multi-agent collaboration. Advances in Neural Information Processing Systems, 37: 0 2686--2710, 2024 a

2024
[38]

Autodroid: Llm-powered task automation in android

Hao Wen, Yuanchun Li, Guohong Liu, Shanhui Zhao, Tao Yu, Toby Jia-Jun Li, Shiqi Jiang, Yunhao Liu, Yaqin Zhang, and Yunxin Liu. Autodroid: Llm-powered task automation in android. In Proceedings of the 30th annual international conference on Mobile computing and networking, pages 543--557, 2024

2024
[42]

Androidlab: Training and systematic benchmarking of android autonomous agents

Yifan Xu, Xiao Liu, Xueqiao Sun, Siyi Cheng, Hao Yu, Hanyu Lai, Shudan Zhang, Dan Zhang, Jie Tang, and Yuxiao Dong. Androidlab: Training and systematic benchmarking of android autonomous agents. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2144--2166, 2025 b

2025
[43]

Step-gui technical report, 2025

Haolong Yan, Jia Wang, Xin Huang, Yeqing Shen, Ziyang Meng, Zhimin Fan, Kaijun Tan, Jin Gao, Lieyu Shi, Mi Yang, Shiliang Yang, Zhirui Wang, Brian Li, Kang An, Chenyang Li, Lei Lei, Mengmeng Duan, Danxun Liang, Guodong Liu, Hang Cheng, Hao Wu, Jie Dong, Junhao Huang, Mei Chen, Renjie Yu, Shunshan Li, Xu Zhou, Yiting Dai, Yineng Deng, Yingdan Liang, Zelin ...

work page arXiv 2025
[47]

Appagent: Multimodal agents as smartphone users

Chi Zhang, Zhao Yang, Jiaxuan Liu, Yanda Li, Yucheng Han, Xin Chen, Zebiao Huang, Bin Fu, and Gang Yu. Appagent: Multimodal agents as smartphone users. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, pages 1--20, 2025

2025
[50]

Moba: multifaceted memory-enhanced adaptive planning for efficient mobile task automation

Zichen Zhu, Hao Tang, Yansi Li, Dingye Liu, Hongshen Xu, Kunyao Lan, Danyang Zhang, Yixuan Jiang, Hao Zhou, Chenrun Wang, et al. Moba: multifaceted memory-enhanced adaptive planning for efficient mobile task automation. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Lang...

2025
[51]

Transactions on Machine Learning Research , year=

Cognitive architectures for language agents , author=. Transactions on Machine Learning Research , year=
[52]

Advances in neural information processing systems , volume=

Reflexion: Language agents with verbal reinforcement learning , author=. Advances in neural information processing systems , volume=
[53]

Proceedings of the 36th annual acm symposium on user interface software and technology , pages=

Generative agents: Interactive simulacra of human behavior , author=. Proceedings of the 36th annual acm symposium on user interface software and technology , pages=
[54]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Expel: Llm agents are experiential learners , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
[55]

Proceedings of the AAAI conference on artificial intelligence , volume=

Memorybank: Enhancing large language models with long-term memory , author=. Proceedings of the AAAI conference on artificial intelligence , volume=
[56]

, author=

MemGPT: towards LLMs as operating systems. , author=. 2023 , publisher=

2023
[57]

A-MEM: Agentic Memory for LLM Agents

A-mem: Agentic memory for llm agents , author=. arXiv preprint arXiv:2502.12110 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[58]

Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

Mem0: Building production-ready ai agents with scalable long-term memory , author=. arXiv preprint arXiv:2504.19413 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[59]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

Memory os of ai agent , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

2025
[60]

arXiv preprint arXiv:2505.16067 , year=

How memory management impacts llm agents: An empirical study of experience-following behavior , author=. arXiv preprint arXiv:2505.16067 , year=

work page arXiv
[61]

arXiv preprint arXiv:2505.19549 , year=

From Single to Multi-Granularity: Toward Long-Term Memory Association and Selection of Conversational Agents , author=. arXiv preprint arXiv:2505.19549 , year=

work page arXiv
[62]

arXiv preprint arXiv:2507.22925 , year=

Hierarchical memory for high-efficiency long-term reasoning in llm agents , author=. arXiv preprint arXiv:2507.22925 , year=

work page arXiv
[63]

arXiv preprint arXiv:2511.18423 , year=

General agentic memory via deep research , author=. arXiv preprint arXiv:2511.18423 , year=

work page arXiv
[64]

Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory

Evo-memory: Benchmarking llm agent test-time learning with self-evolving memory , author=. arXiv preprint arXiv:2511.20857 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[65]

SimpleMem: Efficient Lifelong Memory for LLM Agents

SimpleMem: Efficient Lifelong Memory for LLM Agents , author=. arXiv preprint arXiv:2601.02553 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[66]

MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents

MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents , author=. arXiv preprint arXiv:2602.02474 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[67]

GAM: Hierarchical Graph-based Agentic Memory for LLM Agents

GAM: Hierarchical Graph-based Agentic Memory for LLM Agents , author=. arXiv preprint arXiv:2604.12285 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[68]

MemMachine: A Ground-Truth-Preserving Memory System for Personalized AI Agents

MemMachine: A ground-truth-preserving memory system for personalized AI agents , author=. arXiv preprint arXiv:2604.04853 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[69]

HiGMem: A Hierarchical and LLM-Guided Memory System for Long-Term Conversational Agents

HiGMem: A Hierarchical and LLM-Guided Memory System for Long-Term Conversational Agents , author=. arXiv preprint arXiv:2604.18349 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[70]

arXiv preprint arXiv:2602.14038 , year=

Choosing how to remember: Adaptive memory structures for llm agents , author=. arXiv preprint arXiv:2602.14038 , year=

work page arXiv
[71]

Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems , pages=

Appagent: Multimodal agents as smartphone users , author=. Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems , pages=

2025
[72]

Proceedings of the 30th annual international conference on Mobile computing and networking , pages=

Autodroid: Llm-powered task automation in android , author=. Proceedings of the 30th annual international conference on Mobile computing and networking , pages=
[73]

Proceedings of the 30th Annual International Conference on Mobile Computing and Networking , pages=

Mobilegpt: Augmenting llm with human-like app memory for mobile task automation , author=. Proceedings of the 30th Annual International Conference on Mobile Computing and Networking , pages=
[74]

MobA: multifaceted memory-enhanced adaptive planning for efficient mobile task automation , author=. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (System Demonstrations) , pages=

2025
[75]

arXiv preprint arXiv:2501.11733 , year=

Mobile-agent-e: Self-evolving mobile assistant for complex tasks , author=. arXiv preprint arXiv:2501.11733 , year=

work page arXiv
[76]

arXiv e-prints , pages=

Fairy: Interactive Mobile Assistant to Real-world Tasks via LMM-based Multi-agent , author=. arXiv e-prints , pages=
[77]

arXiv preprint arXiv:2601.19199 , year=

MAGNET: Towards Adaptive GUI Agents with Memory-Driven Knowledge Evolution , author=. arXiv preprint arXiv:2601.19199 , year=

work page arXiv
[78]

arXiv preprint arXiv:2602.05832 , year=

UI-Mem: Self-Evolving Experience Memory for Online Reinforcement Learning in Mobile GUI Agents , author=. arXiv preprint arXiv:2602.05832 , year=

work page arXiv
[79]

arXiv preprint arXiv:2602.06075 , year=

MemGUI-Bench: Benchmarking Memory of Mobile GUI Agents in Dynamic Environments , author=. arXiv preprint arXiv:2602.06075 , year=

work page arXiv
[80]

arXiv preprint arXiv:2603.10291 , year=

Hybrid Self-evolving Structured Memory for GUI Agents , author=. arXiv preprint arXiv:2603.10291 , year=

work page arXiv
[81]

arXiv preprint arXiv:2601.17418 , year=

GraphPilot: GUI Task Automation with One-Step LLM Reasoning Powered by Knowledge Graph , author=. arXiv preprint arXiv:2601.17418 , year=

work page arXiv
[82]

MGA: Memory-Driven GUI Agent for Observation-Centric Interaction

Mga: Memory-driven gui agent for observation-centric interaction , author=. arXiv preprint arXiv:2510.24168 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[83]

arXiv preprint arXiv:2603.18429 , year=

AndroTMem: From Interaction Trajectories to Anchored Memory in Long-Horizon GUI Agents , author=. arXiv preprint arXiv:2603.18429 , year=

work page arXiv
[84]

EchoTrail-GUI: Building Actionable Memory for GUI Agents via Critic-Guided Self-Exploration

EchoTrail-GUI: Building Actionable Memory for GUI Agents via Critic-Guided Self-Exploration , author=. arXiv preprint arXiv:2512.19396 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[85]

SkillDroid: Compile Once, Reuse Forever

SkillDroid: Compile Once, Reuse Forever , author=. arXiv preprint arXiv:2604.14872 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[86]

Mobileworld: Benchmarking autonomous mobile agents in agent-user interactive and mcp-augmented environments.arXiv preprint arXiv:2512.19432, 2025

MobileWorld: Benchmarking Autonomous Mobile Agents in Agent-User Interactive and MCP-Augmented Environments , author=. arXiv preprint arXiv:2512.19432 , year=

work page arXiv
[87]

arXiv preprint arXiv:2512.22047 , year=

MAI-UI Technical Report: Real-World Centric Foundation GUI Agents , author=. arXiv preprint arXiv:2512.22047 , year=

work page arXiv
[88]

Mobile-agent-v3

Mobile-agent-v3. 5: Multi-platform fundamental gui agents , author=. arXiv preprint arXiv:2602.16855 , year=

work page arXiv
[89]

UI-TARS: Pioneering Automated GUI Interaction with Native Agents

Ui-tars: Pioneering automated gui interaction with native agents , author=. arXiv preprint arXiv:2501.12326 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[90]

UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning

Ui-tars-2 technical report: Advancing gui agent with multi-turn reinforcement learning , author=. arXiv preprint arXiv:2509.02544 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[91]

arXiv preprint arXiv:2508.10833 , year=

Ui-venus technical report: Building high-performance ui agents with rft , author=. arXiv preprint arXiv:2508.10833 , year=

work page arXiv
[92]

2025 , url=

GELab-Zero: An Advanced Mobile Agent Inference System , author=. 2025 , url=

2025
[93]

https://openai

Introducing OpenAI o3 and o4-mini , author=. https://openai. com/index/introducing-o3-and-o4-mini/ , year=
[94]

arXiv preprint arXiv:2510.20286 , year=

UI-Ins: Enhancing GUI Grounding with Multi-Perspective Instruction-as-Reasoning , author=. arXiv preprint arXiv:2510.20286 , year=

work page arXiv
[95]

AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents

Androidworld: A dynamic benchmarking environment for autonomous agents , author=. arXiv preprint arXiv:2405.14573 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[96]

Annual review of psychology , volume=

The cognitive neuroscience of working memory , author=. Annual review of psychology , volume=. 2015 , publisher=

2015
[97]

Annual review of neuroscience , volume=

An integrative theory of prefrontal cortex function , author=. Annual review of neuroscience , volume=. 2001 , publisher=

2001
[98]

Task set and prefrontal cortex , author=. Annu. Rev. Neurosci. , volume=. 2008 , publisher=

2008
[99]

Trends in cognitive sciences , volume=

Motivation of extended behaviors by anterior cingulate cortex , author=. Trends in cognitive sciences , volume=. 2012 , publisher=

2012
[100]

Journal of Neuroscience , volume=

Tracking progress toward a goal in corticostriatal ensembles , author=. Journal of Neuroscience , volume=. 2014 , publisher=

2014
[101]

Nature , volume=

Neural activity predicts individual differences in visual working memory capacity , author=. Nature , volume=. 2004 , publisher=

2004
[102]

Trends in cognitive sciences , volume=

The episodic buffer: a new component of working memory? , author=. Trends in cognitive sciences , volume=. 2000 , publisher=

2000
[103]

2024 , journal =

HybridFlow: A Flexible and Efficient RLHF Framework , author =. 2024 , journal =

2024
[104]

Advances in Neural Information Processing Systems , volume=

Androidinthewild: A large-scale dataset for android device control , author=. Advances in Neural Information Processing Systems , volume=
[105]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Mobile-bench: An evaluation benchmark for llm-based mobile agents , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
[106]

arXiv preprint arXiv:2511.09157 , year=

ProBench: Benchmarking GUI Agents with Accurate Process Information , author=. arXiv preprint arXiv:2511.09157 , year=

work page arXiv
[107]

arXiv preprint arXiv:2501.01149 , year=

A3: Android agent arena for mobile gui agents , author=. arXiv preprint arXiv:2501.01149 , year=

work page arXiv
[108]

NeurIPS 2024 Workshop on Open-World Agents , year=

Spa-bench: A comprehensive benchmark for smartphone agent evaluation , author=. NeurIPS 2024 Workshop on Open-World Agents , year=

2024

Showing first 80 references.

[1] [3]

Spa-bench: A comprehensive benchmark for smartphone agent evaluation

Jingxuan Chen, Derek Yuen, Bin Xie, Yuhao Yang, Gongwei Chen, Zhihao Wu, Li Yixing, Xurui Zhou, Weiwen Liu, Shuai Wang, et al. Spa-bench: A comprehensive benchmark for smartphone agent evaluation. In NeurIPS 2024 Workshop on Open-World Agents, 2024

2024

[2] [7]

Developing a computer use model

DeepMind . Developing a computer use model . Google Blog, Oct 2025. URL https://blog.google/technology/google-deepmind/gemini-computer-use-model/. Accessed: October 22, 2025

2025

[3] [8]

Mobile-bench: An evaluation benchmark for llm-based mobile agents

Shihan Deng, Weikai Xu, Hongda Sun, Wei Liu, Tao Tan, Liujianfeng Liujianfeng, Ang Li, Jian Luan, Bin Wang, Rui Yan, et al. Mobile-bench: An evaluation benchmark for llm-based mobile agents. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8813--8831, 2024

2024

[4] [11]

Mobilegpt: Augmenting llm with human-like app memory for mobile task automation

Sunjae Lee, Junyoung Choi, Jungjae Lee, Munim Hasan Wasi, Hojun Choi, Steve Ko, Sangeun Oh, and Insik Shin. Mobilegpt: Augmenting llm with human-like app memory for mobile task automation. In Proceedings of the 30th Annual International Conference on Mobile Computing and Networking, pages 1119--1133, 2024

2024

[5] [15]

Introducing openai o3 and o4-mini

Team OpenAI. Introducing openai o3 and o4-mini. https://openai. com/index/introducing-o3-and-o4-mini/, 2025

2025

[6] [16]

Generative agents: Interactive simulacra of human behavior

Joon Sung Park, Joseph O'Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. Generative agents: Interactive simulacra of human behavior. In Proceedings of the 36th annual acm symposium on user interface software and technology, pages 1--22, 2023

2023

[7] [18]

Androidinthewild: A large-scale dataset for android device control

Christopher Rawles, Alice Li, Daniel Rodriguez, Oriana Riva, and Timothy Lillicrap. Androidinthewild: A large-scale dataset for android device control. Advances in Neural Information Processing Systems, 36: 0 59708--59728, 2023

2023

[8] [20]

Constructive memory: past and future

Daniel L Schacter. Constructive memory: past and future. Dialogues in clinical neuroscience, 14 0 (1): 0 7--18, 2012

2012

[9] [21]

The cognitive neuroscience of constructive memory: Remembering the past and imagining the future

Daniel L Schacter and Donna Rose Addis. The cognitive neuroscience of constructive memory: Remembering the past and imagining the future. Philosophical Transactions of the Royal Society B: Biological Sciences, 362 0 (1481): 0 773, 2007

2007

[10] [22]

Seed1.8 model card: Towards generalized real-world agency

Bytedance Seed. Seed1.8 model card: Towards generalized real-world agency. arXiv preprint, December 2025 a . Technical Report

2025

[11] [23]

Ui-tars-1.5

ByteDance Seed. Ui-tars-1.5. https://seed-tars.com/1.5, 2025 b

2025

[12] [24]

HybridFlow: A Flexible and Efficient RLHF Framework

Guangming Sheng, Chi Zhang, Zilingfeng Ye, Xibin Wu, Wang Zhang, Ru Zhang, Yanghua Peng, Haibin Lin, and Chuan Wu. Hybridflow: A flexible and efficient rlhf framework. arXiv preprint arXiv: 2409.19256, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[13] [27]

Reflexion: Language agents with verbal reinforcement learning

Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. Reflexion: Language agents with verbal reinforcement learning. Advances in neural information processing systems, 36: 0 8634--8652, 2023

2023

[14] [28]

Cognitive architectures for language agents

Theodore Sumers, Shunyu Yao, Karthik R Narasimhan, and Thomas L Griffiths. Cognitive architectures for language agents. Transactions on Machine Learning Research, 2023

2023

[15] [29]

Fairy: Interactive mobile assistant to real-world tasks via lmm-based multi-agent

Jiazheng Sun, Te Yang, Jiayang Niu, Mingxuan Li, Yongyong Lu, Ruimeng Yang, and Xin Peng. Fairy: Interactive mobile assistant to real-world tasks via lmm-based multi-agent. arXiv e-prints, pages arXiv--2509, 2025

2025

[16] [31]

Gelab-zero: An advanced mobile agent inference system, 2025

GELab Team. Gelab-zero: An advanced mobile agent inference system, 2025. URL https://github.com/stepfun-ai/gelab-zero

2025

[17] [35]

Mobile-agent-v2: Mobile device operation assistant with effective navigation via multi-agent collaboration

Junyang Wang, Haiyang Xu, Haitao Jia, Xi Zhang, Ming Yan, Weizhou Shen, Ji Zhang, Fei Huang, and Jitao Sang. Mobile-agent-v2: Mobile device operation assistant with effective navigation via multi-agent collaboration. Advances in Neural Information Processing Systems, 37: 0 2686--2710, 2024 a

2024

[18] [38]

Autodroid: Llm-powered task automation in android

Hao Wen, Yuanchun Li, Guohong Liu, Shanhui Zhao, Tao Yu, Toby Jia-Jun Li, Shiqi Jiang, Yunhao Liu, Yaqin Zhang, and Yunxin Liu. Autodroid: Llm-powered task automation in android. In Proceedings of the 30th annual international conference on Mobile computing and networking, pages 543--557, 2024

2024

[19] [42]

Androidlab: Training and systematic benchmarking of android autonomous agents

Yifan Xu, Xiao Liu, Xueqiao Sun, Siyi Cheng, Hao Yu, Hanyu Lai, Shudan Zhang, Dan Zhang, Jie Tang, and Yuxiao Dong. Androidlab: Training and systematic benchmarking of android autonomous agents. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2144--2166, 2025 b

2025

[20] [43]

Step-gui technical report, 2025

Haolong Yan, Jia Wang, Xin Huang, Yeqing Shen, Ziyang Meng, Zhimin Fan, Kaijun Tan, Jin Gao, Lieyu Shi, Mi Yang, Shiliang Yang, Zhirui Wang, Brian Li, Kang An, Chenyang Li, Lei Lei, Mengmeng Duan, Danxun Liang, Guodong Liu, Hang Cheng, Hao Wu, Jie Dong, Junhao Huang, Mei Chen, Renjie Yu, Shunshan Li, Xu Zhou, Yiting Dai, Yineng Deng, Yingdan Liang, Zelin ...

work page arXiv 2025

[21] [47]

Appagent: Multimodal agents as smartphone users

Chi Zhang, Zhao Yang, Jiaxuan Liu, Yanda Li, Yucheng Han, Xin Chen, Zebiao Huang, Bin Fu, and Gang Yu. Appagent: Multimodal agents as smartphone users. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, pages 1--20, 2025

2025

[22] [50]

Moba: multifaceted memory-enhanced adaptive planning for efficient mobile task automation

Zichen Zhu, Hao Tang, Yansi Li, Dingye Liu, Hongshen Xu, Kunyao Lan, Danyang Zhang, Yixuan Jiang, Hao Zhou, Chenrun Wang, et al. Moba: multifaceted memory-enhanced adaptive planning for efficient mobile task automation. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Lang...

2025

[23] [51]

Transactions on Machine Learning Research , year=

Cognitive architectures for language agents , author=. Transactions on Machine Learning Research , year=

[24] [52]

Advances in neural information processing systems , volume=

Reflexion: Language agents with verbal reinforcement learning , author=. Advances in neural information processing systems , volume=

[25] [53]

Proceedings of the 36th annual acm symposium on user interface software and technology , pages=

Generative agents: Interactive simulacra of human behavior , author=. Proceedings of the 36th annual acm symposium on user interface software and technology , pages=

[26] [54]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Expel: Llm agents are experiential learners , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

[27] [55]

Proceedings of the AAAI conference on artificial intelligence , volume=

Memorybank: Enhancing large language models with long-term memory , author=. Proceedings of the AAAI conference on artificial intelligence , volume=

[28] [56]

, author=

MemGPT: towards LLMs as operating systems. , author=. 2023 , publisher=

2023

[29] [57]

A-MEM: Agentic Memory for LLM Agents

A-mem: Agentic memory for llm agents , author=. arXiv preprint arXiv:2502.12110 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[30] [58]

Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

Mem0: Building production-ready ai agents with scalable long-term memory , author=. arXiv preprint arXiv:2504.19413 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[31] [59]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

Memory os of ai agent , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

2025

[32] [60]

arXiv preprint arXiv:2505.16067 , year=

How memory management impacts llm agents: An empirical study of experience-following behavior , author=. arXiv preprint arXiv:2505.16067 , year=

work page arXiv

[33] [61]

arXiv preprint arXiv:2505.19549 , year=

From Single to Multi-Granularity: Toward Long-Term Memory Association and Selection of Conversational Agents , author=. arXiv preprint arXiv:2505.19549 , year=

work page arXiv

[34] [62]

arXiv preprint arXiv:2507.22925 , year=

Hierarchical memory for high-efficiency long-term reasoning in llm agents , author=. arXiv preprint arXiv:2507.22925 , year=

work page arXiv

[35] [63]

arXiv preprint arXiv:2511.18423 , year=

General agentic memory via deep research , author=. arXiv preprint arXiv:2511.18423 , year=

work page arXiv

[36] [64]

Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory

Evo-memory: Benchmarking llm agent test-time learning with self-evolving memory , author=. arXiv preprint arXiv:2511.20857 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[37] [65]

SimpleMem: Efficient Lifelong Memory for LLM Agents

SimpleMem: Efficient Lifelong Memory for LLM Agents , author=. arXiv preprint arXiv:2601.02553 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[38] [66]

MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents

MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents , author=. arXiv preprint arXiv:2602.02474 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[39] [67]

GAM: Hierarchical Graph-based Agentic Memory for LLM Agents

GAM: Hierarchical Graph-based Agentic Memory for LLM Agents , author=. arXiv preprint arXiv:2604.12285 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[40] [68]

MemMachine: A Ground-Truth-Preserving Memory System for Personalized AI Agents

MemMachine: A ground-truth-preserving memory system for personalized AI agents , author=. arXiv preprint arXiv:2604.04853 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[41] [69]

HiGMem: A Hierarchical and LLM-Guided Memory System for Long-Term Conversational Agents

HiGMem: A Hierarchical and LLM-Guided Memory System for Long-Term Conversational Agents , author=. arXiv preprint arXiv:2604.18349 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[42] [70]

arXiv preprint arXiv:2602.14038 , year=

Choosing how to remember: Adaptive memory structures for llm agents , author=. arXiv preprint arXiv:2602.14038 , year=

work page arXiv

[43] [71]

Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems , pages=

Appagent: Multimodal agents as smartphone users , author=. Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems , pages=

2025

[44] [72]

Proceedings of the 30th annual international conference on Mobile computing and networking , pages=

Autodroid: Llm-powered task automation in android , author=. Proceedings of the 30th annual international conference on Mobile computing and networking , pages=

[45] [73]

Proceedings of the 30th Annual International Conference on Mobile Computing and Networking , pages=

Mobilegpt: Augmenting llm with human-like app memory for mobile task automation , author=. Proceedings of the 30th Annual International Conference on Mobile Computing and Networking , pages=

[46] [74]

MobA: multifaceted memory-enhanced adaptive planning for efficient mobile task automation , author=. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (System Demonstrations) , pages=

2025

[47] [75]

arXiv preprint arXiv:2501.11733 , year=

Mobile-agent-e: Self-evolving mobile assistant for complex tasks , author=. arXiv preprint arXiv:2501.11733 , year=

work page arXiv

[48] [76]

arXiv e-prints , pages=

Fairy: Interactive Mobile Assistant to Real-world Tasks via LMM-based Multi-agent , author=. arXiv e-prints , pages=

[49] [77]

arXiv preprint arXiv:2601.19199 , year=

MAGNET: Towards Adaptive GUI Agents with Memory-Driven Knowledge Evolution , author=. arXiv preprint arXiv:2601.19199 , year=

work page arXiv

[50] [78]

arXiv preprint arXiv:2602.05832 , year=

UI-Mem: Self-Evolving Experience Memory for Online Reinforcement Learning in Mobile GUI Agents , author=. arXiv preprint arXiv:2602.05832 , year=

work page arXiv

[51] [79]

arXiv preprint arXiv:2602.06075 , year=

MemGUI-Bench: Benchmarking Memory of Mobile GUI Agents in Dynamic Environments , author=. arXiv preprint arXiv:2602.06075 , year=

work page arXiv

[52] [80]

arXiv preprint arXiv:2603.10291 , year=

Hybrid Self-evolving Structured Memory for GUI Agents , author=. arXiv preprint arXiv:2603.10291 , year=

work page arXiv

[53] [81]

arXiv preprint arXiv:2601.17418 , year=

GraphPilot: GUI Task Automation with One-Step LLM Reasoning Powered by Knowledge Graph , author=. arXiv preprint arXiv:2601.17418 , year=

work page arXiv

[54] [82]

MGA: Memory-Driven GUI Agent for Observation-Centric Interaction

Mga: Memory-driven gui agent for observation-centric interaction , author=. arXiv preprint arXiv:2510.24168 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[55] [83]

arXiv preprint arXiv:2603.18429 , year=

AndroTMem: From Interaction Trajectories to Anchored Memory in Long-Horizon GUI Agents , author=. arXiv preprint arXiv:2603.18429 , year=

work page arXiv

[56] [84]

EchoTrail-GUI: Building Actionable Memory for GUI Agents via Critic-Guided Self-Exploration

EchoTrail-GUI: Building Actionable Memory for GUI Agents via Critic-Guided Self-Exploration , author=. arXiv preprint arXiv:2512.19396 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[57] [85]

SkillDroid: Compile Once, Reuse Forever

SkillDroid: Compile Once, Reuse Forever , author=. arXiv preprint arXiv:2604.14872 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[58] [86]

Mobileworld: Benchmarking autonomous mobile agents in agent-user interactive and mcp-augmented environments.arXiv preprint arXiv:2512.19432, 2025

MobileWorld: Benchmarking Autonomous Mobile Agents in Agent-User Interactive and MCP-Augmented Environments , author=. arXiv preprint arXiv:2512.19432 , year=

work page arXiv

[59] [87]

arXiv preprint arXiv:2512.22047 , year=

MAI-UI Technical Report: Real-World Centric Foundation GUI Agents , author=. arXiv preprint arXiv:2512.22047 , year=

work page arXiv

[60] [88]

Mobile-agent-v3

Mobile-agent-v3. 5: Multi-platform fundamental gui agents , author=. arXiv preprint arXiv:2602.16855 , year=

work page arXiv

[61] [89]

UI-TARS: Pioneering Automated GUI Interaction with Native Agents

Ui-tars: Pioneering automated gui interaction with native agents , author=. arXiv preprint arXiv:2501.12326 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[62] [90]

UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning

Ui-tars-2 technical report: Advancing gui agent with multi-turn reinforcement learning , author=. arXiv preprint arXiv:2509.02544 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[63] [91]

arXiv preprint arXiv:2508.10833 , year=

Ui-venus technical report: Building high-performance ui agents with rft , author=. arXiv preprint arXiv:2508.10833 , year=

work page arXiv

[64] [92]

2025 , url=

GELab-Zero: An Advanced Mobile Agent Inference System , author=. 2025 , url=

2025

[65] [93]

https://openai

Introducing OpenAI o3 and o4-mini , author=. https://openai. com/index/introducing-o3-and-o4-mini/ , year=

[66] [94]

arXiv preprint arXiv:2510.20286 , year=

UI-Ins: Enhancing GUI Grounding with Multi-Perspective Instruction-as-Reasoning , author=. arXiv preprint arXiv:2510.20286 , year=

work page arXiv

[67] [95]

AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents

Androidworld: A dynamic benchmarking environment for autonomous agents , author=. arXiv preprint arXiv:2405.14573 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[68] [96]

Annual review of psychology , volume=

The cognitive neuroscience of working memory , author=. Annual review of psychology , volume=. 2015 , publisher=

2015

[69] [97]

Annual review of neuroscience , volume=

An integrative theory of prefrontal cortex function , author=. Annual review of neuroscience , volume=. 2001 , publisher=

2001

[70] [98]

Task set and prefrontal cortex , author=. Annu. Rev. Neurosci. , volume=. 2008 , publisher=

2008

[71] [99]

Trends in cognitive sciences , volume=

Motivation of extended behaviors by anterior cingulate cortex , author=. Trends in cognitive sciences , volume=. 2012 , publisher=

2012

[72] [100]

Journal of Neuroscience , volume=

Tracking progress toward a goal in corticostriatal ensembles , author=. Journal of Neuroscience , volume=. 2014 , publisher=

2014

[73] [101]

Nature , volume=

Neural activity predicts individual differences in visual working memory capacity , author=. Nature , volume=. 2004 , publisher=

2004

[74] [102]

Trends in cognitive sciences , volume=

The episodic buffer: a new component of working memory? , author=. Trends in cognitive sciences , volume=. 2000 , publisher=

2000

[75] [103]

2024 , journal =

HybridFlow: A Flexible and Efficient RLHF Framework , author =. 2024 , journal =

2024

[76] [104]

Advances in Neural Information Processing Systems , volume=

Androidinthewild: A large-scale dataset for android device control , author=. Advances in Neural Information Processing Systems , volume=

[77] [105]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Mobile-bench: An evaluation benchmark for llm-based mobile agents , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

[78] [106]

arXiv preprint arXiv:2511.09157 , year=

ProBench: Benchmarking GUI Agents with Accurate Process Information , author=. arXiv preprint arXiv:2511.09157 , year=

work page arXiv

[79] [107]

arXiv preprint arXiv:2501.01149 , year=

A3: Android agent arena for mobile gui agents , author=. arXiv preprint arXiv:2501.01149 , year=

work page arXiv

[80] [108]

NeurIPS 2024 Workshop on Open-World Agents , year=

Spa-bench: A comprehensive benchmark for smartphone agent evaluation , author=. NeurIPS 2024 Workshop on Open-World Agents , year=

2024