pith. machine review for the scientific record. sign in

arxiv: 2605.07358 · v1 · submitted 2026-05-08 · 💻 cs.IR

Recognition: no theorem link

A Comprehensive Survey on Agent Skills: Taxonomy, Techniques, and Applications

Wang Shu, Wenchuan Du, Xuemin Lin, Yaodong Su, Yingli Zhou, Yixiang Fang

Authors on Pith no claims yet

Pith reviewed 2026-05-11 01:44 UTC · model grok-4.3

classification 💻 cs.IR
keywords agent skillsLLM agentsskill lifecycletool coordinationreusable proceduresagent systemssurvey
0
0 comments X

The pith

Agent skills defined as reusable procedures are key to scalable and maintainable LLM agents.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Large language model agents are shifting from simple responses to complex action-oriented tasks, but from-scratch reasoning for each task is inefficient and hard to maintain. The survey defines agent skills as reusable procedural artifacts that coordinate tools, memory, and context to provide reliable execution. Agents focus on high-level reasoning while skills handle the operational details, making them central to system scalability and robustness. The literature is organized into four stages of the skill lifecycle: representation, acquisition, retrieval, and evolution, with reviews of methods and applications. The paper ends by outlining challenges in quality control, interoperability, and long-term management.

Core claim

Agent skills are reusable procedural artifacts that coordinate tools, memory, and runtime context under task-specific constraints. They complement agents, which handle high-level reasoning and planning, by forming the operational layer for reliable, reusable, and composable execution. The survey structures the field around a four-stage lifecycle—representation, acquisition, retrieval, and evolution—and surveys methods, resources, and applications in each while noting open problems in quality control, safe updating, and capability management.

What carries the argument

The agent skill, defined as a reusable procedural artifact coordinating tools, memory, and context, which serves as the operational layer complementing agent reasoning.

If this is right

  • Skills enable more efficient task execution by avoiding repeated low-level tool calls.
  • The lifecycle stages provide a structured way to develop and improve agent capabilities over time.
  • Composability of skills supports building complex workflows from simpler components.
  • Community resources like collected repositories accelerate progress across the field.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Adopting this skill-based approach could lead to standardized skill libraries similar to software libraries.
  • Future work might explore how skills evolve in multi-agent systems or over long deployment periods.
  • Testing the taxonomy on emerging agent frameworks could reveal needed refinements.

Load-bearing premise

The proposed definition of agent skills as reusable procedural artifacts and the division of the literature into four lifecycle stages captures the essential challenges without overlooking significant approaches.

What would settle it

Discovery of a prominent agent technique or system that does not fit into any of the four stages of representation, acquisition, retrieval, or evolution, or that does not benefit from reusable skills, would undermine the framework.

Figures

Figures reproduced from arXiv: 2605.07358 by Wang Shu, Wenchuan Du, Xuemin Lin, Yaodong Su, Yingli Zhou, Yixiang Fang.

Figure 1
Figure 1. Figure 1: Historical evolution of skills, from embodied human survival and craftsmanship to engineering, industrial, digital, and [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Growth of research on agent skills from April 2023 to April 2026. The figure shows the cumulative number of [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The taxonomy for agent skills in this survey. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Illustrative Examples of Agent Skills. in this ecosystem. As illustrated in [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Overview of skill acquisition methods. which a skill is obtained: ❶ human-derived acquisition, ❷ experience-derived acquisition, ❸ task-derived acquisition, and ❹ corpus-derived acquisition. Human-derived acquisition obtains skills directly from expert knowledge and manual curation. Experience-derived acquisition builds them from trajectories, exemplars, or past executions. Task-derived ac￾quisition constr… view at source ↗
Figure 6
Figure 6. Figure 6: The trend of cumulated number of human-derived skills over time. [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Skill retrieval and selection. applying dense retrieval to experiential lessons or structured reasoning memories rather than fully packaged executable skills. This makes dense retrieval the natural entry point when task formulations vary widely but the system still needs to reach reusable skills through a shared semantic layer. The same flexibility also explains why dense retrieval is rarely the whole stor… view at source ↗
Figure 8
Figure 8. Figure 8: From human skill refinement to agent skill evolution. [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Skill evolution through staged refinement: updates revise skills, validation filters changes, and trusted skills are indexed, [PITH_FULL_IMAGE:figures/full_fig_p014_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Application scenarios of agent skills. latency, and execution cost, including dynamic model routing and workload-aware scheduling [6], [18]. Skill Library Evolution under Non-Stationarity. APIs deprecate, tool behavior shifts, and task distributions change over time [10], [22]. Skill libraries need lifecycle-level robust￾ness: drift detection, compatibility checks, safe online updates, and versioned rollb… view at source ↗
read the original abstract

Large language model (LLM)-based agents that reason, plan, and act through tools, memory, and structured interaction are emerging as a promising paradigm for automating complex workflows. Recent systems such as OpenClaw and Claude Code exemplify a broader shift from passive response generation to action-oriented task execution. Yet as agents move toward open-ended, real-world deployment, relying on from-scratch reasoning and low-level tool calls for every task become increasingly inefficient, error-prone, and hard to maintain. This survey examines this challenge through the lens of \emph{agent skills}, which we define as reusable procedural artifacts that coordinate tools, memory, and runtime context under task-specific constraints. Under this view, agents and skills play complementary roles: agents handle high-level reasoning and planning, while skills form the operational layer that enables reliable, reusable, and composable execution. Skills are therefore central to the scalability, robustness, and maintainability of modern agent systems. We organize the literature around four stages of the agent skill lifecycle -- representation, acquisition, retrieval, and evolution -- and review representative methods, ecosystem resources, and application settings across each stage. We conclude by discussing open challenges in quality control, interoperability, safe updating, and long-term capability management. All related resources, including research papers, open-source data, and projects, are collected for the community in \textcolor{blue}{https://github.com/JayLZhou/Awesome-Agent-Skills}.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 3 minor

Summary. The paper surveys LLM-based agents and defines agent skills as reusable procedural artifacts that coordinate tools, memory, and runtime context under task-specific constraints. It positions skills as complementary to high-level agent reasoning and planning, arguing they are central to scalability, robustness, and maintainability. The literature is organized around a four-stage skill lifecycle (representation, acquisition, retrieval, evolution), with reviews of methods, ecosystem resources, and applications in each stage, plus discussion of open challenges in quality control, interoperability, safe updating, and long-term capability management. A GitHub repository collects related papers, data, and projects.

Significance. If the taxonomy holds without major omissions, the survey would provide a practical organizing framework for an emerging sub-area of agent systems, helping researchers navigate techniques for reusable execution components. The curated resource collection adds immediate community value beyond the taxonomy itself.

major comments (1)
  1. [Abstract] Abstract and introduction: The four-stage lifecycle is presented as the central organizing lens, but the manuscript provides limited justification for why representation, acquisition, retrieval, and evolution are exhaustive or optimal compared to alternatives (e.g., adding explicit execution or evaluation stages). This choice is load-bearing for the survey's utility and should be defended with reference to gaps in prior taxonomies.
minor comments (3)
  1. [Abstract] Abstract: The examples 'OpenClaw' and 'Claude Code' are introduced without citations or brief descriptions; add references or short characterizations to ground the shift from passive to action-oriented agents.
  2. The GitHub link is highlighted in blue text; ensure the repository is complete, versioned, and includes DOIs or stable links for all collected resources in the final manuscript.
  3. The definition of skills as 'reusable procedural artifacts' is introduced early; a dedicated subsection comparing it to related concepts (tools, workflows, APIs) would reduce potential overlap with existing agent literature.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive assessment and recommendation for minor revision. We address the major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract and introduction: The four-stage lifecycle is presented as the central organizing lens, but the manuscript provides limited justification for why representation, acquisition, retrieval, and evolution are exhaustive or optimal compared to alternatives (e.g., adding explicit execution or evaluation stages). This choice is load-bearing for the survey's utility and should be defended with reference to gaps in prior taxonomies.

    Authors: We agree that additional explicit justification would improve the manuscript. The four stages were chosen to capture the end-to-end lifecycle of skills viewed as reusable procedural artifacts that complement agent-level reasoning: representation formalizes skill structure for interoperability; acquisition encompasses creation via learning, synthesis, or curation; retrieval covers selection and invocation mechanisms during task execution; and evolution addresses adaptation, refinement, and long-term management. Execution is treated as an agent-runtime concern rather than a skill-lifecycle stage, while evaluation is subsumed under acquisition (initial validation) and evolution (ongoing quality control and safe updating). This framing fills a gap in prior taxonomies, which typically organize around agent architectures, planning algorithms, or tool ecosystems but do not isolate a dedicated, reusable skill layer with its own lifecycle stages. We will revise the introduction to add a dedicated paragraph comparing our taxonomy to alternatives and citing specific omissions in existing LLM-agent surveys. revision: yes

Circularity Check

0 steps flagged

No significant circularity in taxonomy survey

full rationale

This paper is a literature survey that defines agent skills as reusable procedural artifacts and organizes existing work into a four-stage lifecycle (representation, acquisition, retrieval, evolution) as an organizing lens. No equations, predictions, fitted parameters, or derivations appear anywhere in the manuscript. The central claims follow directly from the proposed definitional framing and are supported by review of external literature rather than any self-referential reduction or self-citation chain. The structure is self-contained as a taxonomy exercise with no load-bearing steps that collapse to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The paper introduces a domain-specific definition and taxonomy as its central framing; no mathematical free parameters, formal axioms, or new physical entities are used.

axioms (1)
  • domain assumption LLM-based agents that rely on from-scratch reasoning for every task become inefficient, error-prone, and hard to maintain in real-world deployment.
    This premise is stated in the abstract to motivate the need for agent skills.
invented entities (1)
  • Agent skills no independent evidence
    purpose: Reusable procedural artifacts that coordinate tools, memory, and runtime context under task-specific constraints
    Newly coined conceptual entity introduced to organize the survey; no independent falsifiable evidence is provided beyond the definition itself.

pith-pipeline@v0.9.0 · 5571 in / 1279 out tokens · 67338 ms · 2026-05-11T01:44:15.333735+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

137 extracted references · 137 canonical work pages · 36 internal anchors

  1. [1]

    Language Models are Few-Shot Learners

    T. B. Brownet al., “Language models are few-shot learners,” inAdvances in Neural Information Processing Systems, vol. 33. Curran Associates, Inc., 2020, pp. 1877–1901. [Online]. Available: https://arxiv.org/abs/2005.14165

  2. [2]

    GPT-4 Technical Report

    J. Achiamet al., “GPT-4 technical report,”arXiv preprint arXiv:2303.08774, 2023. [Online]. Available: https://arxiv.org/abs/ 2303.08774

  3. [3]

    Training language models to follow instructions with human feedback

    L. Ouyanget al., “Training language models to follow instructions with human feedback,” inAdvances in Neural Information Processing Systems, vol. 35. Curran Associates, Inc., 2022, pp. 27 730–27 744. [Online]. Available: https://arxiv.org/abs/2203.02155

  4. [4]

    ReAct: Synergizing Reasoning and Acting in Language Models

    S. Yaoet al., “ReAct: Synergizing reasoning and acting in language models,” inInternational Conference on Learning Representations (ICLR), 2023. [Online]. Available: https://arxiv.org/abs/2210.03629

  5. [5]

    HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face

    Y . Shenet al., “HuggingGPT: Solving AI tasks with ChatGPT and its friends in Hugging Face,” inAdvances in Neural Information Processing Systems, vol. 36. Curran Associates, Inc., 2023. [Online]. Available: https://arxiv.org/abs/2303.17580

  6. [6]

    MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework

    S. Honget al., “MetaGPT: Meta programming for a multi- agent collaborative framework,” inInternational Conference on Learning Representations (ICLR), 2024. [Online]. Available: https: //arxiv.org/abs/2308.00352

  7. [7]

    Openclaw — the open-source personal ai assistant and autonomous agent,

    OpenClaw, “Openclaw — the open-source personal ai assistant and autonomous agent,” https://open-claw.org/, 2026, official website, ac- cessed April 21, 2026

  8. [8]

    Welcome - manus documentation,

    Manus, “Welcome - manus documentation,” https://manus.im/docs, 2026, official documentation, accessed April 21, 2026

  9. [9]

    Claude code overview,

    Anthropic, “Claude code overview,” https://docs.anthropic.com/en/ docs/claude-code/overview, 2026, official documentation, accessed April 21, 2026

  10. [10]

    Introducing the model context protocol,

    ——, “Introducing the model context protocol,” https://www.anthropic. com/news/model-context-protocol, 2024, anthropic Blog, November 2024

  11. [11]

    Function calling and other API updates,

    OpenAI, “Function calling and other API updates,” https://openai.com/ blog/function-calling-and-other-api-updates, 2023, openAI Blog, June 2023

  12. [12]

    Voyager: An Open-Ended Embodied Agent with Large Language Models

    G. Wanget al., “V oyager: An open-ended embodied agent with large language models,”arXiv preprint arXiv:2305.16291, 2023. [Online]. Available: https://arxiv.org/abs/2305.16291

  13. [13]

    Large language models as tool makers

    T. Caiet al., “Large language models as tool makers,” inInternational Conference on Learning Representations (ICLR), 2024. [Online]. Available: https://arxiv.org/abs/2305.17126

  14. [14]

    CREATOR: Tool creation for disentangling abstract and concrete reasoning of large language models,

    C. Qianet al., “CREATOR: Tool creation for disentangling abstract and concrete reasoning of large language models,” inFindings of the Association for Computational Linguistics: EMNLP 2023. Singapore: Association for Computational Linguistics, 2023, pp. 6922–6939. [Online]. Available: https://aclanthology.org/2023.findings-emnlp.462/

  15. [15]

    Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

    P. Lewiset al., “Retrieval-augmented generation for knowledge- intensive NLP tasks,” inAdvances in Neural Information Processing Systems, vol. 33. Curran Associates, Inc., 2020, pp. 9459–9474. [Online]. Available: https://arxiv.org/abs/2005.11401

  16. [16]

    Dense passage retrieval for open-domain question answering,

    V . Karpukhinet al., “Dense passage retrieval for open-domain question answering,” inProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Online: Association for Computational Linguistics, 2020, pp. 6769–6781. [Online]. Available: https://aclanthology.org/2020.emnlp-main.550/

  17. [17]

    AnyTool: Self-reflective, hierarchical agents for large-scale API calls,

    Y . Duet al., “AnyTool: Self-reflective, hierarchical agents for large- scale API calls,”arXiv preprint arXiv:2402.04253, 2024. [Online]. Available: https://arxiv.org/abs/2402.04253

  18. [18]

    AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation

    Q. Wuet al., “AutoGen: Enabling next-gen LLM applications via multi-agent conversation,”arXiv preprint arXiv:2308.08155, 2023. [Online]. Available: https://arxiv.org/abs/2308.08155

  19. [19]

    Reflexion: Language Agents with Verbal Reinforcement Learning

    N. Shinnet al., “Reflexion: Language agents with verbal reinforcement learning,” inAdvances in Neural Information Processing Systems, vol. 36. Curran Associates, Inc., 2023. [Online]. Available: https://arxiv.org/abs/2303.11366

  20. [21]

    Toolformer: Language Models Can Teach Themselves to Use Tools

    T. Schicket al., “Toolformer: Language models can teach themselves to use tools,” inAdvances in Neural Information Processing Systems, vol. 36. Curran Associates, Inc., 2023. [Online]. Available: https://arxiv.org/abs/2302.04761

  21. [22]

    ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs

    Y . Qinet al., “ToolLLM: Facilitating large language models to master 16000+ real-world APIs,”arXiv preprint arXiv:2307.16789, 2023. [Online]. Available: https://arxiv.org/abs/2307.16789

  22. [24]

    Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models,

    X. Yanget al., “Buffer of thoughts: Thought-augmented reasoning with large language models,”Advances in Neural Information Processing Systems, 2024. [Online]. Available: https://arxiv.org/abs/2406.04271

  23. [28]

    Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

    M. Ahnet al., “Do as i can, not as i say: Grounding language in robotic affordances,” inConference on Robot Learning (CoRL), 2022. [Online]. Available: https://arxiv.org/abs/2204.01691

  24. [29]

    Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents,

    Z. Wanget al., “Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents,” 2023. [Online]. Available: https://arxiv.org/abs/2302.01560

  25. [30]

    Generative Agents: Interactive Simulacra of Human Behavior

    J. S. Parket al., “Generative agents: Interactive simulacra of human behavior,” 2023. [Online]. Available: https://arxiv.org/abs/2304.03442

  26. [31]

    Ghost in the minecraft: Generally capable agents for open-world enviroments via large language mod- els with text-based knowledge and memory

    X. Zhuet al., “Ghost in the Minecraft: Generally capable agents for open-world environments via large language models with text-based knowledge and memory,”arXiv preprint arXiv:2305.17144, 2023. [Online]. Available: https://arxiv.org/abs/2305.17144

  27. [32]

    Reasoning with language model is planning with world model,

    S. Haoet al., “Reasoning with language model is planning with world model,” inProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023, pp. 8154–8173

  28. [33]

    arXiv preprint arXiv:2308.02151 , year=

    W. Yaoet al., “Retroformer: Retrospective large language agents with policy gradient optimization,”arXiv preprint arXiv:2308.02151, 2023

  29. [34]

    MemGPT: Towards LLMs as Operating Systems

    C. Packeret al., “Memgpt: Towards LLMs as operating systems,” arXiv preprint arXiv:2310.08560, 2023. [Online]. Available: https: //arxiv.org/abs/2310.08560

  30. [36]
  31. [37]

    Self-discover: Large language models self-compose reasoning structures,

    P. Zhouet al., “Self-discover: Large language models self-compose reasoning structures,”Advances in Neural Information Processing Systems, vol. 37, pp. 126 032–126 058, 2024. JOURNAL OF LATEX CLASS FILES, VOL. 18, NO. 9, SEPTEMBER 2020 19

  32. [38]

    Optimizing generative ai by backpropagating language model feedback,

    M. Yuksekgonulet al., “Optimizing generative ai by backpropagating language model feedback,”Nature, vol. 639, no. 8055, pp. 609–616, 2025

  33. [39]

    Fincon: A synthesized LLM multi-agent system with conceptual verbal reinforcement for enhanced financial decision making,

    Y . Yuet al., “Fincon: A synthesized LLM multi-agent system with conceptual verbal reinforcement for enhanced financial decision making,”arXiv preprint arXiv:2407.06567, 2024. [Online]. Available: https://arxiv.org/abs/2407.06567

  34. [40]

    arXiv preprint arXiv:2502.00592 , year=

    Y . Wanget al., “M+: Extending memoryllm with scalable long-term memory,” 2025. [Online]. Available: https://arxiv.org/abs/2502.00592

  35. [41]

    Enhancing reasoning with collaboration and memory,

    J. Michelmanet al., “Enhancing reasoning with collaboration and memory,”arXiv preprint arXiv:2503.05944, 2025

  36. [42]

    Nemori: Self-organizing agent memory inspired by cognitive science,

    a. others, “Nemori: Self-organizing agent memory inspired by cognitive science,”arXiv preprint arXiv:2502.14828, 2025. [Online]. Available: https://arxiv.org/abs/2502.14828

  37. [43]

    Intrinsic memory agents: Heterogeneous multi-agent llm systems through structured contextual memory,

    ——, “Intrinsic memory agents: Heterogeneous multi-agent llm systems through structured contextual memory,”arXiv preprint arXiv:2506.19413, 2025. [Online]. Available: https://arxiv.org/abs/ 2506.19413

  38. [44]

    Skill-Pro: Learning Reusable Skills from Experience via Non-Parametric PPO for LLM Agents

    Q. Miet al., “Procmem: Learning reusable procedural memory from experience via non-parametric ppo for llm agents,” 2026. [Online]. Available: https://arxiv.org/abs/2602.01869

  39. [45]

    arXiv preprint arXiv:2603.00718 , year=

    S. Chenet al., “Skillcraft: Can LLM agents learn to use tools skillfully?”arXiv preprint arXiv:2603.00718, 2026. [Online]. Available: https://arxiv.org/abs/2603.00718

  40. [46]

    Polyskill: Learning generalizable skills through polymorphic abstraction,

    a. others, “Polyskill: Learning generalizable skills through polymorphic abstraction,”International Conference on Learning Representations,

  41. [47]
  42. [49]

    Kanzhi Cheng, Qiushi Sun, Yougang Chu, Fangzhi Xu, Yantao Li, Jianbing Zhang, and Zhiyong Wu

    T. Chenet al., “Cua-skill: Develop skills for computer using agent,” arXiv preprint arXiv:2601.21123, 2026

  43. [50]

    Eureka: Human-Level Reward Design via Coding Large Language Models

    Y . J. Maet al., “Eureka: Human-level reward design via coding large language models,” 2023. [Online]. Available: https://arxiv.org/ abs/2310.12931

  44. [51]

    DS - Agent : Automated Data Science by Empowering Large Language Models with Case - Based Reasoning

    X. Yueet al., “Ds-agent: Automated data science by empowering large language models with case-based reasoning,” 2024. [Online]. Available: https://arxiv.org/abs/2402.17453

  45. [52]

    arXiv preprint arXiv:2402.16906 , year=

    X. Zhonget al., “Debug like a human: A large language model debugger via verifying runtime execution step-by-step,” 2024. [Online]. Available: https://arxiv.org/abs/2402.16906

  46. [53]

    Executable code actions elicit better LLM agents,

    X. Wanget al., “Executable code actions elicit better LLM agents,”arXiv preprint arXiv:2402.01030, 2024. [Online]. Available: https://arxiv.org/abs/2402.01030

  47. [54]

    SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering

    J. Yanget al., “Swe-agent: Agent-computer interfaces enable automated software engineering,” 2024. [Online]. Available: https: //arxiv.org/abs/2405.15793

  48. [55]

    Toolcoder: Teach code generation models to use api search tools,

    K. Zhanget al., “Toolcoder: Teach code generation models to use api search tools,” 2023. [Online]. Available: https://arxiv.org/abs/2305. 04032

  49. [56]

    Evolving programmatic skill networks,

    H. Shiet al., “Evolving programmatic skill networks,” 2026. [Online]. Available: https://arxiv.org/abs/2601.03509

  50. [57]

    JARVIS-1: Open- world multi-task agents with memory-augmented multimodal lan- guage models,

    Z. Wanget al., “Jarvis-1: Open-world multi-task agents with memory-augmented multimodal language models,”arXiv preprint arXiv:2311.05997, 2023. [Online]. Available: https://arxiv.org/abs/ 2311.05997

  51. [59]

    Zheng, R

    [Online]. Available: https://arxiv.org/abs/2306.07863

  52. [61]

    arXiv preprint arXiv:2603.02176 , year=

    H. Liet al., “Organizing, orchestrating, and benchmarking agent skills at ecosystem scale,”arXiv preprint arXiv:2603.02176, 2026. [Online]. Available: https://arxiv.org/abs/2603.02176

  53. [62]

    TPTU: Task planning and tool usage of large language model-based AI agents,

    J. Ruanet al., “Tptu: large language model-based ai agents for task planning and tool usage,”arXiv preprint arXiv:2308.03427, 2023

  54. [63]

    https://arxiv.org/abs/2410.08328

    K. Christakopoulouet al., “Agents thinking fast and slow: A talker- reasoner architecture,”arXiv preprint arXiv:2410.08328, 2024

  55. [64]

    Llm-powered decentralized generative agents with adaptive hierarchical knowledge graph for cooperative planning,

    a. others, “Llm-powered decentralized generative agents with adaptive hierarchical knowledge graph for cooperative planning,” arXiv preprint arXiv:2502.05453, 2025. [Online]. Available: https: //arxiv.org/abs/2502.05453

  56. [65]

    Graphskill: Documentation-guided hierarchical retrieval-augmented coding for complex graph reasoning,

    F. Wanget al., “Graphskill: Documentation-guided hierarchical retrieval-augmented coding for complex graph reasoning,” 2026. [Online]. Available: https://arxiv.org/abs/2603.06620

  57. [66]

    Alita: Generalist agent enabling scalable agentic reasoning with minimal predefinition and maximal self-evolution, 2025

    J. Qiuet al., “Alita: Generalist agent enabling scalable agentic rea- soning with minimal predefinition and maximal self-evolution,”arXiv preprint arXiv:2505.20286, 2025

  58. [67]

    Skillnet: Create, evaluate, and connect ai skills,

    Y . Lianget al., “Skillnet: Create, evaluate, and connect ai skills,”

  59. [68]

    SkillNet: Create, evaluate, and connect AI skills,

    [Online]. Available: https://arxiv.org/abs/2603.04448

  60. [69]

    Sok: Agentic skills – beyond tool use in llm agents,

    Y . Jianget al., “Sok: Agentic skills – beyond tool use in llm agents,”

  61. [70]

    SoK: Agentic Skills -- Beyond Tool Use in LLM Agents

    [Online]. Available: https://arxiv.org/abs/2602.20867

  62. [71]

    Skills are the new apps – now it’s time for skill os,

    L. Chenet al., “Skills are the new apps – now it’s time for skill os,” 2026, preprints.org manuscript 202602.1096.v1. [Online]. Available: https://www.preprints.org/manuscript/202602.1096/v1

  63. [72]

    arXiv preprint arXiv:2405.02957 , year =

    J. Liet al., “Agent hospital: A simulacrum of hospital with evolvable medical agents,”arXiv preprint arXiv:2405.02957, 2024. [Online]. Available: https://arxiv.org/abs/2405.02957

  64. [73]

    Evermemos: A self-organizing memory operating system for structured long-horizon reasoning.arXiv preprint arXiv:2601.02163, 2026

    C. Huet al., “Evermemos: A self-organizing memory operating system for structured long-horizon reasoning,” 2026. [Online]. Available: https://arxiv.org/abs/2601.02163

  65. [74]

    HyperMem: Hypergraph Memory for Long-Term Conversations

    L. Yueet al., “Hypermem: Hypergraph memory for long-term conversations,” 2026, accepted to ACL 2026 Main. [Online]. Available: https://arxiv.org/abs/2604.08256

  66. [75]

    arXiv preprint arXiv:2506.07398 , year=

    G. Zhanget al., “G-memory: Tracing hierarchical memory for multi- agent systems, 2025,”URL https://arxiv. org/abs/2506.07398

  67. [76]

    Agentevolver: Towards efficient self-evolving agent system.arXiv preprint arXiv:2511.10395, 2025

    a. others, “Agentevolver: Towards efficient self-evolving agent system,”arXiv preprint arXiv:2511.10395, 2025. [Online]. Available: https://arxiv.org/abs/2511.10395

  68. [77]

    15 Yuhong Cao, Jeric Lew, Jingsong Liang, Jin Cheng, and Guillaume Sartoretti

    Y . Caiet al., “Building self-evolving agents via experience-driven lifelong learning: A framework and benchmark,” 2025. [Online]. Available: https://arxiv.org/abs/2508.19005

  69. [78]

    AutoRefine: From trajectories to reusable expertise for continual LLM agent refinement.arXiv preprint arXiv:2601.22758, 2026

    L. Qiuet al., “Autorefine: From trajectories to reusable expertise for continual llm agent refinement,” 2026. [Online]. Available: https://arxiv.org/abs/2601.22758

  70. [79]

    Cradle: Empowering foundation agents towards general computer control,

    W. Tanet al., “Cradle: Empowering foundation agents towards general computer control,”arXiv preprint arXiv:2403.03186, 2024. [Online]. Available: https://arxiv.org/abs/2403.03186

  71. [80]

    AppAgent: Multimodal agents as smartphone users,

    C. Zhanget al., “Appagent: Multimodal agents as smartphone users,”arXiv preprint arXiv:2312.13771, 2023. [Online]. Available: https://arxiv.org/abs/2312.13771

  72. [81]

    Autoguide: Automated generation and selection of state-aware guidelines for large language model agents

    Y . Fuet al., “Autoguide: Automated generation and selection of context-aware guidelines for large language model agents,” arXiv preprint arXiv:2403.08978, 2024. [Online]. Available: https: //arxiv.org/abs/2403.08978

  73. [82]

    WebArena: A Realistic Web Environment for Building Autonomous Agents

    S. Zhouet al., “WebArena: A realistic web environment for building autonomous agents,” inInternational Conference on Learning Representations (ICLR), 2024. [Online]. Available: https: //arxiv.org/abs/2307.13854

  74. [83]

    Don't Retrieve, Navigate: Distilling Enterprise Knowledge into Navigable Agent Skills for QA and RAG

    Y . Sunet al., “Don’t retrieve, navigate: Distilling enterprise knowledge into navigable agent skills for qa and rag,”arXiv preprint arXiv:2604.14572, Apr. 2026. [Online]. Available: https: //arxiv.org/abs/2604.14572

  75. [84]

    arXiv preprint arXiv:2506.14728 , year=

    J. Qiuet al., “Agentdistill: Training-free agent distillation with gener- alizable mcp boxes,”arXiv preprint arXiv:2506.14728, 2025

  76. [85]

    Reinforcement learning for self-improving agent with skill library, 2025

    J. Wanget al., “Reinforcement learning for self-improving agent with skill library,”arXiv preprint arXiv:2512.17102, 2025

  77. [86]

    arXiv preprint arXiv:2603.01145 , year=

    Y . Yanget al., “Autoskill: Experience-driven lifelong learning via skill self-evolution,” 2026. [Online]. Available: https://arxiv.org/abs/ 2603.01145

  78. [87]

    MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents

    H. Zhanget al., “Memskill: Learning and evolving memory skills for self-evolving agents,” 2026. [Online]. Available: https: //arxiv.org/abs/2602.02474

  79. [88]

    Ouyang, J

    S. Ouyanget al., “Reasoningbank: Scaling agent self-evolving with reasoning memory,” 2025. [Online]. Available: https://arxiv.org/abs/ 2509.25140

  80. [89]

    SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills

    B. Zhenget al., “Skillweaver: Web agents can self-improve by discovering and honing skills,” 2025. [Online]. Available: https://arxiv.org/abs/2504.07079

Showing first 80 references.