MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluation

· 2026 · cs.AI · arXiv 2605.27366

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

Large language model (LLM) agents rely on reusable skills to solve complex tasks. However, existing skill creation approaches treat skills as isolated and static artifacts, limiting their reusability, reliability, and long-term improvement. We propose MUSE-Autoskill Agent (Memory-Utilizing Skill Evolution), a skill-centric agent framework that lets agents continuously improve their task-solving capability by creating, reusing, and refining skills under a unified lifecycle (creation, memory, management, evaluation, and refinement). Our framework enables agents to create skills on demand, store and reuse them across tasks, organize and select them efficiently, and evaluate them through unit tests and runtime feedback for continuous refinement. We further introduce skill-level memory that accumulates experience for each skill across tasks, enabling more effective reuse and adaptation over time. Experiments on SkillsBench provide initial evidence that lifecycle-managed skills can improve task success, efficiency, reuse, and cross-agent transfer, highlighting the importance of treating skills as long-lived, experience-aware, and testable assets.

representative citing papers

SkillCoach: Self-Evolving Rubrics for Evaluating and Enhancing Agentic Skill-Use

cs.AI · 2026-07-02 · unverdicted · novelty 6.0

SkillCoach introduces self-evolving rubrics derived from rollouts to evaluate and supervise four process dimensions of agentic skill-use separately from outcome success.

MUSE: A Unified Agentic Harness for MLLMs

cs.CV · 2026-06-02 · unverdicted · novelty 6.0

MUSE is a unified agentic harness that improves off-the-shelf MLLMs on visual spatial planning, perception, multimodal reasoning, and fine-grained discrimination benchmarks through structured execution modules and verifier-guided repair without model retraining.

citing papers explorer

Showing 2 of 2 citing papers.

SkillCoach: Self-Evolving Rubrics for Evaluating and Enhancing Agentic Skill-Use cs.AI · 2026-07-02 · unverdicted · none · ref 19 · internal anchor
SkillCoach introduces self-evolving rubrics derived from rollouts to evaluate and supervise four process dimensions of agentic skill-use separately from outcome success.
MUSE: A Unified Agentic Harness for MLLMs cs.CV · 2026-06-02 · unverdicted · none · ref 25 · internal anchor
MUSE is a unified agentic harness that improves off-the-shelf MLLMs on visual spatial planning, perception, multimodal reasoning, and fine-grained discrimination benchmarks through structured execution modules and verifier-guided repair without model retraining.

MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluation

fields

years

verdicts

representative citing papers

citing papers explorer