Framework estimates context-dependent marginal utility of candidate skills via reward gaps in matched base vs. skill-augmented rollouts to filter skills and co-train policy as generator.
Building self-evolving agents via experience-driven lifelong learning: A framework and benchmark
10 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
PEEU enables a 7B MLLM to reach 30.6% accuracy on GUI task planning by autonomous exploration and hindsight experience synthesis, outperforming a 32B model through stronger high-level OOD generalization.
LifeSkill is a verifier-guided skill learning plus online internalization framework that raises average performance by 7 points over lifelong agent baselines on LifelongAgentBench.
EvoMemBench evaluates 15 memory methods for LLM agents and finds long-context baselines competitive with no single memory approach working consistently across settings.
GenericAgent outperforms other LLM agents on long-horizon tasks by maximizing context information density with fewer tokens via minimal tools, on-demand memory, trajectory-to-SOP evolution, and compression.
ViLoMem is a dual-stream grow-and-refine memory system that separates visual and logical error patterns in MLLMs to improve pass@1 accuracy and reduce repeated mistakes across six multimodal benchmarks.
Instance-level experiential knowledge provides strong gains for LLM tool calling, parallel sampling activates it more effectively than deeper reasoning, and RL-based internalization outperforms SFT, yielding the KATE framework with consistent benchmark improvements.
A survey that defines agent skills as reusable procedural artifacts and reviews methods, resources, and applications across their representation, acquisition, retrieval, and evolution stages.
citing papers explorer
-
Co-Evolving Skill Generation and Policy Optimization
Framework estimates context-dependent marginal utility of candidate skills via reward gaps in matched base vs. skill-augmented rollouts to filter skills and co-train policy as generator.
-
Empowering GUI Agents via Autonomous Experience Exploration and Hindsight Experience Utilization for Task Planning
PEEU enables a 7B MLLM to reach 30.6% accuracy on GUI task planning by autonomous exploration and hindsight experience synthesis, outperforming a 32B model through stronger high-level OOD generalization.
-
Learning While Acting: A Skill-Enhanced Test-Time Co-Evolution Framework for Online Lifelong Learning Agents
LifeSkill is a verifier-guided skill learning plus online internalization framework that raises average performance by 7 points over lifelong agent baselines on LifelongAgentBench.
-
EvoMemBench: Benchmarking Agent Memory from a Self-Evolving Perspective
EvoMemBench evaluates 15 memory methods for LLM agents and finds long-context baselines competitive with no single memory approach working consistently across settings.
-
GenericAgent: A Token-Efficient Self-Evolving LLM Agent via Contextual Information Density Maximization (V1.0)
GenericAgent outperforms other LLM agents on long-horizon tasks by maximizing context information density with fewer tokens via minimal tools, on-demand memory, trajectory-to-SOP evolution, and compression.
-
Agentic Learner with Grow-and-Refine Multimodal Semantic Memory
ViLoMem is a dual-stream grow-and-refine memory system that separates visual and logical error patterns in MLLMs to improve pass@1 accuracy and reduce repeated mistakes across six multimodal benchmarks.
-
Pushing the Limits of LLM Tool Calling via Experiential Knowledge Integration and Activation
Instance-level experiential knowledge provides strong gains for LLM tool calling, parallel sampling activates it more effectively than deeper reasoning, and RL-based internalization outperforms SFT, yielding the KATE framework with consistent benchmark improvements.
-
A Comprehensive Survey on Agent Skills: Taxonomy, Techniques, and Applications
A survey that defines agent skills as reusable procedural artifacts and reviews methods, resources, and applications across their representation, acquisition, retrieval, and evolution stages.
- Ask Only When Needed: Proactive Retrieval from Memory and Skills for Experience-Driven Lifelong Agents
- SEA-Eval: A Benchmark for Evaluating Self-Evolving Agents Beyond Episodic Assessment