Building self-evolving agents via experience-driven lifelong learning: A framework and benchmark

Building Self-Evolving Agents via Experience-Driven Lifelong Learning: A Framework, Benchmark , author= · 2025 · arXiv 2508.19005

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

read on arXiv browse 10 citing papers

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

Co-Evolving Skill Generation and Policy Optimization

cs.CL · 2026-06-07 · unverdicted · novelty 7.0

Framework estimates context-dependent marginal utility of candidate skills via reward gaps in matched base vs. skill-augmented rollouts to filter skills and co-train policy as generator.

Empowering GUI Agents via Autonomous Experience Exploration and Hindsight Experience Utilization for Task Planning

cs.CL · 2026-06-25 · unverdicted · novelty 6.0

PEEU enables a 7B MLLM to reach 30.6% accuracy on GUI task planning by autonomous exploration and hindsight experience synthesis, outperforming a 32B model through stronger high-level OOD generalization.

Learning While Acting: A Skill-Enhanced Test-Time Co-Evolution Framework for Online Lifelong Learning Agents

cs.LG · 2026-06-03 · unverdicted · novelty 6.0

LifeSkill is a verifier-guided skill learning plus online internalization framework that raises average performance by 7 points over lifelong agent baselines on LifelongAgentBench.

EvoMemBench: Benchmarking Agent Memory from a Self-Evolving Perspective

cs.CL · 2026-05-18 · unverdicted · novelty 6.0

EvoMemBench evaluates 15 memory methods for LLM agents and finds long-context baselines competitive with no single memory approach working consistently across settings.

GenericAgent: A Token-Efficient Self-Evolving LLM Agent via Contextual Information Density Maximization (V1.0)

cs.CL · 2026-04-18 · unverdicted · novelty 6.0

GenericAgent outperforms other LLM agents on long-horizon tasks by maximizing context information density with fewer tokens via minimal tools, on-demand memory, trajectory-to-SOP evolution, and compression.

Agentic Learner with Grow-and-Refine Multimodal Semantic Memory

cs.AI · 2025-11-26 · unverdicted · novelty 6.0

ViLoMem is a dual-stream grow-and-refine memory system that separates visual and logical error patterns in MLLMs to improve pass@1 accuracy and reduce repeated mistakes across six multimodal benchmarks.

Pushing the Limits of LLM Tool Calling via Experiential Knowledge Integration and Activation

cs.CL · 2026-06-09 · unverdicted · novelty 5.0

Instance-level experiential knowledge provides strong gains for LLM tool calling, parallel sampling activates it more effectively than deeper reasoning, and RL-based internalization outperforms SFT, yielding the KATE framework with consistent benchmark improvements.

A Comprehensive Survey on Agent Skills: Taxonomy, Techniques, and Applications

cs.IR · 2026-05-08 · unverdicted · novelty 3.0 · 3 refs

A survey that defines agent skills as reusable procedural artifacts and reviews methods, resources, and applications across their representation, acquisition, retrieval, and evolution stages.

Ask Only When Needed: Proactive Retrieval from Memory and Skills for Experience-Driven Lifelong Agents

cs.CL · 2026-04-22

SEA-Eval: A Benchmark for Evaluating Self-Evolving Agents Beyond Episodic Assessment

cs.AI · 2026-04-10

citing papers explorer

Showing 10 of 10 citing papers.

Co-Evolving Skill Generation and Policy Optimization cs.CL · 2026-06-07 · unverdicted · none · ref 69
Framework estimates context-dependent marginal utility of candidate skills via reward gaps in matched base vs. skill-augmented rollouts to filter skills and co-train policy as generator.
Empowering GUI Agents via Autonomous Experience Exploration and Hindsight Experience Utilization for Task Planning cs.CL · 2026-06-25 · unverdicted · none · ref 19
PEEU enables a 7B MLLM to reach 30.6% accuracy on GUI task planning by autonomous exploration and hindsight experience synthesis, outperforming a 32B model through stronger high-level OOD generalization.
Learning While Acting: A Skill-Enhanced Test-Time Co-Evolution Framework for Online Lifelong Learning Agents cs.LG · 2026-06-03 · unverdicted · none · ref 1
LifeSkill is a verifier-guided skill learning plus online internalization framework that raises average performance by 7 points over lifelong agent baselines on LifelongAgentBench.
EvoMemBench: Benchmarking Agent Memory from a Self-Evolving Perspective cs.CL · 2026-05-18 · unverdicted · none · ref 3
EvoMemBench evaluates 15 memory methods for LLM agents and finds long-context baselines competitive with no single memory approach working consistently across settings.
GenericAgent: A Token-Efficient Self-Evolving LLM Agent via Contextual Information Density Maximization (V1.0) cs.CL · 2026-04-18 · unverdicted · none · ref 44
GenericAgent outperforms other LLM agents on long-horizon tasks by maximizing context information density with fewer tokens via minimal tools, on-demand memory, trajectory-to-SOP evolution, and compression.
Agentic Learner with Grow-and-Refine Multimodal Semantic Memory cs.AI · 2025-11-26 · unverdicted · none · ref 6
ViLoMem is a dual-stream grow-and-refine memory system that separates visual and logical error patterns in MLLMs to improve pass@1 accuracy and reduce repeated mistakes across six multimodal benchmarks.
Pushing the Limits of LLM Tool Calling via Experiential Knowledge Integration and Activation cs.CL · 2026-06-09 · unverdicted · none · ref 1
Instance-level experiential knowledge provides strong gains for LLM tool calling, parallel sampling activates it more effectively than deeper reasoning, and RL-based internalization outperforms SFT, yielding the KATE framework with consistent benchmark improvements.
A Comprehensive Survey on Agent Skills: Taxonomy, Techniques, and Applications cs.IR · 2026-05-08 · unverdicted · none · ref 75 · 3 links
A survey that defines agent skills as reusable procedural artifacts and reviews methods, resources, and applications across their representation, acquisition, retrieval, and evolution stages.
Ask Only When Needed: Proactive Retrieval from Memory and Skills for Experience-Driven Lifelong Agents cs.CL · 2026-04-22 · unreviewed · ref 4
SEA-Eval: A Benchmark for Evaluating Self-Evolving Agents Beyond Episodic Assessment cs.AI · 2026-04-10 · unreviewed · ref 1

Building self-evolving agents via experience-driven lifelong learning: A framework and benchmark

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer