hub Canonical reference

EvolveR: Self-Evolving LLM Agents through an Experience-Driven Lifecycle

Rong Wu, Xiaoman Wang, Jianbiao Mei, Pinlong Cai, Daocheng Fu, Cheng Yang, Licheng Wen, Xuemeng Yang, Yufan Shen, Yuxin Wang, et al · 2025 · cs.CL · arXiv 2510.16079

Canonical reference. 73% of citing Pith papers cite this work as background.

55 Pith papers citing it

Background 73% of classified citations

open full Pith review browse 55 citing papers arXiv PDF

abstract

Current Large Language Model (LLM) agents show strong performance in tool use, but lack the crucial capability to systematically learn from their own experiences. While existing frameworks mainly focus on mitigating external knowledge gaps, they fail to address a more fundamental limitation: the inability to iteratively refine problem-solving strategies. In this work, we introduce EvolveR, a framework designed to enable agent to self-improve through a complete, closed-loop experience lifecycle. This lifecycle comprises two key stages: (1) Offline Self-Distillation, where the agent's interaction trajectories are synthesized into a structured repository of abstract, reusable strategic principles; (2) Online Interaction, where the agent interacts with tasks and actively retrieves distilled principles to guide its decision-making, accumulating a diverse set of behavioral trajectories. This loop employs a policy reinforcement mechanism to iteratively update the agent based on its performance. We demonstrate the effectiveness of EvolveR on complex multi-hop question-answering benchmarks, where it achieves superior performance over strong agentic baselines. Our work presents a comprehensive blueprint for agents that learn not only from external data but also from the consequences of their own actions, paving the way for more autonomous and continuously improving systems. Code is available at https://github.com/Edaizi/EvolveR.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 9 baseline 1 dataset 1

citation-polarity summary

background 8 baseline 1 unclear 1 use dataset 1

representative citing papers

Experience Makes Skillful: Enabling Generalizable Medical Agent Reasoning via Self-Evolving Skill Memory

cs.AI · 2026-06-08 · unverdicted · novelty 7.0

SkeMex distills agent trajectories into value-aware skills organized in general/task/action branches and evolves them via a closed-loop Read-Write-Assess-Govern process, outperforming prior memory agents on clinical tasks.

Co-Evolving Skill Generation and Policy Optimization

cs.CL · 2026-06-07 · unverdicted · novelty 7.0

Framework estimates context-dependent marginal utility of candidate skills via reward gaps in matched base vs. skill-augmented rollouts to filter skills and co-train policy as generator.

EvoRepair: Enhancing Vulnerability Repair Agents Through Experience-Based Self-Evolution

cs.SE · 2026-05-28 · unverdicted · novelty 7.0

EvoRepair is the first experience-based self-evolving agent framework for automated vulnerability repair, reporting 90.46% overall success on PATCHEVAL and SEC-bench benchmarks.

Library Drift: Diagnosing and Fixing a Silent Failure Mode in Self-Evolving LLM Skill Libraries

cs.AI · 2026-05-19 · unverdicted · novelty 7.0 · 2 refs

The paper diagnoses library drift in self-evolving LLM skill libraries and demonstrates a governance recipe raising pass@1 from 0.258 to 0.584 on MBPP+ hard-100.

EXG: Self-Evolving Agents with Experience Graphs

cs.AI · 2026-05-18 · unverdicted · novelty 7.0

EXG is an experience graph framework for self-evolving LLM agents that supports online real-time growth and offline reuse to enhance solution quality and efficiency on code generation and reasoning benchmarks.

Test-Time Learning with an Evolving Library

cs.LG · 2026-05-14 · unverdicted · novelty 7.0

EvoLib enables LLMs to accumulate, reuse, and evolve knowledge abstractions from inference trajectories at test time, yielding substantial gains on math reasoning, code generation, and agentic benchmarks without parameter updates or supervision.

ClawForge: Generating Executable Interactive Benchmarks for Command-Line Agents

cs.AI · 2026-05-13 · unverdicted · novelty 7.0 · 2 refs

ClawForge is a generator framework that creates reproducible executable benchmarks for command-line agents under state conflict, with ClawForge-Bench showing frontier models reach at most 45.3% strict accuracy and that state inspection drives most performance gaps.

EvolveMem:Self-Evolving Memory Architecture via AutoResearch for LLM Agents

cs.LG · 2026-05-13 · unverdicted · novelty 7.0

EvolveMem enables autonomous self-evolution of LLM memory retrieval configurations via LLM diagnosis and safeguards, delivering 25.7% gains over strong baselines on LoCoMo and 18.9% on MemBench with positive cross-benchmark transfer.

Evolving-RL: End-to-End Optimization of Experience-Driven Self-Evolving Capability within Agents

cs.AI · 2026-05-11 · unverdicted · novelty 7.0

Evolving-RL jointly optimizes experience extraction and utilization in LLM agents via RL with separate evaluation signals, delivering up to 98.7% relative gains on out-of-distribution tasks in ALFWorld and Mind2Web.

MAGE: Multi-Agent Self-Evolution with Co-Evolutionary Knowledge Graphs

cs.AI · 2026-05-11 · unverdicted · novelty 7.0

MAGE uses a four-subgraph co-evolutionary knowledge graph plus dual bandits to externalize and retrieve experience for stable self-evolution of frozen language-model agents, showing gains on nine diverse benchmarks.

RewardHarness: Self-Evolving Agentic Post-Training

cs.AI · 2026-05-09 · unverdicted · novelty 7.0

RewardHarness self-evolves a tool-and-skill library from 100 preference examples to reach 47.4% accuracy on image-edit evaluation, beating GPT-5, and yields stronger RL-tuned models.

Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning

cs.LG · 2026-04-08 · unverdicted · novelty 7.0

This survey introduces the Generate-Filter-Control-Replay (GFCR) taxonomy to structure rollout pipelines for RL-based post-training of reasoning LLMs.

SPIRAL: Self-Evolving Action-Conditioned Video Generation via Reflective Planning Agents

cs.CV · 2026-03-09 · unverdicted · novelty 7.0

SPIRAL is a closed-loop think-act-reflect framework using PlanAgent, VideoGenerator, and CriticAgent plus GRPO self-evolution to improve long-horizon action-conditioned video generation, with new dataset and benchmark showing gains over open-loop baselines.

COMFYCLAW: Self-Evolving Skill Harnesses for Image Generation Workflows

cs.AI · 2026-07-02 · unverdicted · novelty 6.0

COMFYCLAW introduces skill evolution via graph editing, automatic reversion, VLM verification, and distillation of runs into reusable Agent Skills, achieving higher average scores than a verifier-only baseline across benchmarks.

MDForge: Agentic Molecular Dynamics Pipeline Design under Sparse Simulator Feedback

cs.AI · 2026-06-11 · unverdicted · novelty 6.0

MDForge uses an LLM agent with multi-agent debate to densify sparse simulator feedback for automatic MD pipeline design, matching human experts on SAMPL benchmarks and identifying a lab-confirmed picomolar CB[7] binder.

APPO: Agentic Procedural Policy Optimization

cs.LG · 2026-06-10 · unverdicted · novelty 6.0

APPO refines branching and credit assignment in agentic RL via a Branching Score and procedure-level scaling, improving baselines by nearly 4 points on 13 benchmarks.

EpiEvolve: Self-Evolving Agents for Streaming Pandemic Forecasting under Regime Shifts

cs.AI · 2026-06-03 · unverdicted · novelty 6.0

EpiEvolve achieves 0.629 accuracy in streaming COVID-19 forecasting by using episodic memory, reflection on delayed labels, and regime-aware retrieval, outperforming static LLMs (0.561) and CDC ensembles (0.325) while halving recovery lag after regime shifts.

Traj-Evolve: A Self-Evolving Multi-Agent System for Patient Trajectory Modeling in Lung Cancer Early Detection

cs.AI · 2026-06-01 · unverdicted · novelty 6.0

Traj-Evolve combines non-parametric experience retrieval and multi-agent RL with a leave-one-out unification strategy to outperform baselines on lung cancer prediction from up to five years of multimodal EHRs, including in never-smokers.

BenchEvolver: Frontier Task Synthesis via Solution-Centric Evolution

cs.SE · 2026-05-31 · unverdicted · novelty 6.0

BenchEvolver evolves coding problem solutions to generate harder, valid tasks, producing LiveCodeBench-Plus where frontier models score 27.5-62.6% and enabling RL gains on held-out tests.

Rethinking Memory as Continuously Evolving Connectivity

cs.CL · 2026-05-27 · unverdicted · novelty 6.0

FluxMem evolves memory as a heterogeneous graph via three refinement stages and reports consistent state-of-the-art results on LoCoMo, Mind2Web, and GAIA benchmarks.

SKILLC: Learning Autonomous Skill Internalization in LLM Agents via Contrastive Credit Assignment

cs.AI · 2026-05-27 · unverdicted · novelty 6.0

SkillC converts skill-helpfulness contrast into a policy learning signal via paired rollouts and dual-stream advantage estimation, outperforming prior internalization baselines by 5.5% and 4.4% on ALFWorld and WebShop without runtime skill access.

SetupX: Can LLM Agents Learn from Past Failures in Functionality-Correct Code Repository Setup?

cs.SE · 2026-05-25 · unverdicted · novelty 6.0

SetupX presents an experiential learning framework for LLM agents that reaches 92% pass rate on functionality-correct repository setup by transferring verified fixes across repositories via XPU representations, LIFO Docker snapshots, and Prosecutor-Judge verification.

SkillOpt: Executive Strategy for Self-Evolving Agent Skills

cs.AI · 2026-05-22 · unverdicted · novelty 6.0 · 2 refs

SkillOpt introduces a controllable text-space optimizer that evolves agent skills via add/delete/replace edits accepted only on strict held-out validation improvement, reporting consistent gains across 52 model-benchmark-harness combinations.

From Raw Experience to Skill Consumption: A Systematic Study of Model-Generated Agent Skills

cs.AI · 2026-05-22 · unverdicted · novelty 6.0

A systematic study across five domains finds model-generated skills yield average gains but non-uniform negative transfer, with a meta-skill improving extraction quality.

citing papers explorer

Showing 1 of 1 citing paper after filters.

A Comprehensive Survey on Agent Skills: Taxonomy, Techniques, and Applications cs.IR · 2026-05-08 · unverdicted · none · ref 116 · 3 links · internal anchor
A survey that defines agent skills as reusable procedural artifacts and reviews methods, resources, and applications across their representation, acquisition, retrieval, and evolution stages.

EvolveR: Self-Evolving LLM Agents through an Experience-Driven Lifecycle

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer