Mem-π is a framework using a dedicated model and decision-content decoupled RL to generate context-specific guidance on demand for LLM agents, outperforming retrieval baselines by over 30% on web navigation.
Lifelonga- gentbench: Evaluating llm agents as lifelong learners.arXiv preprint arXiv:2505.11942
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
GenericAgent outperforms other LLM agents on long-horizon tasks by maximizing context information density with fewer tokens via minimal tools, on-demand memory, trajectory-to-SOP evolution, and compression.
LLMs corrupt an average of 25% of document content during long delegated editing workflows across 52 domains, even frontier models, and agentic tools do not mitigate the issue.
Evo-Memory is a new streaming benchmark and evaluation framework for self-evolving memory in LLM agents, unifying over ten memory modules and introducing the ReMem pipeline for continual improvement on multi-turn and reasoning datasets.
MINTEval benchmark shows current memory-augmented systems average 27.9% accuracy on long-horizon interference tasks, limited by retrieval and memory construction with degradation from intervening updates.
CLI agents trained with RL benefit from selective observation via σ-Reveal and structured credit assignment via A³ that leverages AST action sub-chains and trajectory margins.
Execution lineage models AI-native work as a DAG of computations with explicit dependencies, achieving perfect state preservation in controlled update tasks where loop-based agents introduce churn and contamination.
LLMA-Mem improves long-horizon performance in LLM multi-agent systems over baselines while reducing cost and shows non-monotonic scaling where memory-enabled smaller teams can beat larger ones.
The paper delivers the first systematic review of self-evolving agents, structured around what components evolve, when adaptation occurs, and how it is implemented.
citing papers explorer
-
Learning CLI Agents with Structured Action Credit under Selective Observation
CLI agents trained with RL benefit from selective observation via σ-Reveal and structured credit assignment via A³ that leverages AST action sub-chains and trajectory margins.