hub Canonical reference

A Survey on the Memory Mechanism of Large Language Model based Agents

Zeyu Zhang, Xiaohe Bo, Chen Ma, Rui Li, Xu Chen, Quanyu Dai · 2024 · cs.AI · arXiv 2404.13501

Canonical reference. 92% of citing Pith papers cite this work as background.

37 Pith papers citing it

Background 92% of classified citations

open full Pith review browse 37 citing papers arXiv PDF

abstract

Large language model (LLM) based agents have recently attracted much attention from the research and industry communities. Compared with original LLMs, LLM-based agents are featured in their self-evolving capability, which is the basis for solving real-world problems that need long-term and complex agent-environment interactions. The key component to support agent-environment interactions is the memory of the agents. While previous studies have proposed many promising memory mechanisms, they are scattered in different papers, and there lacks a systematical review to summarize and compare these works from a holistic perspective, failing to abstract common and effective designing patterns for inspiring future studies. To bridge this gap, in this paper, we propose a comprehensive survey on the memory mechanism of LLM-based agents. In specific, we first discuss ''what is'' and ''why do we need'' the memory in LLM-based agents. Then, we systematically review previous studies on how to design and evaluate the memory module. In addition, we also present many agent applications, where the memory module plays an important role. At last, we analyze the limitations of existing work and show important future directions. To keep up with the latest advances in this field, we create a repository at \url{https://github.com/nuster1128/LLM_Agent_Memory_Survey}.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 10 baseline 1 method 1

citation-polarity summary

background 11 baseline 1

representative citing papers

Belief Engine: Configurable and Inspectable Stance Dynamics in Multi-Agent LLM Deliberation

cs.AI · 2026-05-14 · unverdicted · novelty 7.0

Belief Engine is a configurable belief-update mechanism for multi-agent LLM systems that uses structured argument extraction and log-odds stance updates to make evidence-grounded deliberation inspectable and controllable.

MEME: Multi-entity & Evolving Memory Evaluation

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

All tested LLM memory systems fail at dependency reasoning in multi-entity evolving scenarios, with only an expensive file-based setup showing partial recovery.

Goal-Oriented Reasoning for RAG-based Memory in Conversational Agentic LLM Systems

cs.AI · 2026-05-12 · unverdicted · novelty 7.0

Goal-Mem improves RAG memory retrieval in agentic LLMs by explicit goal decomposition and backward chaining via Natural Language Logic, outperforming nine baselines on multi-hop and implicit inference tasks.

Remember the Decision, Not the Description: A Rate-Distortion Framework for Agent Memory

cs.AI · 2026-05-11 · unverdicted · novelty 7.0

Memory for long-horizon agents should preserve distinctions that affect decisions under a fixed budget, not descriptive features, yielding an exact forgetting boundary and a new online learner DeMem with regret guarantees.

Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory

cs.CL · 2026-05-01 · unverdicted · novelty 7.0

MemCoE learns memory organization guidelines via contrastive feedback and then trains a guideline-aligned RL policy for memory updates, yielding consistent gains on personalization benchmarks.

AEL: Agent Evolving Learning for Open-Ended Environments

cs.CL · 2026-04-23 · conditional · novelty 7.0

AEL uses a fast-timescale bandit for memory policy selection and slow-timescale LLM reflection for causal insights, achieving a Sharpe ratio of 2.13 on a 208-episode portfolio benchmark while showing that added mechanisms degrade performance.

PERMA: Benchmarking Personalized Memory Agents via Event-Driven Preference and Realistic Task Environments

cs.AI · 2026-03-24 · unverdicted · novelty 7.0

PERMA is a new benchmark using temporally ordered events, text variability, and linguistic alignment to evaluate LLM memory agents on persona consistency beyond simple retrieval.

MemEvolve: Meta-Evolution of Agent Memory Systems

cs.CL · 2025-12-21 · unverdicted · novelty 7.0

MemEvolve jointly evolves agent experiential knowledge and memory architectures via a modular codebase, delivering up to 17% gains on agent benchmarks with cross-task and cross-model generalization.

An Empirical Study of Testing Practices in Open Source AI Agent Frameworks and Agentic Applications

cs.SE · 2025-09-23 · conditional · novelty 7.0

Empirical study of open-source AI agents shows testing effort concentrates on deterministic tools and workflows (over 70%) while the FM-based plan body gets under 5% and prompts appear in only 1% of tests.

Automated Design of Agentic Systems

cs.AI · 2024-08-15 · conditional · novelty 7.0

Meta Agent Search uses a meta-agent to iteratively program novel agentic systems in code, producing agents that outperform state-of-the-art hand-designed ones across coding, science, and math while transferring across domains and models.

Mem-$\pi$: Adaptive Memory through Learning When and What to Generate

cs.CL · 2026-05-20 · unverdicted · novelty 6.0

Mem-π is a framework using a dedicated model and decision-content decoupled RL to generate context-specific guidance on demand for LLM agents, outperforming retrieval baselines by over 30% on web navigation.

MedExpMem: Adapting Experience Memory for Differential Diagnosis

cs.LG · 2026-05-20 · unverdicted · novelty 6.0

MedExpMem lets VLM diagnostic agents store and retrieve experience from past failures as pairwise differential notes, producing up to 7% accuracy gains on a multi-subspecialty radiology benchmark.

Solvita: Enhancing Large Language Models for Competitive Programming via Agentic Evolution

cs.AI · 2026-05-14 · unverdicted · novelty 6.0

Solvita is an agentic evolution system using Planner, Solver, Oracle, and Hacker agents with trainable graph knowledge networks updated by reinforcement learning on pass/fail and vulnerability signals to achieve SOTA code generation performance.

Grounded Continuation: A Linear-Time Runtime Verifier for LLM Conversations

cs.AI · 2026-05-13 · conditional · novelty 6.0

A hybrid LLM-symbolic verifier maintains a dependency graph over conversation turns classified into eight formal update operations, enabling linear-time groundedness checks and precise retraction propagation with a conflict-free guarantee.

Evaluating Memory Condensation Strategies for Coding Agents in Data-Driven Scientific Discovery

cs.LG · 2026-05-13 · unverdicted · novelty 6.0

Empirical evaluation of eight memory condensation strategies on 480 DiscoveryBench tasks finds no significant impact on hypothesis quality but domain-dependent differences in token efficiency.

CHAL: Council of Hierarchical Agentic Language

cs.AI · 2026-05-12 · unverdicted · novelty 6.0

CHAL is a multi-agent dialectic system that performs structured belief optimization over defeasible domains using Bayesian-inspired graph representations and configurable meta-cognitive value system hyperparameters.

SkillLens: Adaptive Multi-Granularity Skill Reuse for Cost-Efficient LLM Agents

cs.AI · 2026-05-08 · unverdicted · novelty 6.0

SkillLens organizes skills into policies-strategies-procedures-primitives layers, retrieves via degree-corrected random walk, and uses a verifier for local adaptation, yielding up to 6.31 pp gains on MuLocbench and raising ALFWorld success from 45% to 51.31%.

HiGMem: A Hierarchical and LLM-Guided Memory System for Long-Term Conversational Agents

cs.CL · 2026-04-20 · unverdicted · novelty 6.0

HiGMem combines hierarchical event-turn memory with LLM-guided selection to retrieve concise relevant evidence from long dialogues, improving F1 scores and cutting retrieved turns by an order of magnitude on the LoCoMo10 benchmark.

TSUBASA: Improving Long-Horizon Personalization via Evolving Memory and Self-Learning with Context Distillation

cs.CL · 2026-04-09 · unverdicted · novelty 6.0

TSUBASA improves long-horizon personalization in LLMs via dynamic memory evolution for writing and context-distillation self-learning for reading, outperforming Mem0 and Memory-R1 on Qwen-3 benchmarks while reducing token use.

AgentSociety: Large-Scale Simulation of LLM-Driven Generative Agents Advances Understanding of Human Behaviors and Society

cs.SI · 2025-02-12 · unverdicted · novelty 6.0

AgentSociety is a large-scale LLM agent-based social simulator validated on polarization, UBI, disasters, and sustainability issues with alignment to real experiments.

Retrieval-Augmented Generation for Natural Language Processing: A Survey

cs.CL · 2024-07-18 · accept · novelty 6.0

The survey organizes RAG methods via a taxonomy of query-based, logits-based, latent, and parametric fusion with comparisons on accessibility, efficiency, applications, and challenges.

From Agent Loops to Deterministic Graphs: Execution Lineage for Reproducible AI-Native Work

cs.AI · 2026-05-07 · conditional · novelty 5.0

Execution lineage models AI-native work as a DAG of computations with explicit dependencies, achieving perfect state preservation in controlled update tasks where loop-based agents introduce churn and contamination.

MemReranker: Reasoning-Aware Reranking for Agent Memory Retrieval

cs.CL · 2026-05-07 · unverdicted · novelty 5.0 · 2 refs

MemReranker applies multi-stage distillation to Qwen3-Reranker to produce reasoning-aware rerankers that outperform baselines on memory tasks with temporal and causal constraints.

GRAVITY: Architecture-Agnostic Structured Anchoring for Long-Horizon Conversational Memory

cs.CL · 2026-05-03 · unverdicted · novelty 5.0

GRAVITY adds structured relational, temporal, and thematic memory anchors to conversational LLMs at generation time, delivering 7.5-10.1% average gains in LLM-judge accuracy across five host systems on LongMemEval and LoCoMo.

citing papers explorer

Showing 37 of 37 citing papers.

Belief Engine: Configurable and Inspectable Stance Dynamics in Multi-Agent LLM Deliberation cs.AI · 2026-05-14 · unverdicted · none · ref 7 · internal anchor
Belief Engine is a configurable belief-update mechanism for multi-agent LLM systems that uses structured argument extraction and log-odds stance updates to make evidence-grounded deliberation inspectable and controllable.
MEME: Multi-entity & Evolving Memory Evaluation cs.LG · 2026-05-12 · unverdicted · none · ref 17 · internal anchor
All tested LLM memory systems fail at dependency reasoning in multi-entity evolving scenarios, with only an expensive file-based setup showing partial recovery.
Goal-Oriented Reasoning for RAG-based Memory in Conversational Agentic LLM Systems cs.AI · 2026-05-12 · unverdicted · none · ref 40 · internal anchor
Goal-Mem improves RAG memory retrieval in agentic LLMs by explicit goal decomposition and backward chaining via Natural Language Logic, outperforming nine baselines on multi-hop and implicit inference tasks.
Remember the Decision, Not the Description: A Rate-Distortion Framework for Agent Memory cs.AI · 2026-05-11 · unverdicted · none · ref 54 · internal anchor
Memory for long-horizon agents should preserve distinctions that affect decisions under a fixed budget, not descriptive features, yielding an exact forgetting boundary and a new online learner DeMem with regret guarantees.
Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory cs.CL · 2026-05-01 · unverdicted · none · ref 186 · internal anchor
MemCoE learns memory organization guidelines via contrastive feedback and then trains a guideline-aligned RL policy for memory updates, yielding consistent gains on personalization benchmarks.
AEL: Agent Evolving Learning for Open-Ended Environments cs.CL · 2026-04-23 · conditional · none · ref 31 · internal anchor
AEL uses a fast-timescale bandit for memory policy selection and slow-timescale LLM reflection for causal insights, achieving a Sharpe ratio of 2.13 on a 208-episode portfolio benchmark while showing that added mechanisms degrade performance.
PERMA: Benchmarking Personalized Memory Agents via Event-Driven Preference and Realistic Task Environments cs.AI · 2026-03-24 · unverdicted · none · ref 84 · internal anchor
PERMA is a new benchmark using temporally ordered events, text variability, and linguistic alignment to evaluate LLM memory agents on persona consistency beyond simple retrieval.
MemEvolve: Meta-Evolution of Agent Memory Systems cs.CL · 2025-12-21 · unverdicted · none · ref 34 · internal anchor
MemEvolve jointly evolves agent experiential knowledge and memory architectures via a modular codebase, delivering up to 17% gains on agent benchmarks with cross-task and cross-model generalization.
An Empirical Study of Testing Practices in Open Source AI Agent Frameworks and Agentic Applications cs.SE · 2025-09-23 · conditional · none · ref 31 · internal anchor
Empirical study of open-source AI agents shows testing effort concentrates on deterministic tools and workflows (over 70%) while the FM-based plan body gets under 5% and prompts appear in only 1% of tests.
Automated Design of Agentic Systems cs.AI · 2024-08-15 · conditional · none · ref 236 · internal anchor
Meta Agent Search uses a meta-agent to iteratively program novel agentic systems in code, producing agents that outperform state-of-the-art hand-designed ones across coding, science, and math while transferring across domains and models.
Mem-$\pi$: Adaptive Memory through Learning When and What to Generate cs.CL · 2026-05-20 · unverdicted · none · ref 58 · internal anchor
Mem-π is a framework using a dedicated model and decision-content decoupled RL to generate context-specific guidance on demand for LLM agents, outperforming retrieval baselines by over 30% on web navigation.
MedExpMem: Adapting Experience Memory for Differential Diagnosis cs.LG · 2026-05-20 · unverdicted · none · ref 11 · internal anchor
MedExpMem lets VLM diagnostic agents store and retrieve experience from past failures as pairwise differential notes, producing up to 7% accuracy gains on a multi-subspecialty radiology benchmark.
Solvita: Enhancing Large Language Models for Competitive Programming via Agentic Evolution cs.AI · 2026-05-14 · unverdicted · none · ref 41 · internal anchor
Solvita is an agentic evolution system using Planner, Solver, Oracle, and Hacker agents with trainable graph knowledge networks updated by reinforcement learning on pass/fail and vulnerability signals to achieve SOTA code generation performance.
Grounded Continuation: A Linear-Time Runtime Verifier for LLM Conversations cs.AI · 2026-05-13 · conditional · none · ref 26 · internal anchor
A hybrid LLM-symbolic verifier maintains a dependency graph over conversation turns classified into eight formal update operations, enabling linear-time groundedness checks and precise retraction propagation with a conflict-free guarantee.
Evaluating Memory Condensation Strategies for Coding Agents in Data-Driven Scientific Discovery cs.LG · 2026-05-13 · unverdicted · none · ref 21 · internal anchor
Empirical evaluation of eight memory condensation strategies on 480 DiscoveryBench tasks finds no significant impact on hypothesis quality but domain-dependent differences in token efficiency.
CHAL: Council of Hierarchical Agentic Language cs.AI · 2026-05-12 · unverdicted · none · ref 184 · internal anchor
CHAL is a multi-agent dialectic system that performs structured belief optimization over defeasible domains using Bayesian-inspired graph representations and configurable meta-cognitive value system hyperparameters.
SkillLens: Adaptive Multi-Granularity Skill Reuse for Cost-Efficient LLM Agents cs.AI · 2026-05-08 · unverdicted · none · ref 35 · internal anchor
SkillLens organizes skills into policies-strategies-procedures-primitives layers, retrieves via degree-corrected random walk, and uses a verifier for local adaptation, yielding up to 6.31 pp gains on MuLocbench and raising ALFWorld success from 45% to 51.31%.
HiGMem: A Hierarchical and LLM-Guided Memory System for Long-Term Conversational Agents cs.CL · 2026-04-20 · unverdicted · none · ref 1 · internal anchor
HiGMem combines hierarchical event-turn memory with LLM-guided selection to retrieve concise relevant evidence from long dialogues, improving F1 scores and cutting retrieved turns by an order of magnitude on the LoCoMo10 benchmark.
TSUBASA: Improving Long-Horizon Personalization via Evolving Memory and Self-Learning with Context Distillation cs.CL · 2026-04-09 · unverdicted · none · ref 85 · internal anchor
TSUBASA improves long-horizon personalization in LLMs via dynamic memory evolution for writing and context-distillation self-learning for reading, outperforming Mem0 and Memory-R1 on Qwen-3 benchmarks while reducing token use.
AgentSociety: Large-Scale Simulation of LLM-Driven Generative Agents Advances Understanding of Human Behaviors and Society cs.SI · 2025-02-12 · unverdicted · none · ref 114 · internal anchor
AgentSociety is a large-scale LLM agent-based social simulator validated on polarization, UBI, disasters, and sustainability issues with alignment to real experiments.
Retrieval-Augmented Generation for Natural Language Processing: A Survey cs.CL · 2024-07-18 · accept · none · ref 199 · internal anchor
The survey organizes RAG methods via a taxonomy of query-based, logits-based, latent, and parametric fusion with comparisons on accessibility, efficiency, applications, and challenges.
From Agent Loops to Deterministic Graphs: Execution Lineage for Reproducible AI-Native Work cs.AI · 2026-05-07 · conditional · none · ref 3 · internal anchor
Execution lineage models AI-native work as a DAG of computations with explicit dependencies, achieving perfect state preservation in controlled update tasks where loop-based agents introduce churn and contamination.
MemReranker: Reasoning-Aware Reranking for Agent Memory Retrieval cs.CL · 2026-05-07 · unverdicted · none · ref 1 · 2 links · internal anchor
MemReranker applies multi-stage distillation to Qwen3-Reranker to produce reasoning-aware rerankers that outperform baselines on memory tasks with temporal and causal constraints.
GRAVITY: Architecture-Agnostic Structured Anchoring for Long-Horizon Conversational Memory cs.CL · 2026-05-03 · unverdicted · none · ref 26 · internal anchor
GRAVITY adds structured relational, temporal, and thematic memory anchors to conversational LLMs at generation time, delivering 7.5-10.1% average gains in LLM-judge accuracy across five host systems on LongMemEval and LoCoMo.
From Coarse to Fine: Self-Adaptive Hierarchical Planning for LLM Agents cs.AI · 2026-04-25 · unverdicted · none · ref 33 · internal anchor
AdaPlan-H enables LLM agents to generate self-adaptive hierarchical plans that adjust detail level to task difficulty, improving success rates in multi-step tasks.
Transferable Expertise for Autonomous Agents via Real-World Case-Based Learning cs.AI · 2026-04-14 · unverdicted · none · ref 11 · internal anchor
A case-based learning framework extracts reusable knowledge from past tasks to improve LLM agents' structured performance on complex real-world tasks, outperforming standard prompting baselines especially as task complexity grows.
PaceLLM: Brain-Inspired Large Language Models for Long-Context Understanding q-bio.NC · 2025-06-18 · unverdicted · none · ref 46 · internal anchor
PaceLLM introduces brain-inspired persistent activity and cortical expert clustering to mitigate contextual decay and semantic fragmentation in LLMs, reporting 6% gains on LongBench multi-document QA and 12.5-17.5% on Infinite-Bench while reaching 200K-token NIAH performance.
From Human Memory to AI Memory: A Survey on Memory Mechanisms in the Era of LLMs cs.IR · 2025-04-22 · unverdicted · none · ref 7 · internal anchor
The paper surveys human memory categories, maps them to LLM memory, and proposes a new three-dimension (object, form, time) categorization into eight quadrants to organize existing work and highlight open problems.
CommitDistill: A Lightweight Knowledge-Centric Memory Layer for Software Repositories cs.SE · 2026-05-18 · unverdicted · none · ref 5 · internal anchor
CommitDistill is a deterministic, local-only prototype that extracts typed knowledge from git commits and evaluates retrieval performance against baselines on public repositories.
Memory as Metabolism: A Design for Companion Knowledge Systems cs.AI · 2026-04-13 · unverdicted · none · ref 51 · internal anchor
This paper designs a companion knowledge system with TRIAGE, DECAY, CONTEXTUALIZE, CONSOLIDATE, and AUDIT operations plus memory gravity and minority-hypothesis retention to give contradictory evidence a path to update dominant interpretations in personal LLM wikis.
Measuring Stereotype and Deviation Biases in Large Language Models cs.CL · 2025-08-08 · unverdicted · none · ref 17 · internal anchor
Four advanced LLMs display significant stereotype bias and deviation bias when generating profiles tied to political affiliation, religion, and sexual orientation.
A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence cs.AI · 2025-07-28 · accept · none · ref 282 · internal anchor
The paper delivers the first systematic review of self-evolving agents, structured around what components evolve, when adaptation occurs, and how it is implemented.
From LLM Reasoning to Autonomous AI Agents: A Comprehensive Review cs.AI · 2025-04-28 · accept · none · ref 149 · internal anchor
A survey consolidating benchmarks, agent frameworks, real-world applications, and protocols for LLM-based autonomous agents into a proposed taxonomy with recommendations for future research.
Multi-Agent Collaboration Mechanisms: A Survey of LLMs cs.AI · 2025-01-10 · unverdicted · none · ref 154 · internal anchor
The survey organizes LLM-based multi-agent collaboration mechanisms into a framework with dimensions of actors, types, structures, strategies, and coordination protocols, reviews applications across domains, and identifies challenges for future research.
Large Language Model-Brained GUI Agents: A Survey cs.AI · 2024-11-27 · unverdicted · none · ref 55 · internal anchor
A survey consolidating frameworks, data practices, large action models, benchmarks, applications, and research gaps in LLM-brained GUI agents.
Large Language Model Agent: A Survey on Methodology, Applications and Challenges cs.CL · 2025-03-27 · accept · none · ref 8 · internal anchor
A survey that deconstructs LLM agent systems via a methodology-centered taxonomy linking design principles to emergent behaviors, applications, and challenges.
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems cs.AI · 2025-03-31 · unverdicted · none · ref 67 · internal anchor
This survey frames foundation agents using brain-inspired modular architectures and reviews challenges in evolution, collaboration, and safety.

A Survey on the Memory Mechanism of Large Language Model based Agents

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer