Creates LoCoMo benchmark dataset for very long-term LLM conversational memory and shows current models struggle with lengthy dialogues and long-range temporal dynamics.
hub Canonical reference
Enhancing large language model with self-controlled memory framework
Canonical reference. 80% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
LongBench is the first bilingual multi-task benchmark for long context understanding in LLMs, containing 21 datasets in 6 categories with average lengths of 6711 words (English) and 13386 characters (Chinese).
SMMBench is a benchmark evaluating multimodal agents on cross-source reasoning, conflict resolution, preference reasoning, and action prediction, showing current systems struggle with evidence distributed across heterogeneous sources.
MemLens benchmark shows long-context LVLMs lose accuracy with length while memory agents lose visual fidelity, with multi-session reasoning below 30% for most systems and neither approach solving the task alone.
EvolveMem enables autonomous self-evolution of LLM memory retrieval configurations via LLM diagnosis and safeguards, delivering 25.7% gains over strong baselines on LoCoMo and 18.9% on MemBench with positive cross-benchmark transfer.
HeLa-Mem is a graph-based memory architecture for LLM agents that applies Hebbian learning to episodic associations and distills hubs into semantic knowledge, yielding better results on long-context benchmarks with fewer tokens.
ProAct uses idle compute to anticipate user needs via dialogue history and memory, achieving 14.8% fewer turns, 11.7% less user effort, and 28.1% fewer hallucinations than reactive baselines on the new ProActEval benchmark.
HAGE proposes a trainable weighted graph memory framework with LLM intent classification, dynamic edge modulation, and RL optimization that improves long-horizon reasoning accuracy in agentic LLMs over static baselines.
HingeMem segments dialogue memory via boundary-triggered hyperedges over four elements and applies query-adaptive retrieval, yielding ~20% relative gains and 68% lower QA token cost versus baselines on LOCOMO.
A-MEM is a dynamic memory system for LLM agents that builds and refines an interconnected network of notes with agent-driven linking and evolution, showing performance gains over prior memory methods on six models.
A survey of LLM-based autonomous agents that proposes a unified framework for their construction and reviews applications in social science, natural science, and engineering along with evaluation methods and future directions.
The paper surveys human memory categories, maps them to LLM memory, and proposes a new three-dimension (object, form, time) categorization into eight quadrants to organize existing work and highlight open problems.
The paper delivers the first systematic review of self-evolving agents, structured around what components evolve, when adaptation occurs, and how it is implemented.
The paper surveys the origins, frameworks, applications, and open challenges of AI agents built on large language models.
A systematic review of memory designs, evaluation methods, applications, limitations, and future directions for LLM-based agents.
citing papers explorer
-
From Human Memory to AI Memory: A Survey on Memory Mechanisms in the Era of LLMs
The paper surveys human memory categories, maps them to LLM memory, and proposes a new three-dimension (object, form, time) categorization into eight quadrants to organize existing work and highlight open problems.