Creates LoCoMo benchmark dataset for very long-term LLM conversational memory and shows current models struggle with lengthy dialogues and long-range temporal dynamics.
arXiv preprint arXiv:2304.13343 , year=
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
LongBench is the first bilingual multi-task benchmark for long context understanding in LLMs, containing 21 datasets in 6 categories with average lengths of 6711 words (English) and 13386 characters (Chinese).
HeLa-Mem is a graph-based memory architecture for LLM agents that applies Hebbian learning to episodic associations and distills hubs into semantic knowledge, yielding better results on long-context benchmarks with fewer tokens.
HAGE proposes a trainable weighted graph memory framework with LLM intent classification, dynamic edge modulation, and RL optimization that improves long-horizon reasoning accuracy in agentic LLMs over static baselines.
HingeMem segments dialogue memory via boundary-triggered hyperedges over four elements and applies query-adaptive retrieval, yielding ~20% relative gains and 68% lower QA token cost versus baselines on LOCOMO.
A-MEM is a dynamic memory system for LLM agents that builds and refines an interconnected network of notes with agent-driven linking and evolution, showing performance gains over prior memory methods on six models.
The paper delivers the first systematic review of self-evolving agents, structured around what components evolve, when adaptation occurs, and how it is implemented.
The paper surveys the origins, frameworks, applications, and open challenges of AI agents built on large language models.
citing papers explorer
-
Evaluating Very Long-Term Conversational Memory of LLM Agents
Creates LoCoMo benchmark dataset for very long-term LLM conversational memory and shows current models struggle with lengthy dialogues and long-range temporal dynamics.
-
LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding
LongBench is the first bilingual multi-task benchmark for long context understanding in LLMs, containing 21 datasets in 6 categories with average lengths of 6711 words (English) and 13386 characters (Chinese).
-
HeLa-Mem: Hebbian Learning and Associative Memory for LLM Agents
HeLa-Mem is a graph-based memory architecture for LLM agents that applies Hebbian learning to episodic associations and distills hubs into semantic knowledge, yielding better results on long-context benchmarks with fewer tokens.
-
HAGE: Harnessing Agentic Memory via RL-Driven Weighted Graph Evolution
HAGE proposes a trainable weighted graph memory framework with LLM intent classification, dynamic edge modulation, and RL optimization that improves long-horizon reasoning accuracy in agentic LLMs over static baselines.
-
HingeMem: Boundary Guided Long-Term Memory with Query Adaptive Retrieval for Scalable Dialogues
HingeMem segments dialogue memory via boundary-triggered hyperedges over four elements and applies query-adaptive retrieval, yielding ~20% relative gains and 68% lower QA token cost versus baselines on LOCOMO.
-
A-MEM: Agentic Memory for LLM Agents
A-MEM is a dynamic memory system for LLM agents that builds and refines an interconnected network of notes with agent-driven linking and evolution, showing performance gains over prior memory methods on six models.
-
A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence
The paper delivers the first systematic review of self-evolving agents, structured around what components evolve, when adaptation occurs, and how it is implemented.
-
The Rise and Potential of Large Language Model Based Agents: A Survey
The paper surveys the origins, frameworks, applications, and open challenges of AI agents built on large language models.