hub

A-MEM: Agentic Memory for LLM Agents

Wujiang Xu, Zujie Liang, Kai Mei, Hang Gao, Juntao Tan, Yongfeng Zhang · 2025 · cs.CL · arXiv 2502.12110

84 Pith papers cite this work. Polarity classification is still indexing.

84 Pith papers citing it

open full Pith review browse 84 citing papers arXiv PDF

abstract

While large language model (LLM) agents can effectively use external tools for complex real-world tasks, they require memory systems to leverage historical experiences. Current memory systems enable basic storage and retrieval but lack sophisticated memory organization, despite recent attempts to incorporate graph databases. Moreover, these systems' fixed operations and structures limit their adaptability across diverse tasks. To address this limitation, this paper proposes a novel agentic memory system for LLM agents that can dynamically organize memories in an agentic way. Following the basic principles of the Zettelkasten method, we designed our memory system to create interconnected knowledge networks through dynamic indexing and linking. When a new memory is added, we generate a comprehensive note containing multiple structured attributes, including contextual descriptions, keywords, and tags. The system then analyzes historical memories to identify relevant connections, establishing links where meaningful similarities exist. Additionally, this process enables memory evolution - as new memories are integrated, they can trigger updates to the contextual representations and attributes of existing historical memories, allowing the memory network to continuously refine its understanding. Our approach combines the structured organization principles of Zettelkasten with the flexibility of agent-driven decision making, allowing for more adaptive and context-aware memory management. Empirical experiments on six foundation models show superior improvement against existing SOTA baselines. The source code for evaluating performance is available at https://github.com/WujiangXu/A-mem, while the source code of the agentic memory system is available at https://github.com/WujiangXu/A-mem-sys.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1

citation-polarity summary

background 1

claims ledger

abstract While large language model (LLM) agents can effectively use external tools for complex real-world tasks, they require memory systems to leverage historical experiences. Current memory systems enable basic storage and retrieval but lack sophisticated memory organization, despite recent attempts to incorporate graph databases. Moreover, these systems' fixed operations and structures limit their adaptability across diverse tasks. To address this limitation, this paper proposes a novel agentic memory system for LLM agents that can dynamically organize memories in an agentic way. Following the basi

co-cited works

representative citing papers

RealICU: Do LLM Agents Understand Long-Context ICU Data? A Benchmark Beyond Behavior Imitation

cs.AI · 2026-05-13 · unverdicted · novelty 8.0

RealICU is a new benchmark using physician hindsight labels on MIMIC-IV ICU data that exposes LLM failures in long-horizon clinical assessment, acute problem detection, action recommendation, and red-flag identification.

MedMemoryBench: Benchmarking Agent Memory in Personalized Healthcare

cs.AI · 2026-05-12 · conditional · novelty 8.0

MedMemoryBench supplies a 2,000-session synthetic medical trajectory dataset and an evaluate-while-constructing streaming protocol to expose memory saturation and reasoning failures in current agent architectures for personalized healthcare.

Agent-BRACE: Decoupling Beliefs from Actions in Long-Horizon Tasks via Verbalized State Uncertainty

cs.CL · 2026-05-12 · unverdicted · novelty 8.0

Agent-BRACE improves LLM agent performance on long-horizon partially observable tasks by 5.3-14.5% through a decoupled belief state of verbalized atomic claims with certainty labels that keeps context length constant.

ShadowMerge: A Novel Poisoning Attack on Graph-Based Agent Memory via Relation-Channel Conflicts

cs.CR · 2026-05-09 · unverdicted · novelty 8.0

ShadowMerge poisons graph-based agent memory via relation-channel conflicts using an AIR pipeline, achieving 93.8% average attack success rate on Mem0 and three real-world datasets while bypassing existing defenses.

SkillOps: Managing LLM Agent Skill Libraries as Self-Maintaining Software Ecosystems

cs.SE · 2026-05-13 · unverdicted · novelty 7.0

SkillOps maintains LLM skill libraries via Skill Contracts and ecosystem graphs, raising ALFWorld task success to 79.5% as a standalone agent and improving retrieval baselines by up to 2.9 points with near-zero library-time LLM cost.

LongMemEval-V2: Evaluating Long-Term Agent Memory Toward Experienced Colleagues

cs.CL · 2026-05-12 · unverdicted · novelty 7.0

LongMemEval-V2 is a new benchmark where AgentRunbook-C reaches 72.5% accuracy on long-term agent memory tasks, beating RAG baselines at 48.5% and basic coding agents at 69.3%.

Formalize, Don't Optimize: The Heuristic Trap in LLM-Generated Combinatorial Solvers

cs.AI · 2026-05-12 · unverdicted · novelty 7.0

LLM-generated combinatorial solvers achieve highest correctness when the model formalizes problems for verified backends rather than attempting to optimize search, which often causes regressions.

Goal-Oriented Reasoning for RAG-based Memory in Conversational Agentic LLM Systems

cs.AI · 2026-05-12 · unverdicted · novelty 7.0

Goal-Mem improves RAG memory retrieval in agentic LLMs by explicit goal decomposition and backward chaining via Natural Language Logic, outperforming nine baselines on multi-hop and implicit inference tasks.

Remember the Decision, Not the Description: A Rate-Distortion Framework for Agent Memory

cs.AI · 2026-05-11 · unverdicted · novelty 7.0

Memory for long-horizon agents should preserve distinctions that affect decisions under a fixed budget, not descriptive features, yielding an exact forgetting boundary and a new online learner DeMem with regret guarantees.

MAGE: Multi-Agent Self-Evolution with Co-Evolutionary Knowledge Graphs

cs.AI · 2026-05-11 · unverdicted · novelty 7.0

MAGE uses a four-subgraph co-evolutionary knowledge graph plus dual bandits to externalize and retrieve experience for stable self-evolution of frozen language-model agents, showing gains on nine diverse benchmarks.

Omni-Persona: Systematic Benchmarking and Improving Omnimodal Personalization

cs.CV · 2026-05-11 · unverdicted · novelty 7.0

Omni-Persona benchmark with 18 tasks shows open-source models have audio-visual grounding gaps, RLVR narrows them but leads to conservative outputs, and scale or recall alone fail as diagnostics.

Nautilus Compass: Black-box Persona Drift Detection for Production LLM Agents

cs.CR · 2026-05-11 · unverdicted · novelty 7.0

Nautilus Compass is a black-box drift detector for production LLM agents that uses weighted cosine similarity on BGE-m3 embeddings of raw text against anchors, achieving 0.83 ROC AUC on real session traces while shipping as plugins and servers with an audit log.

Empowering VLMs for Few-Shot Multimodal Time Series Classification via Tailored Agentic Reasoning

cs.AI · 2026-05-10 · unverdicted · novelty 7.0

MarsTSC is a VLM-based agentic reasoning framework with a self-evolving knowledge bank and Generator-Reflector-Modifier roles that achieves better few-shot multimodal time series classification than baselines on 12 benchmarks.

EquiMem: Calibrating Shared Memory in Multi-Agent Debate via Game-Theoretic Equilibrium

cs.AI · 2026-05-10 · unverdicted · novelty 7.0

EquiMem calibrates shared memory in multi-agent debate by computing a game-theoretic equilibrium from agent queries and paths, outperforming heuristics and LLM validators across benchmarks while remaining robust to adversarial agents.

MemCompiler: Compile, Don't Inject -- State-Conditioned Memory for Embodied Agents

cs.RO · 2026-05-08 · unverdicted · novelty 7.0

MemCompiler introduces state-conditioned memory compilation that dynamically selects and compiles relevant memory into text and latent guidance, yielding up to 129% gains over no-memory baselines and 60% lower latency across multiple embodied benchmarks.

When Stored Evidence Stops Being Usable: Scale-Conditioned Evaluation of Agent Memory

cs.AI · 2026-05-08 · unverdicted · novelty 7.0

A new evaluation protocol shows agent memory reliability degrades variably with added irrelevant sessions depending on agent, memory interface, and scale.

PragLocker: Protecting Agent Intellectual Property in Untrusted Deployments via Non-Portable Prompts

cs.CR · 2026-05-07 · unverdicted · novelty 7.0

PragLocker protects agent prompts as IP by building non-portable obfuscated versions that function only on the intended LLM through code-symbol semantic anchoring followed by target-model feedback noise injection.

Belief Memory: Agent Memory Under Partial Observability

cs.AI · 2026-05-07 · unverdicted · novelty 7.0 · 2 refs

BeliefMem is a probabilistic memory architecture for LLM agents that retains multiple candidate conclusions with probabilities updated by Noisy-OR, achieving superior average performance over deterministic baselines on LoCoMo and ALFWorld.

MEMAUDIT: An Exact Package-Oracle Evaluation Protocol for Budgeted Long-Term LLM Memory Writing

cs.AI · 2026-05-04 · unverdicted · novelty 7.0

MEMAUDIT is a new exact optimization protocol for evaluating budgeted LLM memory writing that uses package-oracle fixes and MILP solvers to separate representation quality, validity preservation, and selection effects.

Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory

cs.CL · 2026-05-01 · unverdicted · novelty 7.0

MemCoE learns memory organization guidelines via contrastive feedback and then trains a guideline-aligned RL policy for memory updates, yielding consistent gains on personalization benchmarks.

AEL: Agent Evolving Learning for Open-Ended Environments

cs.CL · 2026-04-23 · conditional · novelty 7.0

AEL uses a fast-timescale bandit for memory policy selection and slow-timescale LLM reflection for causal insights, achieving a Sharpe ratio of 2.13 on a 208-episode portfolio benchmark while showing that added mechanisms degrade performance.

From Recall to Forgetting: Benchmarking Long-Term Memory for Personalized Agents

cs.CL · 2026-04-21 · unverdicted · novelty 7.0

Memora benchmark and FAMA metric show that LLMs and memory agents frequently reuse invalid memories and struggle to reconcile evolving information in long-term interactions.

Four-Axis Decision Alignment for Long-Horizon Enterprise AI Agents

cs.AI · 2026-04-21 · unverdicted · novelty 7.0

Long-horizon enterprise AI agents' decisions decompose into four measurable axes, with benchmark experiments on six memory architectures revealing distinct weaknesses and reversing a pre-registered prediction on summarization.

vstash: Local-First Hybrid Retrieval with Adaptive Fusion for LLM Agents

cs.IR · 2026-04-16 · conditional · novelty 7.0

vstash shows that hybrid retrieval disagreements provide a free training signal to fine-tune 33M-parameter embeddings, yielding NDCG@10 gains up to 19.5% on NFCorpus and matching some larger models on three of five BEIR datasets.

citing papers explorer

Showing 50 of 84 citing papers.

RealICU: Do LLM Agents Understand Long-Context ICU Data? A Benchmark Beyond Behavior Imitation cs.AI · 2026-05-13 · unverdicted · none · ref 33 · internal anchor
RealICU is a new benchmark using physician hindsight labels on MIMIC-IV ICU data that exposes LLM failures in long-horizon clinical assessment, acute problem detection, action recommendation, and red-flag identification.
MedMemoryBench: Benchmarking Agent Memory in Personalized Healthcare cs.AI · 2026-05-12 · conditional · none · ref 37 · internal anchor
MedMemoryBench supplies a 2,000-session synthetic medical trajectory dataset and an evaluate-while-constructing streaming protocol to expose memory saturation and reasoning failures in current agent architectures for personalized healthcare.
Agent-BRACE: Decoupling Beliefs from Actions in Long-Horizon Tasks via Verbalized State Uncertainty cs.CL · 2026-05-12 · unverdicted · none · ref 19 · internal anchor
Agent-BRACE improves LLM agent performance on long-horizon partially observable tasks by 5.3-14.5% through a decoupled belief state of verbalized atomic claims with certainty labels that keeps context length constant.
ShadowMerge: A Novel Poisoning Attack on Graph-Based Agent Memory via Relation-Channel Conflicts cs.CR · 2026-05-09 · unverdicted · none · ref 10 · internal anchor
ShadowMerge poisons graph-based agent memory via relation-channel conflicts using an AIR pipeline, achieving 93.8% average attack success rate on Mem0 and three real-world datasets while bypassing existing defenses.
SkillOps: Managing LLM Agent Skill Libraries as Self-Maintaining Software Ecosystems cs.SE · 2026-05-13 · unverdicted · none · ref 54 · internal anchor
SkillOps maintains LLM skill libraries via Skill Contracts and ecosystem graphs, raising ALFWorld task success to 79.5% as a standalone agent and improving retrieval baselines by up to 2.9 points with near-zero library-time LLM cost.
LongMemEval-V2: Evaluating Long-Term Agent Memory Toward Experienced Colleagues cs.CL · 2026-05-12 · unverdicted · none · ref 101 · internal anchor
LongMemEval-V2 is a new benchmark where AgentRunbook-C reaches 72.5% accuracy on long-term agent memory tasks, beating RAG baselines at 48.5% and basic coding agents at 69.3%.
Formalize, Don't Optimize: The Heuristic Trap in LLM-Generated Combinatorial Solvers cs.AI · 2026-05-12 · unverdicted · none · ref 17 · internal anchor
LLM-generated combinatorial solvers achieve highest correctness when the model formalizes problems for verified backends rather than attempting to optimize search, which often causes regressions.
Goal-Oriented Reasoning for RAG-based Memory in Conversational Agentic LLM Systems cs.AI · 2026-05-12 · unverdicted · none · ref 35 · internal anchor
Goal-Mem improves RAG memory retrieval in agentic LLMs by explicit goal decomposition and backward chaining via Natural Language Logic, outperforming nine baselines on multi-hop and implicit inference tasks.
Remember the Decision, Not the Description: A Rate-Distortion Framework for Agent Memory cs.AI · 2026-05-11 · unverdicted · none · ref 44 · internal anchor
Memory for long-horizon agents should preserve distinctions that affect decisions under a fixed budget, not descriptive features, yielding an exact forgetting boundary and a new online learner DeMem with regret guarantees.
MAGE: Multi-Agent Self-Evolution with Co-Evolutionary Knowledge Graphs cs.AI · 2026-05-11 · unverdicted · none · ref 23 · internal anchor
MAGE uses a four-subgraph co-evolutionary knowledge graph plus dual bandits to externalize and retrieve experience for stable self-evolution of frozen language-model agents, showing gains on nine diverse benchmarks.
Omni-Persona: Systematic Benchmarking and Improving Omnimodal Personalization cs.CV · 2026-05-11 · unverdicted · none · ref 19 · internal anchor
Omni-Persona benchmark with 18 tasks shows open-source models have audio-visual grounding gaps, RLVR narrows them but leads to conservative outputs, and scale or recall alone fail as diagnostics.
Nautilus Compass: Black-box Persona Drift Detection for Production LLM Agents cs.CR · 2026-05-11 · unverdicted · none · ref 2 · internal anchor
Nautilus Compass is a black-box drift detector for production LLM agents that uses weighted cosine similarity on BGE-m3 embeddings of raw text against anchors, achieving 0.83 ROC AUC on real session traces while shipping as plugins and servers with an audit log.
Empowering VLMs for Few-Shot Multimodal Time Series Classification via Tailored Agentic Reasoning cs.AI · 2026-05-10 · unverdicted · none · ref 54 · internal anchor
MarsTSC is a VLM-based agentic reasoning framework with a self-evolving knowledge bank and Generator-Reflector-Modifier roles that achieves better few-shot multimodal time series classification than baselines on 12 benchmarks.
EquiMem: Calibrating Shared Memory in Multi-Agent Debate via Game-Theoretic Equilibrium cs.AI · 2026-05-10 · unverdicted · none · ref 84 · internal anchor
EquiMem calibrates shared memory in multi-agent debate by computing a game-theoretic equilibrium from agent queries and paths, outperforming heuristics and LLM validators across benchmarks while remaining robust to adversarial agents.
MemCompiler: Compile, Don't Inject -- State-Conditioned Memory for Embodied Agents cs.RO · 2026-05-08 · unverdicted · none · ref 33 · internal anchor
MemCompiler introduces state-conditioned memory compilation that dynamically selects and compiles relevant memory into text and latent guidance, yielding up to 129% gains over no-memory baselines and 60% lower latency across multiple embodied benchmarks.
When Stored Evidence Stops Being Usable: Scale-Conditioned Evaluation of Agent Memory cs.AI · 2026-05-08 · unverdicted · none · ref 42 · internal anchor
A new evaluation protocol shows agent memory reliability degrades variably with added irrelevant sessions depending on agent, memory interface, and scale.
PragLocker: Protecting Agent Intellectual Property in Untrusted Deployments via Non-Portable Prompts cs.CR · 2026-05-07 · unverdicted · none · ref 112 · internal anchor
PragLocker protects agent prompts as IP by building non-portable obfuscated versions that function only on the intended LLM through code-symbol semantic anchoring followed by target-model feedback noise injection.
Belief Memory: Agent Memory Under Partial Observability cs.AI · 2026-05-07 · unverdicted · none · ref 16 · 2 links · internal anchor
BeliefMem is a probabilistic memory architecture for LLM agents that retains multiple candidate conclusions with probabilities updated by Noisy-OR, achieving superior average performance over deterministic baselines on LoCoMo and ALFWorld.
MEMAUDIT: An Exact Package-Oracle Evaluation Protocol for Budgeted Long-Term LLM Memory Writing cs.AI · 2026-05-04 · unverdicted · none · ref 13 · internal anchor
MEMAUDIT is a new exact optimization protocol for evaluating budgeted LLM memory writing that uses package-oracle fixes and MILP solvers to separate representation quality, validity preservation, and selection effects.
Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory cs.CL · 2026-05-01 · unverdicted · none · ref 167 · internal anchor
MemCoE learns memory organization guidelines via contrastive feedback and then trains a guideline-aligned RL policy for memory updates, yielding consistent gains on personalization benchmarks.
AEL: Agent Evolving Learning for Open-Ended Environments cs.CL · 2026-04-23 · conditional · none · ref 29 · internal anchor
AEL uses a fast-timescale bandit for memory policy selection and slow-timescale LLM reflection for causal insights, achieving a Sharpe ratio of 2.13 on a 208-episode portfolio benchmark while showing that added mechanisms degrade performance.
From Recall to Forgetting: Benchmarking Long-Term Memory for Personalized Agents cs.CL · 2026-04-21 · unverdicted · none · ref 11 · internal anchor
Memora benchmark and FAMA metric show that LLMs and memory agents frequently reuse invalid memories and struggle to reconcile evolving information in long-term interactions.
Four-Axis Decision Alignment for Long-Horizon Enterprise AI Agents cs.AI · 2026-04-21 · unverdicted · none · ref 9 · internal anchor
Long-horizon enterprise AI agents' decisions decompose into four measurable axes, with benchmark experiments on six memory architectures revealing distinct weaknesses and reversing a pre-registered prediction on summarization.
vstash: Local-First Hybrid Retrieval with Adaptive Fusion for LLM Agents cs.IR · 2026-04-16 · conditional · none · ref 14 · internal anchor
vstash shows that hybrid retrieval disagreements provide a free training signal to fine-tune 33M-parameter embeddings, yielding NDCG@10 gains up to 19.5% on NFCorpus and matching some larger models on three of five BEIR datasets.
When to Forget: A Memory Governance Primitive cs.AI · 2026-04-13 · unverdicted · none · ref 10 · internal anchor
Memory Worth converges almost surely to the conditional probability of task success given memory retrieval and correlates at rho=0.89 with ground-truth utility in controlled experiments.
ClawVM: Harness-Managed Virtual Memory for Stateful Tool-Using LLM Agents cs.AI · 2026-04-11 · unverdicted · none · ref 46 · internal anchor
ClawVM introduces a harness-managed virtual memory system for LLM agents that ensures deterministic residency and durability of state under token budgets by using typed pages and validated writeback.
PRIME: Training Free Proactive Reasoning via Iterative Memory Evolution for User-Centric Agent cs.AI · 2026-04-08 · unverdicted · none · ref 20 · internal anchor
PRIME enables agents to proactively reason in user-centric tasks by iteratively evolving structured memories from interaction trajectories without gradient-based training.
GRAB-ANNS: High-Throughput Indexing and Hybrid Search via GPU-Native Bucketing cs.DB · 2026-03-31 · unverdicted · none · ref 39 · internal anchor
GRAB-ANNS is a new GPU graph index that achieves up to 240x higher hybrid search throughput via bucket layouts and hybrid intra/inter-bucket edges.
Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents cs.CR · 2024-10-03 · unverdicted · none · ref 150 · internal anchor
ASB is a new benchmark that tests 10 prompt injection attacks, memory poisoning, a novel Plan-of-Thought backdoor attack, and 11 defenses on LLM agents across 13 models, finding attack success rates up to 84.3% and limited defense effectiveness.
Cognifold: Always-On Proactive Memory via Cognitive Folding cs.AI · 2026-05-13 · unverdicted · none · ref 64 · internal anchor
Cognifold is a new proactive memory architecture that folds event streams into emergent cognitive structures by extending complementary learning systems theory with a prefrontal intent layer and graph topology self-organization.
Context Training with Active Information Seeking cs.CL · 2026-05-13 · unverdicted · none · ref 55 · internal anchor
Adding active search tools to LLM context optimization works only when combined with a multi-candidate search-based training procedure that prunes contexts, delivering gains across low-resource translation, health, and reasoning benchmarks.
MAP: A Map-then-Act Paradigm for Long-Horizon Interactive Agent Reasoning cs.AI · 2026-05-13 · unverdicted · none · ref 37 · internal anchor
MAP improves LLM agent reasoning by constructing a structured cognitive map of the environment before task execution, yielding performance gains on benchmarks like ARC-AGI-3 and superior training data via the new MAP-2K dataset.
Learning, Fast and Slow: Towards LLMs That Adapt Continually cs.LG · 2026-05-12 · unverdicted · none · ref 64 · internal anchor
Fast-Slow Training combines slow parameter updates with fast context optimization to achieve up to 3x better sample efficiency, higher performance, less forgetting, and preserved plasticity in continual LLM learning.
PRISM: Pareto-Efficient Retrieval over Intent-Aware Structured Memory for Long-Horizon Agents cs.CL · 2026-05-12 · unverdicted · none · ref 25 · internal anchor
PRISM achieves higher accuracy than baselines on long-horizon agent tasks at an order-of-magnitude smaller context budget by combining hierarchical bundle search, query-sensitive costing, evidence compression, and adaptive intent routing over structured memory.
SAGE: A Self-Evolving Agentic Graph-Memory Engine for Structure-Aware Associative Memory cs.AI · 2026-05-12 · unverdicted · none · ref 235 · internal anchor
SAGE is a self-evolving agentic graph-memory engine that dynamically constructs and refines structured memory graphs via writer-reader feedback, yielding performance gains on multi-hop QA, open-domain retrieval, and long-term agent benchmarks.
Beyond Similarity Search: Tenure and the Case for Structured Belief State in LLM Memory cs.IR · 2026-05-11 · unverdicted · none · ref 11 · internal anchor
Tenure replaces similarity search with a structured belief store using scope isolation and alias-weighted BM25 retrieval, achieving 1.0 precision on 72 cases where cosine similarity scores 0.12.
Dynamic Skill Lifecycle Management for Agentic Reinforcement Learning cs.LG · 2026-05-11 · unverdicted · none · ref 62 · internal anchor
SLIM dynamically optimizes active external skills in agentic RL via leave-one-skill-out marginal contribution estimates and three lifecycle operations, outperforming baselines by 7.1% on ALFWorld and SearchQA while showing some skills are internalized and others remain external.
HAGE: Harnessing Agentic Memory via RL-Driven Weighted Graph Evolution cs.AI · 2026-05-11 · unverdicted · none · ref 5 · internal anchor
HAGE proposes a trainable weighted graph memory framework with LLM intent classification, dynamic edge modulation, and RL optimization that improves long-horizon reasoning accuracy in agentic LLMs over static baselines.
MemPrivacy: Privacy-Preserving Personalized Memory Management for Edge-Cloud Agents cs.CR · 2026-05-10 · unverdicted · none · ref 46 · 2 links · internal anchor
MemPrivacy uses edge detection of sensitive spans and type-aware placeholders to enable cloud-side memory management for LLM agents without exposing private data, achieving under 1.6% utility loss.
The Trap of Trajectory: Towards Understanding and Mitigating Spurious Correlations in Agentic Memory cs.LG · 2026-05-10 · unverdicted · none · ref 57 · internal anchor
Agentic memory improves clean reasoning but worsens performance when spurious patterns are present in stored trajectories; CAMEL calibration reduces this reliance while preserving clean performance.
Slipstream: Trajectory-Grounded Compaction Validation for Long-Horizon Agents cs.MA · 2026-05-09 · unverdicted · none · ref 95 · internal anchor
Slipstream uses asynchronous compaction with trajectory-grounded judge validation to improve long-horizon agent accuracy by up to 8.8 percentage points and reduce latency by up to 39.7%.
SkillLens: Adaptive Multi-Granularity Skill Reuse for Cost-Efficient LLM Agents cs.AI · 2026-05-08 · unverdicted · none · ref 26 · internal anchor
SkillLens organizes skills into policies-strategies-procedures-primitives layers, retrieves via degree-corrected random walk, and uses a verifier for local adaptation, yielding up to 6.31 pp gains on MuLocbench and raising ALFWorld success from 45% to 51.31%.
From History to State: Constant-Context Skill Learning for LLM Agents cs.AI · 2026-05-06 · unverdicted · none · ref 37 · internal anchor
Constant-context skill learning trains reusable task-family modules for LLM agents using a deterministic state block for progress tracking and subgoal rewards, achieving 89.6% unseen success on ALFWorld, 76.8% on WebShop, and 66.4% on SciWorld with Qwen3-8B while reducing prompt tokens 2-7x.
MemRouter: Memory-as-Embedding Routing for Long-Term Conversational Agents cs.CL · 2026-05-01 · unverdicted · none · ref 20 · internal anchor
A lightweight supervised router using frozen-LLM embeddings for memory admission decisions outperforms LLM-based memory managers in both F1 score and latency on the LoCoMo benchmark.
From Soliloquy to Agora: Memory-Enhanced LLM Agents with Decentralized Debate for Optimization Modeling math.OC · 2026-04-28 · unverdicted · none · ref 23 · internal anchor
Agora-Opt uses decentralized debate among LLM agent teams plus a read-write memory bank to produce more accurate optimization models from text than prior LLM methods.
Memanto: Typed Semantic Memory with Information-Theoretic Retrieval for Long-Horizon Agents cs.AI · 2026-04-23 · unverdicted · none · ref 9 · internal anchor
Memanto delivers 89.8% and 87.1% accuracy on LongMemEval and LoCoMo benchmarks using typed semantic memory and information-theoretic retrieval, outperforming hybrid graph and vector systems with a single query and zero ingestion cost.
Self-Aware Vector Embeddings for Retrieval-Augmented Generation: A Neuroscience-Inspired Framework for Temporal, Confidence-Weighted, and Relational Knowledge cs.IR · 2026-04-22 · unverdicted · none · ref 8 · internal anchor
SmartVector augments embeddings with time, confidence, and relation signals plus a consolidation process, raising top-1 accuracy on versioned queries from 31% to 62% on a synthetic benchmark while cutting stale answers and calibration error.
Stateless Decision Memory for Enterprise AI Agents cs.AI · 2026-04-22 · unverdicted · none · ref 9 · internal anchor
Deterministic Projection Memory (DPM) delivers stateless, deterministic decision memory for enterprise AI agents that matches or exceeds summarization-based approaches at tight memory budgets while improving speed, determinism, and auditability.
To Know is to Construct: Schema-Constrained Generation for Agent Memory cs.CL · 2026-04-22 · unverdicted · none · ref 20 · internal anchor
SCG-MEM reformulates agent memory access as schema-constrained generation within dynamic cognitive schemas, using assimilation and accommodation for updates plus an associative graph for reasoning, and outperforms retrieval baselines on the LoCoMo benchmark.
HiGMem: A Hierarchical and LLM-Guided Memory System for Long-Term Conversational Agents cs.CL · 2026-04-20 · unverdicted · none · ref 3 · internal anchor
HiGMem combines hierarchical event-turn memory with LLM-guided selection to retrieve concise relevant evidence from long dialogues, improving F1 scores and cutting retrieved turns by an order of magnitude on the LoCoMo10 benchmark.

A-MEM: Agentic Memory for LLM Agents

hub tools

citation-role summary

citation-polarity summary

claims ledger

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer