super hub Canonical reference

A-MEM: Agentic Memory for LLM Agents

Hang Gao, Juntao Tan, Kai Mei, Wujiang Xu, Yongfeng Zhang, Zujie Liang · 2025 · cs.CL · arXiv 2502.12110

Canonical reference. 75% of citing Pith papers cite this work as background.

213 Pith papers citing it

Background 75% of classified citations

open full Pith review browse 213 citing papers more from Hang Gao arXiv PDF

abstract

While large language model (LLM) agents can effectively use external tools for complex real-world tasks, they require memory systems to leverage historical experiences. Current memory systems enable basic storage and retrieval but lack sophisticated memory organization, despite recent attempts to incorporate graph databases. Moreover, these systems' fixed operations and structures limit their adaptability across diverse tasks. To address this limitation, this paper proposes a novel agentic memory system for LLM agents that can dynamically organize memories in an agentic way. Following the basic principles of the Zettelkasten method, we designed our memory system to create interconnected knowledge networks through dynamic indexing and linking. When a new memory is added, we generate a comprehensive note containing multiple structured attributes, including contextual descriptions, keywords, and tags. The system then analyzes historical memories to identify relevant connections, establishing links where meaningful similarities exist. Additionally, this process enables memory evolution - as new memories are integrated, they can trigger updates to the contextual representations and attributes of existing historical memories, allowing the memory network to continuously refine its understanding. Our approach combines the structured organization principles of Zettelkasten with the flexibility of agent-driven decision making, allowing for more adaptive and context-aware memory management. Empirical experiments on six foundation models show superior improvement against existing SOTA baselines. The source code for evaluating performance is available at https://github.com/WujiangXu/A-mem, while the source code of the agentic memory system is available at https://github.com/WujiangXu/A-mem-sys.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 30 baseline 6

citation-polarity summary

background 27 baseline 6 unclear 3

claims ledger

abstract While large language model (LLM) agents can effectively use external tools for complex real-world tasks, they require memory systems to leverage historical experiences. Current memory systems enable basic storage and retrieval but lack sophisticated memory organization, despite recent attempts to incorporate graph databases. Moreover, these systems' fixed operations and structures limit their adaptability across diverse tasks. To address this limitation, this paper proposes a novel agentic memory system for LLM agents that can dynamically organize memories in an agentic way. Following the basi

authors

Hang Gao Juntao Tan Kai Mei Wujiang Xu Yongfeng Zhang Zujie Liang

co-cited works

representative citing papers

Recalling Too Well: Sycophancy Evaluation and Mitigation in Memory-Augmented Models

cs.AI · 2026-06-09 · conditional · novelty 8.0

Memory augmentation in LLMs amplifies sycophancy up to 25x compared to in-context baselines due to lossy memory extraction, with two lightweight mitigations that reduce the effect while preserving recall.

RealICU: Do LLM Agents Understand Long-Context ICU Data? A Benchmark Beyond Behavior Imitation

cs.AI · 2026-05-13 · unverdicted · novelty 8.0

RealICU is a new benchmark using physician hindsight labels on MIMIC-IV ICU data that exposes LLM failures in long-horizon clinical assessment, acute problem detection, action recommendation, and red-flag identification.

MedMemoryBench: Benchmarking Agent Memory in Personalized Healthcare

cs.AI · 2026-05-12 · conditional · novelty 8.0

MedMemoryBench supplies a 2,000-session synthetic medical trajectory dataset and an evaluate-while-constructing streaming protocol to expose memory saturation and reasoning failures in current agent architectures for personalized healthcare.

Agent-BRACE: Decoupling Beliefs from Actions in Long-Horizon Tasks via Verbalized State Uncertainty

cs.CL · 2026-05-12 · unverdicted · novelty 8.0

Agent-BRACE improves LLM agent performance on long-horizon partially observable tasks by 5.3-14.5% through a decoupled belief state of verbalized atomic claims with certainty labels that keeps context length constant.

ShadowMerge: A Novel Poisoning Attack on Graph-Based Agent Memory via Relation-Channel Conflicts

cs.CR · 2026-05-09 · unverdicted · novelty 8.0 · 3 refs

ShadowMerge exploits relation-channel conflicts to poison graph-based agent memory, achieving 93.8% average attack success rate on Mem0 and real-world datasets while bypassing existing defenses.

AlpsBench: An LLM Personalization Benchmark for Real-Dialogue Memorization and Preference Alignment

cs.CL · 2026-03-09 · unverdicted · novelty 8.0

AlpsBench supplies 2500 real-dialogue sequences with verified memories to benchmark LLM extraction, updating, retrieval, and utilization of personalized information.

Seek to Segment: Active Perception for Panoramic Referring Segmentation

cs.CV · 2026-07-02 · unverdicted · novelty 7.0

Introduces APRS task and PanoSeeker agent using VLM plus EgoSphere memory for active 360° search and segmentation, outperforming baselines on a new benchmark.

Self-GC: Self-Governing Context for Long-Horizon LLM Agents

cs.AI · 2026-07-01 · unverdicted · novelty 7.0

Self-GC governs agent context as indexed objects with planner-proposed actions, achieving 84.85% no-impact on future continuations on a hard set versus 54-70% for baselines.

MEMPROBE: Probing Long-Term Agent Memory via Hidden User-State Recovery

cs.CL · 2026-06-23 · unverdicted · novelty 7.0

MEMPROBE is a benchmark for direct recovery of hidden user states from LLM agent memory, showing task success and memory recovery as distinct capabilities with moderate recovery scores around 0.6.

User as Engram: Internalizing Per-User Memory as Local Parametric Edits

cs.AI · 2026-06-17 · unverdicted · novelty 7.0

User facts are internalized as surgical local edits to a hash-keyed Engram memory table with reasoning skill held in a shared adapter, claimed to match LoRA recall, improve indirect reasoning 5.6x on average, and compose across users with 33,000x smaller footprint than per-user adapters.

RTSGameBench: An RTS Benchmark for Strategic Reasoning by Vision-Language Models

cs.AI · 2026-06-17 · unverdicted · novelty 7.0

RTSGameBench is a new extensible benchmark for VLMs using diverse RTS matchups, diagnostic mini-games targeting individual competencies, and a self-evolving query-to-game generator, with results showing poor VLM performance on tight coordination and large-scale tasks.

PreAct: Computer-Using Agents that Get Faster on Repeated Tasks

cs.AI · 2026-06-16 · unverdicted · novelty 7.0

PreAct compiles successful agent executions into verifiable state-machine programs for 8.5-13x faster replay on repeated tasks, with an independent evaluator check before storing each program.

MemTrace: Probing What Final Accuracy Misses in Long-Term Memory

cs.AI · 2026-06-15 · unverdicted · novelty 7.0

MemTrace shows that evidence utilization, not retrieval, is the dominant failure mode in LLM long-term memory systems across tested configurations.

Control-Plane Placement Shapes Forgetting: An Architectural Study of Agent Memory Across Thirteen System Configurations

cs.CL · 2026-06-14 · unverdicted · novelty 7.0

An empirical comparison of thirteen control-plane placements in agent memory pipelines identifies three regimes with complementary forgetting recovery on a new 385-case adversarial benchmark, with mutation-time placement achieving 91.7-93.2% overall.

Memory Beyond Recall: A Dual-Process Cognitive Memory System for Self-Evolving LLM Agents

cs.CL · 2026-06-08 · unverdicted · novelty 7.0

DCPM reorganizes LLM agent memory into a cognitive hierarchy driven by a synchronous daytime belief writer and an asynchronous nighttime schema engine, reporting gains on cross-session inference benchmarks.

Agent Memory: Characterization and System Implications of Stateful Long-Horizon Workloads

cs.AI · 2026-06-04 · unverdicted · novelty 7.0

The paper delivers the first systems characterization of agent memory, with a four-axis taxonomy, phase-aware profiler, evaluation of ten systems on two benchmarks, and ten design recommendations.

CollabSim: A CSCW-Grounded Methodology for Investigating Collaborative Competence of LLM Agents through Controlled Multi-Agent Experiments

cs.CL · 2026-06-04 · unverdicted · novelty 7.0

CollabSim is a new CSCW-grounded simulation framework that enables controlled multi-agent experiments to measure collaborative competence in LLM agents.

Worth Remembering: Surprise-Gated Robot Episodic Memory

cs.RO · 2026-06-02 · unverdicted · novelty 7.0

Surprise-gated episodic memory using V-JEPA-2 improves robot QA by ≥12% over prior memory methods and outperforms supervised baselines on event segmentation.

AVTrack: Audio-Visual Tracking in Human-centric Complex Scenes

cs.CV · 2026-06-01 · unverdicted · novelty 7.0

Introduces AVTrack dataset for audio-visual tracking in challenging human-centric scenes, demonstrating performance drops in existing methods.

ElasticMem: Latent Memory as a Learnable Resource for LLM Agents

cs.CL · 2026-05-29 · unverdicted · novelty 7.0

ElasticMem enables LLM agents to learn adaptive latent memory retrieval and elastic budget allocation, improving QA accuracy by 24-26% and ALFWorld success by 27-66% over baselines with lower token cost.

HEART-Bench: Do LLM Agents Exhibit Human-like Psychology?

cs.CL · 2026-05-28 · unverdicted · novelty 7.0

HEART-Bench evaluates LLM agents on psychological consistency using 11 Big-Five-grounded characters with 1,000 episodic memories each and 64 DIAMONDS-based decision scenarios, yielding 673 validated MCQs.

Personal Visual Memory from Explicit and Implicit Evidence

cs.CV · 2026-05-27 · unverdicted · novelty 7.0

VisualMem augments text memory with a visual module that resolves identity and durable user facts from images, outperforming prior systems on a new benchmark for explicit and implicit personal visual evidence.

VitaBench 2.0: Evaluating Personalized and Proactive Agents in Long-Term User Interactions

cs.AI · 2026-05-26 · unverdicted · novelty 7.0

VitaBench 2.0 introduces a benchmark for long-term personalized and proactive agent behavior, with results indicating substantial gaps in current frontier LLMs.

Memory Architectures for Multi-Turn Text-to-SQL: A Benchmark and Empirical Study

cs.CL · 2026-05-25 · unverdicted · novelty 7.0

EnterpriseMem-Bench shows stateless multi-turn Text-to-SQL accuracy drops to zero by turn 3, working memory is the main driver of gains, and additional memory components yield model- and dataset-dependent effects from +14 to -16 percentage points.

citing papers explorer

Showing 50 of 86 citing papers after filters.

Recalling Too Well: Sycophancy Evaluation and Mitigation in Memory-Augmented Models cs.AI · 2026-06-09 · conditional · none · ref 28 · internal anchor
Memory augmentation in LLMs amplifies sycophancy up to 25x compared to in-context baselines due to lossy memory extraction, with two lightweight mitigations that reduce the effect while preserving recall.
RealICU: Do LLM Agents Understand Long-Context ICU Data? A Benchmark Beyond Behavior Imitation cs.AI · 2026-05-13 · unverdicted · none · ref 33 · internal anchor
RealICU is a new benchmark using physician hindsight labels on MIMIC-IV ICU data that exposes LLM failures in long-horizon clinical assessment, acute problem detection, action recommendation, and red-flag identification.
MedMemoryBench: Benchmarking Agent Memory in Personalized Healthcare cs.AI · 2026-05-12 · conditional · none · ref 37 · internal anchor
MedMemoryBench supplies a 2,000-session synthetic medical trajectory dataset and an evaluate-while-constructing streaming protocol to expose memory saturation and reasoning failures in current agent architectures for personalized healthcare.
Self-GC: Self-Governing Context for Long-Horizon LLM Agents cs.AI · 2026-07-01 · unverdicted · none · ref 62 · internal anchor
Self-GC governs agent context as indexed objects with planner-proposed actions, achieving 84.85% no-impact on future continuations on a hard set versus 54-70% for baselines.
User as Engram: Internalizing Per-User Memory as Local Parametric Edits cs.AI · 2026-06-17 · unverdicted · none · ref 61 · internal anchor
User facts are internalized as surgical local edits to a hash-keyed Engram memory table with reasoning skill held in a shared adapter, claimed to match LoRA recall, improve indirect reasoning 5.6x on average, and compose across users with 33,000x smaller footprint than per-user adapters.
RTSGameBench: An RTS Benchmark for Strategic Reasoning by Vision-Language Models cs.AI · 2026-06-17 · unverdicted · none · ref 57 · internal anchor
RTSGameBench is a new extensible benchmark for VLMs using diverse RTS matchups, diagnostic mini-games targeting individual competencies, and a self-evolving query-to-game generator, with results showing poor VLM performance on tight coordination and large-scale tasks.
PreAct: Computer-Using Agents that Get Faster on Repeated Tasks cs.AI · 2026-06-16 · unverdicted · none · ref 55 · internal anchor
PreAct compiles successful agent executions into verifiable state-machine programs for 8.5-13x faster replay on repeated tasks, with an independent evaluator check before storing each program.
MemTrace: Probing What Final Accuracy Misses in Long-Term Memory cs.AI · 2026-06-15 · unverdicted · none · ref 25 · internal anchor
MemTrace shows that evidence utilization, not retrieval, is the dominant failure mode in LLM long-term memory systems across tested configurations.
Agent Memory: Characterization and System Implications of Stateful Long-Horizon Workloads cs.AI · 2026-06-04 · unverdicted · none · ref 31 · internal anchor
The paper delivers the first systems characterization of agent memory, with a four-axis taxonomy, phase-aware profiler, evaluation of ten systems on two benchmarks, and ten design recommendations.
VitaBench 2.0: Evaluating Personalized and Proactive Agents in Long-Term User Interactions cs.AI · 2026-05-26 · unverdicted · none · ref 35 · internal anchor
VitaBench 2.0 introduces a benchmark for long-term personalized and proactive agent behavior, with results indicating substantial gaps in current frontier LLMs.
Personalize-then-Store: Benchmarking and Learning Personalized Memory for Long-horizon Agents cs.AI · 2026-05-25 · unverdicted · none · ref 4 · internal anchor
Introduces PerMemBench benchmark for personalized memory and shows session-level gating yields retention gains under perfect decisions but accurate gating is an open challenge.
EXG: Self-Evolving Agents with Experience Graphs cs.AI · 2026-05-18 · unverdicted · none · ref 37 · internal anchor
EXG is an experience graph framework for self-evolving LLM agents that supports online real-time growth and offline reuse to enhance solution quality and efficiency on code generation and reasoning benchmarks.
ClawForge: Generating Executable Interactive Benchmarks for Command-Line Agents cs.AI · 2026-05-13 · unverdicted · none · ref 1 · 2 links · internal anchor
ClawForge is a generator framework that creates reproducible executable benchmarks for command-line agents under state conflict, with ClawForge-Bench showing frontier models reach at most 45.3% strict accuracy and that state inspection drives most performance gaps.
Formalize, Don't Optimize: The Heuristic Trap in LLM-Generated Combinatorial Solvers cs.AI · 2026-05-12 · unverdicted · none · ref 17 · internal anchor
LLM-generated combinatorial solvers achieve highest correctness when the model formalizes problems for verified backends rather than attempting to optimize search, which often causes regressions.
Remember the Decision, Not the Description: A Rate-Distortion Framework for Agent Memory cs.AI · 2026-05-11 · unverdicted · none · ref 44 · internal anchor
Memory for long-horizon agents should preserve distinctions that affect decisions under a fixed budget, not descriptive features, yielding an exact forgetting boundary and a new online learner DeMem with regret guarantees.
MAGE: Multi-Agent Self-Evolution with Co-Evolutionary Knowledge Graphs cs.AI · 2026-05-11 · unverdicted · none · ref 23 · internal anchor
MAGE uses a four-subgraph co-evolutionary knowledge graph plus dual bandits to externalize and retrieve experience for stable self-evolution of frozen language-model agents, showing gains on nine diverse benchmarks.
EquiMem: Calibrating Shared Memory in Multi-Agent Debate via Game-Theoretic Equilibrium cs.AI · 2026-05-10 · unverdicted · none · ref 84 · internal anchor
EquiMem calibrates shared memory in multi-agent debate by computing a game-theoretic equilibrium from agent queries and paths, outperforming heuristics and LLM validators across benchmarks while remaining robust to adversarial agents.
When Stored Evidence Stops Being Usable: Scale-Conditioned Evaluation of Agent Memory cs.AI · 2026-05-08 · unverdicted · none · ref 42 · internal anchor
A new evaluation protocol shows agent memory reliability degrades variably with added irrelevant sessions depending on agent, memory interface, and scale.
Belief Memory: Agent Memory Under Partial Observability cs.AI · 2026-05-07 · unverdicted · none · ref 16 · 2 links · internal anchor
BeliefMem is a probabilistic memory architecture for LLM agents that retains multiple candidate conclusions with probabilities updated by Noisy-OR, achieving superior average performance over deterministic baselines on LoCoMo and ALFWorld.
MEMAUDIT: An Exact Package-Oracle Evaluation Protocol for Budgeted Long-Term LLM Memory Writing cs.AI · 2026-05-04 · unverdicted · none · ref 13 · internal anchor
MEMAUDIT is a new exact optimization protocol for evaluating budgeted LLM memory writing that uses package-oracle fixes and MILP solvers to separate representation quality, validity preservation, and selection effects.
Four-Axis Decision Alignment for Long-Horizon Enterprise AI Agents cs.AI · 2026-04-21 · unverdicted · none · ref 9 · internal anchor
Long-horizon enterprise AI agents' decisions decompose into four measurable axes, with benchmark experiments on six memory architectures revealing distinct weaknesses and reversing a pre-registered prediction on summarization.
When to Forget: A Memory Governance Primitive cs.AI · 2026-04-13 · unverdicted · none · ref 10 · internal anchor
Memory Worth converges almost surely to the conditional probability of task success given memory retrieval and correlates at rho=0.89 with ground-truth utility in controlled experiments.
ClawVM: Harness-Managed Virtual Memory for Stateful Tool-Using LLM Agents cs.AI · 2026-04-11 · unverdicted · none · ref 46 · internal anchor
ClawVM introduces a harness-managed virtual memory system for LLM agents that ensures deterministic residency and durability of state under token budgets by using typed pages and validated writeback.
PRIME: Training Free Proactive Reasoning via Iterative Memory Evolution for User-Centric Agent cs.AI · 2026-04-08 · unverdicted · none · ref 20 · internal anchor
PRIME enables agents to proactively reason in user-centric tasks by iteratively evolving structured memories from interaction trajectories without gradient-based training.
PERMA: Benchmarking Personalized Memory Agents via Event-Driven Preference and Realistic Task Environments cs.AI · 2026-03-24 · unverdicted · none · ref 71 · internal anchor
PERMA is a new benchmark using temporally ordered events, text variability, and linguistic alignment to evaluate LLM memory agents on persona consistency beyond simple retrieval.
$How^{2}$: How to learn from procedural How-to questions cs.AI · 2025-10-13 · unverdicted · none · ref 3 · internal anchor
$How^{2}$ is a memory agent framework enabling agents to ask, store, and reuse answers to how-to questions at varying abstraction levels for better lifelong planning in environments like Plancraft.
Episodic-to-Semantic Consolidation Without Identity Drift cs.AI · 2026-07-02 · unverdicted · none · ref 14 · internal anchor
A deterministic episodic-to-semantic consolidation function with a structural lemma proving identity invariance, demonstrated in synthetic experiments on an embodied service agent.
Mastermind: Strategy-grounded Learning for Repository-Scale Vulnerability Reproduction cs.AI · 2026-07-02 · unverdicted · none · ref 10 · internal anchor
Mastermind's dual-loop planner learns transferable strategies via SFT and milestone GRPO, raising GPT-5.5 executor pass rate on 200 held-out CyberGym tasks from 60% to 84.5%.
AutoMem: Automated Learning of Memory as a Cognitive Skill cs.AI · 2026-07-01 · unverdicted · none · ref 14 · internal anchor
AutoMem automates memory structure revision and proficiency training in LLMs, delivering 2x-4x performance gains on long-horizon games without altering task-action behavior.
MetaPS: Adaptive Programmatic Strategy Selection for Market Agents cs.AI · 2026-06-21 · unverdicted · none · ref 131 · internal anchor
MetaPS trains models via simulation rollouts to select from programmatic strategy libraries for market agents, yielding better performance than fixed or direct LLM baselines across model sizes.
Nous: A Predictive World Model for Long-Term Agent Memory cs.AI · 2026-06-20 · unverdicted · none · ref 18 · internal anchor
Nous is a predictive world model for agent memory that maintains categorical probability distributions per entity-attribute pair, updates them with closed-form Bayesian posteriors on information-theoretic surprise, stores belief deltas, and achieves F1 scores of 63.50/55.32/58.57/62.50 on LoCoMo sin
GitOfThoughts: Version-Controlled Reasoning and Agent Memory You Can Replay, Diff, and Merge cs.AI · 2026-06-12 · unverdicted · none · ref 22 · internal anchor
GitOfThoughts stores agent reasoning as a git repo and shows memory from past problems improves accuracy only when new problems are nearly identical (cosine similarity >0.8), with self-consistency providing the main gain on novel tasks.
SKILL.nb: Selective Formalization and Gated Execution for Durable Agent Workflows cs.AI · 2026-06-06 · unverdicted · none · ref 48 · internal anchor
SKILL.nb uses selective formalization and gate-conditioned execution in auditable notebooks to improve durability of agent workflows, achieving 53.7% success on WebArena-Verified with 91.7% retention across re-executions.
Memory is Reconstructed, Not Retrieved: Graph Memory for LLM Agents cs.AI · 2026-06-04 · unverdicted · none · ref 62 · internal anchor
MRAgent combines a Cue-Tag-Content associative graph with active reconstruction to enable dynamic memory access in LLM agents, reporting up to 23% gains on long-memory benchmarks with lower token costs.
AgentFugue: Agent Scaling for Long-Horizon Tasks through Collective Reasoning cs.AI · 2026-05-23 · unverdicted · none · ref 62 · internal anchor
AgentFugue introduces a plug-in shared reasoning hub trained with SFT and RL that enables peer agents to share intermediate reasoning, yielding gains on long-horizon tasks over strong baselines.
SAM: State-Adaptive Memory for Long-Horizon Reasoning Agent cs.AI · 2026-05-23 · unverdicted · none · ref 38 · internal anchor
SAM is a standalone memory framework for long-horizon LLM agents that creates state-adaptive cues from interactions, preserves raw trajectories for intent-driven recall, and optimizes the module via expert supervision and RL, outperforming baselines on BrowseComp and related benchmarks.
State Contamination in Memory-Augmented LLM Agents cs.AI · 2026-05-16 · unverdicted · none · ref 23 · internal anchor
Toxic context can be laundered into memory summaries that stay below toxicity thresholds while still driving higher downstream toxicity in LLM agents compared to neutral baselines.
MAP: A Map-then-Act Paradigm for Long-Horizon Interactive Agent Reasoning cs.AI · 2026-05-13 · unverdicted · none · ref 37 · internal anchor
MAP improves LLM agent reasoning by constructing a structured cognitive map of the environment before task execution, yielding performance gains on benchmarks like ARC-AGI-3 and superior training data via the new MAP-2K dataset.
Goal-Oriented Reasoning for RAG-based Memory in Conversational Agentic LLM Systems cs.AI · 2026-05-12 · unverdicted · none · ref 35 · 2 links · internal anchor
Goal-Mem decomposes user goals into subgoals for targeted memory retrieval using Natural Language Logic, improving performance on multi-hop reasoning tasks in conversational agents.
SAGE: A Self-Evolving Agentic Graph-Memory Engine for Structure-Aware Associative Memory cs.AI · 2026-05-12 · unverdicted · none · ref 235 · internal anchor
SAGE is a self-evolving agentic graph-memory engine that dynamically constructs and refines structured memory graphs via writer-reader feedback, yielding performance gains on multi-hop QA, open-domain retrieval, and long-term agent benchmarks.
HAGE: Harnessing Agentic Memory via RL-Driven Weighted Graph Evolution cs.AI · 2026-05-11 · unverdicted · none · ref 5 · internal anchor
HAGE proposes a trainable weighted graph memory framework with LLM intent classification, dynamic edge modulation, and RL optimization that improves long-horizon reasoning accuracy in agentic LLMs over static baselines.
Empowering VLMs for Few-Shot Multimodal Time Series Classification via Tailored Agentic Reasoning cs.AI · 2026-05-10 · unverdicted · none · ref 54 · 2 links · internal anchor
MarsTSC is a VLM agentic system with generator, reflector, and modifier roles that iteratively refines a knowledge bank to improve few-shot multimodal time series classification and produce human-readable explanations.
SkillLens: Adaptive Multi-Granularity Skill Reuse for Cost-Efficient LLM Agents cs.AI · 2026-05-08 · unverdicted · none · ref 26 · internal anchor
SkillLens organizes skills into policies-strategies-procedures-primitives layers, retrieves via degree-corrected random walk, and uses a verifier for local adaptation, yielding up to 6.31 pp gains on MuLocbench and raising ALFWorld success from 45% to 51.31%.
From History to State: Constant-Context Skill Learning for LLM Agents cs.AI · 2026-05-06 · unverdicted · none · ref 37 · internal anchor
Constant-context skill learning trains reusable task-family modules for LLM agents using a deterministic state block for progress tracking and subgoal rewards, achieving 89.6% unseen success on ALFWorld, 76.8% on WebShop, and 66.4% on SciWorld with Qwen3-8B while reducing prompt tokens 2-7x.
Memanto: Typed Semantic Memory with Information-Theoretic Retrieval for Long-Horizon Agents cs.AI · 2026-04-23 · unverdicted · none · ref 9 · internal anchor
Memanto delivers 89.8% and 87.1% accuracy on LongMemEval and LoCoMo benchmarks using typed semantic memory and information-theoretic retrieval, outperforming hybrid graph and vector systems with a single query and zero ingestion cost.
Stateless Decision Memory for Enterprise AI Agents cs.AI · 2026-04-22 · unverdicted · none · ref 9 · internal anchor
Deterministic Projection Memory (DPM) delivers stateless, deterministic decision memory for enterprise AI agents that matches or exceeds summarization-based approaches at tight memory budgets while improving speed, determinism, and auditability.
Training LLM Agents for Spontaneous, Reward-Free Self-Evolution via World Knowledge Exploration cs.AI · 2026-04-20 · unverdicted · none · ref 17 · internal anchor
LLM agents trained with a task-success reward on self-generated knowledge can spontaneously explore and adapt to new environments without any rewards or instructions at inference, yielding 20% gains on web tasks and allowing a 14B model to beat Gemini-2.5-Flash.
GAM: Hierarchical Graph-based Agentic Memory for LLM Agents cs.AI · 2026-04-14 · unverdicted · none · ref 27 · internal anchor
GAM decouples event-level memory encoding from topic-level consolidation in LLM agents using hierarchical graphs to reduce interference and improve long-term coherence and retrieval.
MEMENTO: Teaching LLMs to Manage Their Own Context cs.AI · 2026-04-10 · unverdicted · none · ref 30 · internal anchor
MEMENTO trains LLMs to segment reasoning into blocks, generate mementos as dense summaries, and reason forward using only mementos and KV states, cutting peak KV cache by ~2.5x while preserving benchmark accuracy.
ACF: A Collaborative Framework for Agent Covert Communication under Cognitive Asymmetry cs.AI · 2026-04-09 · unverdicted · none · ref 30 · internal anchor
ACF structurally decouples covert communication from semantic reasoning in agent networks using a shared steganographic configuration to maintain performance under cognitive asymmetry.

A-MEM: Agentic Memory for LLM Agents

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer