SimpleMem: Efficient Lifelong Memory for LLM Agents

Cihang Xie; Huaxiu Yao; Jiaqi Liu; Mingyu Ding; Peng Xia; Siwei Han; Yaofeng Su; Zeyu Zheng

arxiv: 2601.02553 · v3 · pith:6W2BJYNYnew · submitted 2026-01-05 · 💻 cs.AI

SimpleMem: Efficient Lifelong Memory for LLM Agents

Jiaqi Liu , Yaofeng Su , Peng Xia , Siwei Han , Zeyu Zheng , Cihang Xie , Mingyu Ding , Huaxiu Yao This is my paper

Pith reviewed 2026-05-22 08:05 UTC · model grok-4.3

classification 💻 cs.AI

keywords LLM agentsmemory compressionsemantic synthesisintent-aware retrievallifelong memorytoken efficiencycontext management

0 comments

The pith

SimpleMem compresses unstructured LLM agent interactions into compact multi-view memory units via a three-stage semantic pipeline, preserving critical details while cutting token costs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents SimpleMem as a memory framework that replaces either full history retention or expensive iterative filtering with semantic lossless compression. Its pipeline first distills raw interactions into structured indexed units, then synthesizes related context on the fly to remove redundancy, and finally plans retrieval by inferring user intent to assemble only the needed context. If this holds, agents could sustain accurate performance across much longer sessions without the quadratic token blowup that currently limits complex environments. A sympathetic reader cares because lifelong memory is a bottleneck for any agent meant to operate over days or weeks rather than single turns.

Core claim

By distilling interactions through Semantic Structured Compression into compact multi-view indexed units, followed by intra-session Online Semantic Synthesis that merges related context into unified abstracts and Intent-Aware Retrieval Planning that infers search intent to set retrieval scope, the method produces memory representations that maintain task-critical information while dramatically lowering inference-time token use.

What carries the argument

The three-stage pipeline (Semantic Structured Compression into multi-view indexed units, Online Semantic Synthesis for intra-session abstraction, and Intent-Aware Retrieval Planning) that turns raw interaction histories into high-density, query-adaptive memory.

If this is right

Agents achieve an average 26.4% F1 gain on LoCoMo while consuming up to 30 times fewer tokens at inference time.
Memory size stays bounded even as interaction length grows, because redundancy is removed at synthesis time rather than stored.
Retrieval becomes more precise because intent inference dynamically limits scope instead of pulling broad context windows.
The same pipeline can be applied across sessions, turning episodic memory into a growing but compact lifelong store.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could be combined with external knowledge bases by treating retrieved documents as additional input to the synthesis stage.
If the compression remains lossless at scale, similar pipelines might reduce context length requirements for other long-horizon reasoning tasks such as multi-turn planning or code maintenance.
Real-world deployment would still need safeguards against drift if the intent inference model itself hallucinates the wrong retrieval scope.

Load-bearing premise

The compression steps preserve every task-critical detail from the original unstructured interactions without any information loss that would affect downstream agent decisions.

What would settle it

An experiment that replays the same long interaction trace through SimpleMem and a full-history baseline, then measures whether the agent produces identical answers on questions that depend on a single early detail omitted from the compressed memory.

read the original abstract

To support long-term interaction in complex environments, LLM agents require memory systems that manage historical experiences. Existing approaches either retain full interaction histories via passive context extension, leading to substantial redundancy, or rely on iterative reasoning to filter noise, incurring high token costs. To address this challenge, we introduce SimpleMem, an efficient memory framework based on semantic lossless compression. We propose a three-stage pipeline designed to maximize information density and token utilization: (1) Semantic Structured Compression, which distills unstructured interactions into compact, multi-view indexed memory units; (2) Online Semantic Synthesis, an intra-session process that instantly integrates related context into unified abstract representations to eliminate redundancy; and (3) Intent-Aware Retrieval Planning, which infers search intent to dynamically determine retrieval scope and construct precise context efficiently. Experiments on benchmark datasets show that our method consistently outperforms baseline approaches in accuracy, retrieval efficiency, and inference cost, achieving an average F1 improvement of 26.4% in LoCoMo while reducing inference-time token consumption by up to 30-fold, demonstrating a superior balance between performance and efficiency. Code is available at https://github.com/aiming-lab/SimpleMem.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SimpleMem combines structured compression, online synthesis, and intent-aware planning into a practical pipeline for LLM agent memory, with reported efficiency gains, but the lossless compression claim lacks clear supporting metrics.

read the letter

SimpleMem's headline is a three-stage pipeline for efficient lifelong memory in LLM agents that promises better accuracy and much lower token use through semantic compression. The main things to know are that it combines structured compression, intra-session synthesis, and intent-aware planning, and it reports solid-looking gains on benchmarks. What is actually new is the way these pieces fit together. Previous work either keeps everything or filters with heavy reasoning. Here, they distill interactions into compact multi-view memory units, then synthesize related stuff online to remove duplicates, and finally plan retrieval based on what the current intent seems to be. That integration looks like a practical advance for agents that need to remember over long periods without blowing up the context window. The paper does well on the practical side. It identifies the redundancy problem clearly and offers a concrete way to address it. The results claim an average 26.4% F1 boost on LoCoMo and up to 30 times less tokens at inference. Having the GitHub link means the implementation is out there for checking. Soft spots come in the evaluation of the compression step. The idea of semantic lossless compression is central, but there's little shown to confirm that no important details are lost in turning raw interactions into those indexed units. Without metrics for preservation or ablations that test what happens when details are dropped, the accuracy improvements could be driven more by the retrieval planning than by the compression quality. The stress-test note captures this well. If the full experiments include those checks, it would help a lot, but based on the description, it's a gap that needs filling. This kind of paper is for researchers and engineers who build LLM-based agents for tasks that span many turns or sessions. A reader looking for ways to make memory more scalable would get value from the pipeline description and the efficiency numbers. It deserves a serious referee. The work is grounded in a real problem and offers a full system with code, even if some claims need more support. I recommend sending it to peer review.

Referee Report

2 major / 2 minor

Summary. The paper introduces SimpleMem, a memory framework for LLM agents that employs semantic lossless compression via a three-stage pipeline: (1) Semantic Structured Compression to distill interactions into compact multi-view indexed units, (2) Online Semantic Synthesis for intra-session redundancy elimination through unified abstract representations, and (3) Intent-Aware Retrieval Planning to dynamically scope retrieval based on inferred intent. Experiments on benchmark datasets are reported to show consistent outperformance over baselines, with an average 26.4% F1 gain on LoCoMo and up to 30-fold reduction in inference-time token consumption.

Significance. If the experimental claims hold under rigorous validation, the work could meaningfully advance efficient lifelong memory for LLM agents by balancing information density with reduced token costs during inference. The public code release at the cited GitHub repository supports reproducibility and is a clear strength.

major comments (2)

[Experiments / Results] The central performance claims (26.4% F1 improvement on LoCoMo and 30-fold token reduction) rest on the three-stage pipeline producing memory units that preserve all task-critical details, yet no ablation, information-theoretic metric, or explicit verification of semantic lossless compression is supplied in the experimental section to rule out systematic omission of entities or relations.
[Experiments] The abstract and results report specific quantitative gains without describing the baseline implementations, dataset characteristics, number of runs, statistical tests, or error analysis; this leaves the robustness of the accuracy, efficiency, and cost comparisons difficult to evaluate.

minor comments (2)

[Introduction] The term 'semantic lossless compression' is used repeatedly but never formally defined or contrasted with lossy alternatives; a brief operational definition would improve clarity.
[Figures and Tables] Figure captions and table headers should explicitly state the evaluation metrics (e.g., F1, token count) and the exact baselines being compared.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and outline the revisions we will make to strengthen the experimental validation and reporting.

read point-by-point responses

Referee: [Experiments / Results] The central performance claims (26.4% F1 improvement on LoCoMo and 30-fold token reduction) rest on the three-stage pipeline producing memory units that preserve all task-critical details, yet no ablation, information-theoretic metric, or explicit verification of semantic lossless compression is supplied in the experimental section to rule out systematic omission of entities or relations.

Authors: We agree that explicit verification of information preservation would strengthen the claims. The Semantic Structured Compression stage is designed to retain task-critical details by extracting and indexing entities, relations, and temporal attributes into multi-view structures, while Online Semantic Synthesis unifies redundant intra-session content without discarding unique facts. However, we acknowledge the absence of dedicated ablations or metrics in the current experimental section. In the revised manuscript, we will add an ablation study isolating each pipeline stage and report an information-retention metric based on entity and relation overlap (via automated extraction) between original interactions and compressed memory units. This will directly address concerns about potential systematic omissions. revision: yes
Referee: [Experiments] The abstract and results report specific quantitative gains without describing the baseline implementations, dataset characteristics, number of runs, statistical tests, or error analysis; this leaves the robustness of the accuracy, efficiency, and cost comparisons difficult to evaluate.

Authors: We concur that greater transparency on experimental setup is required. The current manuscript provides high-level comparisons but lacks granular details on implementation and statistical rigor. In the revision, we will expand the Experiments section with: (i) precise descriptions of baseline adaptations (including prompt templates and memory management logic for methods such as MemGPT and full-context baselines), (ii) dataset statistics (e.g., number of sessions, average turns per session, and domain coverage for LoCoMo and other benchmarks), (iii) results averaged over multiple runs with standard deviations, (iv) statistical significance testing (paired t-tests with p-values), and (v) a categorized error analysis highlighting cases of retrieval failure versus compression-induced loss. These additions will improve evaluability without altering the core claims. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper introduces SimpleMem as an empirical three-stage pipeline (Semantic Structured Compression, Online Semantic Synthesis, Intent-Aware Retrieval Planning) for lifelong memory in LLM agents and supports its claims through benchmark experiments reporting F1 gains and token reductions. No load-bearing mathematical derivations, equations, fitted parameters renamed as predictions, or self-citation chains appear in the provided text; the central claims rest on external experimental outcomes rather than any reduction of results to inputs by construction. The approach is therefore self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the domain assumption that semantic compression of dialogues can be performed losslessly for downstream agent tasks; no free parameters or invented entities are visible in the abstract.

axioms (1)

domain assumption Semantic structured compression preserves all task-critical information from unstructured interactions
Invoked in the description of the first pipeline stage as the basis for compact memory units.

pith-pipeline@v0.9.0 · 5753 in / 1147 out tokens · 44616 ms · 2026-05-22T08:05:48.249230+00:00 · methodology

discussion (0)

Forward citations

Cited by 27 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

MedMemoryBench: Benchmarking Agent Memory in Personalized Healthcare
cs.AI 2026-05 conditional novelty 8.0

MedMemoryBench supplies a 2,000-session synthetic medical trajectory dataset and an evaluate-while-constructing streaming protocol to expose memory saturation and reasoning failures in current agent architectures for ...
MemGym: a Long-Horizon Memory Environment for LLM Agents
cs.CL 2026-05 unverdicted novelty 7.0

MemGym unifies agent gyms into a memory benchmark with isolated scoring across tool-use, research, coding, and computer-use regimes plus a lightweight reward model for tractable coding evaluation.
ClawForge: Generating Executable Interactive Benchmarks for Command-Line Agents
cs.AI 2026-05 conditional novelty 7.0

ClawForge supplies a generator that turns scenario templates into reproducible command-line tasks testing state conflict handling, where the strongest frontier model scores only 45.3 percent strict accuracy.
ClawForge: Generating Executable Interactive Benchmarks for Command-Line Agents
cs.AI 2026-05 unverdicted novelty 7.0

ClawForge is a generator framework that creates reproducible executable benchmarks for command-line agents under state conflict, with ClawForge-Bench showing frontier models reach at most 45.3% strict accuracy and tha...
EvolveMem:Self-Evolving Memory Architecture via AutoResearch for LLM Agents
cs.LG 2026-05 unverdicted novelty 7.0

EvolveMem enables autonomous self-evolution of LLM memory retrieval configurations via LLM diagnosis and safeguards, delivering 25.7% gains over strong baselines on LoCoMo and 18.9% on MemBench with positive cross-ben...
RewardHarness: Self-Evolving Agentic Post-Training
cs.AI 2026-05 unverdicted novelty 7.0

RewardHarness self-evolves a tool-and-skill library from 100 preference examples to reach 47.4% accuracy on image-edit evaluation, beating GPT-5, and yields stronger RL-tuned models.
Latent Preference Modeling for Cross-Session Personalized Tool Calling
cs.CL 2026-04 unverdicted novelty 7.0

Introduces MPT benchmark and PRefine method that models user preferences as evolving hypotheses to improve personalized tool calling accuracy with 1.24% of full-history token cost.
SensorPersona: An LLM-Empowered System for Continual Persona Extraction from Longitudinal Mobile Sensor Streams
cs.CL 2026-03 unverdicted novelty 7.0

SensorPersona uses LLMs for hierarchical reasoning on longitudinal mobile sensor streams to continually extract stable personas, showing up to 31.4% higher recall and 85.7% win rate over baselines on a 20-user dataset.
Self-Evolving Multi-Agent Systems via Decentralized Memory
cs.MA 2026-05 unverdicted novelty 6.0

DecentMem is a decentralized dual-pool memory framework for self-evolving multi-agent systems that provides O(log T) regret guarantees and yields up to 23.8% accuracy gains over centralized baselines.
EvoIR-Agent: Self-Evolving Image Restoration Agentic System via Experience-Driven Learning
cs.CV 2026-05 unverdicted novelty 6.0

EvoIR-Agent formulates experience components into a hierarchical pool with a self-evolving update mechanism to improve performance and efficiency of training-free MLLM image restoration agents over prior paradigms.
Auto-Dreamer: Learning Offline Memory Consolidation for Language Agents
cs.CL 2026-05 unverdicted novelty 6.0

Auto-Dreamer trains an offline memory consolidator via GRPO on agent performance to abstract cross-session patterns, outperforming baselines by 7 points on ScienceWorld with 12x smaller memory and generalizing to ALFW...
MementoGUI: Learning Agentic Multimodal Memory Control for Long-Horizon GUI Agents
cs.CV 2026-05 conditional novelty 6.0

MementoGUI introduces a modular memory-control framework with working and episodic memory operators that improves long-horizon GUI agent performance over history-replay and text-only baselines.
DimMem: Dimensional Structuring for Efficient Long-Term Agent Memory
cs.CL 2026-05 unverdicted novelty 6.0

DimMem introduces a dimensional memory framework that structures memories as typed atomic units to improve retrieval efficiency and accuracy for long-term LLM agent tasks.
PRISM: Pareto-Efficient Retrieval over Intent-Aware Structured Memory for Long-Horizon Agents
cs.CL 2026-05 unverdicted novelty 6.0

PRISM achieves higher accuracy than baselines on long-horizon agent tasks at an order-of-magnitude smaller context budget by combining hierarchical bundle search, query-sensitive costing, evidence compression, and ada...
SkillGraph: Skill-Augmented Reinforcement Learning for Agents via Evolving Skill Graphs
cs.CL 2026-05 unverdicted novelty 6.0

SkillGraph represents skills as nodes in an evolving directed graph with typed dependency edges and updates the graph from RL trajectories to boost compositional task performance.
Dynamic Skill Lifecycle Management for Agentic Reinforcement Learning
cs.LG 2026-05 unverdicted novelty 6.0

SLIM dynamically optimizes active external skills in agentic RL via leave-one-skill-out marginal contribution estimates and three lifecycle operations, outperforming baselines by 7.1% on ALFWorld and SearchQA while sh...
SkillMaster: Toward Autonomous Skill Mastery in LLM Agents
cs.AI 2026-05 unverdicted novelty 6.0

SkillMaster is a training framework that lets LLM agents autonomously propose, update, and apply skills, yielding 8.8% and 9.3% higher success rates on ALFWorld and WebShop than prior methods.
SkillMaster: Toward Autonomous Skill Mastery in LLM Agents
cs.AI 2026-05 unverdicted novelty 6.0

SkillMaster enables LLM agents to autonomously develop skills via trajectory review, counterfactual evaluation, and DualAdv-GRPO training, boosting success rates by 8.8% on ALFWorld and 9.3% on WebShop.
Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning
cs.AI 2026-05 unverdicted novelty 6.0

Skill1 trains one policy to jointly evolve skill query generation, re-ranking, task solving, and distillation from a single task-success signal, with low-frequency trends crediting selection and high-frequency variati...
FileGram: Grounding Agent Personalization in File-System Behavioral Traces
cs.CV 2026-04 unverdicted novelty 6.0

FileGram grounds AI agent personalization in file-system behavioral traces via a data simulation engine, a diagnostic benchmark, and a bottom-up memory architecture.
Cognis: Context-Aware Memory for Conversational AI Agents
cs.CL 2026-03 unverdicted novelty 6.0

Cognis is a unified memory system for LLM agents that combines BM25 keyword matching with vector search, context-aware ingestion for version tracking, and reranking to achieve state-of-the-art results on LoCoMo and Lo...
HyMem: Hybrid Memory Architecture with Dynamic Retrieval Scheduling
cs.AI 2026-02 unverdicted novelty 6.0

HyMem introduces dual-granular memory storage with a lightweight summary module for fast responses and selective activation of a deep LLM module for complex queries, outperforming full-context baselines by 92.6% lower...
Dynamic Skill Lifecycle Management for Agentic Reinforcement Learning
cs.LG 2026-05 unverdicted novelty 5.0

SLIM dynamically optimizes the active external skill set in agentic RL via leave-one-skill-out marginal contribution estimates and lifecycle operations, delivering a 7.1% average gain over baselines on ALFWorld and Se...
Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning
cs.AI 2026-05 unverdicted novelty 5.0

Skill1 co-evolves skill selection, utilization, and distillation inside a single policy using only task-outcome reward, with low-frequency trends crediting selection and high-frequency variation crediting distillation...
Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning
cs.AI 2026-05 unverdicted novelty 5.0

Skill1 trains a single RL policy to co-evolve skill selection, utilization, and distillation in language model agents from one task-outcome reward, using low-frequency trends to credit selection and high-frequency var...
Scaling Teams or Scaling Time? Memory Enabled Lifelong Learning in LLM Multi-Agent Systems
cs.MA 2026-03 unverdicted novelty 5.0

LLMA-Mem improves long-horizon performance in LLM multi-agent systems over baselines while reducing cost and shows non-monotonic scaling where memory-enabled smaller teams can beat larger ones.
Hierarchical Long-Term Semantic Memory for LinkedIn's Hiring Agent
cs.IR 2026-04 unverdicted novelty 4.0

HLTM builds a hierarchical memory tree from longitudinal data to enable scalable, private, low-latency retrieval, delivering over 10% gains in answer correctness and retrieval F1 for LinkedIn's Hiring Assistant while ...

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages · cited by 22 Pith papers

[1]

org/CorpusID:278960153

URL https://api.semanticscholar. org/CorpusID:278960153. Liskavetsky, A. et al. Compressor: Context-aware prompt compression for enhanced llm inference.arXiv preprint, 2025. Liu, J., Xiong, K., Xia, P., Zhou, Y ., Ji, H., Feng, L., Han, S., Ding, M., and Yao, H. Agent0-vl: Exploring self- evolving agent for tool-integrated vision-language reason- ing.arXi...

work page arXiv 2025
[2]

Alita: Generalist agent enabling scalable agentic reasoning with minimal predefinition and maximal self-evolution.arXiv preprint arXiv:2505.20286, 2025

URL https://api.semanticscholar. org/CorpusID:263909014. Qiu, J., Qi, X., Zhang, T., Juan, X., Guo, J., Lu, Y ., Wang, Y ., Yao, Z., Ren, Q., Jiang, X., et al. Alita: Generalist agent enabling scalable agentic reasoning with minimal predefinition and maximal self-evolution.arXiv preprint arXiv:2505.20286, 2025. Rasmussen, P., Paliychuk, P., Beauvais, T., ...

work page arXiv 2025
[3]

- Discard redundant confirmations unless they modify or finalize a decision

Information Filtering: - Discard social filler, acknowledgements, and conversational routines that introduce no new factual or semantic information. - Discard redundant confirmations unless they modify or finalize a decision. - If no informative content is present, output an empty list

work page
[4]

- Ensure each memory unit is interpretable without access to prior dialogue

Context Normalization: - Resolve all pronouns and implicit references into explicit entity names. - Ensure each memory unit is interpretable without access to prior dialogue

work page
[5]

tomorrow

Temporal Normalization: - Convert relative temporal expressions (e.g., "tomorrow", "last week") into absolute ISO 8601 timestamps using the window start time

work page
[6]

memory_units

Memory Unit Extraction: - Decompose complex utterances into minimal, indivisible factual statements. INPUT DIALOGUE: {dialogue_window} OUTPUT FORMAT (JSON): { "memory_units": [ { "content": "Alice agreed to meet Bob at the Starbucks on 5th Avenue on 2025-11-20T14 :00:00.", "entities": ["Alice", "Bob", "Starbucks", "5th Avenue"], "topic": "Meeting Planning...

work page 2025
[7]

LOW" if the query can be answered via direct fact lookup or a single memory unit. - Assign

Query Complexity Estimation: - Assign "LOW" if the query can be answered via direct fact lookup or a single memory unit. - Assign "HIGH" if the query requires aggregation across multiple events, temporal comparison, or synthesis of patterns

work page
[8]

complexity

Retrieval Signals: - Lexical layer: extract exact keywords or entity names. - Temporal layer: infer absolute time ranges if relevant. - Semantic layer: rewrite the query into a declarative form suitable for semantic matching. OUTPUT FORMAT (JSON): { "complexity": "HIGH", "retrieval_rationale": "The query requires reasoning over multiple temporally separat...

work page 2025
[9]

- Use detailed memory units to ground the response with specific facts

Hierarchical Reasoning: - Use abstract representations to capture recurring patterns or general user preferences. - Use detailed memory units to ground the response with specific facts

work page
[10]

- Optionally reference abstract patterns when relevant

Conflict Handling: - If inconsistencies arise, prioritize the most recent memory unit. - Optionally reference abstract patterns when relevant

work page
[11]

12 SimpleMem: Efficient Lifelong Memory for LLM Agents

Temporal Consistency: - Ensure all statements respect the timestamps provided in memory. 12 SimpleMem: Efficient Lifelong Memory for LLM Agents

work page
[12]

I do not have enough information in my memory

Faithfulness: - Base the answer strictly on the retrieved memory. - If required information is missing, respond with: "I do not have enough information in my memory." FINAL ANSWER: A.4. LongMemEval Evaluation Prompt For the LongMemEval benchmark, we employed gpt-4.1-mini as the judge to evaluate the correctness of the agent’s responses. The prompt strictl...

work page 2024

[1] [1]

org/CorpusID:278960153

URL https://api.semanticscholar. org/CorpusID:278960153. Liskavetsky, A. et al. Compressor: Context-aware prompt compression for enhanced llm inference.arXiv preprint, 2025. Liu, J., Xiong, K., Xia, P., Zhou, Y ., Ji, H., Feng, L., Han, S., Ding, M., and Yao, H. Agent0-vl: Exploring self- evolving agent for tool-integrated vision-language reason- ing.arXi...

work page arXiv 2025

[2] [2]

Alita: Generalist agent enabling scalable agentic reasoning with minimal predefinition and maximal self-evolution.arXiv preprint arXiv:2505.20286, 2025

URL https://api.semanticscholar. org/CorpusID:263909014. Qiu, J., Qi, X., Zhang, T., Juan, X., Guo, J., Lu, Y ., Wang, Y ., Yao, Z., Ren, Q., Jiang, X., et al. Alita: Generalist agent enabling scalable agentic reasoning with minimal predefinition and maximal self-evolution.arXiv preprint arXiv:2505.20286, 2025. Rasmussen, P., Paliychuk, P., Beauvais, T., ...

work page arXiv 2025

[3] [3]

- Discard redundant confirmations unless they modify or finalize a decision

Information Filtering: - Discard social filler, acknowledgements, and conversational routines that introduce no new factual or semantic information. - Discard redundant confirmations unless they modify or finalize a decision. - If no informative content is present, output an empty list

work page

[4] [4]

- Ensure each memory unit is interpretable without access to prior dialogue

Context Normalization: - Resolve all pronouns and implicit references into explicit entity names. - Ensure each memory unit is interpretable without access to prior dialogue

work page

[5] [5]

tomorrow

Temporal Normalization: - Convert relative temporal expressions (e.g., "tomorrow", "last week") into absolute ISO 8601 timestamps using the window start time

work page

[6] [6]

memory_units

Memory Unit Extraction: - Decompose complex utterances into minimal, indivisible factual statements. INPUT DIALOGUE: {dialogue_window} OUTPUT FORMAT (JSON): { "memory_units": [ { "content": "Alice agreed to meet Bob at the Starbucks on 5th Avenue on 2025-11-20T14 :00:00.", "entities": ["Alice", "Bob", "Starbucks", "5th Avenue"], "topic": "Meeting Planning...

work page 2025

[7] [7]

LOW" if the query can be answered via direct fact lookup or a single memory unit. - Assign

Query Complexity Estimation: - Assign "LOW" if the query can be answered via direct fact lookup or a single memory unit. - Assign "HIGH" if the query requires aggregation across multiple events, temporal comparison, or synthesis of patterns

work page

[8] [8]

complexity

Retrieval Signals: - Lexical layer: extract exact keywords or entity names. - Temporal layer: infer absolute time ranges if relevant. - Semantic layer: rewrite the query into a declarative form suitable for semantic matching. OUTPUT FORMAT (JSON): { "complexity": "HIGH", "retrieval_rationale": "The query requires reasoning over multiple temporally separat...

work page 2025

[9] [9]

- Use detailed memory units to ground the response with specific facts

Hierarchical Reasoning: - Use abstract representations to capture recurring patterns or general user preferences. - Use detailed memory units to ground the response with specific facts

work page

[10] [10]

- Optionally reference abstract patterns when relevant

Conflict Handling: - If inconsistencies arise, prioritize the most recent memory unit. - Optionally reference abstract patterns when relevant

work page

[11] [11]

12 SimpleMem: Efficient Lifelong Memory for LLM Agents

Temporal Consistency: - Ensure all statements respect the timestamps provided in memory. 12 SimpleMem: Efficient Lifelong Memory for LLM Agents

work page

[12] [12]

I do not have enough information in my memory

Faithfulness: - Base the answer strictly on the retrieved memory. - If required information is missing, respond with: "I do not have enough information in my memory." FINAL ANSWER: A.4. LongMemEval Evaluation Prompt For the LongMemEval benchmark, we employed gpt-4.1-mini as the judge to evaluate the correctness of the agent’s responses. The prompt strictl...

work page 2024