arxiv: 2508.19828 · v5 · submitted 2025-08-27 · 💻 cs.CL · cs.MA

Recognition: 1 theorem link

Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning

Ercong Nie, Hinrich Sch\"utze, Jeff Z. Pan, Jinhe Bi, Kristian Kersting, Sikuan Yan, Volker Tresp, Xiaowen Ma, Xiufeng Yang, Yunpu Ma, Zifeng Ding, Zonggen Li, Zuchao Huang

Authors on Pith no claims yet

Pith reviewed 2026-05-13 04:51 UTC · model grok-4.3

classification 💻 cs.CL cs.MA

keywords memory managementreinforcement learninglarge language modelsexternal memoryagentic systemslong-context reasoningPPOGRPO

0 comments

The pith

Reinforcement learning trains language models to actively manage external memory through learned add, update, and delete operations using only 152 examples.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Large language models are stateless and limited by context windows, which blocks reliable long-horizon reasoning. Most prior work adds an external memory bank but relies on fixed, hand-written rules for deciding what to keep or retrieve. Memory-R1 instead trains two agents with outcome-driven reinforcement learning: a Memory Manager that chooses among ADD, UPDATE, DELETE, and NOOP, and an Answer Agent that selects and reasons over the resulting entries. The entire system is optimized on question-answering rewards rather than imitation, allowing the policy to adapt without large amounts of labeled data. The result is a memory-management behavior that improves accuracy on long-context benchmarks while generalizing across model sizes and question styles.

Core claim

Memory-R1 introduces a reinforcement-learning framework in which a Memory Manager agent learns to execute structured memory operations (ADD, UPDATE, DELETE, NOOP) and an Answer Agent learns to pre-select relevant entries before generating responses. Both agents are fine-tuned with outcome-driven PPO and GRPO on question-answering success, so the policy for memory management emerges directly from task performance rather than from supervised imitation or heuristics.

What carries the argument

The Memory Manager agent, which outputs one of four discrete operations on an external memory bank and is trained end-to-end with outcome-based reinforcement learning.

If this is right

The learned memory policy outperforms static and heuristic baselines on LoCoMo, MSC, and LongMemEval.
The same policy generalizes across diverse question types and model scales from 3B to 14B parameters.
Training succeeds with only 152 QA pairs, indicating that outcome-driven RL can replace large supervised memory datasets.
Memory management and answer generation can be jointly optimized inside a single reinforcement-learning loop.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same outcome-driven training could be applied to other agentic components such as tool selection or multi-step planning.
If the policy remains stable at much larger scales, models could maintain coherent state across thousands of turns without hand-crafted memory rules.
Extending the action space to include memory compression or summarization operations would test whether the framework scales to even longer histories.

Load-bearing premise

That a policy learned from reinforcement learning on a fixed set of 152 question-answer pairs will continue to produce useful memory decisions when task length, data distribution, or model scale changes.

What would settle it

A new test set containing interaction histories substantially longer than those in LoCoMo, MSC, or LongMemEval on which accuracy drops below the strongest heuristic baseline.

read the original abstract

Large Language Models (LLMs) have demonstrated impressive capabilities across a wide range of NLP tasks, but they remain fundamentally stateless, constrained by limited context windows that hinder long-horizon reasoning. Recent efforts to address this limitation often augment LLMs with an external memory bank, yet most existing pipelines are static and heuristic-driven, lacking a learned mechanism for deciding what to store, update, or retrieve. We present Memory-R1, a reinforcement learning (RL) framework that equips LLMs with the ability to actively manage and utilize external memory through two specialized agents: a Memory Manager that learns structured operations, including ADD, UPDATE, DELETE, and NOOP; and an Answer Agent that pre-selects and reasons over relevant entries. Both agents are fine-tuned with outcome-driven RL (PPO and GRPO), enabling adaptive memory management with minimal supervision. With only 152 training QA pairs, Memory-R1 outperforms strong baselines and generalizes across diverse question types, three benchmarks (LoCoMo, MSC, LongMemEval), and multiple model scales (3B-14B).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Memory-R1 shows a clean RL setup for learning discrete memory operations but the 152-example training regime leaves the generalization claims looking fragile.

read the letter

The paper's core move is to split the problem into a Memory Manager that picks among four explicit operations (ADD, UPDATE, DELETE, NOOP) and a separate Answer Agent that reasons over the resulting memory entries, then trains both end-to-end with outcome-only RL (PPO/GRPO). That combination is not in the static memory-bank papers they cite, so the framing is new. They also report that the same policy works across three different benchmarks and model sizes from 3B to 14B after training on only 152 QA pairs, which is the headline empirical claim.

Referee Report

3 major / 2 minor

Summary. The paper introduces Memory-R1, an RL-based framework with two specialized agents (Memory Manager performing ADD/UPDATE/DELETE/NOOP operations and Answer Agent for retrieval/reasoning) that are fine-tuned via outcome-driven PPO and GRPO on sparse rewards. It claims that training on only 152 QA pairs yields policies that outperform strong baselines and generalize across question types, three long-context benchmarks (LoCoMo, MSC, LongMemEval), and model scales from 3B to 14B.

Significance. If the empirical claims hold under rigorous controls, the work would demonstrate that adaptive, learned memory-management policies can be acquired efficiently in a low-data regime, offering a scalable alternative to heuristic or static external-memory pipelines for long-horizon LLM reasoning.

major comments (3)

[Abstract, §4] Abstract and §4 (Experiments): The headline claim that 152 training QA pairs suffice for generalization across three benchmarks and multiple scales is load-bearing, yet the provided text supplies no description of how the 152 pairs were sampled, whether they overlap with test distributions, or what regularization/diversity penalties were applied during RL to mitigate reward hacking; without these controls the observed gains could reflect memorization rather than transferable memory logic.
[§3] §3 (Method): The outcome-driven RL formulation for the Memory Manager relies on sparse terminal rewards, but no details are given on reward shaping, credit assignment across the discrete action sequence, or how the state (memory bank contents plus query) is encoded for the policy; this omission prevents assessment of whether the learned policies are robust or merely exploit training-specific patterns.
[§4] §4 (Experiments): No ablation isolating the contribution of the RL-trained Memory Manager versus a heuristic baseline or a non-RL fine-tuned variant is reported, nor are statistical significance tests, variance across random seeds, or OOD splits provided; these are required to substantiate that the performance lift stems from learned adaptive management rather than architectural differences.

minor comments (2)

[§3] Notation for the two agents and their interaction loop should be formalized with a diagram or pseudocode to clarify the turn-taking and memory-bank update mechanics.
[Abstract] The abstract states 'minimal supervision' while reporting 152 labeled QA pairs; a brief clarification of the supervision type (e.g., outcome-only vs. action-level) would improve readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We sincerely thank the referee for their detailed and constructive feedback. We appreciate the opportunity to clarify key aspects of our work and will strengthen the manuscript accordingly by adding the requested details and analyses.

read point-by-point responses

Referee: [Abstract, §4] Abstract and §4 (Experiments): The headline claim that 152 training QA pairs suffice for generalization across three benchmarks and multiple scales is load-bearing, yet the provided text supplies no description of how the 152 pairs were sampled, whether they overlap with test distributions, or what regularization/diversity penalties were applied during RL to mitigate reward hacking; without these controls the observed gains could reflect memorization rather than transferable memory logic.

Authors: We agree that these details are essential to substantiate the generalization claims. In the revised manuscript, we will add a dedicated subsection describing the sampling procedure for the 152 QA pairs (including source dataset and selection criteria), explicitly confirm zero overlap with test distributions across all benchmarks, and detail the regularization techniques and diversity penalties used in the RL objective to mitigate reward hacking. These additions will clarify that the performance gains arise from learned transferable memory policies. revision: yes
Referee: [§3] §3 (Method): The outcome-driven RL formulation for the Memory Manager relies on sparse terminal rewards, but no details are given on reward shaping, credit assignment across the discrete action sequence, or how the state (memory bank contents plus query) is encoded for the policy; this omission prevents assessment of whether the learned policies are robust or merely exploit training-specific patterns.

Authors: We acknowledge this gap in methodological transparency. The revised §3 will explicitly describe any reward shaping beyond the sparse terminal signal, explain how PPO and GRPO assign credit across the sequence of discrete actions (ADD/UPDATE/DELETE/NOOP), and provide the precise state encoding for the policy (including representation of memory bank contents concatenated with the query). This will enable readers to better evaluate policy robustness. revision: yes
Referee: [§4] §4 (Experiments): No ablation isolating the contribution of the RL-trained Memory Manager versus a heuristic baseline or a non-RL fine-tuned variant is reported, nor are statistical significance tests, variance across random seeds, or OOD splits provided; these are required to substantiate that the performance lift stems from learned adaptive management rather than architectural differences.

Authors: We recognize the value of these controls for isolating the source of gains. In the revised manuscript, we will add ablations comparing the RL-trained Memory Manager to both heuristic memory baselines and non-RL fine-tuned variants. We will also include statistical significance tests, performance variance across multiple random seeds, and results on OOD splits to demonstrate that improvements derive from learned adaptive management rather than architectural factors. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical RL training with no derivation chain

full rationale

The paper presents an empirical RL framework (PPO/GRPO on 152 QA pairs) for training Memory Manager and Answer Agent policies. No equations, first-principles derivations, or mathematical claims are advanced in the abstract or described structure; performance is measured directly on held-out benchmarks (LoCoMo, MSC, LongMemEval) across model scales. Because there is no claimed derivation that could reduce to fitted inputs or self-citations, the work is self-contained as standard supervised RL fine-tuning and does not exhibit any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 2 invented entities

Abstract provides no explicit free parameters, mathematical axioms, or new physical entities; the two agents are architectural components rather than postulated objects with independent evidence.

invented entities (2)

Memory Manager agent no independent evidence
purpose: Learns to execute ADD, UPDATE, DELETE, or NOOP on external memory
Introduced as one of the two core learned components; no independent falsifiable prediction supplied.
Answer Agent no independent evidence
purpose: Pre-selects relevant memory entries and performs final reasoning
Introduced as the second learned component; no independent falsifiable prediction supplied.

pith-pipeline@v0.9.0 · 5538 in / 1155 out tokens · 34323 ms · 2026-05-13T04:51:36.181289+00:00 · methodology

discussion (0)

Forward citations

Cited by 34 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

MedMemoryBench: Benchmarking Agent Memory in Personalized Healthcare
cs.AI 2026-05 conditional novelty 8.0

MedMemoryBench supplies a 2,000-session synthetic medical trajectory dataset and an evaluate-while-constructing streaming protocol to expose memory saturation and reasoning failures in current agent architectures for ...
R^2-Mem: Reflective Experience for Memory Search
cs.CL 2026-05 conditional novelty 7.0

R^2-Mem distills rubric-scored experiences from high- and low-quality search trajectories to guide LLM agents, raising F1 by up to 22.6% while cutting tokens 12.9% and iterations 20.2%.
LongMemEval-V2: Evaluating Long-Term Agent Memory Toward Experienced Colleagues
cs.CL 2026-05 unverdicted novelty 7.0

LongMemEval-V2 is a new benchmark where AgentRunbook-C reaches 72.5% accuracy on long-term agent memory tasks, beating RAG baselines at 48.5% and basic coding agents at 69.3%.
DeepRefine: Agent-Compiled Knowledge Refinement via Reinforcement Learning
cs.CL 2026-05 unverdicted novelty 7.0

DeepRefine refines agent-compiled knowledge bases via multi-turn abductive diagnosis and RL training with a GBD reward, yielding consistent downstream task gains.
MemCompiler: Compile, Don't Inject -- State-Conditioned Memory for Embodied Agents
cs.RO 2026-05 unverdicted novelty 7.0

MemCompiler introduces state-conditioned memory compilation that dynamically selects and compiles relevant memory into text and latent guidance, yielding up to 129% gains over no-memory baselines and 60% lower latency...
Belief Memory: Agent Memory Under Partial Observability
cs.AI 2026-05 unverdicted novelty 7.0

BeliefMem stores multiple candidate conclusions with probabilities in agent memory and updates them via Noisy-OR rules to preserve uncertainty under partial observability.
Belief Memory: Agent Memory Under Partial Observability
cs.AI 2026-05 unverdicted novelty 7.0

BeliefMem is a probabilistic memory architecture for LLM agents that retains multiple candidate conclusions with probabilities updated by Noisy-OR, achieving superior average performance over deterministic baselines o...
Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory
cs.CL 2026-05 unverdicted novelty 7.0

MemCoE learns memory organization guidelines via contrastive feedback and then trains a guideline-aligned RL policy for memory updates, yielding consistent gains on personalization benchmarks.
OCR-Memory: Optical Context Retrieval for Long-Horizon Agent Memory
cs.CL 2026-04 unverdicted novelty 7.0

OCR-Memory encodes agent trajectories as images with visual anchors and retrieves verbatim text via locate-and-transcribe, yielding gains on long-horizon benchmarks under strict context limits.
SAGER: Self-Evolving User Policy Skills for Recommendation Agent
cs.IR 2026-04 unverdicted novelty 7.0

SAGER equips LLM recommendation agents with per-user evolving policy skills via two-representation architecture, contrastive CoT diagnosis, and skill-augmented listwise reasoning, yielding SOTA gains orthogonal to mem...
Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory
cs.CL 2025-11 unverdicted novelty 7.0

Evo-Memory is a new benchmark for self-evolving memory in LLM agents across task streams, with baseline ExpRAG and proposed ReMem method that integrates reasoning, actions, and memory updates for continual improvement.
SAGE: A Self-Evolving Agentic Graph-Memory Engine for Structure-Aware Associative Memory
cs.AI 2026-05 unverdicted novelty 6.0

SAGE is a self-evolving agentic graph-memory engine that dynamically constructs and refines structured memory graphs via writer-reader feedback, yielding performance gains on multi-hop QA, open-domain retrieval, and l...
Tree-based Credit Assignment for Multi-Agent Memory System
cs.MA 2026-05 unverdicted novelty 6.0

TreeMem assigns credit to agents in multi-agent memory systems by expanding outputs into a tree and using Monte Carlo averaging of final rewards to optimize each agent's policy.
What Happens Inside Agent Memory? Circuit Analysis from Emergence to Diagnosis
cs.AI 2026-05 unverdicted novelty 6.0

Circuit analysis reveals that routing circuits for agent memory emerge at 0.6B parameters while content circuits emerge at 4B, with a shared grounding hub and an unsupervised diagnostic achieving 76.2% accuracy for lo...
What Happens Inside Agent Memory? Circuit Analysis from Emergence to Diagnosis
cs.AI 2026-05 unverdicted novelty 6.0

In LLM agents, memory routing circuits emerge at 0.6B scale while content circuits appear only at 4B, and write/read operations recruit a pre-existing late-layer context hub instead of creating a new one, enabling a 7...
MemRouter: Memory-as-Embedding Routing for Long-Term Conversational Agents
cs.CL 2026-05 unverdicted novelty 6.0

A lightweight supervised router using frozen-LLM embeddings for memory admission decisions outperforms LLM-based memory managers in both F1 score and latency on the LoCoMo benchmark.
Learning When to Remember: Risk-Sensitive Contextual Bandits for Abstention-Aware Memory Retrieval in LLM-Based Coding Agents
cs.CL 2026-04 unverdicted novelty 6.0

RSCB-MC is a risk-sensitive contextual bandit memory controller for LLM coding agents that chooses safe actions including abstention, achieving 60.5% proxy success with 0% false positives and low latency in 200-case v...
MemSearch-o1: Empowering Large Language Models with Reasoning-Aligned Memory Growth in Agentic Search
cs.IR 2026-04 unverdicted novelty 6.0

MemSearch-o1 uses reasoning-aligned memory growth from seed tokens, retracing via contribution functions, and path reorganization to mitigate memory dilution in LLM agentic search.
MemSearch-o1: Empowering Large Language Models with Reasoning-Aligned Memory Growth in Agentic Search
cs.IR 2026-04 unverdicted novelty 6.0

MemSearch-o1 mitigates memory dilution in agentic LLM search through reasoning-aligned token-level memory growth, retracing with a contribution function, and path reorganization, improving reasoning activation on benchmarks.
Experience Compression Spectrum: Unifying Memory, Skills, and Rules in LLM Agents
cs.AI 2026-04 conditional novelty 6.0

The Experience Compression Spectrum unifies memory, skills, and rules in LLM agents along increasing compression levels and identifies the absence of adaptive cross-level compression as the missing diagonal.
POINTS-Seeker: Towards Training a Multimodal Agentic Search Model from Scratch
cs.CV 2026-04 unverdicted novelty 6.0

POINTS-Seeker-8B is an 8B multimodal model trained from scratch for agentic search that uses seeding and visual-space history folding to outperform prior models on six visual reasoning benchmarks.
Trust Your Memory: Verifiable Control of Smart Homes through Reinforcement Learning with Multi-dimensional Rewards
cs.AI 2026-04 unverdicted novelty 6.0

Introduces MemHome benchmark and RL with multi-dimensional rewards for memory-driven smart home device control.
AnomalyAgent: Agentic Industrial Anomaly Synthesis via Tool-Augmented Reinforcement Learning
cs.CV 2026-04 unverdicted novelty 6.0

AnomalyAgent uses tool-augmented reinforcement learning with self-reflection to generate realistic industrial anomalies, achieving better metrics than zero-shot methods on MVTec-AD.
TSUBASA: Improving Long-Horizon Personalization via Evolving Memory and Self-Learning with Context Distillation
cs.CL 2026-04 unverdicted novelty 6.0

TSUBASA improves long-horizon personalization in LLMs via dynamic memory evolution for writing and context-distillation self-learning for reading, outperforming Mem0 and Memory-R1 on Qwen-3 benchmarks while reducing t...
Decocted Experience Improves Test-Time Inference in LLM Agents
cs.AI 2026-04 unverdicted novelty 6.0

Decocted experience—extracting and organizing the essence from accumulated interactions—enables more effective context construction that improves test-time inference in LLM agents on math, web, and software tasks.
MemFactory: Unified Inference & Training Framework for Agent Memory
cs.CL 2026-03 unverdicted novelty 6.0

MemFactory is a new unified modular framework for memory-augmented LLM agent inference and training that integrates GRPO and reports up to 14.8% relative gains on MemAgent evaluations.
Reinforced Collaboration in Multi-Agent Flow Networks
cs.LG 2026-05 unverdicted novelty 5.0

MANGO optimizes multi-agent LLM workflows via flow networks, RL, and textual gradients, delivering up to 12.8% higher performance and 47.4% better efficiency while generalizing to new domains.
Intermediate Artifacts as First-Class Citizens: A Data Model for Durable Intermediate Artifacts in Agentic Systems
cs.AI 2026-05 unverdicted novelty 5.0

A systems-level data model for preserving typed, addressable, versioned, and dependency-aware intermediate artifacts in agentic AI systems to improve long-term inspectability and maintainability.
From Agent Loops to Deterministic Graphs: Execution Lineage for Reproducible AI-Native Work
cs.AI 2026-05 conditional novelty 5.0

Execution lineage models AI-native work as a DAG of computations with explicit dependencies, achieving perfect state preservation in controlled update tasks where loop-based agents introduce churn and contamination.
Web2BigTable: A Bi-Level Multi-Agent LLM System for Internet-Scale Information Search and Extraction
cs.AI 2026-04 unverdicted novelty 5.0

Web2BigTable introduces a bi-level multi-agent system that achieves new state-of-the-art results on wide-coverage and deep web-to-table search benchmarks through orchestration, coordination, and closed-loop reflection.
Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering
cs.SE 2026-04 accept novelty 5.0

LLM agent progress depends on externalizing cognitive functions into memory, skills, protocols, and harness engineering that coordinates them reliably.
Improving Sparse Memory Finetuning
cs.LG 2026-04 unverdicted novelty 4.0

Sparse memory modules with KL-based surprising-token selection let retrofitted LLMs acquire new factual knowledge while largely preserving held-out capabilities.
A Brief Overview: Agentic Reinforcement Learning In Large Language Models
cs.AI 2026-04 unverdicted novelty 2.0

The paper surveys the conceptual foundations, methodological innovations, challenges, and future directions of agentic reinforcement learning frameworks that embed cognitive capabilities like meta-reasoning and self-r...
A Brief Overview: Agentic Reinforcement Learning In Large Language Models
cs.AI 2026-04 unverdicted novelty 2.0

This review synthesizes conceptual foundations, methods, challenges, and future directions for agentic reinforcement learning in large language models.

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages · cited by 30 Pith papers

[1]

Cheng Qian, Emre Can Acikgoz, Qi He, Hongru Wang, Xiusi Chen, Dilek Hakkani-Tür, Gokhan Tur, and Heng Ji

On memory construction and retrieval for personalized conversational agents.arXiv preprint arXiv:2502.05589. Cheng Qian, Emre Can Acikgoz, Qi He, Hongru Wang, Xiusi Chen, Dilek Hakkani-Tür, Gokhan Tur, and Heng Ji. 2025. Toolrl: Reward is all tool learning needs.Preprint, arXiv:2504.13958. Qwen, An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng,...

work page arXiv 2025
[2]

arXiv preprint arXiv:2503.21760 , year=

Meminsight: Autonomous memory augmenta- tion for llm agents.Preprint, arXiv:2503.21760. Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. 2023. Toolformer: Language models can teach themselves to use tools. Preprint, arXiv:2302.04761. John Schulman, Filip Wolski, Prafulla Dh...

work page arXiv 2023
[3]

Shunyu Yao, Noah Shinn, Karthik Narasimhan, and Shunyu Yao

Webagent-r1: Training web agents via end- to-end multi-turn reinforcement learning.Preprint, arXiv:2505.16421. Di Wu, Hongwei Wang, Wenhao Yu, Yuwei Zhang, Kai-Wei Chang, and Dong Yu. 2024. Longmemeval: Benchmarking chat assistants on long-term interac- tive memory.arXiv preprint arXiv:2410.10813. Zidi Xiong, Yuping Lin, Wenya Xie, Pengfei He, Jil- iang T...

work page arXiv 2024
[4]

arXiv preprint arXiv:2505.16067 , year=

How memory management impacts llm agents: An empirical study of experience-following behavior. arXiv preprint arXiv:2505.16067. Jing Xu, Arthur Szlam, and Jason Weston. 2021. Be- yond goldfish memory: Long-term open-domain con- versation.arXiv preprint arXiv:2107.07567. Wujiang Xu, Kai Mei, Hang Gao, Juntao Tan, Zujie Liang, and Yongfeng Zhang. 2025. A-me...

work page arXiv 2021
[5]

id" : "0

**Add**: If the retrieved facts contain new information not present in the memory, then you have to add it by generating a new ID in the id field. - Example: Old Memory: [ {"id" : "0", "text" : "User is a software engineer"} ] Retrieved facts: ["Name is John"] New Memory: { "memory" : [ {"id" : "0", "text" : "User is a software engineer", "event" : "NONE"...

work page
[6]

User likes to play cricket

**Update**: If the retrieved facts contain information that is already present in the memory but the information is totally different, then you have to update it. If the retrieved fact contains information that conveys the same thing as the memory, keep the version with more detail. Example (a) – if the memory contains "User likes to play cricket" and the...

work page
[7]

id" : "1

**Delete**: If the retrieved facts contain information that contradicts the memory, delete it. When deleting, return the same IDs — do not generate new IDs. - Example: Old Memory: [ {"id" : "1", "text" : "Loves cheese pizza"} ] Retrieved facts: ["Dislikes cheese pizza"] New Memory: { "memory" : [ {"id" : "1", "text" : "Loves cheese pizza", "event" : "DELETE"} ] }

work page
[8]

id" : "0

**No Change**: If the retrieved facts are already present, make no change. - Example: Old Memory: [ {"id" : "0", "text" : "Name is John"} ] Retrieved facts: ["Name is John"] New Memory: { "memory" : [ {"id" : "0", "text" : "Name is John", "event" : "NONE"} ] } Figure 10: Memory Manager Prompt (Part 2): DELETE/NO_OPERATION instructions. Algorithm 2Data Con...

work page 2048
[9]

Carefully analyze all provided memories from both speakers

work page
[10]

Pay special attention to the timestamps to determine the answer

work page
[11]

If the question asks about a specific event or fact, look for direct evidence

work page
[12]

If the memories contain contradictory information, prioritize the most recent memory

work page
[13]

last year

If there is a question about time references (like "last year", "two months ago"), calculate the actual date based on the memory timestamp

work page
[14]

Always convert relative time references to specific dates, months, or years

work page
[15]

Do not confuse character names

Focus only on the content of the memories. Do not confuse character names

work page
[16]

The answer should be less than 5-6 words

work page
[17]

IMPORTANT: Select memories you found that are useful for answering the questions, and output it before you answer questions

work page
[18]

IMPORTANT: Output the final answer after **Answer:** # APPROACH (Think step by step):

work page
[19]

Examine all relevant memories

work page
[20]

Examine the timestamps carefully

work page
[21]

Look for explicit mentions that answer the question

work page
[22]

Convert relative references if needed

work page
[23]

Formulate a concise answer

work page
[24]

Double-check the answer correctness

work page
[25]

Ensure the final answer is specific

work page
[26]

last Tuesday

First output the memories that you found are important before you answer questions Memories for user John: - 7:20 pm on 16 June, 2023: John has a special memory of a vacation to California where he experienced a gorgeous sunset and an enjoyable night strolling the shore, creating meaningful memories with loved ones. - 6:13 pm on 10 April, 2023: John explo...

work page arXiv 2023