pith. sign in

hub Canonical reference

Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning

Canonical reference. 100% of citing Pith papers cite this work as background.

74 Pith papers citing it
Background 100% of classified citations
abstract

Large Language Models (LLMs) have demonstrated impressive capabilities across a wide range of NLP tasks, but they remain fundamentally stateless, constrained by limited context windows that hinder long-horizon reasoning. Recent efforts to address this limitation often augment LLMs with an external memory bank, yet most existing pipelines are static and heuristic-driven, lacking a learned mechanism for deciding what to store, update, or retrieve. We present Memory-R1, a reinforcement learning (RL) framework that equips LLMs with the ability to actively manage and utilize external memory through two specialized agents: a Memory Manager that learns structured operations, including ADD, UPDATE, DELETE, and NOOP; and an Answer Agent that pre-selects and reasons over relevant entries. Both agents are fine-tuned with outcome-driven RL (PPO and GRPO), enabling adaptive memory management with minimal supervision. With only 152 training QA pairs, Memory-R1 outperforms strong baselines and generalizes across diverse question types, three benchmarks (LoCoMo, MSC, LongMemEval), and multiple model scales (3B-14B).

hub tools

citation-role summary

background 10

citation-polarity summary

years

2026 71 2025 3

polarities

background 10

clear filters

representative citing papers

MedMemoryBench: Benchmarking Agent Memory in Personalized Healthcare

cs.AI · 2026-05-12 · conditional · novelty 8.0

MedMemoryBench supplies a 2,000-session synthetic medical trajectory dataset and an evaluate-while-constructing streaming protocol to expose memory saturation and reasoning failures in current agent architectures for personalized healthcare.

User as Engram: Internalizing Per-User Memory as Local Parametric Edits

cs.AI · 2026-06-17 · unverdicted · novelty 7.0

User facts are internalized as surgical local edits to a hash-keyed Engram memory table with reasoning skill held in a shared adapter, claimed to match LoRA recall, improve indirect reasoning 5.6x on average, and compose across users with 33,000x smaller footprint than per-user adapters.

Co-Evolving Skill Generation and Policy Optimization

cs.CL · 2026-06-07 · unverdicted · novelty 7.0

Framework estimates context-dependent marginal utility of candidate skills via reward gaps in matched base vs. skill-augmented rollouts to filter skills and co-train policy as generator.

MemTrain: Self-Supervised Context Memory Training

cs.CL · 2026-06-02 · unverdicted · novelty 7.0

MemTrain introduces two coupled self-supervised proxy tasks on Wikipedia corpora to train general context-memory capabilities in LLMs, reporting gains of up to 17.67 points on long-text and search-based QA benchmarks over direct post-training.

ElasticMem: Latent Memory as a Learnable Resource for LLM Agents

cs.CL · 2026-05-29 · unverdicted · novelty 7.0

ElasticMem enables LLM agents to learn adaptive latent memory retrieval and elastic budget allocation, improving QA accuracy by 24-26% and ALFWorld success by 27-66% over baselines with lower token cost.

R^2-Mem: Reflective Experience for Memory Search

cs.CL · 2026-05-13 · conditional · novelty 7.0

R^2-Mem distills rubric-scored experiences from high- and low-quality search trajectories to guide LLM agents, raising F1 by up to 22.6% while cutting tokens 12.9% and iterations 20.2%.

Belief Memory: Agent Memory Under Partial Observability

cs.AI · 2026-05-07 · unverdicted · novelty 7.0 · 2 refs

BeliefMem is a probabilistic memory architecture for LLM agents that retains multiple candidate conclusions with probabilities updated by Noisy-OR, achieving superior average performance over deterministic baselines on LoCoMo and ALFWorld.

SAGER: Self-Evolving User Policy Skills for Recommendation Agent

cs.IR · 2026-04-16 · unverdicted · novelty 7.0

SAGER equips LLM recommendation agents with per-user evolving policy skills via two-representation architecture, contrastive CoT diagnosis, and skill-augmented listwise reasoning, yielding SOTA gains orthogonal to memory accumulation.

ECHO: Prune to act, trace to learn with selective turn memory in agentic RL

cs.LG · 2026-06-30 · unverdicted · novelty 6.0

ECHO is a selective turn-memory framework for agentic RL that compresses turns into indexed records, selects them for bounded contexts, and uses source indices to assign outcome credit to supporting evidence, reaching 43.4% accuracy on BrowseComp-Plus versus 28.9% for GRPO and 36.1% for SUPO.

Scaling Self-Evolving Agents via Parametric Memory

cs.AI · 2026-06-03 · unverdicted · novelty 6.0

TMEM lets LLM agents evolve their policy mid-episode by absorbing distilled supervision into online LoRA updates, outperforming summary and retrieval baselines on several long-context benchmarks.

citing papers explorer

Showing 7 of 7 citing papers after filters.

  • EvolveMem:Self-Evolving Memory Architecture via AutoResearch for LLM Agents cs.LG · 2026-05-13 · unverdicted · none · ref 36 · internal anchor

    EvolveMem enables autonomous self-evolution of LLM memory retrieval configurations via LLM diagnosis and safeguards, delivering 25.7% gains over strong baselines on LoCoMo and 18.9% on MemBench with positive cross-benchmark transfer.

  • ECHO: Prune to act, trace to learn with selective turn memory in agentic RL cs.LG · 2026-06-30 · unverdicted · none · ref 38 · internal anchor

    ECHO is a selective turn-memory framework for agentic RL that compresses turns into indexed records, selects them for bounded contexts, and uses source indices to assign outcome credit to supporting evidence, reaching 43.4% accuracy on BrowseComp-Plus versus 28.9% for GRPO and 36.1% for SUPO.

  • Evaluating Memory Condensation Strategies for Coding Agents in Data-Driven Scientific Discovery cs.LG · 2026-05-13 · unverdicted · none · ref 18 · internal anchor

    Empirical evaluation of eight memory condensation strategies on 480 DiscoveryBench tasks finds no significant impact on hypothesis quality but domain-dependent differences in token efficiency.

  • Dynamic Mixture of Latent Memories for Self-Evolving Agents cs.LG · 2026-05-21 · unverdicted · none · ref 15 · internal anchor

    MoLEM achieves a 10.40% average accuracy improvement in continual learning tasks across math, science, and code by using dynamic latent memory experts with a frozen base model and stage-specific autoencoders for routing.

  • Memory-R2: Fair Credit Assignment for Long-Horizon Memory-Augmented LLM Agents cs.LG · 2026-05-20 · unverdicted · none · ref 23 · internal anchor

    Memory-R2 proposes LoGo-GRPO to fix unfair trajectory comparisons in RL training of memory-augmented LLM agents by combining global end-to-end rewards with local rerollouts from identical memory states.

  • Reinforced Collaboration in Multi-Agent Flow Networks cs.LG · 2026-05-13 · unverdicted · none · ref 50 · internal anchor

    MANGO optimizes multi-agent LLM workflows via flow networks, RL, and textual gradients, delivering up to 12.8% higher performance and 47.4% better efficiency while generalizing to new domains.

  • Improving Sparse Memory Finetuning cs.LG · 2026-04-06 · unverdicted · none · ref 11 · internal anchor

    Sparse memory modules with KL-based surprising-token selection let retrofitted LLMs acquire new factual knowledge while largely preserving held-out capabilities.