Memory-R1 uses PPO and GRPO to train a Memory Manager (ADD/UPDATE/DELETE/NOOP) and Answer Agent that together outperform baselines on long-context QA benchmarks after training on only 152 examples.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CL 2years
2025 2verdicts
UNVERDICTED 2representative citing papers
Mem0 improves long-term LLM conversational performance by up to 26% on LLM-as-Judge while cutting p95 latency 91% and token costs over 90% versus full-context baselines.
citing papers explorer
-
Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning
Memory-R1 uses PPO and GRPO to train a Memory Manager (ADD/UPDATE/DELETE/NOOP) and Answer Agent that together outperform baselines on long-context QA benchmarks after training on only 152 examples.
-
Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory
Mem0 improves long-term LLM conversational performance by up to 26% on LLM-as-Judge while cutting p95 latency 91% and token costs over 90% versus full-context baselines.