Memory-R1 uses PPO and GRPO to train a Memory Manager (ADD/UPDATE/DELETE/NOOP) and Answer Agent that together outperform baselines on long-context QA benchmarks after training on only 152 examples.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning
Memory-R1 uses PPO and GRPO to train a Memory Manager (ADD/UPDATE/DELETE/NOOP) and Answer Agent that together outperform baselines on long-context QA benchmarks after training on only 152 examples.