hub Canonical reference

MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents

Zijian Zhou, Ao Qu, Zhaoxuan Wu, Sunghwan Kim, Alok Prakash, Daniela Rus · 2025 · cs.CL · arXiv 2506.15841

Canonical reference. 100% of citing Pith papers cite this work as background.

53 Pith papers citing it

Background 100% of classified citations

open full Pith review browse 53 citing papers arXiv PDF

abstract

Modern language agents must operate over long-horizon, multi-turn interactions, where they retrieve external information, adapt to observations, and answer interdependent queries. Yet, most LLM systems rely on full-context prompting, appending all past turns regardless of their relevance. This leads to unbounded memory growth, increased computational costs, and degraded reasoning performance on out-of-distribution input lengths. We introduce MEM1, an end-to-end reinforcement learning framework that enables agents to operate with constant memory across long multi-turn tasks. At each turn, MEM1 updates a compact shared internal state that jointly supports memory consolidation and reasoning. This state integrates prior memory with new observations from the environment while strategically discarding irrelevant or redundant information. To support training in more realistic and compositional settings, we propose a simple yet effective and scalable approach to constructing multi-turn environments by composing existing datasets into arbitrarily complex task sequences. Experiments across three domains, including internal retrieval QA, open-domain web QA, and multi-turn web shopping, show that MEM1-7B improves performance by 3.5x while reducing memory usage by 3.7x compared to Qwen2.5-14B-Instruct on a 16-objective multi-hop QA task, and generalizes beyond the training horizon. Our results demonstrate the promise of reasoning-driven memory consolidation as a scalable alternative to existing solutions for training long-horizon interactive agents, where both efficiency and performance are optimized.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 7

citation-polarity summary

background 7

representative citing papers

MedMemoryBench: Benchmarking Agent Memory in Personalized Healthcare

cs.AI · 2026-05-12 · conditional · novelty 8.0

MedMemoryBench supplies a 2,000-session synthetic medical trajectory dataset and an evaluate-while-constructing streaming protocol to expose memory saturation and reasoning failures in current agent architectures for personalized healthcare.

Agent-BRACE: Decoupling Beliefs from Actions in Long-Horizon Tasks via Verbalized State Uncertainty

cs.CL · 2026-05-12 · unverdicted · novelty 8.0

Agent-BRACE improves LLM agent performance on long-horizon partially observable tasks by 5.3-14.5% through a decoupled belief state of verbalized atomic claims with certainty labels that keeps context length constant.

Learning What to Remember: Observability-Safe Memory Retention via Constrained Optimization for Long-Horizon Language Agents

cs.AI · 2026-06-09 · unverdicted · novelty 7.0 · 2 refs

OSL-MR is a learning-augmented framework that casts memory retention as constrained stochastic optimization under partial observability and outperforms heuristic baselines on LoCoMo and LongMemEval.

MemTrain: Self-Supervised Context Memory Training

cs.CL · 2026-06-02 · unverdicted · novelty 7.0

MemTrain introduces two coupled self-supervised proxy tasks on Wikipedia corpora to train general context-memory capabilities in LLMs, reporting gains of up to 17.67 points on long-text and search-based QA benchmarks over direct post-training.

VitaBench 2.0: Evaluating Personalized and Proactive Agents in Long-Term User Interactions

cs.AI · 2026-05-26 · unverdicted · novelty 7.0

VitaBench 2.0 introduces a benchmark for long-term personalized and proactive agent behavior, with results indicating substantial gaps in current frontier LLMs.

Personalize-then-Store: Benchmarking and Learning Personalized Memory for Long-horizon Agents

cs.AI · 2026-05-25 · unverdicted · novelty 7.0

Introduces PerMemBench benchmark for personalized memory and shows session-level gating yields retention gains under perfect decisions but accurate gating is an open challenge.

LongMemEval-V2: Evaluating Long-Term Agent Memory Toward Experienced Colleagues

cs.CL · 2026-05-12 · unverdicted · novelty 7.0

LongMemEval-V2 is a new benchmark where AgentRunbook-C reaches 72.5% accuracy on long-term agent memory tasks, beating RAG baselines at 48.5% and basic coding agents at 69.3%.

Belief Memory: Agent Memory Under Partial Observability

cs.AI · 2026-05-07 · unverdicted · novelty 7.0 · 2 refs

BeliefMem is a probabilistic memory architecture for LLM agents that retains multiple candidate conclusions with probabilities updated by Noisy-OR, achieving superior average performance over deterministic baselines on LoCoMo and ALFWorld.

MemFlow: Intent-Driven Memory Orchestration for Small Language Model Agents

cs.MA · 2026-05-05 · unverdicted · novelty 7.0

MemFlow routes queries by intent to tiered memory operations, nearly doubling accuracy of a 1.7B SLM on long-horizon benchmarks compared to full-context baselines.

Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory

cs.CL · 2026-05-01 · unverdicted · novelty 7.0

MemCoE learns memory organization guidelines via contrastive feedback and then trains a guideline-aligned RL policy for memory updates, yielding consistent gains on personalization benchmarks.

Four-Axis Decision Alignment for Long-Horizon Enterprise AI Agents

cs.AI · 2026-04-21 · unverdicted · novelty 7.0

Long-horizon enterprise AI agents' decisions decompose into four measurable axes, with benchmark experiments on six memory architectures revealing distinct weaknesses and reversing a pre-registered prediction on summarization.

LMEB: Long-horizon Memory Embedding Benchmark

cs.CL · 2026-03-13 · unverdicted · novelty 7.0

LMEB benchmark shows that embedding models' performance on traditional retrieval does not transfer to long-horizon memory tasks, larger models do not always perform better, and LMEB measures capabilities orthogonal to MTEB.

MemSearcher: Training LLMs to Reason, Search and Manage Memory via End-to-End Reinforcement Learning

cs.CL · 2025-11-04 · unverdicted · novelty 7.0

MemSearcher trains LLMs to manage compact memory in multi-turn searches via multi-context GRPO for end-to-end RL, outperforming ReAct-style baselines with stable token counts.

AutoMem: Automated Learning of Memory as a Cognitive Skill

cs.AI · 2026-07-01 · unverdicted · novelty 6.0

AutoMem automates memory structure revision and proficiency training in LLMs, delivering 2x-4x performance gains on long-horizon games without altering task-action behavior.

ECHO: Prune to act, trace to learn with selective turn memory in agentic RL

cs.LG · 2026-06-30 · unverdicted · novelty 6.0

ECHO is a selective turn-memory framework for agentic RL that compresses turns into indexed records, selects them for bounded contexts, and uses source indices to assign outcome credit to supporting evidence, reaching 43.4% accuracy on BrowseComp-Plus versus 28.9% for GRPO and 36.1% for SUPO.

Are We Ready For An Agent-Native Memory System?

cs.CL · 2026-06-23 · unverdicted · novelty 6.0

A four-module framework is used to benchmark 12 agent memory systems, showing no architecture dominates and that workload alignment plus localized maintenance drive performance and cost.

MemGUI-Agent: An End-to-End Long-Horizon Mobile GUI Agent with Proactive Context Management

cs.HC · 2026-06-18 · unverdicted · novelty 6.0

MemGUI-Agent uses Context-as-Action (ConAct) for proactive context management in long-horizon GUI tasks, trained on the MemGUI-3K dataset to achieve top 8B-model results on MemGUI-Bench and MobileWorld.

Organize then Retrieve: Hierarchical Memory Navigation for Efficient Agents

cs.AI · 2026-06-10 · unverdicted · novelty 6.0

HORMA builds a hierarchical memory structure from agent experiences and trains a lightweight RL navigator to retrieve minimal sufficient context, yielding better task performance with at most 22.17% of baseline token usage on ALFWorld, LoCoMo, and LongMemEval.

Scaling Self-Evolving Agents via Parametric Memory

cs.AI · 2026-06-03 · unverdicted · novelty 6.0

TMEM lets LLM agents evolve their policy mid-episode by absorbing distilled supervision into online LoRA updates, outperforming summary and retrieval baselines on several long-context benchmarks.

HMARS: A Hierarchical Multi-Agent Memory System for Long-Context Reasoning

cs.IR · 2026-06-03 · unverdicted · novelty 6.0

HMARS introduces a hierarchical multi-agent memory system that outperforms standard retrieval and other baselines on long-document and multi-turn reasoning tasks through improved evidence coverage.

Connecting the Dots: Benchmarking Reflective Memory in Long-Horizon Dialogue

cs.CL · 2026-05-31 · unverdicted · novelty 6.0

RefMem-Bench benchmarks reflective memory in dialogue with 26K instances across eight dimensions, and REMIND improves model accuracy via hierarchical evidence retrieval, grounding, and abstraction.

AgentOdyssey: Open-Ended Long-Horizon Text Game Generation for Test-Time Continual Learning Agents

cs.CL · 2026-05-29 · unverdicted · novelty 6.0

Introduces AgentOdyssey, a procedural generator of open-ended long-horizon text games, to evaluate test-time continual learning agents and diagnose limits in exploration, memory, and planning.

SAM: State-Adaptive Memory for Long-Horizon Reasoning Agent

cs.AI · 2026-05-23 · unverdicted · novelty 6.0

SAM is a standalone memory framework for long-horizon LLM agents that creates state-adaptive cues from interactions, preserves raw trajectories for intent-driven recall, and optimizes the module via expert supervision and RL, outperforming baselines on BrowseComp and related benchmarks.

OnePred: Next-Query Prediction via Recursive Intent Memory in Multi-Turn Conversations

cs.CL · 2026-05-22 · unverdicted · novelty 6.0

OnePred maintains a recursively updated intent memory and uses two-stage RL to predict next queries, cutting token use by up to 22x while outperforming baselines on a new NQP-Bench dataset.

citing papers explorer

Showing 23 of 23 citing papers after filters.

MedMemoryBench: Benchmarking Agent Memory in Personalized Healthcare cs.AI · 2026-05-12 · conditional · none · ref 45 · internal anchor
MedMemoryBench supplies a 2,000-session synthetic medical trajectory dataset and an evaluate-while-constructing streaming protocol to expose memory saturation and reasoning failures in current agent architectures for personalized healthcare.
Learning What to Remember: Observability-Safe Memory Retention via Constrained Optimization for Long-Horizon Language Agents cs.AI · 2026-06-09 · unverdicted · none · ref 17 · 2 links · internal anchor
OSL-MR is a learning-augmented framework that casts memory retention as constrained stochastic optimization under partial observability and outperforms heuristic baselines on LoCoMo and LongMemEval.
VitaBench 2.0: Evaluating Personalized and Proactive Agents in Long-Term User Interactions cs.AI · 2026-05-26 · unverdicted · none · ref 94 · internal anchor
VitaBench 2.0 introduces a benchmark for long-term personalized and proactive agent behavior, with results indicating substantial gaps in current frontier LLMs.
Personalize-then-Store: Benchmarking and Learning Personalized Memory for Long-horizon Agents cs.AI · 2026-05-25 · unverdicted · none · ref 2 · internal anchor
Introduces PerMemBench benchmark for personalized memory and shows session-level gating yields retention gains under perfect decisions but accurate gating is an open challenge.
Belief Memory: Agent Memory Under Partial Observability cs.AI · 2026-05-07 · unverdicted · none · ref 21 · 2 links · internal anchor
BeliefMem is a probabilistic memory architecture for LLM agents that retains multiple candidate conclusions with probabilities updated by Noisy-OR, achieving superior average performance over deterministic baselines on LoCoMo and ALFWorld.
Four-Axis Decision Alignment for Long-Horizon Enterprise AI Agents cs.AI · 2026-04-21 · unverdicted · none · ref 8 · internal anchor
Long-horizon enterprise AI agents' decisions decompose into four measurable axes, with benchmark experiments on six memory architectures revealing distinct weaknesses and reversing a pre-registered prediction on summarization.
AutoMem: Automated Learning of Memory as a Cognitive Skill cs.AI · 2026-07-01 · unverdicted · none · ref 20 · internal anchor
AutoMem automates memory structure revision and proficiency training in LLMs, delivering 2x-4x performance gains on long-horizon games without altering task-action behavior.
Organize then Retrieve: Hierarchical Memory Navigation for Efficient Agents cs.AI · 2026-06-10 · unverdicted · none · ref 65 · internal anchor
HORMA builds a hierarchical memory structure from agent experiences and trains a lightweight RL navigator to retrieve minimal sufficient context, yielding better task performance with at most 22.17% of baseline token usage on ALFWorld, LoCoMo, and LongMemEval.
Scaling Self-Evolving Agents via Parametric Memory cs.AI · 2026-06-03 · unverdicted · none · ref 23 · internal anchor
TMEM lets LLM agents evolve their policy mid-episode by absorbing distilled supervision into online LoRA updates, outperforming summary and retrieval baselines on several long-context benchmarks.
SAM: State-Adaptive Memory for Long-Horizon Reasoning Agent cs.AI · 2026-05-23 · unverdicted · none · ref 48 · internal anchor
SAM is a standalone memory framework for long-horizon LLM agents that creates state-adaptive cues from interactions, preserves raw trajectories for intent-driven recall, and optimizes the module via expert supervision and RL, outperforming baselines on BrowseComp and related benchmarks.
MAP: A Map-then-Act Paradigm for Long-Horizon Interactive Agent Reasoning cs.AI · 2026-05-13 · unverdicted · none · ref 49 · internal anchor
MAP improves LLM agent reasoning by constructing a structured cognitive map of the environment before task execution, yielding performance gains on benchmarks like ARC-AGI-3 and superior training data via the new MAP-2K dataset.
Stateless Decision Memory for Enterprise AI Agents cs.AI · 2026-04-22 · unverdicted · none · ref 8 · internal anchor
Deterministic Projection Memory (DPM) delivers stateless, deterministic decision memory for enterprise AI agents that matches or exceeds summarization-based approaches at tight memory budgets while improving speed, determinism, and auditability.
Escaping the Context Bottleneck: Active Context Curation for LLM Agents via Reinforcement Learning cs.AI · 2026-04-13 · unverdicted · none · ref 2 · internal anchor
A lightweight RL policy called ContextCurator curates context for frozen LLM agents by reducing noise and keeping reasoning anchors, raising success rates on WebArena (36.4% to 41.2%) and DeepSearch (53.9% to 57.1%) while cutting token use substantially, with a 7B model matching GPT-4o performance.
MEMENTO: Teaching LLMs to Manage Their Own Context cs.AI · 2026-04-10 · unverdicted · none · ref 41 · internal anchor
MEMENTO trains LLMs to segment reasoning into blocks, generate mementos as dense summaries, and reason forward using only mementos and KV states, cutting peak KV cache by ~2.5x while preserving benchmark accuracy.
Hierarchical Reinforcement Learning with Augmented Step-Level Transitions for LLM Agents cs.AI · 2026-04-07 · unverdicted · none · ref 10 · internal anchor
STEP-HRL enables step-level learning in LLM agents via hierarchical task structure and local progress modules, outperforming baselines on ScienceWorld and ALFWorld while cutting token usage.
HyMem: Hybrid Memory Architecture with Dynamic Retrieval Scheduling cs.AI · 2026-02-15 · unverdicted · none · ref 48 · internal anchor
HyMem introduces dual-granular memory storage with a lightweight summary module for fast responses and selective activation of a deep LLM module for complex queries, outperforming full-context baselines by 92.6% lower computational cost on LOCOMO and LongMemEval benchmarks.
AgentProg: Empowering Long-Horizon GUI Agents with Program-Guided Context Management cs.AI · 2025-12-11 · conditional · none · ref 66 · internal anchor
AgentProg reframes interaction history as a program with variables and control flow, plus a belief state for partial observability, achieving SOTA success rates on long-horizon GUI benchmarks while baselines degrade.
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey cs.AI · 2025-09-02 · accept · none · ref 138 · internal anchor
Survey that defines agentic RL for LLMs via POMDPs, introduces a taxonomy of planning/tool-use/memory/reasoning capabilities and domains, and compiles open environments from over 500 papers.
ActiveMem: Distributed Active Memory for Long-Horizon LLM Reasoning cs.AI · 2026-06-09 · unverdicted · none · ref 37 · internal anchor
ActiveMem proposes a heterogeneous distributed memory framework for LLM agents that separates planning from active memory management, reporting SOTA accuracy with lower overhead on BrowseComp-Plus and GAIA.
Exploring Cross-Scenario Generality of Agentic Memory Systems: Diagnostics and a Strong Baseline cs.AI · 2026-06-03 · unverdicted · none · ref 56 · internal anchor
An agentic harness letting the LLM self-manage flat text-file storage via tool calls outperforms eight prior memory systems on cross-scenario generality across QA, chat, trajectory, stress-test, and long-horizon tasks.
Meta-Cognitive Memory Policy Optimization for Long-Horizon LLM Agents cs.AI · 2026-05-28 · unverdicted · none · ref 22 · internal anchor
MMPO introduces Belief Entropy as a self-supervised signal to provide fine-grained supervision for memory policies in LLM agents, outperforming outcome-based RL on long-horizon tasks up to 1.75M tokens.
TRUSTMEM: Learning Trustworthy Memory Consolidation for LLM Agents with Long-Term Memory cs.AI · 2026-06-23 · unverdicted · none · ref 46 · internal anchor
TrustMem introduces a verifier for memory update transitions and preference-guided RL to cut omission, corruption, and hallucination rates in LLM agent memory while reaching SOTA on MemoryAgentBench and HaluMem.
Rethinking Agentic Reinforcement Learning In Large Language Models cs.AI · 2026-04-30 · unverdicted · none · ref 131 · 3 links · internal anchor
The paper reviews conceptual foundations, methodological innovations, effective designs, critical challenges, and future directions for LLM-based Agentic Reinforcement Learning.

MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer