MemSearcher trains LLMs to manage compact memory in multi-turn searches via multi-context GRPO for end-to-end RL, outperforming ReAct-style baselines with stable token counts.
Memoryllm: Towards self-updatable large language models.arXiv preprint arXiv:2402.04624
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
polarities
background 2representative citing papers
RefMem-Bench benchmarks reflective memory in dialogue with 26K instances across eight dimensions, and REMIND improves model accuracy via hierarchical evidence retrieval, grounding, and abstraction.
FluxMem evolves memory as a heterogeneous graph via three refinement stages and reports consistent state-of-the-art results on LoCoMo, Mind2Web, and GAIA benchmarks.
Language models show superior memory to humans on psych experiments but can be adjusted via prompting and compaction to forget more human-like, yielding better user simulators.
In-Place TTT adapts LLM MLP projection matrices at test time with a next-token-aligned objective and chunk-wise updates, enabling better long-context performance as a drop-in enhancement.
HyMem introduces dual-granular memory storage with a lightweight summary module for fast responses and selective activation of a deep LLM module for complex queries, outperforming full-context baselines by 92.6% lower computational cost on LOCOMO and LongMemEval benchmarks.
The paper surveys human memory categories, maps them to LLM memory, and proposes a new three-dimension (object, form, time) categorization into eight quadrants to organize existing work and highlight open problems.
citing papers explorer
-
MemSearcher: Training LLMs to Reason, Search and Manage Memory via End-to-End Reinforcement Learning
MemSearcher trains LLMs to manage compact memory in multi-turn searches via multi-context GRPO for end-to-end RL, outperforming ReAct-style baselines with stable token counts.
-
Connecting the Dots: Benchmarking Reflective Memory in Long-Horizon Dialogue
RefMem-Bench benchmarks reflective memory in dialogue with 26K instances across eight dimensions, and REMIND improves model accuracy via hierarchical evidence retrieval, grounding, and abstraction.
-
Rethinking Memory as Continuously Evolving Connectivity
FluxMem evolves memory as a heterogeneous graph via three refinement stages and reports consistent state-of-the-art results on LoCoMo, Mind2Web, and GAIA benchmarks.
-
Simulating Human Memory with Language Models
Language models show superior memory to humans on psych experiments but can be adjusted via prompting and compaction to forget more human-like, yielding better user simulators.
- Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions