MemSearcher trains LLMs to manage compact memory in multi-turn searches via multi-context GRPO for end-to-end RL, outperforming ReAct-style baselines with stable token counts.
Memoryllm: Towards self-updatable large language models.arXiv preprint arXiv:2402.04624
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
polarities
background 2representative citing papers
MemoryAgentBench is a new multi-turn benchmark assessing four memory competencies in LLM agents—accurate retrieval, test-time learning, long-range understanding, and selective forgetting—showing that existing methods fall short.
RefMem-Bench benchmarks reflective memory in dialogue with 26K instances across eight dimensions, and REMIND improves model accuracy via hierarchical evidence retrieval, grounding, and abstraction.
FluxMem evolves memory as a heterogeneous graph via three refinement stages and reports consistent state-of-the-art results on LoCoMo, Mind2Web, and GAIA benchmarks.
Language models show superior memory to humans on psych experiments but can be adjusted via prompting and compaction to forget more human-like, yielding better user simulators.
In-Place TTT adapts LLM MLP projection matrices at test time with a next-token-aligned objective and chunk-wise updates, enabling better long-context performance as a drop-in enhancement.
HyMem introduces dual-granular memory storage with a lightweight summary module for fast responses and selective activation of a deep LLM module for complex queries, outperforming full-context baselines by 92.6% lower computational cost on LOCOMO and LongMemEval benchmarks.
The paper surveys human memory categories, maps them to LLM memory, and proposes a new three-dimension (object, form, time) categorization into eight quadrants to organize existing work and highlight open problems.
citing papers explorer
No citing papers match the current filters.