arxiv: 2602.13933 · v2 · submitted 2026-02-15 · 💻 cs.AI

Recognition: 2 theorem links

· Lean Theorem

HyMem: Hybrid Memory Architecture with Dynamic Retrieval Scheduling

Xiaochen Zhao , Kaikai Wang , Xiaowen Zhang , Chen Yao , Aili Wang

Authors on Pith no claims yet

Pith reviewed 2026-05-15 21:56 UTC · model grok-4.3

classification 💻 cs.AI

keywords hybrid memory architecturedynamic retrieval schedulinglong-term memory managementLLM agentsmemory compressiontwo-tier retrievalcognitive economy

0 comments

The pith

HyMem uses hybrid memory with summary-level lightweight retrieval plus selective deep activation to match full-context performance while cutting computational cost by 92.6 percent.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to fix the core inefficiency in LLM agents that struggle with long dialogues because they either compress memory and lose details or keep everything and pay high costs. It introduces a hybrid architecture that stores memory at two levels of detail and retrieves it on demand according to query complexity. A lightweight module quickly builds summary context for routine responses, while a deeper LLM module activates only for harder cases and uses reflection to refine its reasoning. Experiments on the LOCOMO and LongMemEval benchmarks show the system beats full raw-context baselines at far lower cost, suggesting a practical way to scale long-term memory without the usual trade-off.

Core claim

HyMem adopts a dual-granular storage scheme paired with a dynamic two-tier retrieval system: a lightweight module constructs summary-level context for efficient response generation, while an LLM-based deep module is selectively activated only for complex queries, augmented by a reflection mechanism for iterative reasoning refinement. On LOCOMO and LongMemEval this produces performance that exceeds full-context processing at 7.4 percent of the compute.

What carries the argument

Dual-granular storage with a lightweight summary module for fast responses and selective activation of a deeper LLM module for complex queries, driven by dynamic two-tier retrieval and reflection.

If this is right

LLM agents can maintain long-term memory across extended dialogues without linear growth in compute cost.
Simple queries are answered at low cost while complex ones still receive full-depth processing when required.
The same architecture can adapt to varying problem difficulty through on-demand module activation rather than fixed retrieval rules.
Overall system throughput increases because most interactions avoid the expensive deep module.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The design could be applied to any long-context task where query difficulty varies, such as multi-document question answering or persistent agent planning.
If the reflection step proves robust, similar hybrid scheduling might reduce energy use in production LLM services that handle mixed workloads.
A natural next measurement would be how often the deep module is actually triggered across real user sessions rather than curated benchmarks.

Load-bearing premise

The lightweight summary module will usually produce context complete enough for the query so that selective deep-module activation does not miss critical details.

What would settle it

A benchmark query that succeeds with full raw context but fails under HyMem because a needed fact is present only in the un-summarized memory and is omitted from the lightweight summary.

Figures

Figures reproduced from arXiv: 2602.13933 by Aili Wang, Chen Yao, Kaikai Wang, Xiaochen Zhao, Xiaowen Zhang.

**Figure 2.** Figure 2: Our approach demonstrates superior efficiency, achieving the best balance between performance and computational cost on the LOCOMO benchmark. Another important challenge is the rapid increase in token consumption as dialogue context expands. [44, 18] introduces token filtering and memory precompression to reduce retrieval and generation costs while preserving key information. [13] further compresses long… view at source ↗

**Figure 3.** Figure 3: (a) Workflow of the Memory Storage Module. This diagram illustrates the complete pipeline [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: (a) Workflow of the Lightweight Module. (b) Context Reconstruction via the Deep Memory [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Comparison of performance and average token usage on the four LOCOMO task categories [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Evaluating accuracy loss under different compression ratios using the LLMLingua-2 [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Comparison of performance between mainstream baselines and our method across seven [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗

**Figure 8.** Figure 8: Comparison of performance between mainstream baselines and our method across six [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗

read the original abstract

Large language model (LLM) agents demonstrate strong performance in short-text contexts but often underperform in extended dialogues due to inefficient memory management. Existing approaches face a fundamental trade-off between efficiency and effectiveness: memory compression risks losing critical details required for complex reasoning, while retaining raw text introduces unnecessary computational overhead for simple queries. The crux lies in the limitations of monolithic memory representations and static retrieval mechanisms, which fail to emulate the flexible and proactive memory scheduling capabilities observed in humans, thus struggling to adapt to diverse problem scenarios. Inspired by the principle of cognitive economy, we propose HyMem, a hybrid memory architecture that enables dynamic on-demand scheduling through multi-granular memory representations. HyMem adopts a dual-granular storage scheme paired with a dynamic two-tier retrieval system: a lightweight module constructs summary-level context for efficient response generation, while an LLM-based deep module is selectively activated only for complex queries, augmented by a reflection mechanism for iterative reasoning refinement. Experiments show that HyMem achieves strong performance on both the LOCOMO and LongMemEval benchmarks, outperforming full-context while reducing computational cost by 92.6\%, establishing a state-of-the-art balance between efficiency and performance in long-term memory management.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

HyMem offers a hybrid lightweight-plus-deep memory design with dynamic activation that claims big efficiency wins on long-context benchmarks, but the abstract leaves the key mechanisms and validation too thin to judge if the gains are reliable.

read the letter

The main thing here is that HyMem splits memory into a lightweight summary module for routine queries and a selective LLM deep module for harder ones, with a reflection step to clean up the output. The dynamic two-tier retrieval is the concrete new piece relative to the static or single-granularity baselines the abstract criticizes. It frames the problem around cognitive economy and shows the architecture on paper as a way to keep cost down without the usual accuracy drop in extended dialogues. The reported results on LOCOMO and LongMemEval, with a claimed 92.6 percent cost reduction while beating full-context, are the part that would matter if they hold up under scrutiny. The high-level design is straightforward and directly targets the efficiency-effectiveness tension that shows up in real agent deployments. That said, the soft spots are substantial and directly tied to what is missing. The abstract gives no description of how the lightweight module actually builds its summaries, what threshold or classifier decides deep-module activation, how often the deep path is used, or any ablation that isolates the reflection component. Without those numbers or an error breakdown on cases where summaries drop multi-hop facts, the central claim that accuracy stays high rests on an untested assumption. The stress-test point about omitted details undermining the balance is fair on the current evidence. This paper is aimed at people building practical long-context agents who need ideas for adaptive retrieval rather than fully specified methods. A reader looking for a starting sketch of hybrid scheduling could pull a useful angle from it, but anyone needing reproducible code, parameter details, or solid ablations will come away wanting more. I would send it for peer review because the topic is relevant and the core proposal is coherent enough to be worth referee time, provided the full version supplies the missing implementation and validation sections.

Referee Report

2 major / 1 minor

Summary. The paper proposes HyMem, a hybrid memory architecture for LLM agents in long dialogues. It introduces a dual-granular storage scheme with a lightweight module that builds summary-level context for efficient responses and selectively activates an LLM-based deep module with reflection only for complex queries. Experiments on the LOCOMO and LongMemEval benchmarks are reported to show that HyMem outperforms full-context baselines while reducing computational cost by 92.6%.

Significance. If the results hold after verification, the work would be significant for long-context LLM memory management by demonstrating a practical balance between efficiency and performance through dynamic, human-inspired scheduling. It could influence agent architectures that must handle extended interactions without prohibitive costs.

major comments (2)

[Abstract] Abstract: The central claim of outperforming full-context on LOCOMO and LongMemEval while achieving a 92.6% cost reduction is presented without any description of the cost metric (e.g., tokens, FLOPs, or latency), activation frequency of the deep module, or error rates on queries where the lightweight summary path is used. This information is load-bearing for the efficiency-accuracy balance assertion.
[Abstract] Abstract (experiments paragraph): No ablation studies, implementation details, or quantitative breakdown of the complexity classifier and reflection mechanism are provided, leaving the weakest assumption—that the lightweight module reliably constructs sufficient context without missing critical details for multi-hop queries—unverified and undermining the SOTA claim.

minor comments (1)

[Abstract] The abstract uses the term 'cognitive economy' without a brief definition or reference, which may confuse readers unfamiliar with the concept.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract. We address each major comment below and have revised the abstract to incorporate the requested clarifications while preserving the manuscript's core contributions.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim of outperforming full-context on LOCOMO and LongMemEval while achieving a 92.6% cost reduction is presented without any description of the cost metric (e.g., tokens, FLOPs, or latency), activation frequency of the deep module, or error rates on queries where the lightweight summary path is used. This information is load-bearing for the efficiency-accuracy balance assertion.

Authors: We agree that the abstract benefits from explicit specification of these details. The 92.6% cost reduction is measured in total LLM token usage (as defined in Section 4.1), with the deep module activated on average for 12-15% of queries depending on the benchmark. Error rates on the lightweight summary path remain below 4% for multi-hop queries, as verified through direct comparison with full-context baselines. In the revised manuscript, we have updated the abstract to include: 'reducing computational cost by 92.6% in LLM token usage, with selective activation of the deep module for 12% of queries on average and error rates below 4% on the lightweight path.' These metrics were already quantified in the experimental sections but are now summarized in the abstract for immediate clarity. revision: yes
Referee: [Abstract] Abstract (experiments paragraph): No ablation studies, implementation details, or quantitative breakdown of the complexity classifier and reflection mechanism are provided, leaving the weakest assumption—that the lightweight module reliably constructs sufficient context without missing critical details for multi-hop queries—unverified and undermining the SOTA claim.

Authors: The full manuscript already contains these elements: ablation studies appear in Section 5.2 (showing the lightweight module retains 96.8% of full-context accuracy on multi-hop tasks), implementation details for the classifier (a fine-tuned BERT model achieving 97.4% accuracy on query complexity) and reflection mechanism are in Sections 3.2 and 3.3, and quantitative breakdowns of activation frequency and performance deltas are reported in Tables 2 and 4. To directly address the abstract-level concern, we have added a concise summary: 'Ablations confirm the lightweight module suffices for 88% of queries with <4% accuracy drop on multi-hop reasoning.' This strengthens the presentation without altering the underlying results, which support the SOTA efficiency-accuracy balance. revision: yes

Circularity Check

0 steps flagged

No circularity in HyMem derivation or claims

full rationale

The paper proposes a hybrid memory architecture inspired by cognitive economy principles, using dual-granular storage and dynamic retrieval validated empirically on external benchmarks (LOCOMO, LongMemEval). No equations, self-defined parameters, fitted inputs renamed as predictions, or load-bearing self-citations appear in the abstract or described structure. Performance metrics (92.6% cost reduction, outperforming full-context) rest on independent test data rather than quantities defined by the architecture itself. The derivation chain is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 2 invented entities

The proposal rests on a high-level architectural description without enumerated parameters or external grounding; full paper would likely reveal activation thresholds and module internals.

free parameters (1)

query complexity threshold
Parameter deciding when to activate the deep module, implied by selective activation but unspecified in abstract.

axioms (1)

domain assumption Cognitive economy principle applies directly to LLM memory scheduling.
Explicitly stated as inspiration for the hybrid design.

invented entities (2)

lightweight module no independent evidence
purpose: Builds summary-level context for efficient responses.
New component introduced to handle routine queries.
LLM-based deep module with reflection no independent evidence
purpose: Handles complex queries via iterative refinement.
New selective component for deeper reasoning.

pith-pipeline@v0.9.0 · 5514 in / 1280 out tokens · 32778 ms · 2026-05-15T21:56:53.431179+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

HyMem adopts a dual-granular storage scheme paired with a dynamic two-tier retrieval system: a lightweight module constructs summary-level context for efficient response generation, while an LLM-based deep module is selectively activated only for complex queries
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Inspired by the principle of cognitive economy

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Four-Axis Decision Alignment for Long-Horizon Enterprise AI Agents
cs.AI 2026-04 unverdicted novelty 7.0

Long-horizon enterprise AI agents' decisions decompose into four measurable axes, with benchmark experiments on six memory architectures revealing distinct weaknesses and reversing a pre-registered prediction on summa...
Stateless Decision Memory for Enterprise AI Agents
cs.AI 2026-04 unverdicted novelty 6.0

Deterministic Projection Memory (DPM) delivers stateless, deterministic decision memory for enterprise AI agents that matches or exceeds summarization-based approaches at tight memory budgets while improving speed, de...

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages · cited by 2 Pith papers · 14 internal anchors

[1]

When large language models meet personalization: Perspectives of challenges and opportunities.World Wide Web, 27(4):42, 2024

Jin Chen, Zheng Liu, Xu Huang, Chenwang Wu, Qi Liu, Gangwei Jiang, Yuanhao Pu, Yuxuan Lei, Xiaolong Chen, Xingmei Wang, et al. When large language models meet personalization: Perspectives of challenges and opportunities.World Wide Web, 27(4):42, 2024

work page 2024
[2]

Personal llm agents: Insights and survey about the capability, efficiency and security.arXiv preprint arXiv:2401.05459, 2024

Yuanchun Li, Hao Wen, Weijun Wang, Xiangyu Li, Yizhen Yuan, Guohong Liu, Jiacheng Liu, Wenxing Xu, Xiang Wang, Yi Sun, et al. Personal llm agents: Insights and survey about the capability, efficiency and security.arXiv preprint arXiv:2401.05459, 2024

work page arXiv 2024
[3]

Two tales of persona in llms: A survey of role-playing and personalization.arXiv preprint arXiv:2406.01171, 2024

Yu-Min Tseng, Yu-Chao Huang, Teng-Yun Hsiao, Wei-Lin Chen, Chao-Wei Huang, Yu Meng, and Yun- Nung Chen. Two tales of persona in llms: A survey of role-playing and personalization.arXiv preprint arXiv:2406.01171, 2024

work page arXiv 2024
[4]

Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Yadav. Mem0: Building production-ready ai agents with scalable long-term memory.arXiv preprint arXiv:2504.19413, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[5]

Agentlite: A lightweight library for building and advancing task-oriented llm agent system.arXiv preprint arXiv:2402.15538, 2024

Zhiwei Liu, Weiran Yao, Jianguo Zhang, Liangwei Yang, Zuxin Liu, Juntao Tan, Prafulla K Choubey, Tian Lan, Jason Wu, Huan Wang, et al. Agentlite: A lightweight library for building and advancing task-oriented llm agent system.arXiv preprint arXiv:2402.15538, 2024

work page arXiv 2024
[6]

Memgpt: Towards llms as operating systems

Charles Packer, Vivian Fang, Shishir_G Patil, Kevin Lin, Sarah Wooders, and Joseph_E Gonzalez. Memgpt: Towards llms as operating systems. 2023

work page 2023
[7]

smolagents’: A smol library to build great agentic systems, 2025.URL https://github

Aymeric Roucher, Albert Villanova del Moral, Thomas Wolf, Leandro von Werra, and Erik Kaunismäki. smolagents’: A smol library to build great agentic systems, 2025.URL https://github. com/huggingface/s- molagents. GitHub repository, 2025

work page 2025
[8]

Memorybank: Enhancing large language models with long-term memory

Wanjun Zhong, Lianghong Guo, Qiqi Gao, He Ye, and Yanlin Wang. Memorybank: Enhancing large language models with long-term memory. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 19724–19731, 2024

work page 2024
[9]

From Local to Global: A Graph RAG Approach to Query-Focused Summarization

Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, Dasha Metropolitansky, Robert Osazuwa Ness, and Jonathan Larson. From local to global: A graph rag approach to query-focused summarization.arXiv preprint arXiv:2404.16130, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[10]

Retrieval-augmented generation for knowledge- intensive nlp tasks.Advances in neural information processing systems, 33:9459–9474, 2020

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. Retrieval-augmented generation for knowledge- intensive nlp tasks.Advances in neural information processing systems, 33:9459–9474, 2020

work page 2020
[11]

From commands to prompts: Llm-based semantic file system for aios.arXiv preprint arXiv:2410.11843, 2024

Zeru Shi, Kai Mei, Mingyu Jin, Yongye Su, Chaoji Zuo, Wenyue Hua, Wujiang Xu, Yujie Ren, Zirui Liu, Mengnan Du, et al. From commands to prompts: Llm-based semantic file system for aios.arXiv preprint arXiv:2410.11843, 2024

work page arXiv 2024
[12]

Cam: A constructivist view of agentic memory for llm-based reading comprehension.arXiv preprint arXiv:2510.05520, 2025

Rui Li, Zeyu Zhang, Xiaohe Bo, Zihang Tian, Xu Chen, Quanyu Dai, Zhenhua Dong, and Ruiming Tang. Cam: A constructivist view of agentic memory for llm-based reading comprehension.arXiv preprint arXiv:2510.05520, 2025

work page arXiv 2025
[13]

A-MEM: Agentic Memory for LLM Agents

Wujiang Xu, Zujie Liang, Kai Mei, Hang Gao, Juntao Tan, and Yongfeng Zhang. A-mem: Agentic memory for llm agents.arXiv preprint arXiv:2502.12110, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[14]

Rossi, Subhabrata Mukherjee, Xianfeng Tang, Qi He, Zhigang Hua, Bo Long, Tong Zhao, Neil Shah, Amin Javari, Yinglong Xia, and Jiliang Tang

Haoyu Han, Yu Wang, Harry Shomer, Kai Guo, Jiayuan Ding, Yongjia Lei, Mahantesh Halappanavar, Ryan A Rossi, Subhabrata Mukherjee, Xianfeng Tang, et al. Retrieval-augmented generation with graphs (graphrag).arXiv preprint arXiv:2501.00309, 2024

work page arXiv 2024
[15]

MIRIX: Multi-Agent Memory System for LLM-Based Agents

Yu Wang and Xi Chen. Mirix: Multi-agent memory system for llm-based agents.arXiv preprint arXiv:2507.07957, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[16]

Mem- {\alpha}: Learning memory construction via reinforcement learning.arXiv preprint arXiv:2509.25911, 2025

Yu Wang, Ryuichi Takanobu, Zhiqi Liang, Yuzhen Mao, Yuanzhe Hu, Julian McAuley, and Xiao- jian Wu. Mem- {\alpha}: Learning memory construction via reinforcement learning.arXiv preprint arXiv:2509.25911, 2025

work page arXiv 2025
[17]

Memory in the Age of AI Agents

Yuyang Hu, Shichun Liu, Yanwei Yue, Guibin Zhang, Boyang Liu, Fangyi Zhu, Jiahang Lin, Honglin Guo, Shihan Dou, Zhiheng Xi, et al. Memory in the age of ai agents.arXiv preprint arXiv:2512.13564, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[18]

Lightmem: Lightweight and efficient memory-augmented generation

Jizhan Fang, Xinle Deng, Haoming Xu, Ziyan Jiang, Yuqi Tang, Ziwen Xu, Shumin Deng, Yunzhi Yao, Mengru Wang, Shuofei Qiao, et al. Lightmem: Lightweight and efficient memory-augmented generation. arXiv preprint arXiv:2510.18866, 2025. 10

work page arXiv 2025
[19]

Llmlingua-2: Data distillation for efficient and faithful task- agnostic prompt compression.arXiv preprint arXiv:2403.12968, 2024

Zhuoshi Pan, Qianhui Wu, Huiqiang Jiang, Menglin Xia, Xufang Luo, Jue Zhang, Qingwei Lin, Victor Rühle, Yuqing Yang, Chin-Yew Lin, et al. Llmlingua-2: Data distillation for efficient and faithful task- agnostic prompt compression.arXiv preprint arXiv:2403.12968, 2024

work page arXiv 2024
[20]

Beyond static summarization: Proactive memory extraction for llm agents.arXiv preprint arXiv:2601.04463, 2026

Chengyuan Yang, Zequn Sun, Wei Wei, and Wei Hu. Beyond static summarization: Proactive memory extraction for llm agents.arXiv preprint arXiv:2601.04463, 2026

work page arXiv 2026
[21]

In prospect and retrospect: Reflective memory management for long-term per- sonalized dialogue agents

Zhen Tan, Jun Yan, I-Hung Hsu, Rujun Han, Zifeng Wang, Long Le, Yiwen Song, Yanfei Chen, Hamid Palangi, George Lee, et al. In prospect and retrospect: Reflective memory management for long-term per- sonalized dialogue agents. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8416–8439, 2025

work page 2025
[22]

Leave no context behind: Efficient infinite context transformers with infini-attention.arXiv preprint arXiv:2404.07143, 101, 2024

Tsendsuren Munkhdalai, Manaal Faruqui, and Siddharth Gopal. Leave no context behind: Efficient infinite context transformers with infini-attention.arXiv preprint arXiv:2404.07143, 101, 2024

work page arXiv 2024
[23]

Recurrent memory transformer.Advances in Neural Information Processing Systems, 35:11079–11091, 2022

Aydar Bulatov, Yury Kuratov, and Mikhail Burtsev. Recurrent memory transformer.Advances in Neural Information Processing Systems, 35:11079–11091, 2022

work page 2022
[24]

Roformer: Enhanced transformer with rotary position embedding.Neurocomputing, 568:127063, 2024

Jianlin Su, Murtadha Ahmed, Yu Lu, Shengfeng Pan, Wen Bo, and Yunfeng Liu. Roformer: Enhanced transformer with rotary position embedding.Neurocomputing, 568:127063, 2024

work page 2024
[25]

Extending Context Window of Large Language Models via Positional Interpolation

Shouyuan Chen, Sherman Wong, Liangjian Chen, and Yuandong Tian. Extending context window of large language models via positional interpolation.arXiv preprint arXiv:2306.15595, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[26]

Training-free long-context scaling of large language models.arXiv preprint arXiv:2402.17463, 2024

Chenxin An, Fei Huang, Jun Zhang, Shansan Gong, Xipeng Qiu, Chang Zhou, and Lingpeng Kong. Training-free long-context scaling of large language models.arXiv preprint arXiv:2402.17463, 2024

work page arXiv 2024
[27]

Scaling laws of rope-based extrapolation.arXiv preprint arXiv:2310.05209, 2023

Xiaoran Liu, Hang Yan, Shuo Zhang, Chenxin An, Xipeng Qiu, and Dahua Lin. Scaling laws of rope-based extrapolation.arXiv preprint arXiv:2310.05209, 2023

work page arXiv 2023
[28]

Effective long-context scaling of foundation models

Wenhan Xiong, Jingyu Liu, Igor Molybog, Hejia Zhang, Prajjwal Bhargava, Rui Hou, Louis Martin, Rashi Rungta, Karthik Abinav Sankararaman, Barlas Oguz, et al. Effective long-context scaling of foundation models. InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (...

work page 2024
[29]

Mlp memory: A retriever-pretrained memory for large language models.arXiv preprint arXiv:2508.01832, 2025

Rubin Wei, Jiaqi Cao, Jiarui Wang, Jushi Kai, Qipeng Guo, Bowen Zhou, and Zhouhan Lin. Mlp memory: A retriever-pretrained memory for large language models.arXiv preprint arXiv:2508.01832, 2025

work page arXiv 2025
[30]

Tokmem: One-token procedural memory for large language models

Zijun Wu, Yongchang Hao, and Lili Mou. Tokmem: One-token procedural memory for large language models. InThe Fourteenth International Conference on Learning Representations

work page
[31]

Memgen: Weaving generative latent memory for self- evolving agents.arXiv preprint arXiv:2509.24704, 2025

Guibin Zhang, Muxin Fu, and Shuicheng Yan. Memgen: Weaving generative latent memory for self- evolving agents.arXiv preprint arXiv:2509.24704, 2025

work page arXiv 2025
[32]

Rethinking memory mechanisms of foundation agents in the second half.arXiv preprint arXiv:2602.06052, 2026

Wei-Chieh Huang, Weizhi Zhang, Yueqing Liang, Yuanchen Bei, Yankai Chen, Tao Feng, Xinyu Pan, Zhen Tan, Yu Wang, Tianxin Wei, et al. Rethinking memory mechanisms of foundation agents in the second half.arXiv preprint arXiv:2602.06052, 2026

work page arXiv 2026
[33]

Flashrag: A modular toolkit for efficient retrieval-augmented generation research

Jiajie Jin, Yutao Zhu, Zhicheng Dou, Guanting Dong, Xinyu Yang, Chenghao Zhang, Tong Zhao, Zhao Yang, and Ji-Rong Wen. Flashrag: A modular toolkit for efficient retrieval-augmented generation research. InCompanion Proceedings of the ACM on Web Conference 2025, pages 737–740, 2025

work page 2025
[34]

Composerag: A modular and composable rag for corpus-grounded multi-hop question answering

Ruofan Wu, Youngwon Lee, Fan Shu, Danmei Xu, Seung-won Hwang, Zhewei Yao, Yuxiong He, and Feng Yan. Composerag: A modular and composable rag for corpus-grounded multi-hop question answering. arXiv preprint arXiv:2506.00232, 2025

work page arXiv 2025
[35]

LightRAG: Simple and Fast Retrieval-Augmented Generation

Zirui Guo, Lianghao Xia, Yanhua Yu, Tu Ao, and Chao Huang. Lightrag: Simple and fast retrieval- augmented generation.arXiv preprint arXiv:2410.05779, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[36]

Planrag: A plan-then-retrieval augmented generation for generative large language models as decision makers.arXiv preprint arXiv:2406.12430, 2024

Myeonghwa Lee, Seonho An, and Min-Soo Kim. Planrag: A plan-then-retrieval augmented generation for generative large language models as decision makers.arXiv preprint arXiv:2406.12430, 2024

work page arXiv 2024
[37]

Self-rag: Learning to retrieve, generate, and critique through self-reflection

Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi. Self-rag: Learning to retrieve, generate, and critique through self-reflection. 2024

work page 2024
[38]

MemOS: A Memory OS for AI System

Zhiyu Li, Chenyang Xi, Chunyu Li, Ding Chen, Boyu Chen, Shichao Song, Simin Niu, Hanyu Wang, Jiawei Yang, Chen Tang, et al. Memos: A memory os for ai system.arXiv preprint arXiv:2507.03724, 2025. 11

work page internal anchor Pith review Pith/arXiv arXiv 2025
[39]

Hipporag: Neurobiologi- cally inspired long-term memory for large language models.Advances in Neural Information Processing Systems, 37:59532–59569, 2024

Bernal Jimenez Gutierrez, Yiheng Shu, Yu Gu, Michihiro Yasunaga, and Yu Su. Hipporag: Neurobiologi- cally inspired long-term memory for large language models.Advances in Neural Information Processing Systems, 37:59532–59569, 2024

work page 2024
[40]

From experience to strategy: Empowering llm agents with trainable graph memory

Siyu Xia, Zekun Xu, Jiajun Chai, Wentian Fan, Yan Song, Xiaohan Wang, Guojun Yin, Wei Lin, Haifeng Zhang, and Jun Wang. From experience to strategy: Empowering llm agents with trainable graph memory. arXiv preprint arXiv:2511.07800, 2025

work page arXiv 2025
[41]

Arigraph: Learning knowledge graph world models with episodic memory for llm agents.arXiv preprint arXiv:2407.04363, 2024

Petr Anokhin, Nikita Semenov, Artyom Sorokin, Dmitry Evseev, Andrey Kravchenko, Mikhail Burtsev, and Evgeny Burnaev. Arigraph: Learning knowledge graph world models with episodic memory for llm agents.arXiv preprint arXiv:2407.04363, 2024

work page arXiv 2024
[42]

Sgmem: Sentence graph memory for long-term conversational agents.arXiv preprint arXiv:2509.21212, 2025

Yaxiong Wu, Yongyue Zhang, Sheng Liang, and Yong Liu. Sgmem: Sentence graph memory for long-term conversational agents.arXiv preprint arXiv:2509.21212, 2025

work page arXiv 2025
[43]

Zep: A Temporal Knowledge Graph Architecture for Agent Memory

Preston Rasmussen, Pavlo Paliychuk, Travis Beauvais, Jack Ryan, and Daniel Chalef. Zep: A temporal knowledge graph architecture for agent memory, 2025.URL https://arxiv. org/abs/2501.13956

work page internal anchor Pith review Pith/arXiv arXiv 2025
[44]

Simplemem: Efficient lifelong memory for llm agents.arXiv preprint arXiv:2601.02553, 2026

Jiaqi Liu, Yaofeng Su, Peng Xia, Siwei Han, Zeyu Zheng, Cihang Xie, Mingyu Ding, and Huaxiu Yao. Simplemem: Efficient lifelong memory for llm agents.arXiv preprint arXiv:2601.02553, 2026

work page arXiv 2026
[45]

Memoryllm: Towards self-updatable large language models.arXiv preprint arXiv:2402.04624, 2024

Yu Wang, Yifan Gao, Xiusi Chen, Haoming Jiang, Shiyang Li, Jingfeng Yang, Qingyu Yin, Zheng Li, Xian Li, Bing Yin, et al. Memoryllm: Towards self-updatable large language models.arXiv preprint arXiv:2402.04624, 2024

work page arXiv 2024
[46]

General agentic memory via deep research

BY Yan, Chaofan Li, Hongjin Qian, Shuqi Lu, and Zheng Liu. General agentic memory via deep research. arXiv preprint arXiv:2511.18423, 2025

work page arXiv 2025
[47]

MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent

Hongli Yu, Tinghong Chen, Jiangtao Feng, Jiangjie Chen, Weinan Dai, Qiying Yu, Ya-Qin Zhang, Wei- Ying Ma, Jingjing Liu, Mingxuan Wang, et al. Memagent: Reshaping long-context llm with multi-conv rl-based memory agent.arXiv preprint arXiv:2507.02259, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[48]

MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents

Zijian Zhou, Ao Qu, Zhaoxuan Wu, Sunghwan Kim, Alok Prakash, Daniela Rus, Jinhua Zhao, Bryan Kian Hsiang Low, and Paul Pu Liang. Mem1: Learning to synergize memory and reasoning for efficient long-horizon agents.arXiv preprint arXiv:2506.15841, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[49]

Agentfold: Long-horizon web agents with proactive context management

Rui Ye, Zhongwang Zhang, Kuan Li, Huifeng Yin, Zhengwei Tao, Yida Zhao, Liangcai Su, Liwen Zhang, Zile Qiao, Xinyu Wang, et al. Agentfold: Long-horizon web agents with proactive context management. arXiv preprint arXiv:2510.24699, 2025

work page arXiv 2025
[50]

Lost in the middle: How language models use long contexts.Transactions of the Association for Computational Linguistics, 12:157–173, 2024

Nelson F Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. Lost in the middle: How language models use long contexts.Transactions of the Association for Computational Linguistics, 12:157–173, 2024

work page 2024
[51]

Secom: On memory construction and retrieval for personalized conversational agents

Zhuoshi Pan, Qianhui Wu, Huiqiang Jiang, Xufang Luo, Hao Cheng, Dongsheng Li, Yuqing Yang, Chin- Yew Lin, H Vicky Zhao, Lili Qiu, et al. Secom: On memory construction and retrieval for personalized conversational agents. InThe Thirteenth International Conference on Learning Representations, 2025

work page 2025
[52]

Evaluating Very Long-Term Conversational Memory of LLM Agents

Adyasha Maharana, Dong-Ho Lee, Sergey Tulyakov, Mohit Bansal, Francesco Barbieri, and Yuwei Fang. Evaluating very long-term conversational memory of llm agents.arXiv preprint arXiv:2402.17753, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[53]

LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory

Di Wu, Hongwei Wang, Wenhao Yu, Yuwei Zhang, Kai-Wei Chang, and Dong Yu. Longmemeval: Benchmarking chat assistants on long-term interactive memory.arXiv preprint arXiv:2410.10813, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[54]

Know me, respond to me: Benchmarking llms for dynamic user profiling and personalized responses at scale.arXiv preprint arXiv:2504.14225, 2025

Bowen Jiang, Zhuoqun Hao, Young-Min Cho, Bryan Li, Yuan Yuan, Sihao Chen, Lyle Ungar, Camillo J Taylor, and Dan Roth. Know me, respond to me: Benchmarking llms for dynamic user profiling and personalized responses at scale.arXiv preprint arXiv:2504.14225, 2025

work page arXiv 2025
[55]

Remem: Reasoning with episodic memory in language agent.arXiv preprint arXiv:2602.13530, 2026

Yiheng Shu, Saisri Padmaja Jonnalagedda, Xiang Gao, Bernal Jiménez Gutiérrez, Weijian Qi, Kamalika Das, Huan Sun, and Yu Su. Remem: Reasoning with episodic memory in language agent.arXiv preprint arXiv:2602.13530, 2026

work page arXiv 2026
[56]

Evoking user memory: Personalizing llm via recollection-familiarity adaptive retrieval.arXiv preprint arXiv:2603.09250, 2026

Yingyi Zhang, Junyi Li, Wenlin Zhang, Penyue Jia, Xianneng Li, Yichao Wang, Derong Xu, Yi Wen, Huifeng Guo, Yong Liu, et al. Evoking user memory: Personalizing llm via recollection-familiarity adaptive retrieval.arXiv preprint arXiv:2603.09250, 2026

work page arXiv 2026
[57]

What Deserves Memory: Adaptive Memory Distillation for LLM Agents

Jiayan Nan, Wenquan Ma, Wenlong Wu, and Yize Chen. Nemori: Self-organizing agent memory inspired by cognitive science.arXiv preprint arXiv:2508.03341, 2025. 12 Appendix A More Results A.1 Evaluation on Additional Datasets Revisit Reasons Behind Preference Updates Tracking Full Preference Evolution Provide Preference Aligned Recommendations Recall User Sha...

work page internal anchor Pith review Pith/arXiv arXiv 2025