Exploring Cross-Scenario Generality of Agentic Memory Systems: Diagnostics and a Strong Baseline
Pith reviewed 2026-06-28 06:46 UTC · model grok-4.3
The pith
An agentic harness where the LLM actively manages its own flat text-file storage via tool calls achieves the best cross-scenario ranking among evaluated memory systems.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The agentic harness self-manages flat text-file storage via tool calls and achieves the best cross-task ranking, suggesting that memory performance hinges on giving the agent active control over storage and retrieval rather than on a passive store behind a fixed pipeline. This insight is instantiated in AutoMEM, an agentic memory harness with a self-managed tool interface that achieves the best cross-scenario generality among the systems evaluated.
What carries the argument
The agentic harness for search problems, which self-manages flat text-file storage via tool calls to give the agent active control over memory operations.
Load-bearing premise
The five chosen scenarios represent the heterogeneous trajectories that agents encounter in real deployments.
What would settle it
Running the same set of systems on a sixth scenario outside the original five, such as multi-agent collaborative tasks, and checking whether the agentic harness still produces the highest average rank.
Figures
read the original abstract
LLM agents accumulate histories that outgrow their context windows, motivating a growing literature on memory systems. Yet most existing designs are tuned to a single scenario (multi-session chat or a single trajectory format), and there is little evidence that they generalize across the heterogeneous trajectories agents encounter in deployment. We revisit eight memory systems plus an agentic harness for search problems, on five scenarios: single-turn QA, multi-session chat, agentic-trajectory QA, memory stress tests, and long-horizon agentic tasks. The harness, which self-manages flat text-file storage via tool calls, achieves the best cross-task ranking, suggesting that memory performance hinges on giving the agent active control over storage and retrieval rather than on a passive store behind a fixed pipeline. We instantiate this insight in AutoMEM, an agentic memory harness with a self-managed tool interface that achieves the best cross-scenario generality among the systems we evaluate.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript evaluates eight memory systems plus a new agentic harness (AutoMEM) across five scenarios (single-turn QA, multi-session chat, agentic-trajectory QA, memory stress tests, long-horizon agentic tasks). It reports that the harness, which lets the agent self-manage flat text-file storage via tool calls, achieves the best cross-task ranking and concludes that memory performance hinges on active agent control over storage/retrieval rather than passive stores behind fixed pipelines.
Significance. If the empirical ranking is robust, the work supplies a multi-scenario diagnostic framework and a strong baseline for LLM-agent memory design, explicitly crediting the multi-system evaluation and the agentic harness as a reproducible point of comparison. It shifts focus from single-scenario tuning to cross-scenario generality.
major comments (1)
- [Abstract] Abstract: the claim that the harness's top cross-task ranking licenses the design implication (active control superior to passive pipelines) is load-bearing on the assumption that the five scenarios adequately sample deployment trajectories; the manuscript supplies no diversity metric, coverage argument, or sensitivity analysis showing the scenarios are heterogeneous rather than correlated on dimensions such as context length or retrieval noise.
minor comments (2)
- Clarify the precise interface and state-management protocol of the agentic harness versus the eight baseline systems in the methods section.
- Specify how the cross-task ranking is aggregated (e.g., mean rank, weighted sum) and whether statistical significance or error bars accompany the reported ordering.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback highlighting a key assumption in the abstract. We respond to the major comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that the harness's top cross-task ranking licenses the design implication (active control superior to passive pipelines) is load-bearing on the assumption that the five scenarios adequately sample deployment trajectories; the manuscript supplies no diversity metric, coverage argument, or sensitivity analysis showing the scenarios are heterogeneous rather than correlated on dimensions such as context length or retrieval noise.
Authors: The five scenarios were selected to represent distinct regimes of agent deployment, spanning single-turn retrieval, persistent multi-session interaction, sequential trajectory reasoning, robustness under stress (noise and length), and extended planning horizons, with corresponding differences in context length and interaction structure as detailed in Section 3. We agree that no quantitative diversity metric, coverage argument, or sensitivity analysis is supplied. In revision we will add a short discussion of scenario heterogeneity along the noted dimensions and qualify the abstract claim to indicate that the design implication is drawn from the evaluated set of scenarios rather than asserted as universally sampled. revision: yes
Circularity Check
No circularity: empirical ranking claim is self-contained
full rationale
The paper reports an empirical comparison of nine memory systems (eight baselines plus the proposed harness) across five fixed scenarios and bases its design suggestion on the observed cross-scenario ranking. No equations, fitted parameters, or first-principles derivations appear in the provided text. No self-citations are used to justify uniqueness theorems, ansatzes, or load-bearing premises. The representativeness of the five scenarios is an external validity assumption rather than a circular reduction of any claimed derivation to its own inputs. The evaluation therefore stands as an independent empirical result against the stated benchmarks.
Axiom & Free-Parameter Ledger
invented entities (1)
-
AutoMEM
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Qingyao Ai, Yichen Tang, Changyue Wang, Jianming Long, Weihang Su, and Yiqun Liu. 2025. https://arxiv.org/abs/2510.17281 MemoryBench : A benchmark for memory and continual learning in LLM systems . ArXiv preprint, abs/2510.17281
Pith/arXiv arXiv 2025
-
[2]
Yushi Bai, Shangqing Tu, Jiajie Zhang, Hao Peng, Xiaozhi Wang, Xin Lv, Shulin Cao, Jiazheng Xu, Lei Hou, Yuxiao Dong, Jie Tang, and Juanzi Li. 2024. https://arxiv.org/abs/2412.15204 LongBench v2 : Towards deeper understanding and reasoning on realistic long-context multitasks . ArXiv preprint, abs/2412.15204
Pith/arXiv arXiv 2024
-
[3]
Gunjan Chhablani, Deshraj Khanna, and Singh Taranjeet. 2024. https://github.com/mem0ai/mem0 Mem0: The memory layer for AI agents . GitHub
2024
-
[4]
Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, and Jonathan Larson. 2024. https://arxiv.org/abs/2404.16130 From local to global: A graph RAG approach to query-focused summarization . ArXiv preprint, abs/2404.16130
Pith/arXiv arXiv 2024
-
[5]
Jizhan Fang, Xinle Deng, Haoming Xu, Ziyan Jiang, Yuqi Tang, Ziwen Xu, Shumin Deng, Yunzhi Yao, Mengru Wang, Shuofei Qiao, Huajun Chen, and Ningyu Zhang. 2025. https://arxiv.org/abs/2510.18866 LightMem : Lightweight and efficient memory-augmented generation . ArXiv preprint, abs/2510.18866
Pith/arXiv arXiv 2025
-
[6]
Nadime Francis, Alastair Green, Paolo Guagliardo, Leonid Libkin, Tobias Lindaaker, Victor Marsault, Stefan Plantikow, Mats Rydberg, Petra Selmer, and Andr \'e s Taylor. 2018. Cypher: An evolving query language for property graphs. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD), pages 1433--1445
2018
-
[7]
Bernal Jim \'e nez Guti \'e rrez, Yiheng Shu, Weijian Qi, Sizhe Zhou, and Yu Su. 2025. https://arxiv.org/abs/2502.14802 From RAG to memory: Non-parametric continual learning for large language models . ArXiv preprint, abs/2502.14802
Pith/arXiv arXiv 2025
-
[8]
Zexue He, Yu Wang, Churan Zhi, Yuanzhe Hu, Tzu-Ping Chen, Lang Yin, Ze Chen, Tong Arthur Wu, Siru Ouyang, Zihan Wang, Jiaxin Pei, Julian McAuley, Yejin Choi, and Alex Pentland. 2026. https://arxiv.org/abs/2602.16313 MemoryArena : Benchmarking agent memory in interdependent multi-session agentic tasks . ArXiv preprint, abs/2602.16313
arXiv 2026
-
[9]
Stefan Heule, Emily Jia, and Naman Jain. 2025. https://cursor.com/blog/semsearch Improving agent with semantic search . Cursor Blog. Published November 6, 2025
2025
-
[10]
Cheng-Ping Hsieh, Simeng Sun, Samuel Kriman, Shantanu Acharya, Dima Rekesh, Fei Jia, Yang Zhang, and Boris Ginsburg. 2024. https://arxiv.org/abs/2404.06654 RULER : What's the real context size of your long-context language models? ArXiv preprint, abs/2404.06654
Pith/arXiv arXiv 2024
-
[11]
Tiansheng Hu, Yilun Zhao, Canyu Zhang, Arman Cohan, and Chen Zhao. 2026 a . https://arxiv.org/abs/2602.05975 SAGE : Benchmarking and improving retrieval for deep research agents . ArXiv preprint, abs/2602.05975
arXiv 2026
-
[12]
Yuanzhe Hu, Yu Wang, and Julian McAuley. 2025. https://arxiv.org/abs/2507.05257 MemoryAgentBench : Evaluating memory in LLM agents via incremental multi-turn interactions . ArXiv preprint, abs/2507.05257
Pith/arXiv arXiv 2025
-
[13]
Yuyang Hu, Shichun Liu, Yanwei Yue, Guibin Zhang, Boyang Liu, Fangyi Zhu, Jiahang Lin, Honglin Guo, Shihan Dou, Zhiheng Xi, Senjie Jin, Jiejun Tan, Yanbin Yin, Jiongnan Liu, Zeyu Zhang, Zhongxiang Sun, Yutao Zhu, Hao Sun, Boci Peng, and 28 others. 2026 b . https://arxiv.org/abs/2512.13564 Memory in the age of AI agents . ArXiv preprint, abs/2512.13564
Pith/arXiv arXiv 2026
-
[14]
Jim \'e nez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan
Carlos E. Jim \'e nez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan. 2023. https://arxiv.org/abs/2310.06770 SWE -bench: Can language models resolve real-world GitHub issues? ArXiv preprint, abs/2310.06770
Pith/arXiv arXiv 2023
-
[15]
Hao Kang, Ziyang Li, Xinyu Yang, Weili Xu, Yinfang Chen, Junxiong Wang, Beidi Chen, Tushar Krishna, Chenfeng Xu, and Simran Arora. 2026. https://arxiv.org/abs/2602.13692 ThunderAgent : A simple, fast and program-aware agentic inference system . ArXiv preprint, abs/2602.13692
arXiv 2026
-
[16]
Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. 2020. https://doi.org/10.18653/v1/2020.emnlp-main.550 Dense passage retrieval for open-domain question answering . In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6769--6781, Online. Ass...
-
[17]
Dong-Ho Lee, Adyasha Maharana, Jay Pujara, Xiang Ren, and Francesco Barbieri. 2025. https://arxiv.org/abs/2502.13270 REALTALK : A 21-day real-world dataset for long-term conversation . ArXiv preprint, abs/2502.13270
arXiv 2025
-
[18]
u ttler, Mike Lewis, Wen - tau Yih, Tim Rockt \
Patrick S. H. Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich K \" u ttler, Mike Lewis, Wen - tau Yih, Tim Rockt \" a schel, Sebastian Riedel, and Douwe Kiela. 2020. https://proceedings.neurips.cc/paper/2020/hash/6b493230205f780e1bc26945df7481e5-Abstract.html Retrieval-augmented generation for knowledge-inte...
2020
-
[19]
Kai Li, Xuanqing Yu, Ziyi Ni, Yi Zeng, Yao Xu, Zheqing Zhang, Xin Li, Jitao Sang, Xiaogang Duan, Xuelei Wang, Chengbao Liu, and Jie Tan. 2026 a . https://arxiv.org/abs/2601.02845 TiMem : Temporal-hierarchical memory consolidation for long-horizon conversational agents . ArXiv preprint, abs/2601.02845
Pith/arXiv arXiv 2026
-
[20]
Yifei Li, Weidong Guo, Lingling Zhang, Rongman Xu, Muye Huang, Hui Liu, Lijiao Xu, Yu Xu, and Jun Liu. 2026 b . https://arxiv.org/abs/2602.10715 Locomo-plus: Beyond-factual cognitive memory evaluation framework for LLM agents . ArXiv preprint, abs/2602.10715
arXiv 2026
-
[21]
Zhuofeng Li, Haoxiang Zhang, Cong Wei, Pan Lu, Ping Nie, Yi Lu, Yuyang Bai, Shangbin Feng, Hangxiao Zhu, Ming Zhong, Yuyu Zhang, Jianwen Xie, Yejin Choi, James Zou, Jiawei Han, Wenhu Chen, Jimmy Lin, Dongfu Jiang, and Yu Zhang. 2026 c . https://arxiv.org/abs/2605.05242 Beyond semantic similarity: Rethinking retrieval for agentic search via direct corpus i...
Pith/arXiv arXiv 2026
-
[22]
Jiaqi Liu, Yaofeng Su, Peng Xia, Siwei Han, Zeyu Zheng, Cihang Xie, Mingyu Ding, and Huaxiu Yao. 2026 a . https://arxiv.org/abs/2601.02553 SimpleMem : Efficient lifelong memory for LLM agents . ArXiv preprint, abs/2601.02553
Pith/arXiv arXiv 2026
-
[23]
Xiaoyuan Liu, Tian Liang, Dongyang Ma, Deyu Zhou, Haitao Mi, Pinjia He, and Yan Wang. 2026 b . https://arxiv.org/abs/2602.12108 The pensieve paradigm: Stateful language models mastering their own context . ArXiv preprint, abs/2602.12108
arXiv 2026
-
[24]
Adyasha Maharana, Dong-Ho Lee, Sergey Tulyakov, Mohit Bansal, Francesco Barbieri, and Yuwei Fang. 2024. https://arxiv.org/abs/2402.17753 Evaluating very long-term conversational memory of LLM agents . ArXiv preprint, abs/2402.17753
Pith/arXiv arXiv 2024
-
[25]
Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul F. Christiano, Jan Leike, and Ryan Lowe. 2022. Training language models to follow instructions with hum...
2022
-
[26]
Patil, Kevin Lin, Sarah Wooders, and Joseph E
Charles Packer, Vivian Fang, Shishir G. Patil, Kevin Lin, Sarah Wooders, and Joseph E. Gonzalez. 2023. https://arxiv.org/abs/2310.08560 MemGPT : Towards LLMs as operating systems . ArXiv preprint, abs/2310.08560
Pith/arXiv arXiv 2023
-
[27]
Richard Yuanzhe Pang, Alicia Parrish, Nitish Joshi, Nikita Nangia, Jason Phang, Angelica Chen, Vishakh Padmakumar, Johnny Ma, Jana Thompson, He He, and Samuel R. Bowman. 2021. https://arxiv.org/abs/2112.08608 QuALITY : Question answering with long input texts, yes! ArXiv preprint, abs/2112.08608
arXiv 2021
-
[28]
Natchanon Pollertlam and Witchayut Kornsuwannawit. 2026. https://arxiv.org/abs/2603.04814 Beyond the context window: A cost-performance analysis of fact-based memory vs.\ long-context LLM s for persistent agents . ArXiv preprint, abs/2603.04814
arXiv 2026
-
[29]
Hongjin Qian, Zheng Liu, Peitian Zhang, Kelong Mao, Defu Lian, Zhicheng Dou, and Tiejun Huang. 2024. https://arxiv.org/abs/2409.05591 MemoRAG : Boosting long context processing with global memory-enhanced retrieval augmentation . ArXiv preprint, abs/2409.05591
arXiv 2024
-
[30]
Alireza Salemi, Sheshera Mysore, Michael Bendersky, and Hamed Zamani. 2024. https://doi.org/10.18653/v1/2024.acl-long.399 LaMP : When large language models meet personalization . In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7370--7392. Association for Computational Linguistics
-
[31]
Timo Schick, Jane Dwivedi-Yu, Roberto Dess \`i , Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. 2023. https://arxiv.org/abs/2302.04761 Toolformer: Language models can teach themselves to use tools . ArXiv preprint, abs/2302.04761
Pith/arXiv arXiv 2023
-
[32]
Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y.K. Li, Y. Wu, and Daya Guo. 2024. https://arxiv.org/abs/2402.03300 DeepSeekMath : Pushing the limits of mathematical reasoning in open language models . ArXiv preprint, abs/2402.03300
Pith/arXiv arXiv 2024
-
[33]
Mohit Shridhar, Xingdi Yuan, Marc-Alexandre C \^o t \'e , Yonatan Bisk, Adam Trischler, and Matthew Hausknecht. 2020. https://arxiv.org/abs/2010.03768 ALFWorld : Aligning text and embodied environments for interactive learning . ArXiv preprint, abs/2010.03768
Pith/arXiv arXiv 2020
-
[34]
Yiheng Shu, Saisri Padmaja Jonnalagedda, Xiang Gao, Bernal Jim \'e nez Guti \'e rrez, Weijian Qi, Kamalika Das, Huan Sun, and Yu Su. 2026. https://arxiv.org/abs/2602.13530 REMem : Reasoning with episodic memory in language agent . ArXiv preprint, abs/2602.13530
arXiv 2026
-
[35]
Anxin Tian, Yiming Li, Xing Li, Hui-Ling Zhen, Lei Chen, Xianzhi Yu, Zhenhua Dong, and Mingxuan Yuan. 2026. https://arxiv.org/abs/2601.08160 SwiftMem : Fast agentic memory via query-aware indexing . ArXiv preprint, abs/2601.08160
arXiv 2026
-
[36]
Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, Wayne Xin Zhao, Zhewei Wei, and Ji-Rong Wen. 2024. https://doi.org/10.1007/s11704-024-40231-1 A survey on large language model based autonomous agents . Frontiers of Computer Science, 18(6):186345
-
[37]
Yinjie Wang, Xuyang Chen, Xiaolong Jin, Mengdi Wang, and Ling Yang. 2026. https://arxiv.org/abs/2603.10165 OpenClaw-RL : Train any agent simply by talking . ArXiv preprint, abs/2603.10165
Pith/arXiv arXiv 2026
-
[38]
Yu Wang, Ryuichi Takanobu, Zhiqi Liang, Yuzhen Mao, Yuanzhe Hu, Julian McAuley, and Xiaojian Wu. 2025. https://arxiv.org/abs/2509.25911 Mem- : Learning memory construction via reinforcement learning . ArXiv preprint, abs/2509.25911
Pith/arXiv arXiv 2025
-
[39]
Di Wu, Hongwei Wang, Wenhao Yu, Yuwei Zhang, Kai-Wei Chang, and Dong Yu. 2024. https://arxiv.org/abs/2410.10813 LongMemEval : Benchmarking chat assistants on long-term interactive memory . ArXiv preprint, abs/2410.10813
Pith/arXiv arXiv 2024
-
[40]
Peng Xia, Jianwen Chen, Hanyang Wang, Jiaqi Liu, Kaide Zeng, Yu Wang, Siwei Han, Yiyang Zhou, Xujiang Zhao, Haifeng Chen, Zeyu Zheng, Cihang Xie, and Huaxiu Yao. 2026. https://arxiv.org/abs/2602.08234 SkillRL : Evolving agents via recursive skill-augmented reinforcement learning . ArXiv preprint, abs/2602.08234
Pith/arXiv arXiv 2026
-
[41]
Jian Xie, Kai Zhang, Jiangjie Chen, Tinghui Zhu, Renze Lou, Yuandong Tian, Yanghua Xiao, and Yu Su. 2024. https://arxiv.org/abs/2402.01622 TravelPlanner : A benchmark for real-world planning with language agents . ArXiv preprint, abs/2402.01622
arXiv 2024
-
[42]
Yiming Xiong, Shengran Hu, and Jeff Clune. 2026. https://arxiv.org/abs/2602.07755 Learning to continually learn via meta-learning agentic memory designs . ArXiv preprint, abs/2602.07755
arXiv 2026
-
[43]
Wujiang Xu, Zujie Liang, Kai Mei, Hang Gao, Juntao Tan, and Yongfeng Zhang. 2025. https://arxiv.org/abs/2502.12110 A-MEM : Agentic memory for LLM agents . ArXiv preprint, abs/2502.12110
Pith/arXiv arXiv 2025
-
[44]
Xiucheng Xu, Bingbing Xu, Xueyun Tian, Zihe Huang, Rongxin Chen, Yunfan Li, and Huawei Shen. 2026. https://arxiv.org/abs/2601.14287 Chain-of-memory: Lightweight memory construction with dynamic evolution for LLM agents . ArXiv preprint, abs/2601.14287
Pith/arXiv arXiv 2026
-
[45]
Pan, Hinrich Sch \"u tze, Volker Tresp, and Yunpu Ma
Sikuan Yan, Xiufeng Yang, Zuchao Huang, Ercong Nie, Zifeng Ding, Zonggen Li, Xiaowen Ma, Jinhe Bi, Kristian Kersting, Jeff Z. Pan, Hinrich Sch \"u tze, Volker Tresp, and Yunpu Ma. 2025. https://arxiv.org/abs/2508.19828 Memory- R1 : Enhancing large language model agents to manage and utilize memories via reinforcement learning . ArXiv preprint, abs/2508.19828
Pith/arXiv arXiv 2025
-
[46]
An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, and 1 others. 2025. https://arxiv.org/abs/2505.09388 Qwen3 technical report . ArXiv preprint, abs/2505.09388
Pith/arXiv arXiv 2025
-
[47]
Ke Yang, Zixi Chen, Xuan He, Jize Jiang, Michel Galley, Chenglong Wang, Jianfeng Gao, Jiawei Han, and ChengXiang Zhai. 2026. https://arxiv.org/abs/2603.03296 PlugMem : A task-agnostic plugin memory module for LLM agents . ArXiv preprint, abs/2603.03296
arXiv 2026
-
[48]
Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William Cohen, Ruslan Salakhutdinov, and Christopher D. Manning. 2018. https://doi.org/10.18653/v1/D18-1259 H otpot QA : A dataset for diverse, explainable multi-hop question answering . In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2369--2380, Brussels...
-
[49]
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2022. https://arxiv.org/abs/2210.03629 ReAct : Synergizing reasoning and acting in language models . ArXiv preprint, abs/2210.03629
Pith/arXiv arXiv 2022
-
[50]
Hongli Yu, Tinghong Chen, Jiangtao Feng, Jiangjie Chen, Weinan Dai, Qiying Yu, Ya-Qin Zhang, Wei-Ying Ma, Jingjing Liu, Mingxuan Wang, and Hao Zhou. 2025. https://arxiv.org/abs/2507.02259 MemAgent : Reshaping long-context LLM with multi-conv RL -based memory agent . ArXiv preprint, abs/2507.02259
Pith/arXiv arXiv 2025
-
[51]
Yanwei Yue, Boci Peng, Xuanbo Fan, Jiaxin Guo, Qiankun Li, and Yan Zhang. 2026. https://arxiv.org/abs/2601.23014 Mem-T : Densifying rewards for long-horizon memory agents . ArXiv preprint, abs/2601.23014
arXiv 2026
-
[52]
Haozhen Zhang, Haodong Yue, Tao Feng, Quanyu Long, Jianzhu Bao, Bowen Jin, Weizhi Zhang, Xiao Li, Jiaxuan You, Chengwei Qin, and Wenya Wang. 2026 a . https://arxiv.org/abs/2602.06025 Learning query-aware budget-tier routing for runtime agent memory . ArXiv preprint, abs/2602.06025
Pith/arXiv arXiv 2026
-
[53]
Shengtao Zhang, Jiaqian Wang, Ruiwen Zhou, Junwei Liao, Yuchen Feng, Zhuo Li, Yujie Zheng, Weinan Zhang, Ying Wen, Zhiyu Li, Feiyu Xiong, Yutao Qi, Bo Tang, and Muning Wen. 2026 b . https://arxiv.org/abs/2601.03192 MemRL : Self-evolving agents via runtime reinforcement learning on episodic memory . ArXiv preprint, abs/2601.03192
Pith/arXiv arXiv 2026
-
[54]
Yujie Zhao, Boqin Yuan, Junbo Huang, Haocheng Yuan, Zhongming Yu, Haozhou Xu, Lanxiang Hu, Abhilash Shankarampeta, Zimeng Huang, Wentao Ni, Yuandong Tian, and Jishen Zhao. 2026. https://arxiv.org/abs/2602.22769 AMA-Bench : Evaluating long-horizon memory for agentic applications . ArXiv preprint, abs/2602.22769
Pith/arXiv arXiv 2026
-
[55]
Wanjun Zhong, Lianghong Guo, Qiqi Gao, He Ye, and Yanlin Wang. 2023. https://arxiv.org/abs/2305.10250 MemoryBank : Enhancing large language models with long-term memory . ArXiv preprint, abs/2305.10250
Pith/arXiv arXiv 2023
-
[56]
Zijian Zhou, Ao Qu, Zhaoxuan Wu, Sunghwan Kim, Alok Prakash, Daniela Rus, Jinhua Zhao, Bryan Kian Hsiang Low, and Paul Pu Liang. 2025. https://arxiv.org/abs/2506.15841 MEM1 : Learning to synergize memory and reasoning for efficient long-horizon agents . ArXiv preprint, abs/2506.15841
Pith/arXiv arXiv 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.