Organize then Retrieve: Hierarchical Memory Navigation for Efficient Agents

Boyi Liu; Hao-Lun Hsu; Nikki Lijing Kuang; Yuxiong He; Zhewei Yao

arxiv: 2606.11680 · v1 · pith:PYTYWA33new · submitted 2026-06-10 · 💻 cs.AI · cs.CL· cs.LG

Organize then Retrieve: Hierarchical Memory Navigation for Efficient Agents

Hao-Lun Hsu , Nikki Lijing Kuang , Boyi Liu , Zhewei Yao , Yuxiong He This is my paper

Pith reviewed 2026-06-27 10:05 UTC · model grok-4.3

classification 💻 cs.AI cs.CLcs.LG

keywords hierarchical memoryLLM agentsmemory navigationreinforcement learning retrievalcontext efficiencylong-horizon tasksworking memory organization

0 comments

The pith

HORMA organizes LLM agent experiences into a file-system hierarchy and retrieves via RL navigation to cut token use while preserving task performance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

LLM agents lose effectiveness on long tasks because every new observation must be packed into an ever-growing context window that degrades reasoning and raises costs. HORMA addresses this by first building a hierarchical memory that stores summarized entities linked to their raw trajectories, then training a lightweight navigation agent to traverse that hierarchy and pull only the minimal sufficient context. The construction step refines the hierarchy by separating failures due to absent facts from failures due to overloaded or misleading context. When this two-stage process works, agents complete the same multi-step tasks with at most 22 percent of the tokens required by flat-context baselines while matching or exceeding their success rates on ALFWorld, LoCoMo, and LongMemEval. The design therefore decouples memory capacity from immediate context length.

Core claim

HORMA decomposes agent working memory into a construction module that iteratively builds a file-system-like hierarchy of summarized entities linked to raw trajectories and a navigation module that trains a reinforcement-learning policy to traverse the hierarchy and return the smallest context sufficient for the current step. On long-horizon benchmarks this yields higher task success under fixed context budgets and reduces token consumption to at most 22.17 percent of baseline usage in extended conversation settings while generalizing to unseen tasks.

What carries the argument

The hierarchical file-system-like memory structure whose construction module distinguishes missing-information failures from overloaded-context failures and whose navigation module uses a trained RL policy to select minimal traversals.

If this is right

Task success rises under fixed context budgets on ALFWorld, LoCoMo, and LongMemEval.
Token consumption drops to at most 22.17 percent of baseline in long-conversation settings.
Efficiency-performance trade-offs improve relative to compression or similarity-retrieval baselines.
Performance on unseen tasks remains competitive without retraining the navigation policy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same hierarchy-plus-navigation pattern could be tested on long-document QA or multi-turn tool use where context budgets are similarly constrained.
Replacing the RL navigator with a cheaper heuristic traversal would test whether learned selection is necessary or whether static rules suffice.
Extending the hierarchy depth on tasks longer than those in the current benchmarks would reveal whether the refinement loop scales or saturates.

Load-bearing premise

The construction module can reliably tell apart failures caused by missing information from failures caused by misleading or overloaded context.

What would settle it

A controlled run on ALFWorld in which the construction module is replaced by random or uniform hierarchy updates and task success remains unchanged or improves would falsify the value of the failure-type distinction.

Figures

Figures reproduced from arXiv: 2606.11680 by Boyi Liu, Hao-Lun Hsu, Nikki Lijing Kuang, Yuxiong He, Zhewei Yao.

**Figure 2.** Figure 2: Efficiency-Performance Trade-offs Across Benchmarks: (a) Comparison of average interac [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Analysis of Retrieval Reliability and Skill Acquisition: (a) Hashed bars as the gain when [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

read the original abstract

Large language model (LLM) agents struggle with long-horizon tasks due to their inherent statelessness, requiring all task-relevant information to be encoded in growing input contexts. The resulting degraded reasoning quality, increased inference cost, and higher latency necessitate efficient working memory mechanisms. However, existing approaches either rely on lossy compression or similarity-based retrieval, which often fail to capture temporal structure and causal dependencies required for multi-step agentic tasks. In this work, we present HORMA, a Hierarchical Organize-and-Retrieve Memory Agent that organizes experience into a file-system-like hierarchical structure, where summarized entities are linked to the corresponding raw trajectories, enabling efficient access without losing detailed information. HORMA decomposes working memory into two stages: structured memory construction and navigation-based retrieval. The construction module iteratively refines how experiences are structured by distinguishing between failures caused by missing information and those caused by misleading or overloaded context. The navigation module retrieves task-relevant context by traversing the hierarchy using a lightweight agent trained with reinforcement learning to select minimal yet sufficient context, thereby reducing latency along the critical execution path. Across ALFWorld, LoCoMo, and LongMemEval, HORMA improves task performance under constrained context budgets while requiring at most 22.17% of the baseline token usage in long conversation tasks. Compared to existing methods, it consistently achieves better efficiency-performance trade-offs and generalizes effectively to unseen tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

HORMA's file-system hierarchy plus RL navigator is a concrete attempt at structured memory for agents, but the abstract gives almost no experimental backing so the efficiency claims stay provisional.

read the letter

The paper's main contribution is a two-part memory system: a construction module that builds and refines a hierarchical structure (summaries linked to raw trajectories, organized like directories) by separating missing-info failures from overloaded-context ones, and a navigation module that trains a lightweight RL agent to traverse the hierarchy and pull minimal sufficient context. This setup is distinct from the lossy compression and pure similarity retrieval baselines mentioned.

It handles the long-horizon statelessness problem directly by keeping raw details accessible while controlling what enters the LLM prompt. The reported outcome—at most 22% of baseline token use on long conversation tasks across ALFWorld, LoCoMo, and LongMemEval, with better performance under tight budgets and some generalization to unseen tasks—would be useful if the numbers check out.

The soft spot is the complete absence of experimental detail in the abstract: no description of baselines, statistical tests, ablations, or how the RL reward is defined. The failure-type distinction in construction sounds plausible on paper but could be brittle in practice, and without seeing the actual runs it's impossible to judge whether the token savings come from the hierarchy or from other unstated choices. The free parameter in the navigator reward also raises reproducibility questions.

This is for people working on practical agent memory and context management rather than pure theory. The architecture is grounded enough and the problem is real enough that it deserves a serious referee to examine the full experiments and code.

Referee Report

3 major / 1 minor

Summary. The paper introduces HORMA, a Hierarchical Organize-and-Retrieve Memory Agent for LLM-based agents tackling long-horizon tasks. It decomposes working memory into a construction module that builds a file-system-like hierarchy of summarized entities linked to raw trajectories, iteratively refined by distinguishing missing-information failures from misleading/overloaded-context failures, and a navigation module that uses an RL-trained lightweight agent to traverse the hierarchy and retrieve minimal sufficient context. Experiments on ALFWorld, LoCoMo, and LongMemEval are reported to show improved task performance under constrained context budgets, with token usage at most 22.17% of baseline in long conversations, better efficiency-performance trade-offs than existing methods, and effective generalization to unseen tasks.

Significance. If the empirical results hold under rigorous validation, HORMA offers a concrete advance over lossy compression or similarity-based retrieval by preserving temporal and causal structure while enabling efficient access. The hierarchical organization plus RL navigation could reduce inference costs and latency for agentic systems, with the reported token reduction and generalization claims representing potentially high-impact outcomes if reproducible.

major comments (3)

[Abstract] Abstract: The central claims of performance gains, 22.17% token usage, and superior efficiency-performance trade-offs are stated without any reference to experimental details such as baselines, number of trials, statistical tests, or ablation studies. This absence prevents assessment of whether the data support the claims.
[Construction module] Construction module description: The iterative refinement process relies on accurately distinguishing failures due to missing information versus misleading/overloaded context, yet no concrete mechanism (heuristic, prompt template, or learned classifier) is specified for making this distinction, which is load-bearing for the claimed structure quality.
[Navigation module] Navigation module: The RL training for the navigator is described as selecting minimal yet sufficient context, but the reward function, state representation, and training details are not provided, leaving the reported efficiency gains dependent on an unspecified optimization process.

minor comments (1)

[Abstract] The abstract mentions generalization to unseen tasks but does not specify how the train/test split was constructed or what constitutes an unseen task.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. The comments highlight areas where additional detail will improve clarity and reproducibility. We address each major comment below and will incorporate the necessary revisions.

read point-by-point responses

Referee: [Abstract] Abstract: The central claims of performance gains, 22.17% token usage, and superior efficiency-performance trade-offs are stated without any reference to experimental details such as baselines, number of trials, statistical tests, or ablation studies. This absence prevents assessment of whether the data support the claims.

Authors: We agree that the abstract would benefit from explicit references to the experimental setup. In the revised version we will add a concise clause noting the baselines (standard full-context agents, similarity-based retrieval, and compression baselines), that results are reported as averages over multiple trials, and that ablation studies together with statistical comparisons appear in the experimental section. This will allow readers to assess the claims without expanding the abstract beyond its length limit. revision: yes
Referee: [Construction module] Construction module description: The iterative refinement process relies on accurately distinguishing failures due to missing information versus misleading/overloaded context, yet no concrete mechanism (heuristic, prompt template, or learned classifier) is specified for making this distinction, which is load-bearing for the claimed structure quality.

Authors: The referee is correct that the current manuscript does not specify the concrete mechanism used to distinguish the two failure modes. We will revise the construction-module section to include the exact decision procedure: a lightweight heuristic that checks for entity coverage in the generated summary against the raw trajectory, followed by an LLM prompt that classifies residual failures as missing-information versus overloaded-context. The full prompt template and pseudocode will be added to the appendix. revision: yes
Referee: [Navigation module] Navigation module: The RL training for the navigator is described as selecting minimal yet sufficient context, but the reward function, state representation, and training details are not provided, leaving the reported efficiency gains dependent on an unspecified optimization process.

Authors: We acknowledge that the navigation-module description omits the RL implementation details required for reproducibility. In the revision we will add a subsection that defines the state (current hierarchy node embedding plus task embedding), the reward function (task-success indicator minus normalized token cost plus a sufficiency bonus derived from downstream agent performance), and the training procedure (PPO with the specific hyperparameters and episode budget). These additions will make the source of the efficiency gains explicit. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper describes an empirical agent architecture (HORMA) with a construction module for hierarchical structuring and an RL-trained navigation module for retrieval. No equations, derivations, or self-citations appear in the provided abstract or description that reduce performance claims (e.g., 22.17% token usage or benchmark gains) to quantities defined by the same fitted parameters or inputs. The method is presented as an independent pipeline evaluated on external benchmarks (ALFWorld, LoCoMo, LongMemEval), with no load-bearing steps matching the enumerated circularity patterns. This is the expected non-finding for a systems/empirical paper without mathematical self-reference.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Abstract-only review; ledger reflects mechanisms stated at high level. The RL navigator and iterative refinement step imply reward design choices and failure-classification criteria that function as free parameters or domain assumptions.

free parameters (1)

RL reward function for navigator
Lightweight agent is trained with reinforcement learning to select minimal yet sufficient context; reward shaping parameters are not specified.

axioms (1)

domain assumption Hierarchical structure can capture temporal structure and causal dependencies required for multi-step tasks
Invoked to justify why file-system-like organization enables efficient access without loss of detail.

pith-pipeline@v0.9.1-grok · 5792 in / 1359 out tokens · 34955 ms · 2026-06-27T10:05:00.973297+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

79 extracted references · 1 canonical work pages

[1]

Why does the effective context length of LLMs fall short? InProceedings of the International Conference on Learning Representations, 2025

Chenxin An, Jun Zhang, Ming Zhong, Lei Li, Shansan Gong, Yao Luo, Jingjing Xu, and Lingpeng Kong. Why does the effective context length of LLMs fall short? InProceedings of the International Conference on Learning Representations, 2025

2025
[2]

The Claude 3 model family: Opus, Sonnet, Haiku

Anthropic. The Claude 3 model family: Opus, Sonnet, Haiku. https://www.anthropic. com/news/claude-3-family, 2024. Accessed: 2026-05-12

2024
[3]

Mem0: Building production-ready AI agents with scalable long-term memory

Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Yadav. Mem0: Building production-ready AI agents with scalable long-term memory. InProceedings of the European Conference on Artificial Intelligence (ECAI), 2025

2025
[4]

Pan, Yuxin Jiang, and Kam-Fai Wong

Yiming Du, Baojun Wang, Yifan Xiang, Zhaowei Wang, Wenyu Huang, Boyang Xue, Bin Liang, Xingshan Zeng, Fei Mi, Haoli Bai, Lifeng Shang, Jeff Z. Pan, Yuxin Jiang, and Kam-Fai Wong. Memory-t1: Reinforcement learning for temporal reasoning in multi-session agents. In International Conference on Learning Representations, 2026

2026
[5]

Deepseek-r1 incentivizes reasoning in LLMs through reinforcement learning.Nature, 645:633–638, 2025

Daya Guo, Dejian Yang, Haowei Zhang, et al. Deepseek-r1 incentivizes reasoning in LLMs through reinforcement learning.Nature, 645:633–638, 2025

2025
[6]

Hiagent: Hierarchical working memory management for solving long-horizon agent tasks with large lan- guage model

Mengkang Hu, Tianxing Chen, Qiguang Chen, Yao Mu, Wenqi Shao, and Ping Luo. Hiagent: Hierarchical working memory management for solving long-horizon agent tasks with large lan- guage model. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics, pages 32779–32798, July 2025

2025
[7]

Rethinking memory mechanisms of foundation agents in the second half: A survey.arXiv preprint arXiv:2602.06052, 2026

Wei-Chieh Huang, Weizhi Zhang, Yueqing Liang, Yuanchen Bei, Yankai Chen, et al. Rethinking memory mechanisms of foundation agents in the second half: A survey.arXiv preprint arXiv:2602.06052, 2026

arXiv 2026
[8]

LLMLingua: Com- pressing prompts for accelerated inference of large language models

Huiqiang Jiang, Qianhui Wu, Chin-Yew Lin, Yuqing Yang, and Lili Qiu. LLMLingua: Com- pressing prompts for accelerated inference of large language models. In Houda Bouamor, Juan 10 Pino, and Kalika Bali, editors,Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 13358–13376, Singapore, December 2023. Association f...

2023
[9]

LongLLMLingua: Accelerating and enhancing LLMs in long context scenarios via prompt compression

Huiqiang Jiang, Qianhui Wu, Xufang Luo, Dongsheng Li, Chin-Yew Lin, Yuqing Yang, and Lili Qiu. LongLLMLingua: Accelerating and enhancing LLMs in long context scenarios via prompt compression. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors,Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Pap...

2024
[10]

Chen, Xu Yuan, Ye Jia, Jiancheng Tu, Chen Li, Peter H

Zhuohang Jiang, Pangjing Wu, Ziran Liang, Peter Q. Chen, Xu Yuan, Ye Jia, Jiancheng Tu, Chen Li, Peter H. F. Ng, and Qing Li. Hibench: Benchmarking LLMs capability on hierarchical structure reasoning. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’25). ACM, 2025

2025
[11]

Search-r1: Training LLMs to reason and leverage search engines with reinforcement learning

Bowen Jin, Hansi Zeng, Zhenrui Yue, Jinsung Yoon, Sercan O Arik, Dong Wang, Hamed Zamani, and Jiawei Han. Search-r1: Training LLMs to reason and leverage search engines with reinforcement learning. InSecond Conference on Language Modeling, 2025

2025
[12]

Disentangling memory and reasoning ability in large language models

Mingyu Jin, Weidi Luo, Sitao Cheng, Xinyi Wang, Wenyue Hua, Ruixiang Tang, William Yang Wang, and Yongfeng Zhang. Disentangling memory and reasoning ability in large language models. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar, editors,Proceedings of the 63rd Annual Meeting of the Association for Computational Linguis- t...

2025
[13]

Memory os of AI agent

Jiazheng Kang, Mingming Ji, Zhe Zhao, and Ting Bai. Memory os of AI agent. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2025

2025
[14]

Inan, Lukas Wutschitz, Yanzhi Chen, Robert Sim, and Saravan Rajmohan

Minki Kang, Wei-Ning Chen, Dongge Han, Huseyin A. Inan, Lukas Wutschitz, Yanzhi Chen, Robert Sim, and Saravan Rajmohan. Acon: Optimizing context compression for long-horizon LLM agents.arXiv preprint arXiv:2510.00615, 2025

Pith/arXiv arXiv 2025
[15]

Dense passage retrieval for open-domain question answering

Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. Dense passage retrieval for open-domain question answering. In Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu, editors,Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6769–6781, Online,...

2020
[16]

LLMs get lost in multi- turn conversation

Philippe Laban, Hiroaki Hayashi, Yingbo Zhou, and Jennifer Neville. LLMs get lost in multi- turn conversation. InThe Fourteenth International Conference on Learning Representations, 2026

2026
[17]

McClel- land

Andrew Kyle Lampinen, Martin Engelcke, Yuxuan Li, Arslan Chaudhry, and James L. McClel- land. Latent learning: Episodic memory complements parametric learning by enabling flexible reuse of experiences.arXiv preprint arXiv:2509.16189, 2025

arXiv 2025
[18]

Compressing context to enhance inference efficiency of large language models

Yucheng Li, Bo Dong, Chenghua Lin, and Frank Guerin. Compressing context to enhance inference efficiency of large language models. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023

2023
[19]

Beyond semantic similarity: Rethinking retrieval for agentic search via direct corpus interaction.arXiv preprint arXiv:2605.05242, 2026

Zhuofeng Li, Haoxiang Zhang, Cong Wei, Pan Lu, Ping Nie, Yi Lu, Yuyang Bai, Shangbin Feng, Hangxiao Zhu, Ming Zhong, Yuyu Zhang, Jianwen Xie, Yejin Choi, James Zou, Jiawei Han, Wenhu Chen, Jimmy Lin, Dongfu Jiang, and Yu Zhang. Beyond semantic similarity: Rethinking retrieval for agentic search via direct corpus interaction.arXiv preprint arXiv:2605.05242, 2026

Pith/arXiv arXiv 2026
[20]

Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang

Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. Lost in the middle: How language models use long contexts.Transactions of the Association for Computational Linguistics, 12:157–173, 2024. 11

2024
[21]

Scaling LLM multi-turn RL with end-to-end summarization-based context management.arXiv preprint arXiv:2510.06727, 2025

Miao Lu, Weiwei Sun, Weihua Du, Zhan Ling, Xuesong Yao, Kang Liu, and Jiecao Chen. Scaling LLM multi-turn RL with end-to-end summarization-based context management.arXiv preprint arXiv:2510.06727, 2025

arXiv 2025
[22]

Evaluating very long-term conversational memory of LLM agents

Adyasha Maharana, Dong-Ho Lee, Sergey Tulyakov, Mohit Bansal, Francesco Barbieri, and Yuwei Fang. Evaluating very long-term conversational memory of LLM agents. InAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

2024
[23]

Skill-pro: Learn- ing reusable skills from experience via non-parametric ppo for LLM agents.arXiv preprint arXiv:2602.01869, 2026

Qirui Mi, Zhijian Ma, Mengyue Yang, Haoxuan Li, Yisen Wang, et al. Skill-pro: Learn- ing reusable skills from experience via non-parametric ppo for LLM agents.arXiv preprint arXiv:2602.01869, 2026

Pith/arXiv arXiv 2026
[24]

Patil, Kevin Lin, Sarah Wooders, and Joseph E

Charles Packer, Vivian Fang, Shishir G. Patil, Kevin Lin, Sarah Wooders, and Joseph E. Gonzalez. Memgpt: Towards LLMs as operating systems.arXiv preprint arXiv:2310.08560, 2023

Pith/arXiv arXiv 2023
[25]

Vicky Zhao, Lili Qiu, and Jianfeng Gao

Zhuoshi Pan, Qianhui Wu, Huiqiang Jiang, Xufang Luo, Hao Cheng, Dongsheng Li, Yuqing Yang, Chin-Yew Lin, H. Vicky Zhao, Lili Qiu, and Jianfeng Gao. Secom: On memory construction and retrieval for personalized conversational agents. InThe Thirteenth International Conference on Learning Representations, 2025

2025
[26]

Direct preference optimization: Your language model is secretly a reward model

Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D Manning, Stefano Ermon, and Chelsea Finn. Direct preference optimization: Your language model is secretly a reward model. InThirty-seventh Conference on Neural Information Processing Systems, 2023

2023
[27]

On context utilization in summariza- tion with large language models

Mathieu Ravaut, Aixin Sun, Nancy Chen, and Shafiq Joty. On context utilization in summariza- tion with large language models. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors, Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2764–2781, Bangkok, Thailand, August 2024. Associatio...

2024
[28]

MemInsight: Autonomous memory augmentation for LLM agents

Rana Salama, Jason Cai, Michelle Yuan, Anna Currey, Monica Sunkara, Yi Zhang, and Yassine Benajiba. MemInsight: Autonomous memory augmentation for LLM agents. In Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng, editors,Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 33136– 33152,...

2025
[29]

Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017

Pith/arXiv arXiv 2017
[30]

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y . K. Li, Y . Wu, and Daya Guo. Deepseekmath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300, 2024

Pith/arXiv arXiv 2024
[31]

Alfworld: Aligning text and embodied environments for interactive learning.International Conference on Learning Representation, 2021

Mohit Shridhar, Xingdi Yuan, Marc-Alexandre Côté, Yonatan Bisk, Adam Trischler, and Matthew Hausknecht. Alfworld: Aligning text and embodied environments for interactive learning.International Conference on Learning Representation, 2021

2021
[32]

Content-based file classification and organization system using LLMs.Electronics, 15(7):1524, 2026

Wonbin Son and Hyungjoon Kim. Content-based file classification and organization system using LLMs.Electronics, 15(7):1524, 2026

2026
[33]

Beyond heuristics: A decision-theoretic framework for agent memory management.arXiv preprint arXiv:2512.21567, 2025

Changzhi Sun, Xiangyu Chen, Jixiang Luo, Dell Zhang, and Xuelong Li. Beyond heuristics: A decision-theoretic framework for agent memory management.arXiv preprint arXiv:2512.21567, 2025

arXiv 2025
[34]

Scaling long-horizon LLM agent via context-folding.arXiv preprint arXiv:2510.11967, 2025

Weiwei Sun, Miao Lu, Zhan Ling, Kang Liu, Xuesong Yao, Yiming Yang, and Jiecao Chen. Scaling long-horizon LLM agent via context-folding.arXiv preprint arXiv:2510.11967, 2025

arXiv 2025
[35]

Hindsight credit assignment for long-horizon LLM agents.arXiv preprint arXiv:2603.08754, 2026

Hui-Ze Tan, Xiao-Wen Yang, Hao Chen, Jie-Jing Shao, Yi Wen, Yuteng Shen, Weihong Luo, Xiku Du, Lan-Zhe Guo, and Yu-Feng Li. Hindsight credit assignment for long-horizon LLM agents.arXiv preprint arXiv:2603.08754, 2026. 12

arXiv 2026
[36]

Rein- forcement learning for self-improving agent with skill library.arXiv preprint arXiv:2512.17102, 2025

Jiongxiao Wang, Qiaojing Yan, Yawei Wang, Yijun Tian, Soumya Smruti Mishra, et al. Rein- forcement learning for self-improving agent with skill library.arXiv preprint arXiv:2512.17102, 2025

Pith/arXiv arXiv 2025
[37]

Recursively summarizing enables long-term dialogue memory in large language models.Neurocomputing, 639, 2025

Qingyue Wang, Yanhe Fu, Yanan Cao, Shuai Wang, Zhiliang Tian, and Liang Ding. Recursively summarizing enables long-term dialogue memory in large language models.Neurocomputing, 639, 2025

2025
[38]

A subgoal- driven framework for improving long-horizon LLM agents.arXiv preprint arXiv:2603.19685, 2026

Taiyi Wang, Sian Gooding, Florian Hartmann, Oriana Riva, and Edward Grefenstette. A subgoal- driven framework for improving long-horizon LLM agents.arXiv preprint arXiv:2603.19685, 2026

arXiv 2026
[39]

Mem-α: Learning memory construction via reinforcement learning.arXiv preprint arXiv:2509.25911, 2025

Yu Wang, Ryuichi Takanobu, Zhiqi Liang, Yuzhen Mao, Yuanzhe Hu, et al. Mem-α: Learning memory construction via reinforcement learning.arXiv preprint arXiv:2509.25911, 2025

Pith/arXiv arXiv 2025
[40]

Chi, Quoc V

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V . Le, and Denny Zhou. Chain-of-thought prompting elicits reasoning in large language models. InAdvances in Neural Information Processing Systems, volume 35, pages 24824–24837, 2022

2022
[41]

Long- memeval: Benchmarking chat assistants on long-term interactive memory

Di Wu, Hongwei Wang, Wenhao Yu, Yuwei Zhang, Kai-Wei Chang, and Dong Yu. Long- memeval: Benchmarking chat assistants on long-term interactive memory. InInternational Conference on Learning Representations, 2025

2025
[42]

Resum: Unlocking long-horizon search intelligence via context summarization.arXiv preprint arXiv:2509.13313, 2025

Xixi Wu, Kuan Li, Yida Zhao, Liwen Zhang, Litu Ou, Huifeng Yin, Zhongwang Zhang, Xinmiao Yu, Dingchu Zhang, Yong Jiang, Pengjun Xie, Fei Huang, Minhao Cheng, Shuai Wang, Hong Cheng, and Jingren Zhou. Resum: Unlocking long-horizon search intelligence via context summarization.arXiv preprint arXiv:2509.13313, 2025

arXiv 2025
[43]

Memory in the LLM era: Modular architectures and strategies in a unified framework.arXiv preprint arXiv:2604.01707, 2026

Yanchen Wu, Tenghui Lin, Yingli Zhou, Fangyuan Zhang, Qintian Guo, Xun Zhou, Sibo Wang, Xilin Liu, Yuchi Ma, and Yixiang Fang. Memory in the LLM era: Modular architectures and strategies in a unified framework.arXiv preprint arXiv:2604.01707, 2026

Pith/arXiv arXiv 2026
[44]

SkillRL: Evolving agents via recursive skill-augmented reinforcement learning.arXiv preprint arXiv:2602.08234, 2026

Peng Xia, Jianwen Chen, Hanyang Wang, Jiaqi Liu, Kaide Zeng, Yu Wang, Siwei Han, Yiyang Zhou, Xujiang Zhao, Haifeng Chen, et al. SkillRL: Evolving agents via recursive skill-augmented reinforcement learning.arXiv preprint arXiv:2602.08234, 2026

Pith/arXiv arXiv 2026
[45]

Structmem: Structured memory for long-horizon behavior in LLMs.arXiv preprint arXiv:2604.21748, 2026

Buqiang Xu, Yijun Chen, Jizhan Fang, Ruobin Zhong, Yunzhi Yao, Yuqi Zhu, Lun Du, and Shumin Deng. Structmem: Structured memory for long-horizon behavior in LLMs.arXiv preprint arXiv:2604.21748, 2026

Pith/arXiv arXiv 2026
[46]

A-mem: Agentic memory for LLM agents

Wujiang Xu, Zujie Liang, Kai Mei, Hang Gao, Juntao Tan, and Yongfeng Zhang. A-mem: Agentic memory for LLM agents. InAdvances in Neural Information Processing Systems, 2025

2025
[47]

Pan, Hinrich Schütze, V olker Tresp, and Yunpu Ma

Sikuan Yan, Xiufeng Yang, Zuchao Huang, Ercong Nie, Zifeng Ding, Zonggen Li, Xiaowen Ma, Jinhe Bi, Kristian Kersting, Jeff Z. Pan, Hinrich Schütze, V olker Tresp, and Yunpu Ma. Memory-r1: Enhancing large language model agents to manage and utilize memories via reinforcement learning. InAnnual Meeting of the Association for Computational Linguistics (ACL), 2026

2026
[48]

Beyond static summarization: Proactive memory extraction for LLM agents.arXiv preprint arXiv:2601.04463, 2026

Chengyuan Yang, Zequn Sun, Wei Wei, and Wei Hu. Beyond static summarization: Proactive memory extraction for LLM agents.arXiv preprint arXiv:2601.04463, 2026

arXiv 2026
[49]

React: Synergizing reasoning and acting in language models.International Conference on Learning Representation, 2023

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models.International Conference on Learning Representation, 2023

2023
[50]

Meta context engineer- ing via agentic skill evolution.arXiv preprint arXiv:2601.21557, 2026

Haoran Ye, Xuning He, Vincent Arak, Haonan Dong, and Guojie Song. Meta context engineer- ing via agentic skill evolution.arXiv preprint arXiv:2601.21557, 2026

arXiv 2026
[51]

AgentFold: Long-horizon web agents with proactive context management.arXiv preprint arXiv:2510.24699, 2025

Rui Ye, Zhongwang Zhang, Kuan Li, Huifeng Yin, Zhengwei Tao, Yida Zhao, Liangcai Su, Liwen Zhang, Zile Qiao, Xinyu Wang, Xixi Wu, Xinmiao Yu, Yong Jiang, Dingchu Zhang, Hong Cheng, and Jingren Zhou. Agentfold: Long-horizon web agents with proactive context management.arXiv preprint arXiv:2510.24699, 2025. doi: 10.48550/arXiv.2510.24699. 13

work page doi:10.48550/arxiv.2510.24699 2025
[52]

Compact: Compressing retrieved documents actively for question answering

Chanwoong Yoon, Taewhoo Lee, Hyeon Hwang, Minbyul Jeong, and Jaewoo Kang. Compact: Compressing retrieved documents actively for question answering. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024

2024
[53]

Memagent: Reshaping long-context LLM with multi-conv RL-based memory agent

Hongli Yu, Tinghong Chen, Jiangtao Feng, Jiangjie Chen, Weinan Dai, Qiying Yu, Ya-Qin Zhang, Wei-Ying Ma, Jingjing Liu, Mingxuan Wang, and Hao Zhou. Memagent: Reshaping long-context LLM with multi-conv RL-based memory agent. InThe Fourteenth International Conference on Learning Representations, 2026

2026
[54]

Polyskill: Learning generalizable skills through polymorphic abstraction for continual learning

Simon Yu, Gang Li, Weiyan Shi, and Peng Qi. Polyskill: Learning generalizable skills through polymorphic abstraction for continual learning. InThe Fourteenth International Conference on Learning Representations, 2026

2026
[55]

Memsearcher: Training LLMs to reason, search and manage memory via end-to-end reinforcement learning.arXiv preprint arXiv:2511.02805, 2025

Qianhao Yuan, Jie Lou, Zichao Li, Jiawei Chen, Yaojie Lu, Hongyu Lin, Le Sun, Debing Zhang, and Xianpei Han. Memsearcher: Training LLMs to reason, search and manage memory via end-to-end reinforcement learning.arXiv preprint arXiv:2511.02805, 2025

Pith/arXiv arXiv 2025
[56]

Optimizing generative ai by backpropagating language model feedback.Nature, 639(8055):609–616, 2025

Mert Yuksekgonul, Federico Bianchi, Joseph Boen, Sheng Liu, Pan Lu, Zhi Huang, Carlos Guestrin, and James Zou. Optimizing generative ai by backpropagating language model feedback.Nature, 639(8055):609–616, 2025

2025
[57]

On the structural memory of LLM agents.arXiv preprint arXiv:2412.15266, 2024

Ruihong Zeng, Jinyuan Fang, Siwei Liu, and Zaiqiao Meng. On the structural memory of LLM agents.arXiv preprint arXiv:2412.15266, 2024

arXiv 2024
[58]

Large language models are semi-parametric reinforcement learning agents

Danyang Zhang, Lu Chen, Situo Zhang, Hongshen Xu, Zihan Zhao, and Kai Yu. Large language models are semi-parametric reinforcement learning agents. InThirty-seventh Conference on Neural Information Processing Systems, 2023

2023
[59]

Memskill: Learning and evolving memory skills for self-evolving agents.arXiv preprint arXiv:2602.02474, 2026

Haozhen Zhang, Quanyu Long, Jianzhu Bao, Tao Feng, Weizhi Zhang, Haodong Yue, and Wenya Wang. Memskill: Learning and evolving memory skills for self-evolving agents.arXiv preprint arXiv:2602.02474, 2026

Pith/arXiv arXiv 2026
[60]

MemRL: Self-evolving agents via runtime reinforcement learning on episodic memory.arXiv preprint arXiv:2601.03192, 2026

Shengtao Zhang, Jiaqian Wang, Ruiwen Zhou, Junwei Liao, Yuchen Feng, et al. MemRL: Self-evolving agents via runtime reinforcement learning on episodic memory.arXiv preprint arXiv:2601.03192, 2026

Pith/arXiv arXiv 2026
[61]

Mem- ory as action: Autonomous context curation for long-horizon agentic tasks.arXiv preprint arXiv:2510.12635, 2025

Yuxiang Zhang, Jiangming Shu, Ye Ma, Xueyuan Lin, Shangxi Wu, and Jitao Sang. Mem- ory as action: Autonomous context curation for long-horizon agentic tasks.arXiv preprint arXiv:2510.12635, 2025

Pith/arXiv arXiv 2025
[62]

Learn to memorize: Optimizing LLM-based agents with adaptive memory framework.arXiv preprint arXiv:2508.16629, 2025

Zeyu Zhang, Quanyu Dai, Rui Li, Xiaohe Bo, Xu Chen, and Zhenhua Dong. Learn to memorize: Optimizing LLM-based agents with adaptive memory framework.arXiv preprint arXiv:2508.16629, 2025

arXiv 2025
[63]

Memorybank: Enhancing large language models with long-term memory

Wanjun Zhong, Lianghong Guo, Qiqi Gao, He Ye, and Yanlin Wang. Memorybank: Enhancing large language models with long-term memory. InProceedings of the AAAI Conference on Artificial Intelligence, pages 19724–19731, 2024

2024
[64]

Memento: Fine-tuning LLM agents without fine-tuning LLMs.arXiv preprint arXiv:2508.16153, 2025

Huichi Zhou, Yihang Chen, Siyuan Guo, Xue Yan, Kin Hei Lee, Zihan Wang, Ka Yiu Lee, Guchun Zhang, Kun Shao, Linyi Yang, and Jun Wang. Memento: Fine-tuning LLM agents without fine-tuning LLMs.arXiv preprint arXiv:2508.16153, 2025

arXiv 2025
[65]

Mem1: Learning to synergize memory and reasoning for efficient long-horizon agents.arXiv preprint arXiv:2506.15841, 2025

Zijian Zhou, Ao Qu, Zhaoxuan Wu, Sunghwan Kim, Alok Prakash, Daniela Rus, Jinhua Zhao, Bryan Kian Hsiang Low, and Paul Pu Liang. Mem1: Learning to synergize memory and reasoning for efficient long-horizon agents.arXiv preprint arXiv:2506.15841, 2025

Pith/arXiv arXiv 2025
[66]

Linearrag: Linear graph retrieval augmented generation on large-scale corpora

Luyao Zhuang, Shengyuan Chen, Yilin Xiao, Huachi Zhou, Yujing Zhang, Hao Chen, Qinggang Zhang, and Xiao Huang. Linearrag: Linear graph retrieval augmented generation on large-scale corpora. InThe Fourteenth International Conference on Learning Representations, 2026. 14 A Dataset and Experiment Setup Table 4 summarizes the benchmarks and dataset splits use...

2026
[67]

take (object) from (receptacle)
[68]

move (object) to (receptacle)
[69]

toggle (object) (receptacle)
[70]

clean (object) with (receptacle)
[71]

heat (object) with (receptacle)
[72]

cool (object) with (receptacle)
[73]

think: (your thought) , where (object) refers to manipulable objects and (receptacle) refers to receptacles or locations in the environment
[74]

If the environment output: Nothing happens, that means the previous action is invalid and you should try more options
[75]

Before taking a new object, make sure you have placed down any object you are currently holding

You can only hold one object at a time. Before taking a new object, make sure you have placed down any object you are currently holding
[76]

You should not assume or anticipate the feedback
[77]

Even if you have planned multiple steps ahead, you should only execute one action at a time, which aligns with subgoal
[78]

Do not proceed with any further exploration or actions until you receive the feedback from the environment after your action
[79]

better planning

Do not keep thinking. Your response should use one of the following formats: <your next action> think: <your thoughts>. Here are two examples. <example1> <example2> Here is the task <task>. 21 D.2: LoCoMo Answer Prompt Based on the following conversation, answer the question with a short, precise answer. Use the date/time information derived or concluded ...

2023

[1] [1]

Why does the effective context length of LLMs fall short? InProceedings of the International Conference on Learning Representations, 2025

Chenxin An, Jun Zhang, Ming Zhong, Lei Li, Shansan Gong, Yao Luo, Jingjing Xu, and Lingpeng Kong. Why does the effective context length of LLMs fall short? InProceedings of the International Conference on Learning Representations, 2025

2025

[2] [2]

The Claude 3 model family: Opus, Sonnet, Haiku

Anthropic. The Claude 3 model family: Opus, Sonnet, Haiku. https://www.anthropic. com/news/claude-3-family, 2024. Accessed: 2026-05-12

2024

[3] [3]

Mem0: Building production-ready AI agents with scalable long-term memory

Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Yadav. Mem0: Building production-ready AI agents with scalable long-term memory. InProceedings of the European Conference on Artificial Intelligence (ECAI), 2025

2025

[4] [4]

Pan, Yuxin Jiang, and Kam-Fai Wong

Yiming Du, Baojun Wang, Yifan Xiang, Zhaowei Wang, Wenyu Huang, Boyang Xue, Bin Liang, Xingshan Zeng, Fei Mi, Haoli Bai, Lifeng Shang, Jeff Z. Pan, Yuxin Jiang, and Kam-Fai Wong. Memory-t1: Reinforcement learning for temporal reasoning in multi-session agents. In International Conference on Learning Representations, 2026

2026

[5] [5]

Deepseek-r1 incentivizes reasoning in LLMs through reinforcement learning.Nature, 645:633–638, 2025

Daya Guo, Dejian Yang, Haowei Zhang, et al. Deepseek-r1 incentivizes reasoning in LLMs through reinforcement learning.Nature, 645:633–638, 2025

2025

[6] [6]

Hiagent: Hierarchical working memory management for solving long-horizon agent tasks with large lan- guage model

Mengkang Hu, Tianxing Chen, Qiguang Chen, Yao Mu, Wenqi Shao, and Ping Luo. Hiagent: Hierarchical working memory management for solving long-horizon agent tasks with large lan- guage model. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics, pages 32779–32798, July 2025

2025

[7] [7]

Rethinking memory mechanisms of foundation agents in the second half: A survey.arXiv preprint arXiv:2602.06052, 2026

Wei-Chieh Huang, Weizhi Zhang, Yueqing Liang, Yuanchen Bei, Yankai Chen, et al. Rethinking memory mechanisms of foundation agents in the second half: A survey.arXiv preprint arXiv:2602.06052, 2026

arXiv 2026

[8] [8]

LLMLingua: Com- pressing prompts for accelerated inference of large language models

Huiqiang Jiang, Qianhui Wu, Chin-Yew Lin, Yuqing Yang, and Lili Qiu. LLMLingua: Com- pressing prompts for accelerated inference of large language models. In Houda Bouamor, Juan 10 Pino, and Kalika Bali, editors,Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 13358–13376, Singapore, December 2023. Association f...

2023

[9] [9]

LongLLMLingua: Accelerating and enhancing LLMs in long context scenarios via prompt compression

Huiqiang Jiang, Qianhui Wu, Xufang Luo, Dongsheng Li, Chin-Yew Lin, Yuqing Yang, and Lili Qiu. LongLLMLingua: Accelerating and enhancing LLMs in long context scenarios via prompt compression. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors,Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Pap...

2024

[10] [10]

Chen, Xu Yuan, Ye Jia, Jiancheng Tu, Chen Li, Peter H

Zhuohang Jiang, Pangjing Wu, Ziran Liang, Peter Q. Chen, Xu Yuan, Ye Jia, Jiancheng Tu, Chen Li, Peter H. F. Ng, and Qing Li. Hibench: Benchmarking LLMs capability on hierarchical structure reasoning. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’25). ACM, 2025

2025

[11] [11]

Search-r1: Training LLMs to reason and leverage search engines with reinforcement learning

Bowen Jin, Hansi Zeng, Zhenrui Yue, Jinsung Yoon, Sercan O Arik, Dong Wang, Hamed Zamani, and Jiawei Han. Search-r1: Training LLMs to reason and leverage search engines with reinforcement learning. InSecond Conference on Language Modeling, 2025

2025

[12] [12]

Disentangling memory and reasoning ability in large language models

Mingyu Jin, Weidi Luo, Sitao Cheng, Xinyi Wang, Wenyue Hua, Ruixiang Tang, William Yang Wang, and Yongfeng Zhang. Disentangling memory and reasoning ability in large language models. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar, editors,Proceedings of the 63rd Annual Meeting of the Association for Computational Linguis- t...

2025

[13] [13]

Memory os of AI agent

Jiazheng Kang, Mingming Ji, Zhe Zhao, and Ting Bai. Memory os of AI agent. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2025

2025

[14] [14]

Inan, Lukas Wutschitz, Yanzhi Chen, Robert Sim, and Saravan Rajmohan

Minki Kang, Wei-Ning Chen, Dongge Han, Huseyin A. Inan, Lukas Wutschitz, Yanzhi Chen, Robert Sim, and Saravan Rajmohan. Acon: Optimizing context compression for long-horizon LLM agents.arXiv preprint arXiv:2510.00615, 2025

Pith/arXiv arXiv 2025

[15] [15]

Dense passage retrieval for open-domain question answering

Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. Dense passage retrieval for open-domain question answering. In Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu, editors,Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6769–6781, Online,...

2020

[16] [16]

LLMs get lost in multi- turn conversation

Philippe Laban, Hiroaki Hayashi, Yingbo Zhou, and Jennifer Neville. LLMs get lost in multi- turn conversation. InThe Fourteenth International Conference on Learning Representations, 2026

2026

[17] [17]

McClel- land

Andrew Kyle Lampinen, Martin Engelcke, Yuxuan Li, Arslan Chaudhry, and James L. McClel- land. Latent learning: Episodic memory complements parametric learning by enabling flexible reuse of experiences.arXiv preprint arXiv:2509.16189, 2025

arXiv 2025

[18] [18]

Compressing context to enhance inference efficiency of large language models

Yucheng Li, Bo Dong, Chenghua Lin, and Frank Guerin. Compressing context to enhance inference efficiency of large language models. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023

2023

[19] [19]

Beyond semantic similarity: Rethinking retrieval for agentic search via direct corpus interaction.arXiv preprint arXiv:2605.05242, 2026

Zhuofeng Li, Haoxiang Zhang, Cong Wei, Pan Lu, Ping Nie, Yi Lu, Yuyang Bai, Shangbin Feng, Hangxiao Zhu, Ming Zhong, Yuyu Zhang, Jianwen Xie, Yejin Choi, James Zou, Jiawei Han, Wenhu Chen, Jimmy Lin, Dongfu Jiang, and Yu Zhang. Beyond semantic similarity: Rethinking retrieval for agentic search via direct corpus interaction.arXiv preprint arXiv:2605.05242, 2026

Pith/arXiv arXiv 2026

[20] [20]

Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang

Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. Lost in the middle: How language models use long contexts.Transactions of the Association for Computational Linguistics, 12:157–173, 2024. 11

2024

[21] [21]

Scaling LLM multi-turn RL with end-to-end summarization-based context management.arXiv preprint arXiv:2510.06727, 2025

Miao Lu, Weiwei Sun, Weihua Du, Zhan Ling, Xuesong Yao, Kang Liu, and Jiecao Chen. Scaling LLM multi-turn RL with end-to-end summarization-based context management.arXiv preprint arXiv:2510.06727, 2025

arXiv 2025

[22] [22]

Evaluating very long-term conversational memory of LLM agents

Adyasha Maharana, Dong-Ho Lee, Sergey Tulyakov, Mohit Bansal, Francesco Barbieri, and Yuwei Fang. Evaluating very long-term conversational memory of LLM agents. InAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

2024

[23] [23]

Skill-pro: Learn- ing reusable skills from experience via non-parametric ppo for LLM agents.arXiv preprint arXiv:2602.01869, 2026

Qirui Mi, Zhijian Ma, Mengyue Yang, Haoxuan Li, Yisen Wang, et al. Skill-pro: Learn- ing reusable skills from experience via non-parametric ppo for LLM agents.arXiv preprint arXiv:2602.01869, 2026

Pith/arXiv arXiv 2026

[24] [24]

Patil, Kevin Lin, Sarah Wooders, and Joseph E

Charles Packer, Vivian Fang, Shishir G. Patil, Kevin Lin, Sarah Wooders, and Joseph E. Gonzalez. Memgpt: Towards LLMs as operating systems.arXiv preprint arXiv:2310.08560, 2023

Pith/arXiv arXiv 2023

[25] [25]

Vicky Zhao, Lili Qiu, and Jianfeng Gao

Zhuoshi Pan, Qianhui Wu, Huiqiang Jiang, Xufang Luo, Hao Cheng, Dongsheng Li, Yuqing Yang, Chin-Yew Lin, H. Vicky Zhao, Lili Qiu, and Jianfeng Gao. Secom: On memory construction and retrieval for personalized conversational agents. InThe Thirteenth International Conference on Learning Representations, 2025

2025

[26] [26]

Direct preference optimization: Your language model is secretly a reward model

Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D Manning, Stefano Ermon, and Chelsea Finn. Direct preference optimization: Your language model is secretly a reward model. InThirty-seventh Conference on Neural Information Processing Systems, 2023

2023

[27] [27]

On context utilization in summariza- tion with large language models

Mathieu Ravaut, Aixin Sun, Nancy Chen, and Shafiq Joty. On context utilization in summariza- tion with large language models. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors, Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2764–2781, Bangkok, Thailand, August 2024. Associatio...

2024

[28] [28]

MemInsight: Autonomous memory augmentation for LLM agents

Rana Salama, Jason Cai, Michelle Yuan, Anna Currey, Monica Sunkara, Yi Zhang, and Yassine Benajiba. MemInsight: Autonomous memory augmentation for LLM agents. In Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng, editors,Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 33136– 33152,...

2025

[29] [29]

Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017

Pith/arXiv arXiv 2017

[30] [30]

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y . K. Li, Y . Wu, and Daya Guo. Deepseekmath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300, 2024

Pith/arXiv arXiv 2024

[31] [31]

Alfworld: Aligning text and embodied environments for interactive learning.International Conference on Learning Representation, 2021

Mohit Shridhar, Xingdi Yuan, Marc-Alexandre Côté, Yonatan Bisk, Adam Trischler, and Matthew Hausknecht. Alfworld: Aligning text and embodied environments for interactive learning.International Conference on Learning Representation, 2021

2021

[32] [32]

Content-based file classification and organization system using LLMs.Electronics, 15(7):1524, 2026

Wonbin Son and Hyungjoon Kim. Content-based file classification and organization system using LLMs.Electronics, 15(7):1524, 2026

2026

[33] [33]

Beyond heuristics: A decision-theoretic framework for agent memory management.arXiv preprint arXiv:2512.21567, 2025

Changzhi Sun, Xiangyu Chen, Jixiang Luo, Dell Zhang, and Xuelong Li. Beyond heuristics: A decision-theoretic framework for agent memory management.arXiv preprint arXiv:2512.21567, 2025

arXiv 2025

[34] [34]

Scaling long-horizon LLM agent via context-folding.arXiv preprint arXiv:2510.11967, 2025

Weiwei Sun, Miao Lu, Zhan Ling, Kang Liu, Xuesong Yao, Yiming Yang, and Jiecao Chen. Scaling long-horizon LLM agent via context-folding.arXiv preprint arXiv:2510.11967, 2025

arXiv 2025

[35] [35]

Hindsight credit assignment for long-horizon LLM agents.arXiv preprint arXiv:2603.08754, 2026

Hui-Ze Tan, Xiao-Wen Yang, Hao Chen, Jie-Jing Shao, Yi Wen, Yuteng Shen, Weihong Luo, Xiku Du, Lan-Zhe Guo, and Yu-Feng Li. Hindsight credit assignment for long-horizon LLM agents.arXiv preprint arXiv:2603.08754, 2026. 12

arXiv 2026

[36] [36]

Rein- forcement learning for self-improving agent with skill library.arXiv preprint arXiv:2512.17102, 2025

Jiongxiao Wang, Qiaojing Yan, Yawei Wang, Yijun Tian, Soumya Smruti Mishra, et al. Rein- forcement learning for self-improving agent with skill library.arXiv preprint arXiv:2512.17102, 2025

Pith/arXiv arXiv 2025

[37] [37]

Recursively summarizing enables long-term dialogue memory in large language models.Neurocomputing, 639, 2025

Qingyue Wang, Yanhe Fu, Yanan Cao, Shuai Wang, Zhiliang Tian, and Liang Ding. Recursively summarizing enables long-term dialogue memory in large language models.Neurocomputing, 639, 2025

2025

[38] [38]

A subgoal- driven framework for improving long-horizon LLM agents.arXiv preprint arXiv:2603.19685, 2026

Taiyi Wang, Sian Gooding, Florian Hartmann, Oriana Riva, and Edward Grefenstette. A subgoal- driven framework for improving long-horizon LLM agents.arXiv preprint arXiv:2603.19685, 2026

arXiv 2026

[39] [39]

Mem-α: Learning memory construction via reinforcement learning.arXiv preprint arXiv:2509.25911, 2025

Yu Wang, Ryuichi Takanobu, Zhiqi Liang, Yuzhen Mao, Yuanzhe Hu, et al. Mem-α: Learning memory construction via reinforcement learning.arXiv preprint arXiv:2509.25911, 2025

Pith/arXiv arXiv 2025

[40] [40]

Chi, Quoc V

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V . Le, and Denny Zhou. Chain-of-thought prompting elicits reasoning in large language models. InAdvances in Neural Information Processing Systems, volume 35, pages 24824–24837, 2022

2022

[41] [41]

Long- memeval: Benchmarking chat assistants on long-term interactive memory

Di Wu, Hongwei Wang, Wenhao Yu, Yuwei Zhang, Kai-Wei Chang, and Dong Yu. Long- memeval: Benchmarking chat assistants on long-term interactive memory. InInternational Conference on Learning Representations, 2025

2025

[42] [42]

Resum: Unlocking long-horizon search intelligence via context summarization.arXiv preprint arXiv:2509.13313, 2025

Xixi Wu, Kuan Li, Yida Zhao, Liwen Zhang, Litu Ou, Huifeng Yin, Zhongwang Zhang, Xinmiao Yu, Dingchu Zhang, Yong Jiang, Pengjun Xie, Fei Huang, Minhao Cheng, Shuai Wang, Hong Cheng, and Jingren Zhou. Resum: Unlocking long-horizon search intelligence via context summarization.arXiv preprint arXiv:2509.13313, 2025

arXiv 2025

[43] [43]

Memory in the LLM era: Modular architectures and strategies in a unified framework.arXiv preprint arXiv:2604.01707, 2026

Yanchen Wu, Tenghui Lin, Yingli Zhou, Fangyuan Zhang, Qintian Guo, Xun Zhou, Sibo Wang, Xilin Liu, Yuchi Ma, and Yixiang Fang. Memory in the LLM era: Modular architectures and strategies in a unified framework.arXiv preprint arXiv:2604.01707, 2026

Pith/arXiv arXiv 2026

[44] [44]

SkillRL: Evolving agents via recursive skill-augmented reinforcement learning.arXiv preprint arXiv:2602.08234, 2026

Peng Xia, Jianwen Chen, Hanyang Wang, Jiaqi Liu, Kaide Zeng, Yu Wang, Siwei Han, Yiyang Zhou, Xujiang Zhao, Haifeng Chen, et al. SkillRL: Evolving agents via recursive skill-augmented reinforcement learning.arXiv preprint arXiv:2602.08234, 2026

Pith/arXiv arXiv 2026

[45] [45]

Structmem: Structured memory for long-horizon behavior in LLMs.arXiv preprint arXiv:2604.21748, 2026

Buqiang Xu, Yijun Chen, Jizhan Fang, Ruobin Zhong, Yunzhi Yao, Yuqi Zhu, Lun Du, and Shumin Deng. Structmem: Structured memory for long-horizon behavior in LLMs.arXiv preprint arXiv:2604.21748, 2026

Pith/arXiv arXiv 2026

[46] [46]

A-mem: Agentic memory for LLM agents

Wujiang Xu, Zujie Liang, Kai Mei, Hang Gao, Juntao Tan, and Yongfeng Zhang. A-mem: Agentic memory for LLM agents. InAdvances in Neural Information Processing Systems, 2025

2025

[47] [47]

Pan, Hinrich Schütze, V olker Tresp, and Yunpu Ma

Sikuan Yan, Xiufeng Yang, Zuchao Huang, Ercong Nie, Zifeng Ding, Zonggen Li, Xiaowen Ma, Jinhe Bi, Kristian Kersting, Jeff Z. Pan, Hinrich Schütze, V olker Tresp, and Yunpu Ma. Memory-r1: Enhancing large language model agents to manage and utilize memories via reinforcement learning. InAnnual Meeting of the Association for Computational Linguistics (ACL), 2026

2026

[48] [48]

Beyond static summarization: Proactive memory extraction for LLM agents.arXiv preprint arXiv:2601.04463, 2026

Chengyuan Yang, Zequn Sun, Wei Wei, and Wei Hu. Beyond static summarization: Proactive memory extraction for LLM agents.arXiv preprint arXiv:2601.04463, 2026

arXiv 2026

[49] [49]

React: Synergizing reasoning and acting in language models.International Conference on Learning Representation, 2023

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models.International Conference on Learning Representation, 2023

2023

[50] [50]

Meta context engineer- ing via agentic skill evolution.arXiv preprint arXiv:2601.21557, 2026

Haoran Ye, Xuning He, Vincent Arak, Haonan Dong, and Guojie Song. Meta context engineer- ing via agentic skill evolution.arXiv preprint arXiv:2601.21557, 2026

arXiv 2026

[51] [51]

AgentFold: Long-horizon web agents with proactive context management.arXiv preprint arXiv:2510.24699, 2025

Rui Ye, Zhongwang Zhang, Kuan Li, Huifeng Yin, Zhengwei Tao, Yida Zhao, Liangcai Su, Liwen Zhang, Zile Qiao, Xinyu Wang, Xixi Wu, Xinmiao Yu, Yong Jiang, Dingchu Zhang, Hong Cheng, and Jingren Zhou. Agentfold: Long-horizon web agents with proactive context management.arXiv preprint arXiv:2510.24699, 2025. doi: 10.48550/arXiv.2510.24699. 13

work page doi:10.48550/arxiv.2510.24699 2025

[52] [52]

Compact: Compressing retrieved documents actively for question answering

Chanwoong Yoon, Taewhoo Lee, Hyeon Hwang, Minbyul Jeong, and Jaewoo Kang. Compact: Compressing retrieved documents actively for question answering. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024

2024

[53] [53]

Memagent: Reshaping long-context LLM with multi-conv RL-based memory agent

Hongli Yu, Tinghong Chen, Jiangtao Feng, Jiangjie Chen, Weinan Dai, Qiying Yu, Ya-Qin Zhang, Wei-Ying Ma, Jingjing Liu, Mingxuan Wang, and Hao Zhou. Memagent: Reshaping long-context LLM with multi-conv RL-based memory agent. InThe Fourteenth International Conference on Learning Representations, 2026

2026

[54] [54]

Polyskill: Learning generalizable skills through polymorphic abstraction for continual learning

Simon Yu, Gang Li, Weiyan Shi, and Peng Qi. Polyskill: Learning generalizable skills through polymorphic abstraction for continual learning. InThe Fourteenth International Conference on Learning Representations, 2026

2026

[55] [55]

Memsearcher: Training LLMs to reason, search and manage memory via end-to-end reinforcement learning.arXiv preprint arXiv:2511.02805, 2025

Qianhao Yuan, Jie Lou, Zichao Li, Jiawei Chen, Yaojie Lu, Hongyu Lin, Le Sun, Debing Zhang, and Xianpei Han. Memsearcher: Training LLMs to reason, search and manage memory via end-to-end reinforcement learning.arXiv preprint arXiv:2511.02805, 2025

Pith/arXiv arXiv 2025

[56] [56]

Optimizing generative ai by backpropagating language model feedback.Nature, 639(8055):609–616, 2025

Mert Yuksekgonul, Federico Bianchi, Joseph Boen, Sheng Liu, Pan Lu, Zhi Huang, Carlos Guestrin, and James Zou. Optimizing generative ai by backpropagating language model feedback.Nature, 639(8055):609–616, 2025

2025

[57] [57]

On the structural memory of LLM agents.arXiv preprint arXiv:2412.15266, 2024

Ruihong Zeng, Jinyuan Fang, Siwei Liu, and Zaiqiao Meng. On the structural memory of LLM agents.arXiv preprint arXiv:2412.15266, 2024

arXiv 2024

[58] [58]

Large language models are semi-parametric reinforcement learning agents

Danyang Zhang, Lu Chen, Situo Zhang, Hongshen Xu, Zihan Zhao, and Kai Yu. Large language models are semi-parametric reinforcement learning agents. InThirty-seventh Conference on Neural Information Processing Systems, 2023

2023

[59] [59]

Memskill: Learning and evolving memory skills for self-evolving agents.arXiv preprint arXiv:2602.02474, 2026

Haozhen Zhang, Quanyu Long, Jianzhu Bao, Tao Feng, Weizhi Zhang, Haodong Yue, and Wenya Wang. Memskill: Learning and evolving memory skills for self-evolving agents.arXiv preprint arXiv:2602.02474, 2026

Pith/arXiv arXiv 2026

[60] [60]

MemRL: Self-evolving agents via runtime reinforcement learning on episodic memory.arXiv preprint arXiv:2601.03192, 2026

Shengtao Zhang, Jiaqian Wang, Ruiwen Zhou, Junwei Liao, Yuchen Feng, et al. MemRL: Self-evolving agents via runtime reinforcement learning on episodic memory.arXiv preprint arXiv:2601.03192, 2026

Pith/arXiv arXiv 2026

[61] [61]

Mem- ory as action: Autonomous context curation for long-horizon agentic tasks.arXiv preprint arXiv:2510.12635, 2025

Yuxiang Zhang, Jiangming Shu, Ye Ma, Xueyuan Lin, Shangxi Wu, and Jitao Sang. Mem- ory as action: Autonomous context curation for long-horizon agentic tasks.arXiv preprint arXiv:2510.12635, 2025

Pith/arXiv arXiv 2025

[62] [62]

Learn to memorize: Optimizing LLM-based agents with adaptive memory framework.arXiv preprint arXiv:2508.16629, 2025

Zeyu Zhang, Quanyu Dai, Rui Li, Xiaohe Bo, Xu Chen, and Zhenhua Dong. Learn to memorize: Optimizing LLM-based agents with adaptive memory framework.arXiv preprint arXiv:2508.16629, 2025

arXiv 2025

[63] [63]

Memorybank: Enhancing large language models with long-term memory

Wanjun Zhong, Lianghong Guo, Qiqi Gao, He Ye, and Yanlin Wang. Memorybank: Enhancing large language models with long-term memory. InProceedings of the AAAI Conference on Artificial Intelligence, pages 19724–19731, 2024

2024

[64] [64]

Memento: Fine-tuning LLM agents without fine-tuning LLMs.arXiv preprint arXiv:2508.16153, 2025

Huichi Zhou, Yihang Chen, Siyuan Guo, Xue Yan, Kin Hei Lee, Zihan Wang, Ka Yiu Lee, Guchun Zhang, Kun Shao, Linyi Yang, and Jun Wang. Memento: Fine-tuning LLM agents without fine-tuning LLMs.arXiv preprint arXiv:2508.16153, 2025

arXiv 2025

[65] [65]

Mem1: Learning to synergize memory and reasoning for efficient long-horizon agents.arXiv preprint arXiv:2506.15841, 2025

Zijian Zhou, Ao Qu, Zhaoxuan Wu, Sunghwan Kim, Alok Prakash, Daniela Rus, Jinhua Zhao, Bryan Kian Hsiang Low, and Paul Pu Liang. Mem1: Learning to synergize memory and reasoning for efficient long-horizon agents.arXiv preprint arXiv:2506.15841, 2025

Pith/arXiv arXiv 2025

[66] [66]

Linearrag: Linear graph retrieval augmented generation on large-scale corpora

Luyao Zhuang, Shengyuan Chen, Yilin Xiao, Huachi Zhou, Yujing Zhang, Hao Chen, Qinggang Zhang, and Xiao Huang. Linearrag: Linear graph retrieval augmented generation on large-scale corpora. InThe Fourteenth International Conference on Learning Representations, 2026. 14 A Dataset and Experiment Setup Table 4 summarizes the benchmarks and dataset splits use...

2026

[67] [67]

take (object) from (receptacle)

[68] [68]

move (object) to (receptacle)

[69] [69]

toggle (object) (receptacle)

[70] [70]

clean (object) with (receptacle)

[71] [71]

heat (object) with (receptacle)

[72] [72]

cool (object) with (receptacle)

[73] [73]

think: (your thought) , where (object) refers to manipulable objects and (receptacle) refers to receptacles or locations in the environment

[74] [74]

If the environment output: Nothing happens, that means the previous action is invalid and you should try more options

[75] [75]

Before taking a new object, make sure you have placed down any object you are currently holding

You can only hold one object at a time. Before taking a new object, make sure you have placed down any object you are currently holding

[76] [76]

You should not assume or anticipate the feedback

[77] [77]

Even if you have planned multiple steps ahead, you should only execute one action at a time, which aligns with subgoal

[78] [78]

Do not proceed with any further exploration or actions until you receive the feedback from the environment after your action

[79] [79]

better planning

Do not keep thinking. Your response should use one of the following formats: <your next action> think: <your thoughts>. Here are two examples. <example1> <example2> Here is the task <task>. 21 D.2: LoCoMo Answer Prompt Based on the following conversation, answer the question with a short, precise answer. Use the date/time information derived or concluded ...

2023