Recognition: unknown
Ask Only When Needed: Proactive Retrieval from Memory and Skills for Experience-Driven Lifelong Agents
Pith reviewed 2026-05-10 00:05 UTC · model grok-4.3
The pith
Lifelong agents learn an explicit policy for retrieving past experience only when it improves the next decision.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ProactAgent organizes past interactions into factual memory, episodic memory, and behavioral skills, then trains a retrieval policy through Proactive Reinforcement Learning-based Retrieval (ProactRL). ProactRL compares two continuations that start from the identical state: one branch receives retrieved content and the other does not. The difference in eventual task outcome or efficiency supplies the reward that updates the retrieval decision. Combined with Experience-Enhanced Online Evolution that updates both the main policy and the memory store, the framework yields success rates of 73.50 percent on SciWorld and 71.28 percent on AlfWorld while cutting retrieval calls.
What carries the argument
ProactRL, the reinforcement-learning policy that decides both when and what to retrieve by comparing paired branches from the same prefix and using the outcome difference as step-level supervision.
If this is right
- Agents reach higher success rates on SciWorld and AlfWorld while issuing far fewer retrieval requests than passive baselines.
- The same framework produces results competitive with proprietary models on the StuLife benchmark.
- Memory and policy continue to improve together because retrieval decisions feed back into both the experience base and the main behavior.
- Retrieval overhead drops because the policy learns to skip retrieval on steps where past experience adds no value.
Where Pith is reading between the lines
- The paired-branch technique could be applied to decide other costly internal actions, such as calling external tools or planning subgoals.
- If the experience base grows very large, the same reward signal might be used to prune low-value entries rather than only to select among them.
- Environments with noisy or conflicting memories would require an additional consistency check before the retrieval reward is computed.
Load-bearing premise
Comparing continuations from identical prefixes with and without retrieval gives an unbiased signal about whether retrieval is helpful at that exact step.
What would settle it
Run the paired-branch comparison on a held-out set of steps; if the branch that receives retrieval shows no consistent gain in final success or efficiency over the branch that skips retrieval, the supervision signal for the policy is invalid.
Figures
read the original abstract
Online lifelong learning enables agents to accumulate experience across interactions and continually improve on long-horizon tasks. However, existing methods typically treat retrieval from past experience as a passive operation, triggering it only at task initialization or after completing a step. Consequently, agents often fail to identify knowledge gaps during interaction and proactively retrieve the most useful experience for the current decision. To address this limitation, we present ProactAgent, an experience-driven lifelong learning framework for proactive retrieval over a structured experience base. We first introduce Experience-Enhanced Online Evolution (ExpOnEvo), which enables continual improvement through both policy updates and memory refinement. The experience base organizes historical interactions into typed repositories, including factual memory, episodic memory, and behavioral skills, so that retrieval can provide both relevant evidence and actionable guidance. On top of this, we propose Proactive Reinforcement Learning-based Retrieval (ProactRL), which models retrieval as an explicit policy action and learns when and what to retrieve via paired-branch process rewards. By comparing continuations from identical interaction prefixes with and without retrieval, ProactRL provides step-level supervision for retrieval decisions, encouraging retrieval only when it leads to better task outcomes or higher efficiency. Experiments on SciWorld, AlfWorld, and StuLife show that ProactAgent consistently improves lifelong agent performance, achieving success rates of 73.50\% on SciWorld and 71.28\% on AlfWorld while substantially reducing retrieval overhead, and attains performance competitive with proprietary models on StuLife.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to introduce ProactAgent, a framework for experience-driven lifelong agents that performs proactive retrieval from a structured base (factual memory, episodic memory, behavioral skills) rather than passive triggering. It proposes Experience-Enhanced Online Evolution (ExpOnEvo) for joint policy and memory refinement, and Proactive RL-based Retrieval (ProactRL) that treats retrieval as a policy action trained via paired-branch process rewards: continuations from identical interaction prefixes are compared with and without retrieval to supply step-level supervision that encourages retrieval only when it improves outcomes or efficiency. Experiments on SciWorld, AlfWorld, and StuLife report success rates of 73.50% and 71.28% on the first two environments, reduced retrieval overhead, and performance competitive with proprietary models on the third.
Significance. If the results hold after addressing the supervision-signal concerns, the work would offer a concrete mechanism for reducing unnecessary retrieval while improving long-horizon performance, which is a practical advance for memory-augmented agents. The multi-environment evaluation and explicit comparison to proprietary models provide useful empirical grounding; the structured experience base and online evolution component also supply reusable design patterns.
major comments (2)
- [ProactRL / §3] ProactRL description (abstract and §3): the paired-branch comparison that supplies process rewards assumes the without-retrieval continuation is an unbiased counterfactual. The manuscript does not detail prefix selection criteria (e.g., uncertainty thresholds), whether the two branches use identical temperature/stochasticity, or how cached states are avoided. This risks selection bias or reward hacking and directly affects the central claim that the policy learns to 'ask only when needed.'
- [Experiments] Experimental section (results on SciWorld/AlfWorld): success-rate gains are reported without accompanying statistical tests, variance across seeds, or ablation isolating the contribution of ProactRL versus ExpOnEvo alone. Given that the training signal depends on downstream outcomes, these omissions make it difficult to assess whether the reported 73.50% and 71.28% figures are robust or partly attributable to post-hoc tuning.
minor comments (2)
- [Abstract] The abstract states 'substantially reducing retrieval overhead' but does not quantify the reduction (e.g., average retrievals per episode or percentage decrease); adding a concrete metric would strengthen the efficiency claim.
- [Experience base] Notation for the three memory types (factual, episodic, behavioral skills) is introduced without a compact table or diagram showing their retrieval interfaces; a small summary table would improve readability.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below with clarifications on our design and commitments to strengthen the manuscript.
read point-by-point responses
-
Referee: [ProactRL / §3] ProactRL description (abstract and §3): the paired-branch comparison that supplies process rewards assumes the without-retrieval continuation is an unbiased counterfactual. The manuscript does not detail prefix selection criteria (e.g., uncertainty thresholds), whether the two branches use identical temperature/stochasticity, or how cached states are avoided. This risks selection bias or reward hacking and directly affects the central claim that the policy learns to 'ask only when needed.'
Authors: We appreciate the referee's careful reading of the ProactRL mechanism. The paired-branch process is designed to provide direct step-level supervision by comparing outcomes from identical prefixes. To address potential bias, prefix selection is performed based on the agent's internal uncertainty estimate at each step, both branches are run with matching stochasticity settings (same temperature and seed), and the without-retrieval branch is executed in a reset environment state to prevent any carry-over from caching. These measures aim to make the counterfactual as unbiased as possible. We will revise §3 to explicitly document these implementation choices, including the exact criteria and procedures used, to eliminate ambiguity around selection bias and reward hacking. revision: yes
-
Referee: [Experiments] Experimental section (results on SciWorld/AlfWorld): success-rate gains are reported without accompanying statistical tests, variance across seeds, or ablation isolating the contribution of ProactRL versus ExpOnEvo alone. Given that the training signal depends on downstream outcomes, these omissions make it difficult to assess whether the reported 73.50% and 71.28% figures are robust or partly attributable to post-hoc tuning.
Authors: We acknowledge that the current experimental presentation lacks statistical tests, seed variance, and clear ablations, which limits the assessment of robustness. In the revised version, we will include standard deviations from multiple random seeds and conduct appropriate statistical significance tests (e.g., t-tests) for the reported success rates. Additionally, we will expand the experimental section with dedicated ablations that isolate the effect of ProactRL from ExpOnEvo by comparing the full ProactAgent against a baseline using only ExpOnEvo with passive retrieval. These ablations demonstrate the specific contribution of the proactive retrieval policy. While the reward signal is derived from downstream task outcomes, the paired-branch comparison provides granular, step-wise supervision that reduces reliance on post-hoc adjustments. revision: yes
Circularity Check
No circularity: derivation relies on external task outcomes and benchmark experiments
full rationale
The paper's core claims rest on introducing ExpOnEvo for memory refinement and ProactRL for learning a retrieval policy via paired-branch comparisons that assign rewards from downstream task success rates and efficiency on SciWorld, AlfWorld, and StuLife. These are not self-definitional, as the supervision signal derives from independent environment outcomes rather than re-using fitted parameters or prior self-citations as the sole justification. No equations or sections reduce the reported success rates (73.50% on SciWorld, 71.28% on AlfWorld) to inputs by construction; the method is falsifiable against external benchmarks and does not invoke uniqueness theorems or ansatzes from overlapping prior work. The derivation chain is therefore self-contained.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Retrieval decisions can be supervised by comparing task outcomes from identical prefixes with and without retrieval
- domain assumption Organizing memory into factual, episodic, and skill repositories enables both evidence and actionable guidance
invented entities (2)
-
ProactRL
no independent evidence
-
ExpOnEvo
no independent evidence
Reference graph
Works this paper leans on
-
[1]
ALFWorld: Aligning text and embodied environments for interactive learning
Mohit Shridhar, Xingdi Yuan, Marc-Alexandre Cote, Yonatan Bisk, Adam Trischler, and Matthew Hausknecht. ALFWorld: Aligning text and embodied environments for interactive learning. InInternational Conference on 10 ICALK@ECNU Learning Representations, 2021
2021
-
[2]
ScienceWorld: Is your agent smarter than a 5th grader? InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022
Ruoyao Wang, Peter Jansen, Marc-Alexandre Cote, and Prithviraj Ammanabrolu. ScienceWorld: Is your agent smarter than a 5th grader? InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022
2022
-
[3]
V oyager: An open-ended embodied agent with large language models
Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. V oyager: An open-ended embodied agent with large language models. InAdvances in Neural Information Processing Systems, 2023
2023
-
[4]
15 Yuhong Cao, Jeric Lew, Jingsong Liang, Jin Cheng, and Guillaume Sartoretti
Yuxuan Cai, Yipeng Hao, Jie Zhou, Hang Yan, Zhikai Lei, Rui Zhen, Zhenhua Han, Yutao Yang, Junsong Li, Qianjun Pan, Tianyu Huai, Qin Chen, Xin Li, Kai Chen, Bo Zhang, Xipeng Qiu, and Liang He. Building self-evolving agents via experience-driven lifelong learning: A framework and benchmark.arXiv preprint arXiv:2508.19005, 2025
-
[5]
Chi, Quoc V
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V . Le, and Denny Zhou. Chain-of-thought prompting elicits reasoning in large language models. InAdvances in Neural Information Processing Systems, 2022
2022
-
[6]
Griffiths, Yuan Cao, and Karthik Narasimhan
Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, and Karthik Narasimhan. Tree of thoughts: Deliberate problem solving with large language models. InAdvances in Neural Information Processing Systems, 2023
2023
-
[7]
Toolformer: Language models can teach themselves to use tools
Timo Schick, Jane Dwivedi-Yu, Roberto Dessi, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools. InAdvances in Neural Information Processing Systems, 2023
2023
-
[8]
ReAct: Synergizing reasoning and acting in language models
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. ReAct: Synergizing reasoning and acting in language models. InInternational Conference on Learning Representations, 2023
2023
-
[9]
Self-refine: Iterative refinement with self-feedback
Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, Shashank Gupta, Bodhisattwa Prasad Majumder, Katherine Hermann, Sean Welleck, Amir Yazdanbakhsh, and Peter Clark. Self-refine: Iterative refinement with self-feedback. In Advances in Neural Information Processing Sy...
2023
-
[10]
Reflexion: Language agents with verbal reinforcement learning
Noah Shinn, Federico Cassano, Edward Berman, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. Reflexion: Language agents with verbal reinforcement learning. InAdvances in Neural Information Processing Systems, 2023
2023
-
[11]
O’Brien, Carrie J
Joon Sung Park, Joseph C. O’Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. Generative agents: Interactive simulacra of human behavior. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, 2023
2023
-
[12]
ExpeL: LLM agents are experiential learners.Proceedings of the AAAI Conference on Artificial Intelligence, 38(17):19666–19674, 2024
Andrew Zhao, Daniel Huang, Quentin Xu, Matthieu Lin, Yong-Jin Liu, and Gao Huang. ExpeL: LLM agents are experiential learners.Proceedings of the AAAI Conference on Artificial Intelligence, 38(17):19666–19674, 2024
2024
-
[13]
MemoryBank: Enhancing large language models with long-term memory.Proceedings of the AAAI Conference on Artificial Intelligence, 38(17):19724– 19731, 2024
Wanjun Zhong, Lianghong Guo, Qiqi Gao, He Ye, and Yanlin Wang. MemoryBank: Enhancing large language models with long-term memory.Proceedings of the AAAI Conference on Artificial Intelligence, 38(17):19724– 19731, 2024
2024
-
[14]
URLhttps://arxiv.org/abs/2512.18746
Guibin Zhang, Haotian Ren, Chong Zhan, Zhenhong Zhou, Junhao Wang, He Zhu, Wangchunshu Zhou, and Shuicheng Yan. Memevolve: Meta-evolution of agent memory systems.arXiv preprint arXiv:2512.18746, 2025
-
[15]
Yixuan Tang and Yi Yang. Multihop-rag: Benchmarking retrieval-augmented generation for multi-hop queries. arXiv preprint arXiv:2401.15391, 2024
-
[16]
Reflectiverag: Rethinking adaptivity in retrieval-augmented generation
Akshay Verma, Swapnil Gupta, Siddharth Pillai, Prateek Sircar, and Deepak Gupta. Reflectiverag: Rethinking adaptivity in retrieval-augmented generation. 2026
2026
-
[17]
MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents
Haozhen Zhang, Quanyu Long, Jianzhu Bao, Tao Feng, Weizhi Zhang, Haodong Yue, and Wenya Wang. Memskill: Learning and evolving memory skills for self-evolving agents.arXiv preprint arXiv:2602.02474, 2026
work page internal anchor Pith review arXiv 2026
-
[18]
Alex Graves, Greg Wayne, and Ivo Danihelka. Neural turing machines.arXiv preprint arXiv:1410.5401, 2014
work page internal anchor Pith review arXiv 2014
-
[19]
Memory networks
Jason Weston, Sumit Chopra, and Antoine Bordes. Memory networks. InInternational Conference on Learning Representations, 2015
2015
-
[20]
Hybrid computing using a neural network with dynamic external memory.Nature, 538(7626):471–476, 2016
Alex Graves, Greg Wayne, Malcolm Reynolds, et al. Hybrid computing using a neural network with dynamic external memory.Nature, 538(7626):471–476, 2016. 11 ICALK@ECNU
2016
-
[21]
Retrieval-augmented generation for knowledge-intensive NLP tasks
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Kuttler, Mike Lewis, Wen-tau Yih, Tim Rocktaschel, Sebastian Riedel, and Douwe Kiela. Retrieval-augmented generation for knowledge-intensive NLP tasks. InAdvances in Neural Information Processing Systems, 2020
2020
-
[22]
Rae, Erich Elsen, and Laurent Sifre
Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George van den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, Diego de Las Casas, Aurelia Guy, Jacob Menick, Roman Ring, Tom Hennigan, Saffron Huang, Loren Maggiore, Chris Jones, Albin Cassirer, Andy Brock, Michela Paganini, Geoffrey Irving, Ori...
2022
-
[23]
MemGPT: Towards LLMs as Operating Systems
Charles Packer, Sarah Wooders, Kevin Lin, Vivian Fang, Shishir G. Patil, Ion Stoica, and Joseph E. Gonzalez. MemGPT: Towards LLMs as operating systems.arXiv preprint arXiv:2310.08560, 2023
work page internal anchor Pith review arXiv 2023
-
[24]
Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory
Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Yadav. Mem0: Building production- ready AI agents with scalable long-term memory.arXiv preprint arXiv:2504.19413, 2025
work page internal anchor Pith review arXiv 2025
-
[25]
Xu, Luyu Gao, Zhiqing Sun, Qian Liu, Jane Dwivedi-Yu, Yiming Yang, Jamie Callan, and Graham Neubig
Zhengbao Jiang, Frank F. Xu, Luyu Gao, Zhiqing Sun, Qian Liu, Jane Dwivedi-Yu, Yiming Yang, Jamie Callan, and Graham Neubig. Active retrieval augmented generation. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023
2023
-
[26]
Self-RAG: Learning to retrieve, generate, and critique through self-reflection
Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi. Self-RAG: Learning to retrieve, generate, and critique through self-reflection. InInternational Conference on Learning Representations, 2024
2024
-
[27]
Soyeong Jeong, Jinheon Baek, Sukmin Cho, Sung Ju Hwang, and Jong C. Park. Adaptive-RAG: Learning to adapt retrieval-augmented large language models through question complexity. InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics, 2024
2024
-
[28]
Interleaving retrieval with chain-of-thought reasoning for knowledge-intensive multi-step questions
Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, and Ashish Sabharwal. Interleaving retrieval with chain-of-thought reasoning for knowledge-intensive multi-step questions. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics, 2023
2023
-
[29]
Agentgym: Evaluating and training large language model-based agents across diverse environments
Zhiheng Xi, Yiwen Ding, Wenxiang Chen, Boyang Hong, Honglin Guo, Junzhe Wang, Xin Guo, Dingwen Yang, Chenyang Liao, Wei He, et al. Agentgym: Evaluating and training large language model-based agents across diverse environments. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 27914–27...
2025
-
[30]
TTRL: Test-Time Reinforcement Learning
Yuxin Zuo, Kaiyan Zhang, Shang Qu, Li Sheng, Xuekai Zhu, Biqing Qi, Youbang Sun, Ganqu Cui, Ning Ding, and Bowen Zhou. TTRL: Test-time reinforcement learning.arXiv preprint arXiv:2504.16084, 2025
work page Pith review arXiv 2025
-
[31]
Zhiheng Xi, Jixuan Huang, Chenyang Liao, Baodai Huang, Honglin Guo, Jiaqi Liu, Rui Zheng, Junjie Ye, Jiazheng Zhang, Wenxiang Chen, et al. Agentgym-rl: Training llm agents for long-horizon decision making through multi-turn reinforcement learning.arXiv preprint arXiv:2509.08755, 2025
-
[32]
Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y . K. Li, Y . Wu, and Daya Guo. DeepSeekMath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300, 2024. 12 ICALK@ECNU A Theoretical Analysis and Proofs In this section, we provide a formal analysis of how...
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[33]
The extractor produces at most two entries of each type per trajectory, focusing on environment facts and trajectory-specific plans or constraints
Factual and episodic memoriesare extracted from individual trajectories via summarization. The extractor produces at most two entries of each type per trajectory, focusing on environment facts and trajectory-specific plans or constraints
-
[34]
Each distiller returns one to three JSON-formatted entries that encode reusable strategies (from successes) or corrective rules (from failures)
Success and failure skillsare distilled from outcome-specific trajectory subsets. Each distiller returns one to three JSON-formatted entries that encode reusable strategies (from successes) or corrective rules (from failures)
-
[35]
Paired A/B branches produced by PROACTRL are prioritized because they share the same task prefix and therefore expose the most localized contrastive signal
Comparative skillsare distilled from matched trajectory pairs. Paired A/B branches produced by PROACTRL are prioritized because they share the same task prefix and therefore expose the most localized contrastive signal. When such pairs are unavailable, the extractor falls back to outcome-ranked trajectory pairs from the same task group. The complete promp...
-
[36]
This stage is critical for initializing the policy with sufficient tool-calling competence before reinforcement learning begins (as confirmed by the ablation in Section 4.3)
Cold start.The base policy is trained via supervised learning on successful trajectories to learn the interaction format, valid action syntax, and retrieval-tag conventions. This stage is critical for initializing the policy with sufficient tool-calling competence before reinforcement learning begins (as confirmed by the ablation in Section 4.3)
-
[37]
A portion of these rollouts is configured as no-retrieval trajectories through the retrieval_enabled switch, whose probability is annealed across training phases
Rollout sampling.Multiple rollouts are sampled for each training prompt under the current policy. A portion of these rollouts is configured as no-retrieval trajectories through the retrieval_enabled switch, whose probability is annealed across training phases
-
[38]
Paired-branch construction.When paired branching is active, the system identifies retrieval-trigger steps in retrieval-enabled rollouts, replays the corresponding prefixes, and creates matched no-retrieval branches (Sec- tion D.2)
-
[39]
Reward computation.The environment outcome is combined with the paired-branch process reward and the efficiency bonus to produce the PROACTRL trajectory-level reward (Section 3.3)
-
[40]
Policy update.The policy is updated using GRPO-style group normalization with PPO-style clipped surrogate optimization
-
[41]
Experience base update.The experience base D is updated by extracting factual, episodic, success, failure, and comparative entries from the new trajectories (Appendix C.3). 17 ICALK@ECNU This organization ensures that policy learning and memory growth remain tightly interleaved throughout training, realizing the co-evolution loop described in Section 3.2....
-
[42]
* *Examples:* Info/Preferences, Domain Knowledge, Tool/System Facts
**Factual Memory (Objective Truths):** * *Definition:* Verifiable facts learned during execution. * *Examples:* Info/Preferences, Domain Knowledge, Tool/System Facts
-
[43]
If I take step A, error B occurs,
**Episodic Memory (Experience, Reflection & Temporal Events):** * *Definition:* Insights derived from the flow of events, strategies, errors, OR ** specific real-world time constraints/schedules**. * *Logic Examples:* "If I take step A, error B occurs," "Method X is faster than Y." * *Temporal Examples:* "User has a class at 8 AM on Mondays," "The deadlin...
-
[44]
**Selection:** Extract a MAXIMUM of **2 Factual** and **2 Episodic** memories
-
[45]
when_to_use
**"when_to_use" Strategy:** * For *Factual*: Focus on the **Context Trigger** (e.g., "When using ‘pandas‘..."). * For *Episodic (Logic)*: Focus on the **Situational Trigger** (e.g., "When the dataset is empty..."). * For *Episodic (Time)*: Focus on the **Temporal Trigger** (e.g., "When it is Monday morning," "At 8:00 AM")
-
[46]
factual_memories
**Minimum Output:** Do not output an empty result. You must find the most valuable takeaway, even if minor. # Output Format Output a single valid JSON object strictly following this schema. {{ "factual_memories": [ {{ "when_to_use": "<Precise context trigger>", "memory": "<The objective fact>" }} ], "episodic_memories": [ {{ "when_to_use": "<Precise situa...
-
[47]
when_to_use
**FIELD: "when_to_use" (The Trigger Scope)** - **Definition:** Precisely define the context where this best practice applies. You MUST consider three dimensions: a. **Task Requirement:** What is the user specifically asking for? (e.g., "When the task requires verifying code execution results...") b. **Specific Scenario:** What is the current state of the ...
-
[48]
experience
**FIELD: "experience" (The Solution)** - **Definition:** A strict, actionable standard operating procedure. - **Constraint:** **PURELY FORWARD-LOOKING.** Do NOT explain why the approach was superior. Do NOT include phrases like "The agent succeeded because..." or "It is better to...". - **Structure:** Directly provide the step-by-step instruction on how t...
-
[49]
when_to_use
**FIELD: "when_to_use" (The Trigger Scope)** - **Definition:** Precisely define the context where this memory applies. You MUST consider three dimensions: a. **Task Requirement:** What is the user specifically asking for? (e.g., "When the task requires distinct counting of similar objects...") b. **Specific Scenario:** What is the current state of the age...
-
[50]
experience
**FIELD: "experience" (The Solution)** - **Definition:** A strict, actionable instruction on how to handle this exact situation correctly. - **Constraint:** **PURELY FORWARD-LOOKING.** Do NOT explain why the previous attempt failed. Do NOT include diagnosis or "The agent failed because..." statements. - **Content:** Directly provide the optimized logic or...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.