Recognition: 2 theorem links
· Lean TheoremCASCADE: Case-Based Continual Adaptation for Large Language Models During Deployment
Pith reviewed 2026-05-11 01:13 UTC · model grok-4.3
The pith
LLM agents can learn from experience during deployment by building and querying an explicit episodic memory without changing their parameters.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that formalizing deployment-time learning as a distinct stage after training and fine-tuning, then equipping LLM agents with an explicit evolving episodic memory whose case selection is cast as a contextual bandit problem, produces no-regret guarantees over long interactions, allows agents to accumulate select and refine task-relevant cases, and raises macro-averaged success rates by 20.9 percent over zero-shot prompting while outperforming gradient-based and memory-based baselines on sixteen diverse tasks.
What carries the argument
An explicit evolving episodic memory whose case selection is formulated as a contextual bandit problem to balance exploration and exploitation while accumulating actionable knowledge.
If this is right
- Agents accumulate, select, and refine task-relevant cases from past interactions without parameter changes.
- No-regret guarantees hold for long-term deployment interactions.
- Macro-averaged success rate rises 20.9 percent over zero-shot prompting across sixteen tasks.
- The approach outperforms both gradient-based and other memory-based baselines on medical, legal, coding, search, tool-use, and embodied tasks.
- Deployment is reframed as a continual adaptive learning process rather than a fixed endpoint.
Where Pith is reading between the lines
- Memory systems of this kind could support personalized agents that retain user-specific interaction patterns across sessions.
- The same case-selection logic might extend to teams of agents that share and query a joint memory store.
- Longer real-world deployments would test whether the claimed no-regret property produces measurable gains beyond the reported sixteen-task suite.
Load-bearing premise
That casting experience reuse as a contextual bandit problem will actually deliver no-regret guarantees and convert accumulated cases into effective knowledge without any updates to the underlying model parameters.
What would settle it
A sequence of repeated interactions in which the agent's success rate stays flat at the zero-shot level or cumulative regret fails to converge toward zero over time.
Figures
read the original abstract
Large language models (LLMs) have become a central foundation of modern artificial intelligence, yet their lifecycle remains constrained by a rigid separation between training and deployment, after which learning effectively ceases. This limitation contrasts with natural intelligence, which continually adapts through interaction with its environment. In this paper, we formalise deployment-time learning (DTL) as the third stage in the LLM lifecycle that enables LLM agents to improve from experience during deployment without modifying model parameters. We present CASCADE (CASe-based Continual Adaptation during DEployment), a general and principled framework that equips LLM agents with an explicit, evolving episodic memory. CASCADE formulates experience reuse as a contextual bandit problem, enabling principled exploration-exploitation trade-offs and establishing no-regret guarantees over long-term interactions. This design allows agents to accumulate, select, and refine task-relevant cases, transforming past experience into actionable knowledge. Across 16 diverse tasks spanning medical diagnosis, legal analysis, code generation, web search, tool use, and embodied interaction, CASCADE improves macro-averaged success rate by 20.9% over zero-shot prompting while consistently outperforming gradient-based and memory-based baselines. By reframing deployment as an adaptive learning process, this work establishes a foundation for continually improving AI systems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces CASCADE, a framework for deployment-time learning (DTL) in LLMs that equips agents with an evolving episodic memory. It formulates experience reuse as a contextual bandit problem to enable exploration-exploitation trade-offs and no-regret guarantees without parameter updates. The approach accumulates, selects, and refines task-relevant cases, with empirical evaluation across 16 tasks (medical diagnosis, legal analysis, code generation, web search, tool use, embodied interaction) showing a 20.9% macro-averaged success rate improvement over zero-shot prompting and consistent outperformance of gradient-based and memory-based baselines.
Significance. If the central claims hold, this work is significant for reframing LLM deployment as an adaptive process rather than a static endpoint. The parameter-free design via case-based memory and the scale of the 16-task evaluation are strengths that could influence practical agent systems. The attempt to import contextual bandit theory for principled long-term improvement is a clear contribution, though its applicability here requires careful validation.
major comments (2)
- [theoretical analysis section on contextual bandit formulation] Contextual bandit formulation (theoretical analysis section deriving no-regret guarantees): The claim that formulating case selection as a contextual bandit yields no-regret guarantees for the overall agent is not automatically supported. Standard bounds (e.g., for LinUCB) assume direct, observable rewards from the chosen arm, but here the reward is the stochastic success of the LLM-generated response after inserting the retrieved case; the bandit never observes the internal LLM computation. This indirect mapping means case-selection regret does not necessarily translate to performance guarantees for the agent, and a precise reduction or modified analysis is needed to support the assertion.
- [experimental evaluation and results] Experimental section (results on 16 tasks and baseline comparisons): The reported 20.9% macro-averaged gain and consistent outperformance are promising, but the manuscript must clarify controls for post-hoc task selection and whether the bandit algorithm's exploration is evaluated in a truly online, non-stationary deployment setting rather than offline replay. Without these, the empirical support for long-term knowledge accumulation remains incomplete.
minor comments (2)
- [abstract and introduction] The abstract and introduction should explicitly name the specific contextual bandit algorithm (e.g., LinUCB, Thompson sampling) and the exact reward definition used in the formulation.
- [framework description] Notation for the episodic memory and case retrieval process could be made more precise, including how cases are represented and updated over time.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and the recommendation for major revision. The comments on the theoretical grounding of the contextual bandit formulation and the need for clearer experimental controls are valuable and will help strengthen the manuscript. We address each point below, outlining the revisions we will make.
read point-by-point responses
-
Referee: Contextual bandit formulation (theoretical analysis section deriving no-regret guarantees): The claim that formulating case selection as a contextual bandit yields no-regret guarantees for the overall agent is not automatically supported. Standard bounds (e.g., for LinUCB) assume direct, observable rewards from the chosen arm, but here the reward is the stochastic success of the LLM-generated response after inserting the retrieved case; the bandit never observes the internal LLM computation. This indirect mapping means case-selection regret does not necessarily translate to performance guarantees for the agent, and a precise reduction or modified analysis is needed to support the assertion.
Authors: We appreciate this precise observation on the reward structure. In CASCADE, the contextual bandit treats case selection as the action, with the observed reward being the binary task success (0/1) after the LLM produces its response using the selected case. This reward is directly observable post-execution and follows the standard stochastic reward model in contextual bandits, where the distribution depends on context and arm but need not reveal internal mechanisms. The no-regret bound therefore applies to the case-selection policy relative to the optimal policy in hindsight, ensuring sublinear regret in cumulative reward (i.e., task successes) over long-term interactions. While LLM stochasticity means the bound does not yield a deterministic performance guarantee for every response, it does guarantee that the selection policy improves, which in turn drives the observed agent-level gains. We will revise the theoretical analysis section to include an explicit reduction: we map the problem to a standard contextual bandit instance by defining the reward as the observed success indicator, state the assumptions under which LinUCB-style bounds hold, and clarify that the guarantees concern regret of the bandit (not a direct bound on LLM internals). A new subsection will formalize this mapping. revision: yes
-
Referee: Experimental section (results on 16 tasks and baseline comparisons): The reported 20.9% macro-averaged gain and consistent outperformance are promising, but the manuscript must clarify controls for post-hoc task selection and whether the bandit algorithm's exploration is evaluated in a truly online, non-stationary deployment setting rather than offline replay. Without these, the empirical support for long-term knowledge accumulation remains incomplete.
Authors: We agree that explicit controls are required to substantiate the deployment-time claims. The current evaluation processes the 16 tasks sequentially in a single continuous stream, with the episodic memory and bandit updating after each interaction; task order is randomized across runs to induce non-stationarity, and exploration occurs online via the bandit algorithm at each step. No offline replay or post-hoc filtering of tasks is performed—all 16 tasks are included as predefined. To make this transparent, we will add a new subsection in the experimental evaluation that (i) details the online sequential protocol, (ii) confirms absence of post-hoc task selection, (iii) describes how non-stationarity is simulated through randomized ordering and evolving memory, and (iv) includes cumulative success-rate plots over the interaction sequence to visualize long-term accumulation. These additions will directly address the concern about empirical support for continual adaptation. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper applies standard contextual bandit theory to experience reuse for no-regret guarantees, relying on external literature rather than self-referential definitions, fitted parameters renamed as predictions, or load-bearing self-citations. The central claims rest on empirical results across 16 tasks and the formalization of DTL, which does not reduce to its inputs by construction. No equations or steps in the provided text exhibit the enumerated circular patterns.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Contextual bandit formulation yields no-regret guarantees over long-term LLM agent interactions
invented entities (1)
-
Evolving episodic memory for LLMs
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
CASCADE formulates experience reuse as a contextual bandit problem, enabling principled exploration-exploitation trade-offs and establishing no-regret guarantees over long-term interactions.
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanembed_injective unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
RT = E[∑(R(qt,a⋆t)−R¯(qt,c⋆t)+R¯(qt,c⋆t)−R¯(qt,ct))]
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
AlphaEvolve: A coding agent for scientific and algorithmic discovery
Alexander Novikov, Ngân V˜u, Marvin Eisenberger, Emilien Dupont, Po-Sen Huang, Adam Zsolt Wagner, Sergey Shirobokov, Borislav Kozlovskii, Francisco JR Ruiz, Abbas Mehrabian, et al. Alphaevolve: A coding agent for scientific and algorithmic discovery. arXiv preprint arXiv:2506.13131, 2025. URL https://arxiv.org/abs/2506.13131
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[2]
Haitham Bou-Ammar, Antoine Grosnit, Alexandre Maraval, Refinath SN, Zichao Zhao, James Doran, Giuseppe Paolo, Albert Thomas, Jonas Gonzalez, Abhineet Kumar, et al. Kolb-based experiential learning for generalist agents with human-level kaggle data science performance,
-
[3]
URL https://doi.org/10.21203/rs.3.rs-7472642/v1
-
[4]
Deepseek-r1 incentivizes reasoning in llms through reinforcement learning
Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, et al. Deepseek-r1 incentivizes reasoning in llms through reinforcement learning. Nature, 645(8081):633–638, 2025
work page 2025
-
[5]
Experience-dependent structural synaptic plasticity in the mammalian brain
Anthony Holtmaat and Karel Svoboda. Experience-dependent structural synaptic plasticity in the mammalian brain. Nature Reviews Neuroscience, 10(9):647–658, 2009
work page 2009
-
[6]
Predictive processing: a canonical cortical computation
Georg B Keller and Thomas D Mrsic-Flogel. Predictive processing: a canonical cortical computation. Neuron, 100(2):424–435, 2018
work page 2018
-
[7]
A survey on large language model based autonomous agents
Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, et al. A survey on large language model based autonomous agents. Frontiers of Computer Science, 18(6):186345, 2024
work page 2024
-
[8]
Reinforcement learning: An introduction, volume 1
Richard S Sutton, Andrew G Barto, et al. Reinforcement learning: An introduction, volume 1. MIT press Cambridge, 1998
work page 1998
-
[15]
Reflexion: language agents with verbal reinforcement learning
Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. Reflexion: language agents with verbal reinforcement learning. In Advances in Neural Information Processing Systems, volume 36, pages 8634–8652, 2023. 19
work page 2023
-
[18]
GEPA: Reflective prompt evolution can outperform reinforcement learning
Lakshya A Agrawal, Shangyin Tan, Dilara Soylu, Noah Ziems, Rishi Khare, Krista Opsahl-Ong, Arnav Singhvi, Herumb Shandilya, Michael J Ryan, Meng Jiang, Christopher Potts, Koushik Sen, Alex Dimakis, Ion Stoica, Dan Klein, Matei Zaharia, and Omar Khattab. GEPA: Reflective prompt evolution can outperform reinforcement learning. In The Fourteenth Internationa...
work page 2026
-
[21]
Lora: Low-rank adaptation of large language models
Edward J Hu, yelong shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?i d=nZeVKeeFYf9
work page 2022
-
[28]
Alfworld: Aligning text and embodied environments for interactive learning
Mohit Shridhar, Xingdi Yuan, Marc-Alexandre Cote, Yonatan Bisk, Adam Trischler, and Matthew Hausknecht. Alfworld: Aligning text and embodied environments for interactive learning. In International Conference on Learning Representations , 2021. URL https: //openreview.net/forum?id=0IOX0YcCdTn. 20
work page 2021
-
[30]
React: Synergizing reasoning and acting in language models
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models. InThe Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?i d=WE_vluYUL-X
work page 2023
-
[31]
Search-r1: Training LLMs to reason and leverage search engines with reinforcement learning
Bowen Jin, Hansi Zeng, Zhenrui Yue, Jinsung Yoon, Sercan O Arik, Dong Wang, Hamed Zamani, and Jiawei Han. Search-r1: Training LLMs to reason and leverage search engines with reinforcement learning. In Second Conference on Language Modeling, 2025
work page 2025
-
[33]
Dense passage retrieval for open-domain question answering
Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6769–6781, 2020
work page 2020
-
[34]
Text Embeddings by Weakly-Supervised Contrastive Pre-training
Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, and Furu Wei. Text embeddings by weakly-supervised contrastive pre-training. arXiv preprint arXiv:2212.03533, 2022
work page internal anchor Pith review arXiv 2022
-
[36]
Hl7 fhir: An agile and restful approach to healthcare information exchange
Duane Bender and Kamran Sartipi. Hl7 fhir: An agile and restful approach to healthcare information exchange. In Proceedings of the 26th IEEE international symposium on computer- based medical systems, pages 326–331. IEEE, 2013
work page 2013
-
[38]
Yusheng Liao, Chaoyi Wu, Junwei Liu, Shuyang Jiang, Pengcheng Qiu, Haowen Wang, Yun Yue, Shuai Zhen, Jian Wang, Qianrui Fan, et al. Ehr-r1: A reasoning-enhanced foundational language model for electronic health record analysis. arXiv preprint arXiv:2510.25628, 2025
-
[43]
Retrieval, reuse, revision and retention in case-based reasoning
Ramon Lopez De Mantaras, David Mcsherry, Derek Bridge, David Leake, Barry Smyth, Susan Craw, Boi Faltings, Mary Lou Maher, Michael T Cox, Kenneth Forbus, et al. Retrieval, reuse, revision and retention in case-based reasoning. Knowledge Engineering Review, 20(3):215–240, 2005
work page 2005
-
[47]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017
work page 2017
-
[48]
Bert: Pre-training of deep bidirectional transformers for language understanding
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186, 2019
work page 2019
-
[49]
Neural contextual bandits with ucb-based exploration
Dongruo Zhou, Lihong Li, and Quanquan Gu. Neural contextual bandits with ucb-based exploration. In International conference on machine learning, pages 11492–11502. PMLR, 2020. 22 a b c Figure E1: Results on the extension of CASCADE to multiple cases. All results are reported based on five different random seeds. a, Performance comparison under different n...
work page 2020
-
[50]
On the night of July 8, 2013, the defendants Chen, Liu, and Chu conspired in advance to snatch a gold necklace in the room of the Renhe Hotel at Chicheng Street, Tiantai County. After coming to an agreement, the three took a moped driven by Chu to Shiliang Bay Park near the bridge on Jiuchang Road, Chicheng Street, Tiantai County. When the victim, Ding, g...
work page 2013
-
[51]
In the afternoon of July 9, 2013, defendants Chen and Liu agreed to snatch another gold necklace. Liu drove the moped carrying Chen to a road outside Tiantai V ocational and Technical Secondary School on Tiantaishan Middle Road in Tiantai County, approached Yang, who was riding an electric bike, and Chen snatched the gold necklace from around Yang’s neck....
work page 2013
-
[52]
With this assumption, we can derive that the initialisation scheme above leads to f(xt,k; ω0) = 0 for all t ∈ [T ], k ∈ [K]. Assumption 5. There exists a constant ℓLip > 0 such that for all x, x′ ∈ {xi}T K i=1, ∥∇ωf(x; ω0) − ∇ωf(x′; ω0)∥ ≤ ℓLip∥x − x′∥2. This assumption ensures the stability of the parameter gradients with respect to input variations, whi...
-
[53]
David Silver and Richard S. Sutton. Welcome to the era of experience, 2025. URL https: //storage.googleapis.com/deepmind-media/Era-of-Experience%20/The%20Era%2 0of%20Experience%20Paper.pdf
work page 2025
-
[54]
The landscape of agentic reinforcement learning for llms: A survey, 2025
Guibin Zhang, Hejia Geng, Xiaohang Yu, Zhenfei Yin, Zaibin Zhang, Zelin Tan, Heng Zhou, Zhongzhi Li, Xiangyuan Xue, Yijiang Li, Yifan Zhou, Yang Chen, Chen Zhang, Yutao Fan, Zihu Wang, Songtao Huang, Francisco Piedrahita-Velez, Yue Liao, Hongru Wang, Mengyue Yang, Heng Ji, Jun Wang, Shuicheng Yan, Philip Torr, and Lei Bai. The landscape of agentic reinfor...
work page 2025
-
[55]
Proximal Policy Optimization Algorithms
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[56]
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Yang Wu, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models. arXiv preprint arXiv:2402.03300, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[57]
Group-in-group policy optimization for llm agent training
Lang Feng, Zhenghai Xue, Tingcong Liu, and Bo An. Group-in-group policy optimization for llm agent training. In The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025
work page 2025
-
[58]
Agent learning via early experience.arXiv preprint arXiv:2510.08558, 2025
Kai Zhang, Xiangchao Chen, Bo Liu, Tianci Xue, Zeyi Liao, Zhihan Liu, Xiyao Wang, Yuting Ning, Zhaorun Chen, Xiaohan Fu, et al. Agent learning via early experience. arXiv preprint arXiv:2510.08558, 2025
-
[59]
arXiv preprint arXiv:2510.01051 , year=
Zichen Liu, Anya Sims, Keyu Duan, Changyu Chen, Simon Yu, Xiangxin Zhou, Haotian Xu, Shaopan Xiong, Bo Liu, Chenmien Tan, et al. Gem: A gym for agentic llms. arXiv preprint arXiv:2510.01051, 2025
-
[60]
A survey of context engineering for large language models, 2025
Lingrui Mei, Jiayu Yao, Yuyao Ge, Yiwei Wang, Baolong Bi, Yujun Cai, Jiazhi Liu, Mingyu Li, Zhong-Zhi Li, Duzhen Zhang, Chenlin Zhou, Jiayi Mao, Tianze Xia, Jiafeng Guo, and Shenghua Liu. A survey of context engineering for large language models, 2025
work page 2025
-
[61]
Optimizing generative ai by backpropagating language model feedback
Mert Yuksekgonul, Federico Bianchi, Joseph Boen, Sheng Liu, Pan Lu, Zhi Huang, Carlos Guestrin, and James Zou. Optimizing generative ai by backpropagating language model feedback. Nature, 639(8055):609–616, 2025
work page 2025
-
[62]
Feedback descent: Open-ended text optimization via pairwise comparison
Yoonho Lee, Joseph Boen, and Chelsea Finn. Feedback descent: Open-ended text optimization via pairwise comparison. arXiv preprint arXiv:2511.07919, 2025
-
[63]
Joshi, Hanna Moazam, Heather Miller, Matei Zaharia, and Christopher Potts
Omar Khattab, Arnav Singhvi, Paridhi Maheshwari, Zhiyuan Zhang, Keshav Santhanam, Sri Vardhamanan A, Saiful Haq, Ashutosh Sharma, Thomas T. Joshi, Hanna Moazam, Heather Miller, Matei Zaharia, and Christopher Potts. DSPy: Compiling declarative language model calls into state-of-the-art pipelines. In The Twelfth International Conference on Learning Represen...
work page 2024
-
[64]
GEPA: Reflective prompt evolution can outperform reinforcement learning
Lakshya A Agrawal, Shangyin Tan, Dilara Soylu, Noah Ziems, Rishi Khare, Krista Opsahl-Ong, Arnav Singhvi, Herumb Shandilya, Michael J Ryan, Meng Jiang, Christopher Potts, Koushik Sen, Alex Dimakis, Ion Stoica, Dan Klein, Matei Zaharia, and Omar Khattab. GEPA: Reflective prompt evolution can outperform reinforcement learning. In The Fourteenth Internationa...
work page 2026
-
[65]
Llms are in-context bandit reinforcement learners
Giovanni Monea, Antoine Bosselut, Kianté Brantley, and Yoav Artzi. Llms are in-context bandit reinforcement learners. In Second Conference on Language Modeling , 2025. URL https://openreview.net/forum?id=c0RsezY2D1
work page 2025
-
[66]
arXiv preprint arXiv:2507.06229 , year=
Xiangru Tang, Tianrui Qin, Tianhao Peng, Ziyang Zhou, Daniel Shao, Tingting Du, Xinming Wei, Peng Xia, Fang Wu, He Zhu, et al. Agent kb: Leveraging cross-domain experience for agentic problem solving. arXiv preprint arXiv:2507.06229, 2025
-
[67]
Memento: Fine-tuning LLM agents without fine-tuning LLMs.arXiv, 2025
Huichi Zhou, Yihang Chen, Siyuan Guo, Xue Yan, Kin Hei Lee, Zihan Wang, Ka Yiu Lee, Guchun Zhang, Kun Shao, Linyi Yang, and Jun Wang. Memento: Fine-tuning llm agents without fine-tuning llms. arXiv preprint arXiv: 2508.16153, 2025. URL https://arxiv.or g/abs/2508.16153
-
[68]
arXiv preprint arXiv:2511.06449 , year=
Zhicheng Cai, Xinyuan Guo, Yu Pei, JiangTao Feng, Jiangjie Chen, Ya-Qin Zhang, Wei-Ying Ma, Mingxuan Wang, and Hao Zhou. Flex: Continuous agent evolution via forward learning from experience. arXiv preprint arXiv:2511.06449, 2025
-
[69]
Dynamic cheatsheet: Test-time learning with adaptive memory
Mirac Suzgun, Mert Yuksekgonul, Federico Bianchi, Dan Jurafsky, and James Zou. Dynamic cheatsheet: Test-time learning with adaptive memory. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7080–7106, 2026
work page 2026
-
[70]
Agentic context engineering: Learning comprehensive contexts for self-improving language models
Qizheng Zhang, Changran Hu, Shubhangi Upasani, Boyuan Ma, Fenglu Hong, Vamsidhar Kamanuru, Jay Rainton, Chen Wu, Mengmeng Ji, Hanchen Li, et al. Agentic context engineering: Learning comprehensive contexts for self-improving language models. In The Fourteenth International Conference on Learning Representations, 2026
work page 2026
-
[71]
An introduction to case-based reasoning
Janet L Kolodner. An introduction to case-based reasoning. Artificial intelligence review, 6(1): 3–34, 1992
work page 1992
-
[72]
Case-based reasoning: A review
Ian Watson and Farhi Marir. Case-based reasoning: A review. The knowledge engineering review, 9(4):327–354, 1994
work page 1994
-
[73]
Case-based reasoning: Foundational issues, methodological variations, and system approaches
Agnar Aamodt and Enric Plaza. Case-based reasoning: Foundational issues, methodological variations, and system approaches. AI communications, 7(1):39–59, 1994
work page 1994
-
[74]
Floyd, Lasal Jayawardena, David Leake, Mirko Lenz, Lukas Malburg, David H
Kerstin Bach, Ralph Bergmann, Florian Brand, Marta Caro-Martínez, Viktor Eisenstadt, Michael W. Floyd, Lasal Jayawardena, David Leake, Mirko Lenz, Lukas Malburg, David H. Ménager, Mirjam Minor, Brian Schack, Ian Watson, Kaitlynne Wilkerson, and Nirmalie Wiratunga. Case- based reasoning meets large language models: A research manifesto for open challenges ...
work page 2025
-
[75]
Language models are few-shot learners
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020
work page 1901
-
[76]
A survey on in-context learning
Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Jingyuan Ma, Rui Li, Heming Xia, Jingjing Xu, Zhiyong Wu, Baobao Chang, et al. A survey on in-context learning. In Proceedings of the 2024 conference on empirical methods in natural language processing, pages 1107–1128, 2024
work page 2024
-
[77]
Improving language models by retrieving from trillions of tokens
Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George Bm Van Den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, et al. Improving language models by retrieving from trillions of tokens. In International conference on machine learning, pages 2206–2240. PMLR, 2022. 57
work page 2022
-
[78]
Retrieval-Augmented Generation for Large Language Models: A Survey
Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yixin Dai, Jiawei Sun, Haofen Wang, and Haofen Wang. Retrieval-augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997, 2(1), 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[79]
Ds-agent: Automated data science by empowering large language models with case-based reasoning
Siyuan Guo, Cheng Deng, Ying Wen, Hechang Chen, Yi Chang, and Jun Wang. Ds-agent: Automated data science by empowering large language models with case-based reasoning. In International Conference on Machine Learning, pages 16813–16848. PMLR, 2024
work page 2024
-
[80]
Siyuan Guo, Huiwu Liu, Xiaolong Chen, Yuming Xie, Liang Zhang, Tao Han, Hechang Chen, Yi Chang, and Jun Wang. Optimizing case-based reasoning system for functional test script generation with large language models. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 2, pages 4487–4498, 2025
work page 2025
-
[81]
Case-based reasoning enhances the predictive power of llms in drug-drug interaction
Guangyi Liu, Yongqi Zhang, Xunyuan Liu, and Quanming Yao. Case-based reasoning enhances the predictive power of llms in drug-drug interaction. arXiv preprint arXiv:2505.23034, 2025
-
[82]
Memento-ii: Learning by stateful reflective memory.arXiv preprint arXiv:2512.22716, 2025
Jun Wang. Memento-ii: Learning by stateful reflective memory. arXiv preprint arXiv:2512.22716, 2025
-
[83]
Memento-skills: Let agents design agents
Huichi Zhou, Siyuan Guo, Anjie Liu, Zhongwei Yu, Ziqin Gong, Bowen Zhao, Zhixun Chen, Menglong Zhang, Yihang Chen, Jinsong Li, et al. Memento-skills: Let agents design agents. arXiv preprint arXiv:2603.18743, 2026
-
[84]
Gaia: a benchmark for general ai assistants
Grégoire Mialon, Clémentine Fourrier, Thomas Wolf, Yann LeCun, and Thomas Scialom. Gaia: a benchmark for general ai assistants. In The Twelfth International Conference on Learning Representations, 2023
work page 2023
-
[85]
A contextual-bandit approach to personalized news article recommendation
Lihong Li, Wei Chu, John Langford, and Robert E Schapire. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web, pages 661–670, 2010
work page 2010
-
[86]
Scalable neural contextual bandit for recommender systems
Zheqing Zhu and Benjamin Van Roy. Scalable neural contextual bandit for recommender systems. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, pages 3636–3646, 2023
work page 2023
-
[87]
Neural contextual bandits for personalized recommendation
Yikun Ban, Yunzhe Qi, and Jingrui He. Neural contextual bandits for personalized recommendation. In Companion Proceedings of the ACM Web Conference 2024, pages 1246– 1249, 2024
work page 2024
-
[88]
Use your instinct: Instruction optimization for llms using neural bandits coupled with transformers
Xiaoqiang Lin, Zhaoxuan Wu, Zhongxiang Dai, Wenyang Hu, Yao Shu, See-Kiong Ng, Patrick Jaillet, and Bryan Kian Hsiang Low. Use your instinct: Instruction optimization for llms using neural bandits coupled with transformers. In International Conference on Machine Learning, pages 30317–30345. PMLR, 2024
work page 2024
-
[89]
Neural contextual bandits with ucb-based exploration
Dongruo Zhou, Lihong Li, and Quanquan Gu. Neural contextual bandits with ucb-based exploration. In International conference on machine learning, pages 11492–11502. PMLR, 2020
work page 2020
-
[90]
Prompt optimization with ease? efficient ordering-aware automated selection of exemplars
Zhaoxuan Wu, Xiaoqiang Lin, Zhongxiang Dai, Wenyang Hu, Yao Shu, See-Kiong Ng, Patrick Jaillet, and Bryan Kian Hsiang Low. Prompt optimization with ease? efficient ordering-aware automated selection of exemplars. Advances in Neural Information Processing Systems, 37: 122706–122740, 2024
work page 2024
-
[91]
Adaptive llm routing under budget constraints
Pranoy Panda, Raghav Magazine, Chaitanya Devaguptapu, Sho Takemori, and Vishal Sharma. Adaptive llm routing under budget constraints. InFindings of the Association for Computational Linguistics: EMNLP 2025, pages 23934–23949, 2025. 58
work page 2025
-
[92]
Online multi-llm selection via contextual bandits under unstructured context evolution
Manhin Poon, XiangXiang Dai, Xutong Liu, Fang Kong, John Lui, and Jinhang Zuo. Online multi-llm selection via contextual bandits under unstructured context evolution. arXiv preprint arXiv:2506.17670, 2025
-
[93]
Ddxplus: A new dataset for automatic medical diagnosis
Arsene Fansi Tchango, Rishab Goel, Zhi Wen, Julien Martel, and Joumana Ghosn. Ddxplus: A new dataset for automatic medical diagnosis. Advances in neural information processing systems, 35:31306–31318, 2022
work page 2022
-
[94]
Streambench: Towards benchmarking continuous improvement of language agents
Cheng-Kuang Wu, Zhi Rui Tam, Chieh-Yen Lin, Yun-Nung Vivian Chen, and Hung-yi Lee. Streambench: Towards benchmarking continuous improvement of language agents. Advances in Neural Information Processing Systems, 37:107039–107063, 2024
work page 2024
-
[95]
Mimic-iv, a freely accessible electronic health record dataset
Alistair EW Johnson, Lucas Bulgarelli, Lu Shen, Alvin Gayles, Ayad Shammout, Steven Horng, Tom J Pollard, Sicheng Hao, Benjamin Moody, Brian Gow, et al. Mimic-iv, a freely accessible electronic health record dataset. Scientific data, 10(1):1, 2023
work page 2023
-
[96]
Large language model distilling medication recommendation model
Qidong Liu, Xian Wu, Xiangyu Zhao, Yuanshao Zhu, Zijian Zhang, Feng Tian, and Yefeng Zheng. Large language model distilling medication recommendation model. arXiv preprint arXiv:2402.02803, 2024
-
[97]
Farieda Gaber, Maqsood Shaik, Fabio Allega, Agnes Julia Bilecz, Felix Busch, Kelsey Goon, Vedran Franke, and Altuna Akalin. Evaluating large language model workflows in clinical decision support for triage and referral and diagnosis. npj Digital Medicine, 8(1):263, 2025
work page 2025
-
[98]
Emergency severity index (esi): a triage tool for emergency department care, version 4
Nicki Gilboy, Paula Tanabe, Debbie Travers, Alexander M Rosenau, et al. Emergency severity index (esi): a triage tool for emergency department care, version 4. Implementation handbook, 2012:12–0014, 2012
work page 2012
-
[99]
Through the mud: A multi-defendant charge prediction benchmark with linked crime elements
Xiao Wei, Qi Xu, Hang Yu, Qian Liu, and Erik Cambria. Through the mud: A multi-defendant charge prediction benchmark with linked crime elements. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages 2864–2878, 2024
work page 2024
-
[100]
Cmdl: A large-scale chinese multi-defendant legal judgment prediction dataset
Wanhong Huang, Yi Feng, Chuanyi Li, Honghan Wu, Jidong Ge, and Vincent Ng. Cmdl: A large-scale chinese multi-defendant legal judgment prediction dataset. In Findings of the Association for Computational Linguistics: ACL 2024 , pages 5895–5906. Association for Computational Linguistics, 2024
work page 2024
-
[101]
Efficient intent detection with dual sentence encoders
Iñigo Casanueva, Tadas Temˇcinas, Daniela Gerz, Matthew Henderson, and Ivan Vuli´c. Efficient intent detection with dual sentence encoders. In Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI, pages 38–45, 2020
work page 2020
-
[102]
Sentfin 1.0: Entity-aware sentiment analysis for financial news
Ankur Sinha, Satishwar Kedas, Rishu Kumar, and Pekka Malo. Sentfin 1.0: Entity-aware sentiment analysis for financial news. Journal of the Association for Information Science and Technology, 73(9):1314–1335, 2022
work page 2022
-
[103]
Logreasoner: Empowering llms with expert-like coarse- to-fine reasoning for log analysis tasks
Lipeng Ma, Yixuan Li, Weidong Yang, Mingjie Zhou, Xinyi Liu, Ben Fei, Shuhao Li, Xiaoyan Sun, Sihang Jiang, and Yanghua Xiao. Logreasoner: Empowering llms with expert-like coarse- to-fine reasoning for log analysis tasks. arXiv preprint arXiv:2509.20798, 2025
-
[104]
Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, et al. Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3911–3921, 2018. 59
work page 2018
-
[105]
Jinyang Li, Binyuan Hui, Ge Qu, Jiaxi Yang, Binhua Li, Bowen Li, Bailin Wang, Bowen Qin, Ruiying Geng, Nan Huo, et al. Can llm already serve as a database interface? a big bench for large-scale database grounded text-to-sqls. Advances in Neural Information Processing Systems, 36:42330–42357, 2023
work page 2023
-
[106]
Alfworld: Aligning text and embodied environments for interactive learning
Mohit Shridhar, Xingdi Yuan, Marc-Alexandre Cote, Yonatan Bisk, Adam Trischler, and Matthew Hausknecht. Alfworld: Aligning text and embodied environments for interactive learning. In International Conference on Learning Representations , 2021. URL https: //openreview.net/forum?id=0IOX0YcCdTn
work page 2021
-
[107]
Ruoyao Wang, Peter Jansen, Marc-Alexandre Côté, and Prithviraj Ammanabrolu. Scienceworld: Is your agent smarter than a 5th grader? In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 11279–11298, 2022
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.