Recognition: 3 theorem links
· Lean TheoremMemory in the Age of AI Agents
Pith reviewed 2026-05-11 18:10 UTC · model grok-4.3
The pith
Agent memory research unifies under forms, functions, and dynamics with a new factual-experiential-working taxonomy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
This survey delineates the scope of agent memory and examines it through the unified lenses of forms (token-level, parametric, and latent realizations), functions (factual, experiential, and working memory), and dynamics (how memory is formed, evolved, and retrieved). It argues that traditional long/short-term distinctions are insufficient for contemporary agent systems and compiles benchmarks, frameworks, and forward-looking topics such as memory automation, reinforcement-learning integration, multimodal memory, multi-agent memory, and trustworthiness to support memory as a first-class design primitive.
What carries the argument
The three lenses of forms, functions, and dynamics, with the function-based taxonomy that distinguishes factual, experiential, and working memory.
Load-bearing premise
The distinctions among forms, functions, and dynamics form a complete, non-overlapping classification that meaningfully reduces fragmentation in the existing literature.
What would settle it
A later systematic mapping of published agent systems that shows most implementations still fall outside the factual-experiential-working categories or require substantial overlap would falsify the taxonomy's claimed unifying power.
read the original abstract
Memory has emerged, and will continue to remain, a core capability of foundation model-based agents. As research on agent memory rapidly expands and attracts unprecedented attention, the field has also become increasingly fragmented. Existing works that fall under the umbrella of agent memory often differ substantially in their motivations, implementations, and evaluation protocols, while the proliferation of loosely defined memory terminologies has further obscured conceptual clarity. Traditional taxonomies such as long/short-term memory have proven insufficient to capture the diversity of contemporary agent memory systems. This work aims to provide an up-to-date landscape of current agent memory research. We begin by clearly delineating the scope of agent memory and distinguishing it from related concepts such as LLM memory, retrieval augmented generation (RAG), and context engineering. We then examine agent memory through the unified lenses of forms, functions, and dynamics. From the perspective of forms, we identify three dominant realizations of agent memory, namely token-level, parametric, and latent memory. From the perspective of functions, we propose a finer-grained taxonomy that distinguishes factual, experiential, and working memory. From the perspective of dynamics, we analyze how memory is formed, evolved, and retrieved over time. To support practical development, we compile a comprehensive summary of memory benchmarks and open-source frameworks. Beyond consolidation, we articulate a forward-looking perspective on emerging research frontiers, including memory automation, reinforcement learning integration, multimodal memory, multi-agent memory, and trustworthiness issues. We hope this survey serves not only as a reference for existing work, but also as a conceptual foundation for rethinking memory as a first-class primitive in the design of future agentic intelligence.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper is a survey on memory systems for foundation model-based AI agents. It argues that the field is fragmented with proliferating terminologies and that traditional long/short-term memory distinctions are insufficient. The authors delineate the scope of agent memory from related concepts such as LLM memory, RAG, and context engineering; propose taxonomies organized by forms (token-level, parametric, latent), functions (factual, experiential, working), and dynamics (formation, evolution, retrieval); compile benchmarks and open-source frameworks; and outline future frontiers including memory automation, RL integration, multimodal memory, multi-agent memory, and trustworthiness issues.
Significance. If the taxonomy is adopted, the survey could meaningfully consolidate a rapidly expanding area by supplying a unified organizational lens that better captures contemporary agent memory systems than prior distinctions. The explicit compilation of benchmarks and frameworks provides immediate practical utility for researchers and developers, while the forward-looking section on emerging frontiers offers a useful roadmap. These elements position the work as a potential reference point for treating memory as a first-class design primitive in agentic systems.
minor comments (2)
- [Abstract] The abstract states that the survey compiles 'a comprehensive summary of memory benchmarks and open-source frameworks' but does not indicate selection criteria or coverage scope; adding a short methods paragraph or table in the main text would improve reproducibility and transparency of the consolidation effort.
- [Scope delineation] The scope delineation from RAG and context engineering is conceptually useful; a concise comparative table (e.g., in the introduction) listing key differences in motivation, implementation, and evaluation would enhance clarity without altering the central argument.
Simulated Author's Rebuttal
We thank the referee for the positive and constructive review, which highlights the potential of the proposed taxonomy, benchmark compilation, and future directions to consolidate the agent memory literature. We appreciate the recommendation for minor revision.
Circularity Check
No significant circularity; survey taxonomy is externally grounded
full rationale
This is a survey paper whose central contribution is an organizational taxonomy of agent memory drawn from analysis of external literature. It delineates scope against related concepts (LLM memory, RAG, context engineering), identifies forms (token-level/parametric/latent), proposes functions (factual/experiential/working), and examines dynamics without any equations, fitted parameters, predictions, or derivations. No load-bearing step reduces to self-definition, self-citation chains, or renaming of known results by construction; the distinctions are explicitly motivated as a response to fragmentation in prior work. The paper is self-contained against external benchmarks and compiles summaries of existing frameworks rather than deriving new results from its own inputs.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Traditional taxonomies such as long/short-term memory are insufficient to capture the diversity of contemporary agent memory systems
- domain assumption Agent memory is distinct from LLM memory, RAG, and context engineering
Forward citations
Cited by 38 Pith papers
-
Agent-ValueBench: A Comprehensive Benchmark for Evaluating Agent Values
Agent-ValueBench is the first dedicated benchmark for agent values, showing they diverge from LLM values, form a homogeneous 'Value Tide' across models, and bend under harnesses and skill steering.
-
Trojan Hippo: Weaponizing Agent Memory for Data Exfiltration
Trojan Hippo attacks on LLM agent memory achieve 85-100% success rates in data exfiltration across four memory backends even after 100 benign sessions, while evaluated defenses reduce success rates but impose varying ...
-
ScioMind: Cognitively Grounded Multi-Agent Social Simulation with Anchoring-Based Belief Dynamics and Dynamic Profiles
ScioMind combines anchoring-based belief updates, hierarchical memory, and dynamic profiles in LLM multi-agent systems to produce more stable, diverse, and psychologically aligned opinion trajectories than prior fixed...
-
Remember the Decision, Not the Description: A Rate-Distortion Framework for Agent Memory
Memory for long-horizon agents should preserve distinctions that affect decisions under a fixed budget, not descriptive features, yielding an exact forgetting boundary and a new online learner DeMem with regret guarantees.
-
When Stored Evidence Stops Being Usable: Scale-Conditioned Evaluation of Agent Memory
A new evaluation protocol shows agent memory reliability degrades variably with added irrelevant sessions depending on agent, memory interface, and scale.
-
SRTJ: Self-Evolving Rule-Driven Training-Free LLM Jailbreaking
SRTJ is a training-free jailbreak method that evolves hierarchical attack rules using iterative verifier feedback and ASP-based constraint-aware composition to achieve stable high success rates on HarmBench across mul...
-
Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory
MemCoE learns memory organization guidelines via contrastive feedback and then trains a guideline-aligned RL policy for memory updates, yielding consistent gains on personalization benchmarks.
-
HeLa-Mem: Hebbian Learning and Associative Memory for LLM Agents
HeLa-Mem is a graph-based memory architecture for LLM agents that applies Hebbian learning to episodic associations and distills hubs into semantic knowledge, yielding better results on long-context benchmarks with fe...
-
When to Forget: A Memory Governance Primitive
Memory Worth converges almost surely to the conditional probability of task success given memory retrieval and correlates at rho=0.89 with ground-truth utility in controlled experiments.
-
Automated Reformulation of Robust Optimization via Memory-Augmented Large Language Models
AutoREM augments LLMs with a structured memory of failed reformulation trajectories to improve accuracy and efficiency on robust optimization tasks without parameter updates or expert knowledge.
-
HAGE: Harnessing Agentic Memory via RL-Driven Weighted Graph Evolution
HAGE proposes a trainable weighted graph memory framework with LLM intent classification, dynamic edge modulation, and RL optimization that improves long-horizon reasoning accuracy in agentic LLMs over static baselines.
-
MemPrivacy: Privacy-Preserving Personalized Memory Management for Edge-Cloud Agents
MemPrivacy uses edge detection of sensitive spans and type-aware placeholders to enable cloud-side memory management for LLM agents without exposing private data, achieving under 1.6% utility loss.
-
MemPrivacy: Privacy-Preserving Personalized Memory Management for Edge-Cloud Agents
MemPrivacy replaces privacy-sensitive spans with structured placeholders on edge devices to enable effective cloud memory management while limiting utility loss to 1.6% and outperforming general models on privacy extraction.
-
Tree-based Credit Assignment for Multi-Agent Memory System
TreeMem assigns credit to agents in multi-agent memory systems by expanding outputs into a tree and using Monte Carlo averaging of final rewards to optimize each agent's policy.
-
What Happens Inside Agent Memory? Circuit Analysis from Emergence to Diagnosis
Circuit analysis reveals that routing circuits for agent memory emerge at 0.6B parameters while content circuits emerge at 4B, with a shared grounding hub and an unsupervised diagnostic achieving 76.2% accuracy for lo...
-
What Happens Inside Agent Memory? Circuit Analysis from Emergence to Diagnosis
In LLM agents, memory routing circuits emerge at 0.6B scale while content circuits appear only at 4B, and write/read operations recruit a pre-existing late-layer context hub instead of creating a new one, enabling a 7...
-
MemRouter: Memory-as-Embedding Routing for Long-Term Conversational Agents
A lightweight supervised router using frozen-LLM embeddings for memory admission decisions outperforms LLM-based memory managers in both F1 score and latency on the LoCoMo benchmark.
-
From Unstructured Recall to Schema-Grounded Memory: Reliable AI Memory via Iterative, Schema-Aware Extraction
Schema-aware iterative extraction turns AI memory into a verified system of record, reaching 90-97% accuracy on extraction and end-to-end memory benchmarks where retrieval baselines score 80-87%.
-
Contextual Agentic Memory is a Memo, Not True Memory
Agentic memory is lookup-based retrieval, not weight-based consolidation, creating a generalization ceiling on novel tasks and structural vulnerability to memory poisoning.
-
EviMem: Evidence-Gap-Driven Iterative Retrieval for Long-Term Conversational Memory
EviMem improves accuracy on temporal and multi-hop questions in long-term conversational memory by iteratively diagnosing and filling evidence gaps, achieving 81.6% and 85.2% judge accuracy on LoCoMo at 4.5x lower lat...
-
Memanto: Typed Semantic Memory with Information-Theoretic Retrieval for Long-Horizon Agents
Memanto delivers 89.8% and 87.1% accuracy on LongMemEval and LoCoMo benchmarks using typed semantic memory and information-theoretic retrieval, outperforming hybrid graph and vector systems with a single query and zer...
-
Experience Compression Spectrum: Unifying Memory, Skills, and Rules in LLM Agents
The Experience Compression Spectrum unifies memory, skills, and rules in LLM agents along increasing compression levels and identifies the absence of adaptive cross-level compression as the missing diagonal.
-
GAM: Hierarchical Graph-based Agentic Memory for LLM Agents
GAM decouples event-level memory encoding from topic-level consolidation in LLM agents using hierarchical graphs to reduce interference and improve long-term coherence and retrieval.
-
Quantifying Trust: Financial Risk Management for Trustworthy AI Agents
The paper introduces the Agentic Risk Standard (ARS) as a payment settlement framework that delivers predefined compensation for AI agent execution failures, misalignment, or unintended outcomes.
-
MemFactory: Unified Inference & Training Framework for Agent Memory
MemFactory is a new unified modular framework for memory-augmented LLM agent inference and training that integrates GRPO and reports up to 14.8% relative gains on MemAgent evaluations.
-
Safety in Embodied AI: A Survey of Risks, Attacks, and Defenses
The survey organizes over 400 papers on embodied AI safety into a multi-level taxonomy and flags overlooked issues such as fragile multimodal fusion and unstable planning under jailbreaks.
-
MemReread: Enhancing Agentic Long-Context Reasoning via Memory-Guided Rereading
MemReread improves agent long-context reasoning by triggering rereading on insufficient final memory to recover discarded indirect facts, outperforming baselines at linear complexity.
-
A Semantic Autonomy Framework for VLM-Integrated Indoor Mobile Robots: Hybrid Deterministic Reasoning and Cross-Robot Adaptive Memory
The Semantic Autonomy Stack combines a seven-step parametric resolver handling 88% of instructions in under 0.1 ms with VLM escalation and a five-category cross-robot memory system, achieving 100% accuracy and 103,000...
-
Towards Self-Improving Error Diagnosis in Multi-Agent Systems
ErrorProbe introduces a self-improving pipeline for attributing semantic failures in LLM multi-agent systems to specific agents and steps via anomaly detection, backward tracing, and tool-grounded validation with veri...
-
Towards Scalable Lightweight GUI Agents via Multi-role Orchestration
LAMO uses role-oriented data synthesis and two-stage training (perplexity-weighted supervised fine-tuning plus reinforcement learning) to create scalable lightweight GUI agents that support both single-model and multi...
-
On the Creativity of AI Agents
LLM agents produce outputs that meet basic functional criteria for creativity but lack the process-level, social, and personal elements required for ontological creativity.
-
Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems
Claude Code centers on a model-tool while-loop surrounded by permission systems, context compaction, extensibility hooks, subagent delegation, and session storage; the same design questions yield different answers in ...
-
Retrieval Is Not Enough: Why Organizational AI Needs Epistemic Infrastructure
OIDA adds typed knowledge objects, decay-based importance scores, contradiction edges, and an inverse-decay QUESTION primitive for ignorance to raise epistemic fidelity beyond retrieval.
-
MemCoT: Test-Time Scaling through Memory-Driven Chain-of-Thought
MemCoT redefines long-context reasoning as iterative stateful search with zoom-in/zoom-out memory perception and dual short-term memories, claiming SOTA results on LoCoMo and LongMemEval-S benchmarks.
-
MemMachine: A Ground-Truth-Preserving Memory System for Personalized AI Agents
MemMachine stores entire conversational episodes and applies contextualized retrieval plus adaptive query routing to achieve 0.9169 accuracy on LoCoMo and 93 percent on LongMemEvalS while using 80 percent fewer tokens...
-
Reliable AI Needs to Externalize Implicit Knowledge: A Human-AI Collaboration Perspective
Reliable AI needs structured Knowledge Objects to externalize and enable human validation of implicit knowledge that current methods cannot verify.
-
Memory as Metabolism: A Design for Companion Knowledge Systems
This paper designs a companion knowledge system with TRIAGE, DECAY, CONTEXTUALIZE, CONSOLIDATE, and AUDIT operations plus memory gravity and minority-hypothesis retention to give contradictory evidence a path to updat...
-
Back to Basics: Let Conversational Agents Remember with Just Retrieval and Generation
A minimalist retrieval-and-generation framework using turn isolation and query-driven pruning outperforms complex memory systems by directly addressing signal sparsity and dual-level redundancy in dialogues.
Reference graph
Works this paper leans on
-
[1]
Detoxifying Large Language Models via Knowledge Editing , booktitle =
Mengru Wang and Ningyu Zhang and Ziwen Xu and Zekun Xi and Shumin Deng and Yunzhi Yao and Qishen Zhang and Linyi Yang and Jindong Wang and Huajun Chen , editor =. Detoxifying Large Language Models via Knowledge Editing , booktitle =. 2024 , url =. doi:10.18653/V1/2024.ACL-LONG.171 , timestamp =
-
[2]
Neighboring Perturbations of Knowledge Editing on Large Language Models , booktitle =
Jun. Neighboring Perturbations of Knowledge Editing on Large Language Models , booktitle =. 2024 , url =
work page 2024
-
[3]
Editing Personality For Large Language Models , booktitle =
Shengyu Mao and Xiaohan Wang and Mengru Wang and Yong Jiang and Pengjun Xie and Fei Huang and Ningyu Zhang , editor =. Editing Personality For Large Language Models , booktitle =. 2024 , url =. doi:10.1007/978-981-97-9434-8\_19 , timestamp =
-
[4]
Eric Mitchell and Charles Lin and Antoine Bosselut and Chelsea Finn and Christopher D. Manning , title =. The Tenth International Conference on Learning Representations,. 2022 , url =
work page 2022
-
[5]
The Twelfth International Conference on Learning Representations,
Guangxuan Xiao and Yuandong Tian and Beidi Chen and Song Han and Mike Lewis , title =. The Twelfth International Conference on Learning Representations,. 2024 , url =
work page 2024
-
[6]
Zep: A Temporal Knowledge Graph Architecture for Agent Memory
Preston Rasmussen and Pavlo Paliychuk and Travis Beauvais and Jack Ryan and Daniel Chalef , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2501.13956 , eprinttype =. 2501.13956 , timestamp =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2501.13956 2025
-
[7]
Zhen Qin and Weigao Sun and Dong Li and Xuyang Shen and Weixuan Sun and Yiran Zhong , title =. CoRR , volume =. 2024 , url =. doi:10.48550/ARXIV.2401.04658 , eprinttype =. 2401.04658 , timestamp =
-
[8]
Forty-first International Conference on Machine Learning,
Zhen Qin and Weigao Sun and Dong Li and Xuyang Shen and Weixuan Sun and Yiran Zhong , title =. Forty-first International Conference on Machine Learning,. 2024 , url =
work page 2024
-
[9]
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision , booktitle =
Jay Shah and Ganesh Bikshandi and Ying Zhang and Vijay Thakkar and Pradeep Ramani and Tri Dao , editor =. FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision , booktitle =. 2024 , url =
work page 2024
-
[10]
The Twelfth International Conference on Learning Representations,
Tri Dao , title =. The Twelfth International Conference on Learning Representations,. 2024 , url =
work page 2024
-
[11]
Rethinking Attention with Performers , booktitle =
Krzysztof Marcin Choromanski and Valerii Likhosherstov and David Dohan and Xingyou Song and Andreea Gane and Tam. Rethinking Attention with Performers , booktitle =. 2021 , url =
work page 2021
-
[12]
Big Bird: Transformers for Longer Sequences , booktitle =
Manzil Zaheer and Guru Guruganesh and Kumar Avinava Dubey and Joshua Ainslie and Chris Alberti and Santiago Onta. Big Bird: Transformers for Longer Sequences , booktitle =. 2020 , url =
work page 2020
-
[13]
Qiuhui Chen and Qiang Fu and Hao Bai and Yi Hong , title =. 2024 , url =. doi:10.1109/WACV57701.2024.00354 , timestamp =
-
[14]
A Machine with Short-Term, Episodic, and Semantic Memory Systems , booktitle =
Taewoon Kim and Michael Cochez and Vincent Fran. A Machine with Short-Term, Episodic, and Semantic Memory Systems , booktitle =. 2023 , url =. doi:10.1609/AAAI.V37I1.25075 , timestamp =
-
[15]
Yu Wang and Xinshuang Liu and Xiusi Chen and Sean O'Brien and Junda Wu and Julian J. McAuley , title =. The Thirteenth International Conference on Learning Representations,. 2025 , url =
work page 2025
-
[16]
Agents: An open-source framework for autonomous lan- guage agents
Wangchunshu Zhou and Yuchen Eleanor Jiang and Long Li and Jialong Wu and Tiannan Wang and Shi Qiu and Jintian Zhang and Jing Chen and Ruipu Wu and Shuai Wang and Shiding Zhu and Jiyu Chen and Wentao Zhang and Ningyu Zhang and Huajun Chen and Peng Cui and Mrinmaya Sachan , title =. CoRR , volume =. 2023 , url =. doi:10.48550/ARXIV.2309.07870 , eprinttype =...
-
[17]
OA gents: An Empirical Study of Building Effective Agents
Zhu, He and Qin, Tianrui and Zhu, King and Huang, Heyuan and Guan, Yeyi and Xia, Jinxiang and Li, Hanhao and Yao, Yi and Wang, Ningning and Liu, Pai and Peng, Tianhao and Gui, Xin and Xiaowan, Li and Liu, Yuhui and Tang, Xiangru and Yang, Jian and Zhang, Ge and Gao, Xitong and Jiang, Yuchen Eleanor and Zhang, Changwang and Wang, Jun and Liu, Jiaheng and Z...
work page 2025
-
[18]
Flash-Searcher: Fast and Effective Web Agents via DAG-Based Parallel Execution , author=. 2025 , eprint=
work page 2025
-
[19]
Towards Personalized Deep Research: Benchmarks and Evaluations , author=. 2025 , eprint=
work page 2025
-
[20]
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL , author=. 2025 , eprint=
work page 2025
-
[21]
RecurrentGPT: Interactive Generation of (Arbitrarily) Long Text , author=. 2023 , eprint=
work page 2023
-
[22]
AI PERSONA: Towards Life-long Personalization of LLMs , author=. 2024 , eprint=
work page 2024
-
[23]
arXiv preprint arXiv:2406.18532 , year=
Wangchunshu Zhou and Yixin Ou and Shengwei Ding and Long Li and Jialong Wu and Tiannan Wang and Jiamin Chen and Shuai Wang and Xiaohua Xu and Ningyu Zhang and Huajun Chen and Yuchen Eleanor Jiang , title =. CoRR , volume =. 2024 , url =. doi:10.48550/ARXIV.2406.18532 , eprinttype =. 2406.18532 , timestamp =
-
[24]
EvoVLA: Self-Evolving Vision-Language-Action Model , author=. 2025 , eprint=
work page 2025
-
[25]
O-Mem: Omni Memory System for Personalized, Long Horizon, Self-Evolving Agents , author=. 2025 , eprint=
work page 2025
-
[26]
Rubin Wei and Jiaqi Cao and Jiarui Wang and Jushi Kai and Qipeng Guo and Bowen Zhou and Zhouhan Lin , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2508.01832 , eprinttype =. 2508.01832 , timestamp =
-
[27]
Pretraining with hierarchical memories: separating long-tail and common knowledge , author=. 2025 , eprint=
work page 2025
-
[28]
Yunfan Shao and Linyang Li and Junqi Dai and Xipeng Qiu , editor =. Character-LLM:. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing,. 2023 , url =. doi:10.18653/V1/2023.EMNLP-MAIN.814 , timestamp =
-
[29]
CharacterGLM: Customizing social characters with large language models
Jinfeng Zhou and Zhuang Chen and Dazhen Wan and Bosi Wen and Yi Song and Jifan Yu and Yongkang Huang and Pei Ke and Guanqun Bi and Libiao Peng and Jiaming Yang and Xiyao Xiao and Sahand Sabour and Xiaohan Zhang and Wenjing Hou and Yijia Zhang and Yuxiao Dong and Hongning Wang and Jie Tang and Minlie Huang , editor =. CharacterGLM: Customizing Social Chara...
- [30]
-
[31]
Scaling agents via continual pre-training.arXiv preprint arXiv:2509.13310, 2025
Liangcai Su and Zhen Zhang and Guangyu Li and Zhuo Chen and Chenxi Wang and Maojia Song and Xinyu Wang and Kuan Li and Jialong Wu and Xuanzhong Chen and Zile Qiao and Zhongwang Zhang and Huifeng Yin and Shihao Cai and Runnan Fang and Zhengwei Tao and Wenbiao Yin and Chenxiong Qian and Yong Jiang and Pengjun Xie and Fei Huang and Jingren Zhou , title =. Co...
-
[32]
Online Adaptation of Language Models with a Memory of Amortized Contexts , booktitle =
Jihoon Tack and Jaehyung Kim and Eric Mitchell and Jinwoo Shin and Yee Whye Teh and Jonathan Richard Schwarz , editor =. Online Adaptation of Language Models with a Memory of Amortized Contexts , booktitle =. 2024 , url =
work page 2024
-
[33]
Advances and challenges in foundation agents: From brain-inspired intelligence to evolutionary, collaborative, and safe systems , author=. arXiv preprint arXiv:2504.01990 , year=
-
[34]
From AI for Science to Agentic Science: A Survey on Autonomous Scientific Discovery , author=. 2025 , eprint=
work page 2025
-
[35]
Tool learning with large language models: a survey , volume=
Qu, Changle and Dai, Sunhao and Wei, Xiaochi and Cai, Hengyi and Wang, Shuaiqiang and Yin, Dawei and Xu, Jun and Wen, Ji-rong , year=. Tool learning with large language models: a survey , volume=. Frontiers of Computer Science , publisher=. doi:10.1007/s11704-024-40678-2 , number=
-
[36]
Reinforcement Learning for Reasoning in Large Language Models with One Training Example , author=. 2025 , eprint=
work page 2025
-
[37]
A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems , author=. 2025 , eprint=
work page 2025
-
[38]
Agent AI: Surveying the Horizons of Multimodal Interaction , author=. 2024 , eprint=
work page 2024
-
[39]
Agents in Software Engineering: Survey, Landscape, and Vision , author=. 2024 , eprint=
work page 2024
-
[40]
Deep Research: A Survey of Autonomous Research Agents , author=. 2025 , eprint=
work page 2025
-
[41]
A Comprehensive Survey of Deep Research: Systems, Methodologies, and Applications , author=. 2025 , eprint=
work page 2025
-
[42]
Large Language Model Agent: A Survey on Methodology, Applications and Challenges , author=. 2025 , eprint=
work page 2025
-
[43]
Zhao, Andrew and Wu, Yiran and Yue, Yang and Wu, Tong and Xu, Quentin and Lin, Matthieu and Wang, Shenzhi and Wu, Qingyun and Zheng, Zilong and Huang, Gao , journal=
- [44]
-
[45]
A Survey on Large Language Models with some Insights on their Capabilities and Limitations , author=. 2025 , eprint=
work page 2025
-
[46]
Huang, Chengsong and Yu, Wenhao and Wang, Xiaoyang and Zhang, Hongming and Li, Zongxia and Li, Ruosen and Huang, Jiaxin and Mi, Haitao and Yu, Dong , journal=
-
[47]
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Deepseekmath: Pushing the limits of mathematical reasoning in open language models , author=. arXiv preprint arXiv:2402.03300 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[48]
First Conference on Language Modeling , year=
Autogen: Enabling next-gen LLM applications via multi-agent conversations , author=. First Conference on Language Modeling , year=
-
[49]
Wang, Yingxu and Liu, Siwei and Fang, Jinyuan and Meng, Zaiqiao , journal=
-
[50]
Advances in Neural Information Processing Systems , year =
Shuofei Qiao and Runnan Fang and Ningyu Zhang and Yuqi Zhu and Xiang Chen and Shumin Deng and Yong Jiang and Pengjun Xie and Fei Huang and Huajun Chen , title =. Advances in Neural Information Processing Systems , year =
-
[51]
arXiv preprint arXiv:2507.21407 , year=
Graph-Augmented Large Language Model Agents: Current Progress and Future Prospects , author=. arXiv preprint arXiv:2507.21407 , year=
-
[52]
Yu, Miao and Wang, Shilong and Zhang, Guibin and Mao, Junyuan and Yin, Chenlong and Liu, Qijiong and Wen, Qingsong and Wang, Kun and Wang, Yang , journal=
-
[53]
Yanwei Yue and Guibin Zhang and Boyang Liu and Guancheng Wan and Kun Wang and Dawei Cheng and Yiyan Qi , title =. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages =
-
[54]
Zhang, Guibin and Fu, Muxin and Wan, Guancheng and Yu, Miao and Wang, Kun and Yan, Shuicheng , journal=
-
[55]
Liu, Siwei and Fang, Jinyuan and Zhou, Han and Wang, Yingxu and Meng, Zaiqiao , journal=
-
[56]
arXiv preprint arXiv:2506.10408 , year=
Reasoning RAG via System 1 or System 2: A Survey on Reasoning Agentic Retrieval-Augmented Generation for Industry Challenges , author=. arXiv preprint arXiv:2506.10408 , year=
-
[57]
Liao, Junwei and Wen, Muning and Wang, Jun and Zhang, Weinan , journal=
-
[58]
Ozdaglar and Kaiqing Zhang and Joo
Chanwoo Park and Seungju Han and Xingzhi Guo and Asuman E. Ozdaglar and Kaiqing Zhang and Joo. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages =
-
[59]
Zhao, Wanjia and Yuksekgonul, Mert and Wu, Shirley and Zou, James , journal=
- [60]
-
[61]
Enhancing the Code Debugging Ability of LLMs via Communicative Agent Based Data Refinement , author=. CoRR , year=
-
[62]
Yang, Yingxuan and Chai, Huacan and Shao, Shuai and Song, Yuanyi and Qi, Siyuan and Rui, Renting and Zhang, Weinan , journal=
-
[63]
Yang, Yingxuan and Chai, Huacan and Song, Yuanyi and Qi, Siyuan and Wen, Muning and Li, Ning and Liao, Junwei and Hu, Haoyi and Lin, Jianghao and Chang, Gaowei and others , journal=. A survey of
-
[64]
Yao, Shunyu and Zhao, Jeffrey and Yu, Dian and Du, Nan and Shafran, Izhak and Narasimhan, Karthik R and Cao, Yuan , year =
-
[65]
Multi-agent Architecture Search via Agentic Supernet , author=. 2025 , booktitle=
work page 2025
-
[66]
Cut the Crap: An Economical Communication Pipeline for LLM-based Multi-Agent Systems , year =
Zhang, Guibin and Yue, Yanwei and Li, Zhixun and Yun, Sukwon and Wan, Guancheng and Wang, Kun and Cheng, Dawei and Yu, Jeffrey Xu and Chen, Tianlong , booktitle=. Cut the Crap: An Economical Communication Pipeline for LLM-based Multi-Agent Systems , year =
-
[67]
Marft: Multi-agent reinforcement fine-tuning.arXiv preprint arXiv:2504.16129, 2025
Marft: Multi-agent reinforcement fine-tuning , author=. arXiv preprint arXiv:2504.16129 , year=
-
[68]
Findings of the Association for Computational Linguistics , pages =
Weize Chen and Jiarui Yuan and Chen Qian and Cheng Yang and Zhiyuan Liu and Maosong Sun , title =. Findings of the Association for Computational Linguistics , pages =
-
[69]
PromptWizard: Task-aware prompt optimization framework, 2024
PromptWizard: Task-aware prompt optimization framework , author=. arXiv preprint arXiv:2405.18369 , year=
-
[70]
arXiv preprint arXiv:2504.03723 , year=
Vflow: Discovering optimal agentic workflows for verilog generation , author=. arXiv preprint arXiv:2504.03723 , year=
-
[71]
Motwani, Sumeet Ramesh and Smith, Chandler and Das, Rocktim Jyoti and Rafailov, Rafael and Laptev, Ivan and Torr, Philip HS and Pizzati, Fabio and Clark, Ronald and de Witt, Christian Schroeder , journal=
-
[72]
Multiagent finetuning: Self improvement with diverse reasoning chains , author=. arXiv preprint arXiv:2501.05707 , year=
-
[73]
Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents , author=. arXiv preprint arXiv:2505.22954 , year=
-
[74]
Huxley-Godel Machine: Human-Level Coding Agent Development by an Approximation of the Optimal Self-Improving Machine , author=. 2025 , eprint=
work page 2025
-
[75]
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing , pages =
Hanwei Xu and Yujun Chen and Yulun Du and Nan Shao and Yanggang Wang and Haiyu Li and Zhilin Yang , title =. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing , pages =
work page 2022
-
[76]
Archiki Prasad and Peter Hase and Xiang Zhou and Mohit Bansal , editor =. Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics , pages =
-
[77]
Findings of the Association for Computational Linguistics , pages =
Rui Pan and Shuo Xing and Shizhe Diao and Wenhe Sun and Xiang Liu and Kashun Shum and Jipeng Zhang and Renjie Pi and Tong Zhang , title =. Findings of the Association for Computational Linguistics , pages =
-
[78]
Automatic Engineering of Long Prompts , booktitle =
Cho. Automatic Engineering of Long Prompts , booktitle =
-
[79]
Yao Lu and Jiayi Wang and Raphael Tang and Sebastian Riedel and Pontus Stenetorp , title =. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , pages =
work page 2024
-
[80]
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages =
Yongchao Chen and Jacob Arkin and Yilun Hao and Yang Zhang and Nicholas Roy and Chuchu Fan , title =. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages =
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.