arxiv: 2512.13564 · v2 · submitted 2025-12-15 · 💻 cs.CL · cs.AI

Recognition: 3 theorem links

· Lean Theorem

Memory in the Age of AI Agents

Boci Peng, Boyang Liu, Dawei Cheng, Fangyi Zhu, Guibin Zhang, Hao Sun, Honglin Guo, Jiahang Lin, Jiahao Huo, Jiaxin Guo, Jiejun Tan, Jiongnan Liu, Ji-Rong Wen, Junhao Wang, Kun Wang, Philip Torr, Qiankun Li, Qi Zhang, Senjie Jin, Shichun Liu, Shihan Dou, Shirui Pan, Shuicheng Yan, Tao Gui, Wangchunshu Zhou, Xiaobin Hu, Xinlei Yu, Xuanbo Fan, Xuanjing Huang, Yanbin Yin, Yanwei Yue, Yan Zhang, Yixin Liu, Yue Liao, Yu-Gang Jiang, Yutao Zhu, Yu Wang, Yuwei Niu, Yuyang Hu, Zewen Hu, Zeyu Zhang, Zhenfei Yin, Zhenhong Zhou, Zhenrong Cheng, Zhicheng Dou, Zhiheng Xi, Zhongxiang Sun

Authors on Pith no claims yet

Pith reviewed 2026-05-11 18:10 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords AI agentsagent memoryfoundation modelsmemory taxonomyfactual memoryexperiential memoryworking memorymemory dynamics

0 comments

The pith

Agent memory research unifies under forms, functions, and dynamics with a new factual-experiential-working taxonomy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

AI agents built on foundation models depend on memory systems that have proliferated rapidly yet remain conceptually scattered across motivations, implementations, and terms. This survey first separates agent memory from nearby ideas such as LLM memory and retrieval-augmented generation. It then applies three consistent lenses—forms, functions, and dynamics—to organize the literature. From the function lens it introduces a finer taxonomy that replaces broad long-term versus short-term labels with factual, experiential, and working memory categories. The result supplies a consolidated reference, a list of benchmarks and open frameworks, and a map of open research directions.

Core claim

This survey delineates the scope of agent memory and examines it through the unified lenses of forms (token-level, parametric, and latent realizations), functions (factual, experiential, and working memory), and dynamics (how memory is formed, evolved, and retrieved). It argues that traditional long/short-term distinctions are insufficient for contemporary agent systems and compiles benchmarks, frameworks, and forward-looking topics such as memory automation, reinforcement-learning integration, multimodal memory, multi-agent memory, and trustworthiness to support memory as a first-class design primitive.

What carries the argument

The three lenses of forms, functions, and dynamics, with the function-based taxonomy that distinguishes factual, experiential, and working memory.

Load-bearing premise

The distinctions among forms, functions, and dynamics form a complete, non-overlapping classification that meaningfully reduces fragmentation in the existing literature.

What would settle it

A later systematic mapping of published agent systems that shows most implementations still fall outside the factual-experiential-working categories or require substantial overlap would falsify the taxonomy's claimed unifying power.

read the original abstract

Memory has emerged, and will continue to remain, a core capability of foundation model-based agents. As research on agent memory rapidly expands and attracts unprecedented attention, the field has also become increasingly fragmented. Existing works that fall under the umbrella of agent memory often differ substantially in their motivations, implementations, and evaluation protocols, while the proliferation of loosely defined memory terminologies has further obscured conceptual clarity. Traditional taxonomies such as long/short-term memory have proven insufficient to capture the diversity of contemporary agent memory systems. This work aims to provide an up-to-date landscape of current agent memory research. We begin by clearly delineating the scope of agent memory and distinguishing it from related concepts such as LLM memory, retrieval augmented generation (RAG), and context engineering. We then examine agent memory through the unified lenses of forms, functions, and dynamics. From the perspective of forms, we identify three dominant realizations of agent memory, namely token-level, parametric, and latent memory. From the perspective of functions, we propose a finer-grained taxonomy that distinguishes factual, experiential, and working memory. From the perspective of dynamics, we analyze how memory is formed, evolved, and retrieved over time. To support practical development, we compile a comprehensive summary of memory benchmarks and open-source frameworks. Beyond consolidation, we articulate a forward-looking perspective on emerging research frontiers, including memory automation, reinforcement learning integration, multimodal memory, multi-agent memory, and trustworthiness issues. We hope this survey serves not only as a reference for existing work, but also as a conceptual foundation for rethinking memory as a first-class primitive in the design of future agentic intelligence.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper is a survey on memory systems for foundation model-based AI agents. It argues that the field is fragmented with proliferating terminologies and that traditional long/short-term memory distinctions are insufficient. The authors delineate the scope of agent memory from related concepts such as LLM memory, RAG, and context engineering; propose taxonomies organized by forms (token-level, parametric, latent), functions (factual, experiential, working), and dynamics (formation, evolution, retrieval); compile benchmarks and open-source frameworks; and outline future frontiers including memory automation, RL integration, multimodal memory, multi-agent memory, and trustworthiness issues.

Significance. If the taxonomy is adopted, the survey could meaningfully consolidate a rapidly expanding area by supplying a unified organizational lens that better captures contemporary agent memory systems than prior distinctions. The explicit compilation of benchmarks and frameworks provides immediate practical utility for researchers and developers, while the forward-looking section on emerging frontiers offers a useful roadmap. These elements position the work as a potential reference point for treating memory as a first-class design primitive in agentic systems.

minor comments (2)

[Abstract] The abstract states that the survey compiles 'a comprehensive summary of memory benchmarks and open-source frameworks' but does not indicate selection criteria or coverage scope; adding a short methods paragraph or table in the main text would improve reproducibility and transparency of the consolidation effort.
[Scope delineation] The scope delineation from RAG and context engineering is conceptually useful; a concise comparative table (e.g., in the introduction) listing key differences in motivation, implementation, and evaluation would enhance clarity without altering the central argument.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive and constructive review, which highlights the potential of the proposed taxonomy, benchmark compilation, and future directions to consolidate the agent memory literature. We appreciate the recommendation for minor revision.

Circularity Check

0 steps flagged

No significant circularity; survey taxonomy is externally grounded

full rationale

This is a survey paper whose central contribution is an organizational taxonomy of agent memory drawn from analysis of external literature. It delineates scope against related concepts (LLM memory, RAG, context engineering), identifies forms (token-level/parametric/latent), proposes functions (factual/experiential/working), and examines dynamics without any equations, fitted parameters, predictions, or derivations. No load-bearing step reduces to self-definition, self-citation chains, or renaming of known results by construction; the distinctions are explicitly motivated as a response to fragmentation in prior work. The paper is self-contained against external benchmarks and compiles summaries of existing frameworks rather than deriving new results from its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

This is a survey that consolidates and reorganizes existing research rather than deriving new results from first principles or introducing new entities. It relies on the domain assumption that current literature is sufficiently fragmented to require a new taxonomy.

axioms (2)

domain assumption Traditional taxonomies such as long/short-term memory are insufficient to capture the diversity of contemporary agent memory systems
Explicitly stated in the abstract as motivation for the new framework.
domain assumption Agent memory is distinct from LLM memory, RAG, and context engineering
Stated as the starting point for delineating scope.

pith-pipeline@v0.9.0 · 5769 in / 1437 out tokens · 115660 ms · 2026-05-11T18:10:58.826346+00:00 · methodology

discussion (0)

Forward citations

Cited by 38 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Agent-ValueBench: A Comprehensive Benchmark for Evaluating Agent Values
cs.AI 2026-05 unverdicted novelty 8.0

Agent-ValueBench is the first dedicated benchmark for agent values, showing they diverge from LLM values, form a homogeneous 'Value Tide' across models, and bend under harnesses and skill steering.
Trojan Hippo: Weaponizing Agent Memory for Data Exfiltration
cs.CR 2026-05 unverdicted novelty 8.0

Trojan Hippo attacks on LLM agent memory achieve 85-100% success rates in data exfiltration across four memory backends even after 100 benign sessions, while evaluated defenses reduce success rates but impose varying ...
ScioMind: Cognitively Grounded Multi-Agent Social Simulation with Anchoring-Based Belief Dynamics and Dynamic Profiles
cs.AI 2026-05 unverdicted novelty 7.0

ScioMind combines anchoring-based belief updates, hierarchical memory, and dynamic profiles in LLM multi-agent systems to produce more stable, diverse, and psychologically aligned opinion trajectories than prior fixed...
Remember the Decision, Not the Description: A Rate-Distortion Framework for Agent Memory
cs.AI 2026-05 unverdicted novelty 7.0

Memory for long-horizon agents should preserve distinctions that affect decisions under a fixed budget, not descriptive features, yielding an exact forgetting boundary and a new online learner DeMem with regret guarantees.
When Stored Evidence Stops Being Usable: Scale-Conditioned Evaluation of Agent Memory
cs.AI 2026-05 unverdicted novelty 7.0

A new evaluation protocol shows agent memory reliability degrades variably with added irrelevant sessions depending on agent, memory interface, and scale.
SRTJ: Self-Evolving Rule-Driven Training-Free LLM Jailbreaking
cs.CR 2026-05 unverdicted novelty 7.0

SRTJ is a training-free jailbreak method that evolves hierarchical attack rules using iterative verifier feedback and ASP-based constraint-aware composition to achieve stable high success rates on HarmBench across mul...
Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory
cs.CL 2026-05 unverdicted novelty 7.0

MemCoE learns memory organization guidelines via contrastive feedback and then trains a guideline-aligned RL policy for memory updates, yielding consistent gains on personalization benchmarks.
HeLa-Mem: Hebbian Learning and Associative Memory for LLM Agents
cs.CL 2026-04 unverdicted novelty 7.0

HeLa-Mem is a graph-based memory architecture for LLM agents that applies Hebbian learning to episodic associations and distills hubs into semantic knowledge, yielding better results on long-context benchmarks with fe...
When to Forget: A Memory Governance Primitive
cs.AI 2026-04 unverdicted novelty 7.0

Memory Worth converges almost surely to the conditional probability of task success given memory retrieval and correlates at rho=0.89 with ground-truth utility in controlled experiments.
Automated Reformulation of Robust Optimization via Memory-Augmented Large Language Models
cs.AI 2026-05 unverdicted novelty 6.0

AutoREM augments LLMs with a structured memory of failed reformulation trajectories to improve accuracy and efficiency on robust optimization tasks without parameter updates or expert knowledge.
HAGE: Harnessing Agentic Memory via RL-Driven Weighted Graph Evolution
cs.AI 2026-05 unverdicted novelty 6.0

HAGE proposes a trainable weighted graph memory framework with LLM intent classification, dynamic edge modulation, and RL optimization that improves long-horizon reasoning accuracy in agentic LLMs over static baselines.
MemPrivacy: Privacy-Preserving Personalized Memory Management for Edge-Cloud Agents
cs.CR 2026-05 unverdicted novelty 6.0

MemPrivacy uses edge detection of sensitive spans and type-aware placeholders to enable cloud-side memory management for LLM agents without exposing private data, achieving under 1.6% utility loss.
MemPrivacy: Privacy-Preserving Personalized Memory Management for Edge-Cloud Agents
cs.CR 2026-05 unverdicted novelty 6.0

MemPrivacy replaces privacy-sensitive spans with structured placeholders on edge devices to enable effective cloud memory management while limiting utility loss to 1.6% and outperforming general models on privacy extraction.
Tree-based Credit Assignment for Multi-Agent Memory System
cs.MA 2026-05 unverdicted novelty 6.0

TreeMem assigns credit to agents in multi-agent memory systems by expanding outputs into a tree and using Monte Carlo averaging of final rewards to optimize each agent's policy.
What Happens Inside Agent Memory? Circuit Analysis from Emergence to Diagnosis
cs.AI 2026-05 unverdicted novelty 6.0

Circuit analysis reveals that routing circuits for agent memory emerge at 0.6B parameters while content circuits emerge at 4B, with a shared grounding hub and an unsupervised diagnostic achieving 76.2% accuracy for lo...
What Happens Inside Agent Memory? Circuit Analysis from Emergence to Diagnosis
cs.AI 2026-05 unverdicted novelty 6.0

In LLM agents, memory routing circuits emerge at 0.6B scale while content circuits appear only at 4B, and write/read operations recruit a pre-existing late-layer context hub instead of creating a new one, enabling a 7...
MemRouter: Memory-as-Embedding Routing for Long-Term Conversational Agents
cs.CL 2026-05 unverdicted novelty 6.0

A lightweight supervised router using frozen-LLM embeddings for memory admission decisions outperforms LLM-based memory managers in both F1 score and latency on the LoCoMo benchmark.
From Unstructured Recall to Schema-Grounded Memory: Reliable AI Memory via Iterative, Schema-Aware Extraction
cs.AI 2026-04 unverdicted novelty 6.0

Schema-aware iterative extraction turns AI memory into a verified system of record, reaching 90-97% accuracy on extraction and end-to-end memory benchmarks where retrieval baselines score 80-87%.
Contextual Agentic Memory is a Memo, Not True Memory
cs.AI 2026-04 unverdicted novelty 6.0

Agentic memory is lookup-based retrieval, not weight-based consolidation, creating a generalization ceiling on novel tasks and structural vulnerability to memory poisoning.
EviMem: Evidence-Gap-Driven Iterative Retrieval for Long-Term Conversational Memory
cs.CV 2026-04 unverdicted novelty 6.0

EviMem improves accuracy on temporal and multi-hop questions in long-term conversational memory by iteratively diagnosing and filling evidence gaps, achieving 81.6% and 85.2% judge accuracy on LoCoMo at 4.5x lower lat...
Memanto: Typed Semantic Memory with Information-Theoretic Retrieval for Long-Horizon Agents
cs.AI 2026-04 unverdicted novelty 6.0

Memanto delivers 89.8% and 87.1% accuracy on LongMemEval and LoCoMo benchmarks using typed semantic memory and information-theoretic retrieval, outperforming hybrid graph and vector systems with a single query and zer...
Experience Compression Spectrum: Unifying Memory, Skills, and Rules in LLM Agents
cs.AI 2026-04 conditional novelty 6.0

The Experience Compression Spectrum unifies memory, skills, and rules in LLM agents along increasing compression levels and identifies the absence of adaptive cross-level compression as the missing diagonal.
GAM: Hierarchical Graph-based Agentic Memory for LLM Agents
cs.AI 2026-04 unverdicted novelty 6.0

GAM decouples event-level memory encoding from topic-level consolidation in LLM agents using hierarchical graphs to reduce interference and improve long-term coherence and retrieval.
Quantifying Trust: Financial Risk Management for Trustworthy AI Agents
cs.AI 2026-04 unverdicted novelty 6.0

The paper introduces the Agentic Risk Standard (ARS) as a payment settlement framework that delivers predefined compensation for AI agent execution failures, misalignment, or unintended outcomes.
MemFactory: Unified Inference & Training Framework for Agent Memory
cs.CL 2026-03 unverdicted novelty 6.0

MemFactory is a new unified modular framework for memory-augmented LLM agent inference and training that integrates GRPO and reports up to 14.8% relative gains on MemAgent evaluations.
Safety in Embodied AI: A Survey of Risks, Attacks, and Defenses
cs.CR 2026-03 unverdicted novelty 6.0

The survey organizes over 400 papers on embodied AI safety into a multi-level taxonomy and flags overlooked issues such as fragile multimodal fusion and unstable planning under jailbreaks.
MemReread: Enhancing Agentic Long-Context Reasoning via Memory-Guided Rereading
cs.CL 2026-05 unverdicted novelty 5.0

MemReread improves agent long-context reasoning by triggering rereading on insufficient final memory to recover discarded indirect facts, outperforming baselines at linear complexity.
A Semantic Autonomy Framework for VLM-Integrated Indoor Mobile Robots: Hybrid Deterministic Reasoning and Cross-Robot Adaptive Memory
cs.RO 2026-05 unverdicted novelty 5.0

The Semantic Autonomy Stack combines a seven-step parametric resolver handling 88% of instructions in under 0.1 ms with VLM escalation and a five-category cross-robot memory system, achieving 100% accuracy and 103,000...
Towards Self-Improving Error Diagnosis in Multi-Agent Systems
cs.MA 2026-04 unverdicted novelty 5.0

ErrorProbe introduces a self-improving pipeline for attributing semantic failures in LLM multi-agent systems to specific agents and steps via anomaly detection, backward tracing, and tool-grounded validation with veri...
Towards Scalable Lightweight GUI Agents via Multi-role Orchestration
cs.AI 2026-04 unverdicted novelty 5.0

LAMO uses role-oriented data synthesis and two-stage training (perplexity-weighted supervised fine-tuning plus reinforcement learning) to create scalable lightweight GUI agents that support both single-model and multi...
On the Creativity of AI Agents
cs.CY 2026-04 unverdicted novelty 5.0

LLM agents produce outputs that meet basic functional criteria for creativity but lack the process-level, social, and personal elements required for ontological creativity.
Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems
cs.SE 2026-04 unverdicted novelty 5.0

Claude Code centers on a model-tool while-loop surrounded by permission systems, context compaction, extensibility hooks, subagent delegation, and session storage; the same design questions yield different answers in ...
Retrieval Is Not Enough: Why Organizational AI Needs Epistemic Infrastructure
cs.AI 2026-04 unverdicted novelty 5.0

OIDA adds typed knowledge objects, decay-based importance scores, contradiction edges, and an inverse-decay QUESTION primitive for ignorance to raise epistemic fidelity beyond retrieval.
MemCoT: Test-Time Scaling through Memory-Driven Chain-of-Thought
cs.MA 2026-04 unverdicted novelty 5.0

MemCoT redefines long-context reasoning as iterative stateful search with zoom-in/zoom-out memory perception and dual short-term memories, claiming SOTA results on LoCoMo and LongMemEval-S benchmarks.
MemMachine: A Ground-Truth-Preserving Memory System for Personalized AI Agents
cs.AI 2026-04 unverdicted novelty 5.0

MemMachine stores entire conversational episodes and applies contextualized retrieval plus adaptive query routing to achieve 0.9169 accuracy on LoCoMo and 93 percent on LongMemEvalS while using 80 percent fewer tokens...
Reliable AI Needs to Externalize Implicit Knowledge: A Human-AI Collaboration Perspective
cs.AI 2026-05 unverdicted novelty 4.0

Reliable AI needs structured Knowledge Objects to externalize and enable human validation of implicit knowledge that current methods cannot verify.
Memory as Metabolism: A Design for Companion Knowledge Systems
cs.AI 2026-04 unverdicted novelty 4.0

This paper designs a companion knowledge system with TRIAGE, DECAY, CONTEXTUALIZE, CONSOLIDATE, and AUDIT operations plus memory gravity and minority-hypothesis retention to give contradictory evidence a path to updat...
Back to Basics: Let Conversational Agents Remember with Just Retrieval and Generation
cs.CL 2026-04 unverdicted novelty 4.0

A minimalist retrieval-and-generation framework using turn isolation and query-driven pruning outperforms complex memory systems by directly addressing signal sparsity and dual-level redundancy in dialogues.

Reference graph

Works this paper leans on

300 extracted references · 300 canonical work pages · cited by 36 Pith papers · 10 internal anchors

[1]

Detoxifying Large Language Models via Knowledge Editing , booktitle =

Mengru Wang and Ningyu Zhang and Ziwen Xu and Zekun Xi and Shumin Deng and Yunzhi Yao and Qishen Zhang and Linyi Yang and Jindong Wang and Huajun Chen , editor =. Detoxifying Large Language Models via Knowledge Editing , booktitle =. 2024 , url =. doi:10.18653/V1/2024.ACL-LONG.171 , timestamp =

work page doi:10.18653/v1/2024.acl-long.171 2024
[2]

Neighboring Perturbations of Knowledge Editing on Large Language Models , booktitle =

Jun. Neighboring Perturbations of Knowledge Editing on Large Language Models , booktitle =. 2024 , url =

work page 2024
[3]

Editing Personality For Large Language Models , booktitle =

Shengyu Mao and Xiaohan Wang and Mengru Wang and Yong Jiang and Pengjun Xie and Fei Huang and Ningyu Zhang , editor =. Editing Personality For Large Language Models , booktitle =. 2024 , url =. doi:10.1007/978-981-97-9434-8\_19 , timestamp =

work page doi:10.1007/978-981-97-9434-8 2024
[4]

Manning , title =

Eric Mitchell and Charles Lin and Antoine Bosselut and Chelsea Finn and Christopher D. Manning , title =. The Tenth International Conference on Learning Representations,. 2022 , url =

work page 2022
[5]

The Twelfth International Conference on Learning Representations,

Guangxuan Xiao and Yuandong Tian and Beidi Chen and Song Han and Mike Lewis , title =. The Twelfth International Conference on Learning Representations,. 2024 , url =

work page 2024
[6]

Zep: A Temporal Knowledge Graph Architecture for Agent Memory

Preston Rasmussen and Pavlo Paliychuk and Travis Beauvais and Jack Ryan and Daniel Chalef , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2501.13956 , eprinttype =. 2501.13956 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2501.13956 2025
[7]

Lightning attention-2: A free lunch for handling unlimited sequence lengths in large language models.arXiv preprint arXiv:2401.04658, 2024

Zhen Qin and Weigao Sun and Dong Li and Xuyang Shen and Weixuan Sun and Yiran Zhong , title =. CoRR , volume =. 2024 , url =. doi:10.48550/ARXIV.2401.04658 , eprinttype =. 2401.04658 , timestamp =

work page doi:10.48550/arxiv.2401.04658 2024
[8]

Forty-first International Conference on Machine Learning,

Zhen Qin and Weigao Sun and Dong Li and Xuyang Shen and Weixuan Sun and Yiran Zhong , title =. Forty-first International Conference on Machine Learning,. 2024 , url =

work page 2024
[9]

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision , booktitle =

Jay Shah and Ganesh Bikshandi and Ying Zhang and Vijay Thakkar and Pradeep Ramani and Tri Dao , editor =. FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision , booktitle =. 2024 , url =

work page 2024
[10]

The Twelfth International Conference on Learning Representations,

Tri Dao , title =. The Twelfth International Conference on Learning Representations,. 2024 , url =

work page 2024
[11]

Rethinking Attention with Performers , booktitle =

Krzysztof Marcin Choromanski and Valerii Likhosherstov and David Dohan and Xingyou Song and Andreea Gane and Tam. Rethinking Attention with Performers , booktitle =. 2021 , url =

work page 2021
[12]

Big Bird: Transformers for Longer Sequences , booktitle =

Manzil Zaheer and Guru Guruganesh and Kumar Avinava Dubey and Joshua Ainslie and Chris Alberti and Santiago Onta. Big Bird: Transformers for Longer Sequences , booktitle =. 2020 , url =

work page 2020
[13]

Semantic generative augmentations for few-shot counting, in: IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2024, Waikoloa, HI, USA, January 3-8, 2024, IEEE

Qiuhui Chen and Qiang Fu and Hao Bai and Yi Hong , title =. 2024 , url =. doi:10.1109/WACV57701.2024.00354 , timestamp =

work page doi:10.1109/wacv57701.2024.00354 2024
[14]

A Machine with Short-Term, Episodic, and Semantic Memory Systems , booktitle =

Taewoon Kim and Michael Cochez and Vincent Fran. A Machine with Short-Term, Episodic, and Semantic Memory Systems , booktitle =. 2023 , url =. doi:10.1609/AAAI.V37I1.25075 , timestamp =

work page doi:10.1609/aaai.v37i1.25075 2023
[15]

McAuley , title =

Yu Wang and Xinshuang Liu and Xiusi Chen and Sean O'Brien and Junda Wu and Julian J. McAuley , title =. The Thirteenth International Conference on Learning Representations,. 2025 , url =

work page 2025
[16]

Agents: An open-source framework for autonomous lan- guage agents

Wangchunshu Zhou and Yuchen Eleanor Jiang and Long Li and Jialong Wu and Tiannan Wang and Shi Qiu and Jintian Zhang and Jing Chen and Ruipu Wu and Shuai Wang and Shiding Zhu and Jiyu Chen and Wentao Zhang and Ningyu Zhang and Huajun Chen and Peng Cui and Mrinmaya Sachan , title =. CoRR , volume =. 2023 , url =. doi:10.48550/ARXIV.2309.07870 , eprinttype =...

work page doi:10.48550/arxiv.2309.07870 2023
[17]

OA gents: An Empirical Study of Building Effective Agents

Zhu, He and Qin, Tianrui and Zhu, King and Huang, Heyuan and Guan, Yeyi and Xia, Jinxiang and Li, Hanhao and Yao, Yi and Wang, Ningning and Liu, Pai and Peng, Tianhao and Gui, Xin and Xiaowan, Li and Liu, Yuhui and Tang, Xiangru and Yang, Jian and Zhang, Ge and Gao, Xitong and Jiang, Yuchen Eleanor and Zhang, Changwang and Wang, Jun and Liu, Jiaheng and Z...

work page 2025
[18]

2025 , eprint=

Flash-Searcher: Fast and Effective Web Agents via DAG-Based Parallel Execution , author=. 2025 , eprint=

work page 2025
[19]

2025 , eprint=

Towards Personalized Deep Research: Benchmarks and Evaluations , author=. 2025 , eprint=

work page 2025
[20]

2025 , eprint=

Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL , author=. 2025 , eprint=

work page 2025
[21]

2023 , eprint=

RecurrentGPT: Interactive Generation of (Arbitrarily) Long Text , author=. 2023 , eprint=

work page 2023
[22]

2024 , eprint=

AI PERSONA: Towards Life-long Personalization of LLMs , author=. 2024 , eprint=

work page 2024
[23]

arXiv preprint arXiv:2406.18532 , year=

Wangchunshu Zhou and Yixin Ou and Shengwei Ding and Long Li and Jialong Wu and Tiannan Wang and Jiamin Chen and Shuai Wang and Xiaohua Xu and Ningyu Zhang and Huajun Chen and Yuchen Eleanor Jiang , title =. CoRR , volume =. 2024 , url =. doi:10.48550/ARXIV.2406.18532 , eprinttype =. 2406.18532 , timestamp =

work page doi:10.48550/arxiv.2406.18532 2024
[24]

2025 , eprint=

EvoVLA: Self-Evolving Vision-Language-Action Model , author=. 2025 , eprint=

work page 2025
[25]

2025 , eprint=

O-Mem: Omni Memory System for Personalized, Long Horizon, Self-Evolving Agents , author=. 2025 , eprint=

work page 2025
[26]

Mlp memory: A retriever-pretrained memory for large language models, 2026.https://arxiv.org/abs/2508.01832

Rubin Wei and Jiaqi Cao and Jiarui Wang and Jushi Kai and Qipeng Guo and Bowen Zhou and Zhouhan Lin , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2508.01832 , eprinttype =. 2508.01832 , timestamp =

work page doi:10.48550/arxiv.2508.01832 2025
[27]

2025 , eprint=

Pretraining with hierarchical memories: separating long-tail and common knowledge , author=. 2025 , eprint=

work page 2025
[28]

Character-LLM:

Yunfan Shao and Linyang Li and Junqi Dai and Xipeng Qiu , editor =. Character-LLM:. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing,. 2023 , url =. doi:10.18653/V1/2023.EMNLP-MAIN.814 , timestamp =

work page doi:10.18653/v1/2023.emnlp-main.814 2023
[29]

CharacterGLM: Customizing social characters with large language models

Jinfeng Zhou and Zhuang Chen and Dazhen Wan and Bosi Wen and Yi Song and Jifan Yu and Yongkang Huang and Pei Ke and Guanqun Bi and Libiao Peng and Jiaming Yang and Xiyao Xiao and Sahand Sabour and Xiaohan Zhang and Wenjing Hou and Yijia Zhang and Yuxiao Dong and Hongning Wang and Jie Tang and Minlie Huang , editor =. CharacterGLM: Customizing Social Chara...

work page doi:10.18653/v1/2024.emnlp-industry.107 2024
[30]

2025 , eprint=

Agent Learning via Early Experience , author=. 2025 , eprint=

work page 2025
[31]

Scaling agents via continual pre-training.arXiv preprint arXiv:2509.13310, 2025

Liangcai Su and Zhen Zhang and Guangyu Li and Zhuo Chen and Chenxi Wang and Maojia Song and Xinyu Wang and Kuan Li and Jialong Wu and Xuanzhong Chen and Zile Qiao and Zhongwang Zhang and Huifeng Yin and Shihao Cai and Runnan Fang and Zhengwei Tao and Wenbiao Yin and Chenxiong Qian and Yong Jiang and Pengjun Xie and Fei Huang and Jingren Zhou , title =. Co...

work page doi:10.48550/arxiv.2509.13310 2025
[32]

Online Adaptation of Language Models with a Memory of Amortized Contexts , booktitle =

Jihoon Tack and Jaehyung Kim and Eric Mitchell and Jinwoo Shin and Yee Whye Teh and Jonathan Richard Schwarz , editor =. Online Adaptation of Language Models with a Memory of Amortized Contexts , booktitle =. 2024 , url =

work page 2024
[33]

Advances and challenges in foundation agents: From brain-inspired intelligence to evolutionary, collaborative, and safe systems.arXiv preprint arXiv:2504.01990, 2025

Advances and challenges in foundation agents: From brain-inspired intelligence to evolutionary, collaborative, and safe systems , author=. arXiv preprint arXiv:2504.01990 , year=

work page arXiv
[34]

2025 , eprint=

From AI for Science to Agentic Science: A Survey on Autonomous Scientific Discovery , author=. 2025 , eprint=

work page 2025
[35]

Tool learning with large language models: a survey , volume=

Qu, Changle and Dai, Sunhao and Wei, Xiaochi and Cai, Hengyi and Wang, Shuaiqiang and Yin, Dawei and Xu, Jun and Wen, Ji-rong , year=. Tool learning with large language models: a survey , volume=. Frontiers of Computer Science , publisher=. doi:10.1007/s11704-024-40678-2 , number=

work page doi:10.1007/s11704-024-40678-2
[36]

2025 , eprint=

Reinforcement Learning for Reasoning in Large Language Models with One Training Example , author=. 2025 , eprint=

work page 2025
[37]

2025 , eprint=

A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems , author=. 2025 , eprint=

work page 2025
[38]

2024 , eprint=

Agent AI: Surveying the Horizons of Multimodal Interaction , author=. 2024 , eprint=

work page 2024
[39]

2024 , eprint=

Agents in Software Engineering: Survey, Landscape, and Vision , author=. 2024 , eprint=

work page 2024
[40]

2025 , eprint=

Deep Research: A Survey of Autonomous Research Agents , author=. 2025 , eprint=

work page 2025
[41]

2025 , eprint=

A Comprehensive Survey of Deep Research: Systems, Methodologies, and Applications , author=. 2025 , eprint=

work page 2025
[42]

2025 , eprint=

Large Language Model Agent: A Survey on Methodology, Applications and Challenges , author=. 2025 , eprint=

work page 2025
[43]

Zhao, Andrew and Wu, Yiran and Yue, Yang and Wu, Tong and Xu, Quentin and Lin, Matthieu and Wang, Shenzhi and Wu, Qingyun and Zheng, Zilong and Huang, Gao , journal=

work page
[44]

2025 , eprint=

Large Language Models: A Survey , author=. 2025 , eprint=

work page 2025
[45]

2025 , eprint=

A Survey on Large Language Models with some Insights on their Capabilities and Limitations , author=. 2025 , eprint=

work page 2025
[46]

Huang, Chengsong and Yu, Wenhao and Wang, Xiaoyang and Zhang, Hongming and Li, Zongxia and Li, Ruosen and Huang, Jiaxin and Mi, Haitao and Yu, Dong , journal=

work page
[47]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Deepseekmath: Pushing the limits of mathematical reasoning in open language models , author=. arXiv preprint arXiv:2402.03300 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[48]

First Conference on Language Modeling , year=

Autogen: Enabling next-gen LLM applications via multi-agent conversations , author=. First Conference on Language Modeling , year=

work page
[49]

Wang, Yingxu and Liu, Siwei and Fang, Jinyuan and Meng, Zaiqiao , journal=

work page
[50]

Advances in Neural Information Processing Systems , year =

Shuofei Qiao and Runnan Fang and Ningyu Zhang and Yuqi Zhu and Xiang Chen and Shumin Deng and Yong Jiang and Pengjun Xie and Fei Huang and Huajun Chen , title =. Advances in Neural Information Processing Systems , year =

work page
[51]

arXiv preprint arXiv:2507.21407 , year=

Graph-Augmented Large Language Model Agents: Current Progress and Future Prospects , author=. arXiv preprint arXiv:2507.21407 , year=

work page arXiv
[52]

Yu, Miao and Wang, Shilong and Zhang, Guibin and Mao, Junyuan and Yin, Chenlong and Liu, Qijiong and Wen, Qingsong and Wang, Kun and Wang, Yang , journal=

work page
[53]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages =

Yanwei Yue and Guibin Zhang and Boyang Liu and Guancheng Wan and Kun Wang and Dawei Cheng and Yiyan Qi , title =. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages =

work page
[54]

Zhang, Guibin and Fu, Muxin and Wan, Guancheng and Yu, Miao and Wang, Kun and Yan, Shuicheng , journal=

work page
[55]

Liu, Siwei and Fang, Jinyuan and Zhou, Han and Wang, Yingxu and Meng, Zaiqiao , journal=

work page
[56]

arXiv preprint arXiv:2506.10408 , year=

Reasoning RAG via System 1 or System 2: A Survey on Reasoning Agentic Retrieval-Augmented Generation for Industry Challenges , author=. arXiv preprint arXiv:2506.10408 , year=

work page arXiv
[57]

Liao, Junwei and Wen, Muning and Wang, Jun and Zhang, Weinan , journal=

work page
[58]

Ozdaglar and Kaiqing Zhang and Joo

Chanwoo Park and Seungju Han and Xingzhi Guo and Asuman E. Ozdaglar and Kaiqing Zhang and Joo. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages =

work page
[59]

Zhao, Wanjia and Yuksekgonul, Mert and Wu, Shirley and Zou, James , journal=

work page
[60]

Survey of

Sarkar, Anjana and Sarkar, Soumyendu , journal=. Survey of

work page
[61]

CoRR , year=

Enhancing the Code Debugging Ability of LLMs via Communicative Agent Based Data Refinement , author=. CoRR , year=

work page
[62]

Yang, Yingxuan and Chai, Huacan and Shao, Shuai and Song, Yuanyi and Qi, Siyuan and Rui, Renting and Zhang, Weinan , journal=

work page
[63]

A survey of

Yang, Yingxuan and Chai, Huacan and Song, Yuanyi and Qi, Siyuan and Wen, Muning and Li, Ning and Liao, Junwei and Hu, Haoyi and Lin, Jianghao and Chang, Gaowei and others , journal=. A survey of

work page
[64]

Yao, Shunyu and Zhao, Jeffrey and Yu, Dian and Du, Nan and Shafran, Izhak and Narasimhan, Karthik R and Cao, Yuan , year =

work page
[65]

2025 , booktitle=

Multi-agent Architecture Search via Agentic Supernet , author=. 2025 , booktitle=

work page 2025
[66]

Cut the Crap: An Economical Communication Pipeline for LLM-based Multi-Agent Systems , year =

Zhang, Guibin and Yue, Yanwei and Li, Zhixun and Yun, Sukwon and Wan, Guancheng and Wang, Kun and Cheng, Dawei and Yu, Jeffrey Xu and Chen, Tianlong , booktitle=. Cut the Crap: An Economical Communication Pipeline for LLM-based Multi-Agent Systems , year =

work page
[67]

Marft: Multi-agent reinforcement fine-tuning.arXiv preprint arXiv:2504.16129, 2025

Marft: Multi-agent reinforcement fine-tuning , author=. arXiv preprint arXiv:2504.16129 , year=

work page arXiv
[68]

Findings of the Association for Computational Linguistics , pages =

Weize Chen and Jiarui Yuan and Chen Qian and Cheng Yang and Zhiyuan Liu and Maosong Sun , title =. Findings of the Association for Computational Linguistics , pages =

work page
[69]

PromptWizard: Task-aware prompt optimization framework, 2024

PromptWizard: Task-aware prompt optimization framework , author=. arXiv preprint arXiv:2405.18369 , year=

work page arXiv
[70]

arXiv preprint arXiv:2504.03723 , year=

Vflow: Discovering optimal agentic workflows for verilog generation , author=. arXiv preprint arXiv:2504.03723 , year=

work page arXiv
[71]

Motwani, Sumeet Ramesh and Smith, Chandler and Das, Rocktim Jyoti and Rafailov, Rafael and Laptev, Ivan and Torr, Philip HS and Pizzati, Fabio and Clark, Ronald and de Witt, Christian Schroeder , journal=

work page
[72]

Subramaniam, Y

Multiagent finetuning: Self improvement with diverse reasoning chains , author=. arXiv preprint arXiv:2501.05707 , year=

work page arXiv
[73]

Darwin godel machine: Open-ended evolution of self-improving agents.arXiv preprint arXiv:2505.22954, 2025

Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents , author=. arXiv preprint arXiv:2505.22954 , year=

work page arXiv
[74]

2025 , eprint=

Huxley-Godel Machine: Human-Level Coding Agent Development by an Approximation of the Optimal Self-Improving Machine , author=. 2025 , eprint=

work page 2025
[75]

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing , pages =

Hanwei Xu and Yujun Chen and Yulun Du and Nan Shao and Yanggang Wang and Haiyu Li and Zhilin Yang , title =. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing , pages =

work page 2022
[76]

Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics , pages =

Archiki Prasad and Peter Hase and Xiang Zhou and Mohit Bansal , editor =. Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics , pages =

work page
[77]

Findings of the Association for Computational Linguistics , pages =

Rui Pan and Shuo Xing and Shizhe Diao and Wenhe Sun and Xiang Liu and Kashun Shum and Jipeng Zhang and Renjie Pi and Tong Zhang , title =. Findings of the Association for Computational Linguistics , pages =

work page
[78]

Automatic Engineering of Long Prompts , booktitle =

Cho. Automatic Engineering of Long Prompts , booktitle =

work page
[79]

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , pages =

Yao Lu and Jiayi Wang and Raphael Tang and Sebastian Riedel and Pontus Stenetorp , title =. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , pages =

work page 2024
[80]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages =

Yongchao Chen and Jacob Arkin and Yilun Hao and Yang Zhang and Nicholas Roy and Chuchu Fan , title =. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages =

work page 2024

Showing first 80 references.