arxiv: 2308.11432 · v7 · submitted 2023-08-22 · 💻 cs.AI · cs.CL

Recognition: 2 theorem links

· Lean Theorem

A Survey on Large Language Model based Autonomous Agents

Lei Wang , Chen Ma , Xueyang Feng , Zeyu Zhang , Hao Yang , Jingsen Zhang , Zhiyuan Chen , Jiakai Tang

show 5 more authors

Xu Chen Yankai Lin Wayne Xin Zhao Zhewei Wei Ji-Rong Wen

Authors on Pith no claims yet

Pith reviewed 2026-05-15 03:59 UTC · model grok-4.3

classification 💻 cs.AI cs.CL

keywords LLM-based autonomous agentsunified frameworkagent constructionapplicationsevaluation strategieschallengesfuture directions

0 comments

The pith

A unified framework organizes the construction of most LLM-based autonomous agents.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This survey reviews how large language models enable autonomous agents that learn from broad web knowledge rather than isolated training. The authors introduce one framework that structures the main components used in prior agent designs, covering perception, memory, planning, and action. It then maps applications across social science, natural science, and engineering domains while reviewing common evaluation methods. The work closes by listing challenges such as reliability and long-term coherence along with suggested research directions.

Core claim

The paper establishes a unified framework for LLM-based autonomous agents that integrates the core modules appearing across most existing architectures, then applies this lens to catalog construction approaches, applications in social, natural, and engineering fields, and evaluation strategies, while surfacing open challenges.

What carries the argument

The unified framework for LLM-based autonomous agents, which organizes components such as memory, planning, tool use, and feedback loops to describe prior designs.

If this is right

Most prior LLM-agent work can be categorized under the same construction, application, and evaluation headings.
Agents built this way can tackle tasks that require human-like reasoning in social simulation and scientific domains.
Evaluation combines automated metrics with human assessment of task success and reasoning steps.
Future progress depends on solving reliability, safety, and long-horizon planning issues.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The framework can serve as a checklist for designing new agents by highlighting which modules are often missing.
Maintaining the linked repository could support community-wide tracking as new papers appear rapidly.
Hybrid systems that combine the framework with non-LLM planning methods may address some current limitations.

Load-bearing premise

The proposed framework is general enough to encompass the majority of existing LLM-agent architectures without major omissions or forced groupings.

What would settle it

A substantial set of published LLM-agent papers whose designs cannot be mapped onto the framework's main stages without significant distortion would show the framework is not sufficiently general.

read the original abstract

Autonomous agents have long been a prominent research focus in both academic and industry communities. Previous research in this field often focuses on training agents with limited knowledge within isolated environments, which diverges significantly from human learning processes, and thus makes the agents hard to achieve human-like decisions. Recently, through the acquisition of vast amounts of web knowledge, large language models (LLMs) have demonstrated remarkable potential in achieving human-level intelligence. This has sparked an upsurge in studies investigating LLM-based autonomous agents. In this paper, we present a comprehensive survey of these studies, delivering a systematic review of the field of LLM-based autonomous agents from a holistic perspective. More specifically, we first discuss the construction of LLM-based autonomous agents, for which we propose a unified framework that encompasses a majority of the previous work. Then, we present a comprehensive overview of the diverse applications of LLM-based autonomous agents in the fields of social science, natural science, and engineering. Finally, we delve into the evaluation strategies commonly used for LLM-based autonomous agents. Based on the previous studies, we also present several challenges and future directions in this field. To keep track of this field and continuously update our survey, we maintain a repository of relevant references at https://github.com/Paitesanshi/LLM-Agent-Survey.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This survey gives a clear unified framework for LLM agent construction plus broad coverage of applications and evaluations, making it a practical reference for the field.

read the letter

The main takeaway is that this survey pulls the fast-moving LLM agent literature into one construction framework built around planning, memory, tool use, and action, then shows how those pieces show up in applications and how people evaluate them. The framework is the real addition here; it organizes prior work into reusable parts without inventing new mechanisms, and the mapping across social science, natural science, and engineering domains feels comprehensive for a survey of this scope. They also maintain a GitHub repo for updates, which is a practical touch that keeps the reference current. As a pure synthesis paper there are no new derivations or fitted results to verify, so soundness rests on whether the categories capture the cited papers, and they do for the bulk of the work reviewed. The evaluation section is especially useful because it collects the scattered benchmarks and metrics in one place. Soft spots are minor and typical for surveys. The claim that the framework covers a majority of existing architectures holds up in the manuscript, but any unifying lens will have edge cases or papers that arrived after the authors' cutoff, and the future directions section stays high-level rather than ranking specific open problems by impact. No load-bearing gaps or contradictions appear in the structure or citations. This is for researchers entering the LLM agent area or anyone who needs a single reference to cite the landscape. It deserves peer review because the organization is systematic and the framework reduces duplication without overclaiming novelty.

Referee Report

0 major / 2 minor

Summary. The paper surveys the emerging field of LLM-based autonomous agents. It proposes a unified framework for agent construction that is claimed to encompass a majority of prior work, provides a comprehensive overview of applications across social science, natural science, and engineering domains, reviews common evaluation strategies, and discusses challenges and future directions while maintaining an online repository for ongoing updates.

Significance. If the unified framework successfully organizes the majority of existing LLM-agent architectures without major omissions, the survey will serve as a valuable reference point for researchers, helping to structure a rapidly expanding literature and identify cross-domain applications and evaluation practices.

minor comments (2)

[§3] The claim that the framework 'encompasses a majority of the previous work' (abstract and §3) would be strengthened by an explicit count or table showing how many surveyed papers map to each component of the framework versus any notable omissions.
[Applications sections] In the applications overview, some domain-specific examples could include more direct citations to the original LLM-agent papers rather than secondary references to improve traceability.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive review and recommendation to accept the manuscript. The assessment correctly identifies the core contributions of our unified framework for LLM-based agent construction, the cross-domain application overview, and the maintained repository for updates.

Circularity Check

0 steps flagged

No significant circularity in literature survey synthesis

full rationale

This is a pure survey paper whose central contribution is a descriptive unified framework for organizing prior LLM-agent literature. No equations, fitted parameters, predictions, or derivations appear in the provided text. The framework is explicitly positioned as an encompassing lens drawn from external cited works rather than derived internally or justified via self-citation chains. All content rests on external references; the survey structure (construction, applications, evaluation) introduces no self-definitional loops, fitted-input predictions, or load-bearing self-citations that reduce the claims to the paper's own inputs by construction. This matches the expected non-circular outcome for honest literature synthesis.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a survey paper that synthesizes existing research without introducing new fitted parameters, axioms, or invented entities.

pith-pipeline@v0.9.0 · 5564 in / 905 out tokens · 26934 ms · 2026-05-15T03:59:17.330238+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith.Foundation.DAlembert.Inevitability bilinear_family_forced echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

the cost of a composite genuinely depends on how its components fit together

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 21 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Your Agent Is Mine: Measuring Malicious Intermediary Attacks on the LLM Supply Chain
cs.CR 2026-04 unverdicted novelty 8.0

Malicious LLM API routers actively perform payload injection and secret exfiltration, with 9 of 428 tested routers showing malicious behavior and further poisoning risks from leaked credentials.
Agent-First Tool API: A Semantic Interface Paradigm for Enterprise AI Agent Systems
cs.AI 2026-05 unverdicted novelty 7.0

The Agent-First Tool API paradigm raises AI agent task success from 64% to 88% and cuts human interventions by 72.7% through semantic phases, structured contracts, and risk governance in a production enterprise system.
OPT-BENCH: Evaluating the Iterative Self-Optimization of LLM Agents in Large-Scale Search Spaces
cs.AI 2026-05 unverdicted novelty 6.0

OPT-BENCH and OPT-Agent evaluate LLM self-optimization in large search spaces, showing stronger models improve via feedback but stay constrained by base capacity and below human performance.
EvoMAS: Learning Execution-Time Workflows for Multi-Agent Systems
cs.AI 2026-05 unverdicted novelty 6.0

EvoMAS trains a workflow adapter with policy gradients to dynamically instantiate stage-specific multi-agent workflows from a fixed agent pool, using explicit task-state construction and terminal success signals, and ...
Self-Adaptive Multi-Agent LLM-Based Security Pattern Selection for IoT Systems
cs.CR 2026-05 unverdicted novelty 6.0

ASPO combines multi-agent LLM proposals with deterministic enforcement in a MAPE-K loop to select conflict-free, resource-feasible security patterns for IoT, delivering 100% safety invariants and 21-23% tail latency/e...
An AI Agent Execution Environment to Safeguard User Data
cs.CR 2026-04 unverdicted novelty 6.0

GAAP guarantees confidentiality of private user data for AI agents by enforcing user-specified permissions deterministically through persistent information flow tracking, without trusting the agent or requiring attack...
Visual Inception: Compromising Long-term Planning in Agentic Recommenders via Multimodal Memory Poisoning
cs.CR 2026-04 unverdicted novelty 6.0

Visual Inception poisons images to hijack long-term memory in agentic recommenders and steer planning, while CognitiveGuard reduces success to about 10% via perceptual sanitization and reasoning verification.
In-situ process monitoring for defect detection in wire-arc additive manufacturing: an agentic AI approach
cs.AI 2026-04 unverdicted novelty 6.0

A multi-agent AI framework using processing and acoustic agents achieves 91.6% accuracy and 0.821 F1 score for in-situ porosity defect detection in wire-arc additive manufacturing.
SoK: Agentic Skills -- Beyond Tool Use in LLM Agents
cs.CR 2026-02 unverdicted novelty 6.0

The paper systematizes agentic skills beyond tool use, providing design pattern and representation-scope taxonomies plus security analysis of malicious skill infiltration in agent marketplaces.
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
cs.CL 2024-10 unverdicted novelty 6.0

OS-Atlas, trained on the largest open-source cross-platform GUI grounding corpus of 13 million elements, outperforms prior open-source models on six benchmarks across mobile, desktop, and web platforms.
Is Grep All You Need? How Agent Harnesses Reshape Agentic Search
cs.CL 2026-05 unverdicted novelty 5.0

Grep retrieval generally outperforms vector retrieval in agentic search tasks, with performance varying strongly by agent harness and tool-calling style.
Agent Mentor: Framing Agent Knowledge through Semantic Trajectory Analysis
cs.AI 2026-04 unverdicted novelty 5.0

Agent Mentor analyzes semantic trajectories in agent logs to identify undesired behaviors and derives corrective prompt instructions, yielding measurable accuracy gains on benchmark tasks across three agent setups.
Toward Explanatory Equilibrium: Verifiable Reasoning as a Coordination Mechanism under Asymmetric Information
cs.MA 2026-04 unverdicted novelty 5.0

Structured reasoning artifacts enable coordination in LLM multi-agent systems by preventing approval and welfare collapse under asymmetric information while keeping bad-approval rates low across audit regimes.
EconAI: Dynamic Persona Evolution and Memory-Aware Agents in Evolving Economic Environments
cs.MA 2026-05 unverdicted novelty 4.0

EconAI adds memory weighting and economic sentiment indexing to LLM agents so they adapt short-term actions to long-term goals inside a single macro/micro simulation loop.
SciFi: A Safe, Lightweight, User-Friendly, and Fully Autonomous Agentic AI Workflow for Scientific Applications
cs.AI 2026-04 unverdicted novelty 4.0

SciFi is a safe, lightweight agentic AI framework that automates structured scientific tasks with minimal human intervention via isolated environments and layered self-assessing agents.
Aethon: A Reference-Based Replication Primitive for Constant-Time Instantiation of Stateful AI Agents
cs.AI 2026-04 unverdicted novelty 4.0

Aethon enables near-constant-time instantiation of stateful AI agents via reference-based replication over compositional views, layered memory, and copy-on-write semantics.
OpenKedge: Governing Agentic Mutation with Execution-Bound Safety and Evidence Chains
cs.AI 2026-04 unverdicted novelty 4.0

OpenKedge redefines AI agent state mutations as a governed process using intent proposals, policy-evaluated execution contracts, and cryptographic evidence chains to enable safe, auditable agentic behavior.
Understanding the planning of LLM agents: A survey
cs.AI 2024-02 accept novelty 4.0

A survey that provides a taxonomy of methods for improving planning in LLM-based agents across task decomposition, plan selection, external modules, reflection, and memory.
The Rise and Potential of Large Language Model Based Agents: A Survey
cs.AI 2023-09 accept novelty 4.0

The paper surveys the origins, frameworks, applications, and open challenges of AI agents built on large language models.
A Survey on the Memory Mechanism of Large Language Model based Agents
cs.AI 2024-04 accept novelty 3.0

A systematic review of memory designs, evaluation methods, applications, limitations, and future directions for LLM-based agents.
Large Language Models: A Survey
cs.CL 2024-02 accept novelty 3.0

The paper surveys key large language models, their training methods, datasets, evaluation benchmarks, and future research directions in the field.

Reference graph

Works this paper leans on

185 extracted references · 185 canonical work pages · cited by 21 Pith papers · 25 internal anchors

[1]

Human-level control through deep reinforcement learning

Mnih V , Kavukcuoglu K, Silver D, Rusu A A, Veness J, Bellemare M G, Graves A, Riedmiller M, Fidjeland A K, Ostrovski G, others . Human-level control through deep reinforcement learning. nature, 2015, 518(7540): 529–533

work page 2015
[2]

Continuous control with deep reinforcement learning

Lillicrap T P, Hunt J J, Pritzel A, Heess N, Erez T, Tassa Y , Silver D, Wierstra D. Continuous con- trol with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[3]

Proximal Policy Optimization Algorithms

Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[4]

Soft actor- critic: O ff-policy maximum entropy deep reinforce- ment learning with a stochastic actor

Haarnoja T, Zhou A, Abbeel P, Levine S. Soft actor- critic: O ff-policy maximum entropy deep reinforce- ment learning with a stochastic actor. In: International conference on machine learning. 2018, 1861–1870

work page 2018
[5]

Language models are few-shot learners

Brown T, Mann B, Ryder N, Subbiah M, Kaplan J D, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, others . Language models are few-shot learners. Advances in neural information processing systems, 2020, 33: 1877–1901

work page 2020
[6]

Language models are unsuper- vised multitask learners

Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I, others . Language models are unsuper- vised multitask learners. OpenAI blog, 2019, 1(8): 9

work page 2019
[7]

GPT-4 Technical Report

Achiam J, Adler S, Agarwal S, Ahmad L, Akkaya I, Aleman F L, Almeida D, Altenschmidt J, Altman S, Anadkat S, others . Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[8]

Model card and evaluations for claude models

Anthropic . Model card and evaluations for claude models. https://www-files. anthropic.com/production/images/ Model-Card-Claude-2.pdf?ref= maginative.com, 2023

work page 2023
[9]

LLaMA: Open and Efficient Foundation Language Models

Touvron H, Lavril T, Izacard G, Martinet X, Lachaux M A, Lacroix T, Rozière B, Goyal N, Hambro E, Azhar F, others . Llama: Open and e fficient foundation lan- guage models. arXiv preprint arXiv:2302.13971, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[10]

Llama 2: Open Foundation and Fine-Tuned Chat Models

Touvron H, Martin L, Stone K, Albert P, Almahairi A, Babaei Y , Bashlykov N, Batra S, Bhargava P, Bhosale S, others . Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[11]

Genera- tive adversarial user model for reinforcement learning based recommendation system

Chen X, Li S, Li H, Jiang S, Qi Y , Song L. Genera- tive adversarial user model for reinforcement learning based recommendation system. In: International Con- ference on Machine Learning. 2019, 1052–1061

work page 2019
[12]

Reflexion: Language agents with verbal reinforcement learning

Shinn N, Cassano F, Gopinath A, Narasimhan K, Yao S. Reflexion: Language agents with verbal reinforcement learning. Advances in Neural Information Processing Systems, 2024, 36

work page 2024
[13]

Hug- ginggpt: Solving ai tasks with chatgpt and its friends in hugging face

Shen Y , Song K, Tan X, Li D, Lu W, Zhuang Y . Hug- ginggpt: Solving ai tasks with chatgpt and its friends in hugging face. Advances in Neural Information Pro- cessing Systems, 2024, 36

work page 2024
[14]

ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs

Qin Y , Liang S, Ye Y , Zhu K, Yan L, Lu Y , Lin Y , Cong X, Tang X, Qian B, others . Toolllm: Facilitating large language models to master 16000 + real-world apis. arXiv preprint arXiv:2307.16789, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[15]

Toolformer: Language models can teach themselves to use tools

Schick T, Dwivedi-Yu J, Dessì R, Raileanu R, Lomeli M, Hambro E, Zettlemoyer L, Cancedda N, Scialom T. Toolformer: Language models can teach themselves to use tools. Advances in Neural Information Processing Systems, 2024, 36

work page 2024
[16]

Ghost in the minecraft: Generally capable agents for open-world enviroments via large language models with text-based knowledge and memory

Zhu X, Chen Y , Tian H, Tao C, Su W, Yang C, Huang G, Li B, Lu L, Wang X, others . Ghost in the minecraft: Generally capable agents for open-world enviroments via large language models with text-based knowledge and memory. arXiv preprint arXiv:2305.17144, 2023

work page arXiv 2023
[17]

Minding language models’(lack of) theory of mind: A plug-and-play multi-character belief tracker

Sclar M, Kumar S, West P, Suhr A, Choi Y , Tsvetkov Y . Minding language models’(lack of) theory of mind: A plug-and-play multi-character belief tracker. arXiv preprint arXiv:2306.00924, 2023

work page arXiv 2023
[18]

ChatDev: Communicative Agents for Software Development

Qian C, Cong X, Yang C, Chen W, Su Y , Xu J, Liu Z, Sun M. Communicative agents for software develop- ment. arXiv preprint arXiv:2307.07924, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[19]

al. e C. Agentverse. https://github.com/ OpenBMB/AgentVerse, 2023

work page 2023
[20]

Generative agents: Interactive simu- lacra of human behavior

Park J S, O’Brien J, Cai C J, Morris M R, Liang P, Bernstein M S. Generative agents: Interactive simu- lacra of human behavior. In: Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. 2023, 1–22

work page 2023
[21]

Recagent: A novel simulation paradigm for recommender systems

Wang L, Zhang J, Chen X, Lin Y , Song R, Zhao W X, Wen J R. Recagent: A novel simulation paradigm for recommender systems. arXiv preprint arXiv:2306.02552, 2023

work page arXiv 2023
[22]

Building cooperative embodied agents modularly with large language models

Zhang H, Du W, Shan J, Zhou Q, Du Y , Tenenbaum J B, Shu T, Gan C. Building cooperative embodied agents modularly with large language models. arXiv Lei Wang et al. A Survey on Large Language Model based Autonomous Agents 35 preprint arXiv:2307.02485, 2023

work page arXiv 2023
[23]

MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework

Hong S, Zheng X, Chen J, Cheng Y , Wang J, Zhang C, Wang Z, Yau S K S, Lin Z, Zhou L, others . Metagpt: Meta programming for multi-agent collaborative frame- work. arXiv preprint arXiv:2308.00352, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[24]

Self-collaboration code generation via chatgpt

Dong Y , Jiang X, Jin Z, Li G. Self-collaboration code generation via chatgpt. arXiv preprint arXiv:2304.07590, 2023

work page arXiv 2023
[25]

Person- ality traits in large language models

Safdari M, Serapio-García G, Crepy C, Fitz S, Romero P, Sun L, Abdulhai M, Faust A, Matari ´c M. Person- ality traits in large language models. arXiv preprint arXiv:2307.00184, 2023

work page arXiv 2023
[26]

Measuring thirty facets of the five factor model with a 120-item public domain inventory: De- velopment of the ipip-neo-120

Johnson J A. Measuring thirty facets of the five factor model with a 120-item public domain inventory: De- velopment of the ipip-neo-120. Journal of research in personality, 2014, 51: 78–89

work page 2014
[27]

Big five inventory

John O P, Donahue E M, Kentle R L. Big five inventory. Journal of Personality and Social Psychology, 1991

work page 1991
[28]

Toxicity in chatgpt: Analyzing persona-assigned language models

Deshpande A, Murahari V , Rajpurohit T, Kalyan A, Narasimhan K. Toxicity in chatgpt: Analyzing persona-assigned language models. arXiv preprint arXiv:2304.05335, 2023

work page arXiv 2023
[29]

Out of one, many: Using language models to simulate human samples

Argyle L P, Busby E C, Fulda N, Gubler J R, Rytting C, Wingate D. Out of one, many: Using language models to simulate human samples. Political Analysis, 2023, 31(3): 337–351

work page 2023
[30]

Reflective linguistic programming (rlp): A stepping stone in socially-aware agi (socialagi)

Fischer K A. Reflective linguistic programming (rlp): A stepping stone in socially-aware agi (socialagi). arXiv preprint arXiv:2305.12647, 2023

work page arXiv 2023
[31]

Sayplan: Grounding large language models using 3d scene graphs for scalable robot task planning

Rana K, Haviland J, Garg S, Abou-Chakra J, Reid I, Suenderhauf N. Sayplan: Grounding large language models using 3d scene graphs for scalable robot task planning. In: 7th Annual Conference on Robot Learn- ing. 2023

work page 2023
[32]

Calypso: Llms as dungeon master’s assistants

Zhu A, Martin L, Head A, Callison-Burch C. Calypso: Llms as dungeon master’s assistants. In: Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment. 2023, 380–390

work page 2023
[33]

Zhiyong Wu, Zhenyu Wu, Fangzhi Xu, Yian Wang, Qiushi Sun, Chengyou Jia, Kanzhi Cheng, Zichen Ding, Liheng Chen, Paul Pu Liang, and Yu Qiao

Wang Z, Cai S, Chen G, Liu A, Ma X, Liang Y . De- scribe, explain, plan and select: Interactive planning with large language models enables open-world multi- task agents. arXiv preprint arXiv:2302.01560, 2023

work page arXiv 2023
[34]

Agentsims: An open-source sandbox for large language model evaluation

Lin J, Zhao H, Zhang A, Wu Y , Ping H, Chen Q. Agentsims: An open-source sandbox for large language model evaluation. arXiv preprint arXiv:2308.04026, 2023

work page arXiv 2023
[35]

Unleashing infinite-length input capacity for large-scale language models with self-controlled memory system

Liang X, Wang B, Huang H, Wu S, Wu P, Lu L, Ma Z, Li Z. Unleashing infinite-length input capacity for large-scale language models with self-controlled mem- ory system. arXiv preprint arXiv:2304.13343, 2023

work page arXiv 2023
[36]

Simplyretrieve: A private and lightweight retrieval-centric generative ai tool

Ng Y , Miyashita D, Hoshi Y , Morioka Y , Torii O, Ko- dama T, Deguchi J. Simplyretrieve: A private and lightweight retrieval-centric generative ai tool. arXiv preprint arXiv:2308.03983, 2023

work page arXiv 2023
[37]

Memory sandbox: Transparent and interactive memory manage- ment for conversational agents

Huang Z, Gutierrez S, Kamana H, MacNeil S. Memory sandbox: Transparent and interactive memory manage- ment for conversational agents. In: Adjunct Proceed- ings of the 36th Annual ACM Symposium on User Interface Software and Technology. 2023, 1–3

work page 2023
[38]

Voyager: An Open-Ended Embodied Agent with Large Language Models

Wang G, Xie Y , Jiang Y , Mandlekar A, Xiao C, Zhu Y , Fan L, Anandkumar A. V oyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[39]

Memorybank: Enhancing large language models with long-term memory

Zhong W, Guo L, Gao Q, Wang Y . Memorybank: En- hancing large language models with long-term memory. arXiv preprint arXiv:2305.10250, 2023

work page arXiv 2023
[40]

Chatdb: Augmenting llms with databases as their symbolic memory

Hu C, Fu J, Du C, Luo S, Zhao J, Zhao H. Chatdb: Aug- menting llms with databases as their symbolic memory. arXiv preprint arXiv:2306.03901, 2023

work page arXiv 2023
[41]

Ret-llm: Towards a general read-write memory for large language models

Modarressi A, Imani A, Fayyaz M, Schütze H. Ret- llm: Towards a general read-write memory for large language models. arXiv preprint arXiv:2305.14322, 2023

work page arXiv 2023
[42]

Memory augmented large language models are computationally universal

Schuurmans D. Memory augmented large language models are computationally universal. arXiv preprint arXiv:2301.04589, 2023

work page arXiv 2023
[43]

Expel: Llm agents are experiential learners

Zhao A, Huang D, Xu Q, Lin M, Liu Y J, Huang G. Expel: Llm agents are experiential learners. arXiv preprint arXiv:2308.10144, 2023

work page arXiv 2023
[44]

Language models as zero-shot planners: Extracting actionable knowledge for embodied agents

Huang W, Abbeel P, Pathak D, Mordatch I. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In: International Con- ference on Machine Learning. 2022, 9118–9147

work page 2022
[45]

Chain-of-thought prompting elicits reasoning in large language models

Wei J, Wang X, Schuurmans D, Bosma M, Xia F, Chi E, Le Q V , Zhou D, others . Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 2022, 35: 24824–24837

work page 2022
[46]

Large language models are zero-shot reasoners

Kojima T, Gu S S, Reid M, Matsuo Y , Iwasawa Y . Large language models are zero-shot reasoners. Advances in neural information processing systems, 2022, 35: 22199–22213

work page 2022
[47]

Planning with large language models via corrective re-prompting

Raman S S, Cohen V , Rosen E, Idrees I, Paulius D, Tellex S. Planning with large language models via corrective re-prompting. In: NeurIPS 2022 Foundation Models for Decision Making Workshop. 2022

work page 2022
[48]

Re- woo: Decoupling reasoning from observations for ef- ficient augmented language models

Xu B, Peng Z, Lei B, Mukherjee S, Liu Y , Xu D. Re- woo: Decoupling reasoning from observations for ef- ficient augmented language models. arXiv preprint arXiv:2305.18323, 2023

work page arXiv 2023
[49]

Swiftsage: 36 Front

Lin B Y , Fu Y , Yang K, Brahman F, Huang S, Bhaga- vatula C, Ammanabrolu P, Choi Y , Ren X. Swiftsage: 36 Front. Comput. Sci., 2025, 0(0): 1–42 A generative agent with fast and slow thinking for com- plex interactive tasks. Advances in Neural Information Processing Systems, 2024, 36

work page 2025
[50]

Dual-process theories of higher cognition: Advancing the debate

Evans J S B, Stanovich K E. Dual-process theories of higher cognition: Advancing the debate. Perspectives on psychological science, 2013, 8(3): 223–241

work page 2013
[51]

Self-Consistency Improves Chain of Thought Reasoning in Language Models

Wang X, Wei J, Schuurmans D, Le Q, Chi E, Narang S, Chowdhery A, Zhou D. Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[52]

Tree of thoughts: Deliberate problem solving with large language models

Yao S, Yu D, Zhao J, Shafran I, Gri ffiths T, Cao Y , Narasimhan K. Tree of thoughts: Deliberate problem solving with large language models. Advances in Neu- ral Information Processing Systems, 2024, 36

work page 2024
[53]

Recmind: Large language model powered agent for recommendation

Wang Y , Jiang Z, Chen Z, Yang F, Zhou Y , Cho E, Fan X, Huang X, Lu Y , Yang Y . Recmind: Large language model powered agent for recommendation. arXiv preprint arXiv:2308.14296, 2023

work page arXiv 2023
[54]

Graph of thoughts: Solving elaborate problems with large language mod- els

Besta M, Blach N, Kubicek A, Gerstenberger R, Gianinazzi L, Gajda J, Lehmann T, Podstawski M, Niewiadomski H, Nyczyk P, others . Graph of thoughts: Solving elaborate problems with large language mod- els. arXiv preprint arXiv:2308.09687, 2023

work page arXiv 2023
[55]

Algorithm of thoughts: Enhancing exploration of ideas in large language models

Sel B, Al-Tawaha A, Khattar V , Wang L, Jia R, Jin M. Algorithm of thoughts: Enhancing exploration of ideas in large language models. arXiv preprint arXiv:2308.10379, 2023

work page arXiv 2023
[56]

Generating executable ac- tion plans with environmentally-aware language mod- els

Gramopadhye M, Szafir D. Generating executable ac- tion plans with environmentally-aware language mod- els. In: 2023 IEEE /RSJ International Conference on Intelligent Robots and Systems (IROS). 2023, 3568– 3575

work page 2023
[57]

Reasoning with language model is planning with world model

Hao S, Gu Y , Ma H, Hong J J, Wang Z, Wang D Z, Hu Z. Reasoning with language model is planning with world model. arXiv preprint arXiv:2305.14992, 2023

work page arXiv 2023
[58]

LLM+P: Empowering Large Language Models with Optimal Planning Proficiency

Liu B, Jiang Y , Zhang X, Liu Q, Zhang S, Biswas J, Stone P. LLM +P: Empowering large language mod- els with optimal planning proficiency. arXiv preprint arXiv:2304.11477, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[59]

Dynamic planning with a llm

Dagan G, Keller F, Lascarides A. Dynamic planning with a llm. arXiv preprint arXiv:2308.06391, 2023

work page arXiv 2023
[60]

React: Synergizing reasoning and acting in language models

Yao S, Zhao J, Yu D, Du N, Shafran I, Narasimhan K, Cao Y . React: Synergizing reasoning and acting in language models. In: The Twelfth International Conference on Learning Representations. 2023

work page 2023
[61]

Llm-planner: Few-shot grounded planning for embodied agents with large language models

Song C H, Wu J, Washington C, Sadler B M, Chao W L, Su Y . Llm-planner: Few-shot grounded planning for embodied agents with large language models. In: Pro- ceedings of the IEEE /CVF International Conference on Computer Vision. 2023, 2998–3009

work page 2023
[62]

Inner Monologue: Embodied Reasoning through Planning with Language Models

Huang W, Xia F, Xiao T, Chan H, Liang J, Flo- rence P, Zeng A, Tompson J, Mordatch I, Chebotar Y , others . Inner monologue: Embodied reasoning through planning with language models. arXiv preprint arXiv:2207.05608, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[63]

Self-refine: Iterative refinement with self- feedback

Madaan A, Tandon N, Gupta P, Hallinan S, Gao L, Wiegreffe S, Alon U, Dziri N, Prabhumoye S, Yang Y , others . Self-refine: Iterative refinement with self- feedback. Advances in Neural Information Processing Systems, 2024, 36

work page 2024
[64]

Selfcheck: Using llms to zero-shot check their own step-by-step reasoning

Miao N, Teh Y W, Rainforth T. Selfcheck: Using llms to zero-shot check their own step-by-step reasoning. In: The Twelfth International Conference on Learning Representations. 2023

work page 2023
[65]

Interact: Exploring the poten- tials of chatgpt as a cooperative agent

Chen P L, Chang C S. Interact: Exploring the poten- tials of chatgpt as a cooperative agent. arXiv preprint arXiv:2308.01552, 2023

work page arXiv 2023
[66]

Chatcot: Tool-augmented chain-of-thought rea- soning on\\chat-based large language models

Chen Z, Zhou K, Zhang B, Gong Z, Zhao W X, Wen J R. Chatcot: Tool-augmented chain-of-thought rea- soning on\\chat-based large language models. arXiv preprint arXiv:2305.14323, 2023

work page arXiv 2023
[67]

WebGPT: Browser-assisted question-answering with human feedback

Nakano R, Hilton J, Balaji S, Wu J, Ouyang L, Kim C, Hesse C, Jain S, Kosaraju V , Saunders W, others . Webgpt: Browser-assisted question-answering with human feedback. arXiv preprint arXiv:2112.09332, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[68]

Tptu: Task planning and tool usage of large language model-based ai agents

Ruan J, Chen Y , Zhang B, Xu Z, Bao T, Du G, Shi S, Mao H, Zeng X, Zhao R. TPTU: Task planning and tool usage of large language model-based AI agents. arXiv preprint arXiv:2308.03427, 2023

work page arXiv 2023
[69]

Gorilla: Large Language Model Connected with Massive APIs

Patil S G, Zhang T, Wang X, Gonzalez J E. Gorilla: Large language model connected with massive apis. arXiv preprint arXiv:2305.15334, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[70]

Api-bank: A comprehensive benchmark for tool-augmented llms

Li M, Song F, Yu B, Yu H, Li Z, Huang F, Li Y . Api- bank: A benchmark for tool-augmented llms. arXiv preprint arXiv:2304.08244, 2023

work page arXiv 2023
[71]

Restgpt: Connecting large language models with real-world restful apis.arXiv preprint arXiv:2306.06624, 2023

Song Y , Xiong W, Zhu D, Li C, Wang K, Tian Y , Li S. Restgpt: Connecting large language models with real-world applications via restful apis. arXiv preprint arXiv:2306.06624, 2023

work page arXiv 2023
[72]

Taskmatrix

Liang Y , Wu C, Song T, Wu W, Xia Y , Liu Y , Ou Y , Lu S, Ji L, Mao S, others . Taskmatrix. ai: Completing tasks by connecting foundation models with millions of apis. Intelligent Computing, 2024, 3: 0063

work page 2024
[73]

MRKL Systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning

Karpas E, Abend O, Belinkov Y , Lenz B, Lieber O, Ratner N, Shoham Y , Bata H, Levine Y , Leyton-Brown K, others . Mrkl systems: A modular, neuro-symbolic architecture that combines large language models, ex- ternal knowledge sources and discrete reasoning. arXiv preprint arXiv:2205.00445, 2022

work page internal anchor Pith review arXiv 2022
[74]

Openagi: When llm meets domain experts

Ge Y , Hua W, Mei K, Tan J, Xu S, Li Z, Zhang Y , others . Openagi: When llm meets domain experts. Advances Lei Wang et al. A Survey on Large Language Model based Autonomous Agents 37 in Neural Information Processing Systems, 2024, 36

work page 2024
[75]

Vipergpt: Visual infer- ence via python execution for reasoning

Surís D, Menon S, V ondrick C. Vipergpt: Visual infer- ence via python execution for reasoning. arXiv preprint arXiv:2303.08128, 2023

work page arXiv 2023
[76]

M., Cox, S., Schilter, O., Baldassari, C., White, A

Bran A M, Cox S, White A D, Schwaller P. Chem- crow: Augmenting large-language models with chem- istry tools. arXiv preprint arXiv:2304.05376, 2023

work page arXiv 2023
[77]

MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action

Yang Z, Li L, Wang J, Lin K, Azarnasab E, Ahmed F, Liu Z, Liu C, Zeng M, Wang L. Mm-react: Prompting chatgpt for multimodal reasoning and action. arXiv preprint arXiv:2303.11381, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[78]

S3: Social-network simulation system with large language model-empowered agents

Gao C, Lan X, Lu Z, Mao J, Piao J, Wang H, Jin D, Li Y . S3: Social-network simulation system with large language model-empowered agents. arXiv preprint arXiv:2307.14984, 2023

work page arXiv 2023
[79]

Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

Ahn M, Brohan A, Brown N, Chebotar Y , Cortes O, David B, Finn C, Fu C, Gopalakrishnan K, Hausman K, others . Do as i can, not as i say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[80]

Social simulacra: Creating populated prototypes for social computing systems

Park J S, Popowski L, Cai C, Morris M R, Liang P, Bernstein M S. Social simulacra: Creating populated prototypes for social computing systems. In: Proceed- ings of the 35th Annual ACM Symposium on User Interface Software and Technology. 2022, 1–18

work page 2022

Showing first 80 references.