Recognition: 2 theorem links
· Lean TheoremA Survey on Large Language Model based Autonomous Agents
Pith reviewed 2026-05-15 03:59 UTC · model grok-4.3
The pith
A unified framework organizes the construction of most LLM-based autonomous agents.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes a unified framework for LLM-based autonomous agents that integrates the core modules appearing across most existing architectures, then applies this lens to catalog construction approaches, applications in social, natural, and engineering fields, and evaluation strategies, while surfacing open challenges.
What carries the argument
The unified framework for LLM-based autonomous agents, which organizes components such as memory, planning, tool use, and feedback loops to describe prior designs.
If this is right
- Most prior LLM-agent work can be categorized under the same construction, application, and evaluation headings.
- Agents built this way can tackle tasks that require human-like reasoning in social simulation and scientific domains.
- Evaluation combines automated metrics with human assessment of task success and reasoning steps.
- Future progress depends on solving reliability, safety, and long-horizon planning issues.
Where Pith is reading between the lines
- The framework can serve as a checklist for designing new agents by highlighting which modules are often missing.
- Maintaining the linked repository could support community-wide tracking as new papers appear rapidly.
- Hybrid systems that combine the framework with non-LLM planning methods may address some current limitations.
Load-bearing premise
The proposed framework is general enough to encompass the majority of existing LLM-agent architectures without major omissions or forced groupings.
What would settle it
A substantial set of published LLM-agent papers whose designs cannot be mapped onto the framework's main stages without significant distortion would show the framework is not sufficiently general.
read the original abstract
Autonomous agents have long been a prominent research focus in both academic and industry communities. Previous research in this field often focuses on training agents with limited knowledge within isolated environments, which diverges significantly from human learning processes, and thus makes the agents hard to achieve human-like decisions. Recently, through the acquisition of vast amounts of web knowledge, large language models (LLMs) have demonstrated remarkable potential in achieving human-level intelligence. This has sparked an upsurge in studies investigating LLM-based autonomous agents. In this paper, we present a comprehensive survey of these studies, delivering a systematic review of the field of LLM-based autonomous agents from a holistic perspective. More specifically, we first discuss the construction of LLM-based autonomous agents, for which we propose a unified framework that encompasses a majority of the previous work. Then, we present a comprehensive overview of the diverse applications of LLM-based autonomous agents in the fields of social science, natural science, and engineering. Finally, we delve into the evaluation strategies commonly used for LLM-based autonomous agents. Based on the previous studies, we also present several challenges and future directions in this field. To keep track of this field and continuously update our survey, we maintain a repository of relevant references at https://github.com/Paitesanshi/LLM-Agent-Survey.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper surveys the emerging field of LLM-based autonomous agents. It proposes a unified framework for agent construction that is claimed to encompass a majority of prior work, provides a comprehensive overview of applications across social science, natural science, and engineering domains, reviews common evaluation strategies, and discusses challenges and future directions while maintaining an online repository for ongoing updates.
Significance. If the unified framework successfully organizes the majority of existing LLM-agent architectures without major omissions, the survey will serve as a valuable reference point for researchers, helping to structure a rapidly expanding literature and identify cross-domain applications and evaluation practices.
minor comments (2)
- [§3] The claim that the framework 'encompasses a majority of the previous work' (abstract and §3) would be strengthened by an explicit count or table showing how many surveyed papers map to each component of the framework versus any notable omissions.
- [Applications sections] In the applications overview, some domain-specific examples could include more direct citations to the original LLM-agent papers rather than secondary references to improve traceability.
Simulated Author's Rebuttal
We thank the referee for their positive review and recommendation to accept the manuscript. The assessment correctly identifies the core contributions of our unified framework for LLM-based agent construction, the cross-domain application overview, and the maintained repository for updates.
Circularity Check
No significant circularity in literature survey synthesis
full rationale
This is a pure survey paper whose central contribution is a descriptive unified framework for organizing prior LLM-agent literature. No equations, fitted parameters, predictions, or derivations appear in the provided text. The framework is explicitly positioned as an encompassing lens drawn from external cited works rather than derived internally or justified via self-citation chains. All content rests on external references; the survey structure (construction, applications, evaluation) introduces no self-definitional loops, fitted-input predictions, or load-bearing self-citations that reduce the claims to the paper's own inputs by construction. This matches the expected non-circular outcome for honest literature synthesis.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith.Foundation.DAlembert.Inevitabilitybilinear_family_forced echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
the cost of a composite genuinely depends on how its components fit together
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 21 Pith papers
-
Your Agent Is Mine: Measuring Malicious Intermediary Attacks on the LLM Supply Chain
Malicious LLM API routers actively perform payload injection and secret exfiltration, with 9 of 428 tested routers showing malicious behavior and further poisoning risks from leaked credentials.
-
Agent-First Tool API: A Semantic Interface Paradigm for Enterprise AI Agent Systems
The Agent-First Tool API paradigm raises AI agent task success from 64% to 88% and cuts human interventions by 72.7% through semantic phases, structured contracts, and risk governance in a production enterprise system.
-
OPT-BENCH: Evaluating the Iterative Self-Optimization of LLM Agents in Large-Scale Search Spaces
OPT-BENCH and OPT-Agent evaluate LLM self-optimization in large search spaces, showing stronger models improve via feedback but stay constrained by base capacity and below human performance.
-
EvoMAS: Learning Execution-Time Workflows for Multi-Agent Systems
EvoMAS trains a workflow adapter with policy gradients to dynamically instantiate stage-specific multi-agent workflows from a fixed agent pool, using explicit task-state construction and terminal success signals, and ...
-
Self-Adaptive Multi-Agent LLM-Based Security Pattern Selection for IoT Systems
ASPO combines multi-agent LLM proposals with deterministic enforcement in a MAPE-K loop to select conflict-free, resource-feasible security patterns for IoT, delivering 100% safety invariants and 21-23% tail latency/e...
-
An AI Agent Execution Environment to Safeguard User Data
GAAP guarantees confidentiality of private user data for AI agents by enforcing user-specified permissions deterministically through persistent information flow tracking, without trusting the agent or requiring attack...
-
Visual Inception: Compromising Long-term Planning in Agentic Recommenders via Multimodal Memory Poisoning
Visual Inception poisons images to hijack long-term memory in agentic recommenders and steer planning, while CognitiveGuard reduces success to about 10% via perceptual sanitization and reasoning verification.
-
In-situ process monitoring for defect detection in wire-arc additive manufacturing: an agentic AI approach
A multi-agent AI framework using processing and acoustic agents achieves 91.6% accuracy and 0.821 F1 score for in-situ porosity defect detection in wire-arc additive manufacturing.
-
SoK: Agentic Skills -- Beyond Tool Use in LLM Agents
The paper systematizes agentic skills beyond tool use, providing design pattern and representation-scope taxonomies plus security analysis of malicious skill infiltration in agent marketplaces.
-
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
OS-Atlas, trained on the largest open-source cross-platform GUI grounding corpus of 13 million elements, outperforms prior open-source models on six benchmarks across mobile, desktop, and web platforms.
-
Is Grep All You Need? How Agent Harnesses Reshape Agentic Search
Grep retrieval generally outperforms vector retrieval in agentic search tasks, with performance varying strongly by agent harness and tool-calling style.
-
Agent Mentor: Framing Agent Knowledge through Semantic Trajectory Analysis
Agent Mentor analyzes semantic trajectories in agent logs to identify undesired behaviors and derives corrective prompt instructions, yielding measurable accuracy gains on benchmark tasks across three agent setups.
-
Toward Explanatory Equilibrium: Verifiable Reasoning as a Coordination Mechanism under Asymmetric Information
Structured reasoning artifacts enable coordination in LLM multi-agent systems by preventing approval and welfare collapse under asymmetric information while keeping bad-approval rates low across audit regimes.
-
EconAI: Dynamic Persona Evolution and Memory-Aware Agents in Evolving Economic Environments
EconAI adds memory weighting and economic sentiment indexing to LLM agents so they adapt short-term actions to long-term goals inside a single macro/micro simulation loop.
-
SciFi: A Safe, Lightweight, User-Friendly, and Fully Autonomous Agentic AI Workflow for Scientific Applications
SciFi is a safe, lightweight agentic AI framework that automates structured scientific tasks with minimal human intervention via isolated environments and layered self-assessing agents.
-
Aethon: A Reference-Based Replication Primitive for Constant-Time Instantiation of Stateful AI Agents
Aethon enables near-constant-time instantiation of stateful AI agents via reference-based replication over compositional views, layered memory, and copy-on-write semantics.
-
OpenKedge: Governing Agentic Mutation with Execution-Bound Safety and Evidence Chains
OpenKedge redefines AI agent state mutations as a governed process using intent proposals, policy-evaluated execution contracts, and cryptographic evidence chains to enable safe, auditable agentic behavior.
-
Understanding the planning of LLM agents: A survey
A survey that provides a taxonomy of methods for improving planning in LLM-based agents across task decomposition, plan selection, external modules, reflection, and memory.
-
The Rise and Potential of Large Language Model Based Agents: A Survey
The paper surveys the origins, frameworks, applications, and open challenges of AI agents built on large language models.
-
A Survey on the Memory Mechanism of Large Language Model based Agents
A systematic review of memory designs, evaluation methods, applications, limitations, and future directions for LLM-based agents.
-
Large Language Models: A Survey
The paper surveys key large language models, their training methods, datasets, evaluation benchmarks, and future research directions in the field.
Reference graph
Works this paper leans on
-
[1]
Human-level control through deep reinforcement learning
Mnih V , Kavukcuoglu K, Silver D, Rusu A A, Veness J, Bellemare M G, Graves A, Riedmiller M, Fidjeland A K, Ostrovski G, others . Human-level control through deep reinforcement learning. nature, 2015, 518(7540): 529–533
work page 2015
-
[2]
Continuous control with deep reinforcement learning
Lillicrap T P, Hunt J J, Pritzel A, Heess N, Erez T, Tassa Y , Silver D, Wierstra D. Continuous con- trol with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[3]
Proximal Policy Optimization Algorithms
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[4]
Haarnoja T, Zhou A, Abbeel P, Levine S. Soft actor- critic: O ff-policy maximum entropy deep reinforce- ment learning with a stochastic actor. In: International conference on machine learning. 2018, 1861–1870
work page 2018
-
[5]
Language models are few-shot learners
Brown T, Mann B, Ryder N, Subbiah M, Kaplan J D, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, others . Language models are few-shot learners. Advances in neural information processing systems, 2020, 33: 1877–1901
work page 2020
-
[6]
Language models are unsuper- vised multitask learners
Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I, others . Language models are unsuper- vised multitask learners. OpenAI blog, 2019, 1(8): 9
work page 2019
-
[7]
Achiam J, Adler S, Agarwal S, Ahmad L, Akkaya I, Aleman F L, Almeida D, Altenschmidt J, Altman S, Anadkat S, others . Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[8]
Model card and evaluations for claude models
Anthropic . Model card and evaluations for claude models. https://www-files. anthropic.com/production/images/ Model-Card-Claude-2.pdf?ref= maginative.com, 2023
work page 2023
-
[9]
LLaMA: Open and Efficient Foundation Language Models
Touvron H, Lavril T, Izacard G, Martinet X, Lachaux M A, Lacroix T, Rozière B, Goyal N, Hambro E, Azhar F, others . Llama: Open and e fficient foundation lan- guage models. arXiv preprint arXiv:2302.13971, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[10]
Llama 2: Open Foundation and Fine-Tuned Chat Models
Touvron H, Martin L, Stone K, Albert P, Almahairi A, Babaei Y , Bashlykov N, Batra S, Bhargava P, Bhosale S, others . Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[11]
Genera- tive adversarial user model for reinforcement learning based recommendation system
Chen X, Li S, Li H, Jiang S, Qi Y , Song L. Genera- tive adversarial user model for reinforcement learning based recommendation system. In: International Con- ference on Machine Learning. 2019, 1052–1061
work page 2019
-
[12]
Reflexion: Language agents with verbal reinforcement learning
Shinn N, Cassano F, Gopinath A, Narasimhan K, Yao S. Reflexion: Language agents with verbal reinforcement learning. Advances in Neural Information Processing Systems, 2024, 36
work page 2024
-
[13]
Hug- ginggpt: Solving ai tasks with chatgpt and its friends in hugging face
Shen Y , Song K, Tan X, Li D, Lu W, Zhuang Y . Hug- ginggpt: Solving ai tasks with chatgpt and its friends in hugging face. Advances in Neural Information Pro- cessing Systems, 2024, 36
work page 2024
-
[14]
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs
Qin Y , Liang S, Ye Y , Zhu K, Yan L, Lu Y , Lin Y , Cong X, Tang X, Qian B, others . Toolllm: Facilitating large language models to master 16000 + real-world apis. arXiv preprint arXiv:2307.16789, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[15]
Toolformer: Language models can teach themselves to use tools
Schick T, Dwivedi-Yu J, Dessì R, Raileanu R, Lomeli M, Hambro E, Zettlemoyer L, Cancedda N, Scialom T. Toolformer: Language models can teach themselves to use tools. Advances in Neural Information Processing Systems, 2024, 36
work page 2024
-
[16]
Zhu X, Chen Y , Tian H, Tao C, Su W, Yang C, Huang G, Li B, Lu L, Wang X, others . Ghost in the minecraft: Generally capable agents for open-world enviroments via large language models with text-based knowledge and memory. arXiv preprint arXiv:2305.17144, 2023
-
[17]
Minding language models’(lack of) theory of mind: A plug-and-play multi-character belief tracker
Sclar M, Kumar S, West P, Suhr A, Choi Y , Tsvetkov Y . Minding language models’(lack of) theory of mind: A plug-and-play multi-character belief tracker. arXiv preprint arXiv:2306.00924, 2023
-
[18]
ChatDev: Communicative Agents for Software Development
Qian C, Cong X, Yang C, Chen W, Su Y , Xu J, Liu Z, Sun M. Communicative agents for software develop- ment. arXiv preprint arXiv:2307.07924, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[19]
al. e C. Agentverse. https://github.com/ OpenBMB/AgentVerse, 2023
work page 2023
-
[20]
Generative agents: Interactive simu- lacra of human behavior
Park J S, O’Brien J, Cai C J, Morris M R, Liang P, Bernstein M S. Generative agents: Interactive simu- lacra of human behavior. In: Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. 2023, 1–22
work page 2023
-
[21]
Recagent: A novel simulation paradigm for recommender systems
Wang L, Zhang J, Chen X, Lin Y , Song R, Zhao W X, Wen J R. Recagent: A novel simulation paradigm for recommender systems. arXiv preprint arXiv:2306.02552, 2023
-
[22]
Building cooperative embodied agents modularly with large language models
Zhang H, Du W, Shan J, Zhou Q, Du Y , Tenenbaum J B, Shu T, Gan C. Building cooperative embodied agents modularly with large language models. arXiv Lei Wang et al. A Survey on Large Language Model based Autonomous Agents 35 preprint arXiv:2307.02485, 2023
-
[23]
MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework
Hong S, Zheng X, Chen J, Cheng Y , Wang J, Zhang C, Wang Z, Yau S K S, Lin Z, Zhou L, others . Metagpt: Meta programming for multi-agent collaborative frame- work. arXiv preprint arXiv:2308.00352, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[24]
Self-collaboration code generation via chatgpt
Dong Y , Jiang X, Jin Z, Li G. Self-collaboration code generation via chatgpt. arXiv preprint arXiv:2304.07590, 2023
-
[25]
Person- ality traits in large language models
Safdari M, Serapio-García G, Crepy C, Fitz S, Romero P, Sun L, Abdulhai M, Faust A, Matari ´c M. Person- ality traits in large language models. arXiv preprint arXiv:2307.00184, 2023
-
[26]
Johnson J A. Measuring thirty facets of the five factor model with a 120-item public domain inventory: De- velopment of the ipip-neo-120. Journal of research in personality, 2014, 51: 78–89
work page 2014
-
[27]
John O P, Donahue E M, Kentle R L. Big five inventory. Journal of Personality and Social Psychology, 1991
work page 1991
-
[28]
Toxicity in chatgpt: Analyzing persona-assigned language models
Deshpande A, Murahari V , Rajpurohit T, Kalyan A, Narasimhan K. Toxicity in chatgpt: Analyzing persona-assigned language models. arXiv preprint arXiv:2304.05335, 2023
-
[29]
Out of one, many: Using language models to simulate human samples
Argyle L P, Busby E C, Fulda N, Gubler J R, Rytting C, Wingate D. Out of one, many: Using language models to simulate human samples. Political Analysis, 2023, 31(3): 337–351
work page 2023
-
[30]
Reflective linguistic programming (rlp): A stepping stone in socially-aware agi (socialagi)
Fischer K A. Reflective linguistic programming (rlp): A stepping stone in socially-aware agi (socialagi). arXiv preprint arXiv:2305.12647, 2023
-
[31]
Sayplan: Grounding large language models using 3d scene graphs for scalable robot task planning
Rana K, Haviland J, Garg S, Abou-Chakra J, Reid I, Suenderhauf N. Sayplan: Grounding large language models using 3d scene graphs for scalable robot task planning. In: 7th Annual Conference on Robot Learn- ing. 2023
work page 2023
-
[32]
Calypso: Llms as dungeon master’s assistants
Zhu A, Martin L, Head A, Callison-Burch C. Calypso: Llms as dungeon master’s assistants. In: Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment. 2023, 380–390
work page 2023
-
[33]
Wang Z, Cai S, Chen G, Liu A, Ma X, Liang Y . De- scribe, explain, plan and select: Interactive planning with large language models enables open-world multi- task agents. arXiv preprint arXiv:2302.01560, 2023
-
[34]
Agentsims: An open-source sandbox for large language model evaluation
Lin J, Zhao H, Zhang A, Wu Y , Ping H, Chen Q. Agentsims: An open-source sandbox for large language model evaluation. arXiv preprint arXiv:2308.04026, 2023
-
[35]
Liang X, Wang B, Huang H, Wu S, Wu P, Lu L, Ma Z, Li Z. Unleashing infinite-length input capacity for large-scale language models with self-controlled mem- ory system. arXiv preprint arXiv:2304.13343, 2023
-
[36]
Simplyretrieve: A private and lightweight retrieval-centric generative ai tool
Ng Y , Miyashita D, Hoshi Y , Morioka Y , Torii O, Ko- dama T, Deguchi J. Simplyretrieve: A private and lightweight retrieval-centric generative ai tool. arXiv preprint arXiv:2308.03983, 2023
-
[37]
Memory sandbox: Transparent and interactive memory manage- ment for conversational agents
Huang Z, Gutierrez S, Kamana H, MacNeil S. Memory sandbox: Transparent and interactive memory manage- ment for conversational agents. In: Adjunct Proceed- ings of the 36th Annual ACM Symposium on User Interface Software and Technology. 2023, 1–3
work page 2023
-
[38]
Voyager: An Open-Ended Embodied Agent with Large Language Models
Wang G, Xie Y , Jiang Y , Mandlekar A, Xiao C, Zhu Y , Fan L, Anandkumar A. V oyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[39]
Memorybank: Enhancing large language models with long-term memory
Zhong W, Guo L, Gao Q, Wang Y . Memorybank: En- hancing large language models with long-term memory. arXiv preprint arXiv:2305.10250, 2023
-
[40]
Chatdb: Augmenting llms with databases as their symbolic memory
Hu C, Fu J, Du C, Luo S, Zhao J, Zhao H. Chatdb: Aug- menting llms with databases as their symbolic memory. arXiv preprint arXiv:2306.03901, 2023
-
[41]
Ret-llm: Towards a general read-write memory for large language models
Modarressi A, Imani A, Fayyaz M, Schütze H. Ret- llm: Towards a general read-write memory for large language models. arXiv preprint arXiv:2305.14322, 2023
-
[42]
Memory augmented large language models are computationally universal
Schuurmans D. Memory augmented large language models are computationally universal. arXiv preprint arXiv:2301.04589, 2023
-
[43]
Expel: Llm agents are experiential learners
Zhao A, Huang D, Xu Q, Lin M, Liu Y J, Huang G. Expel: Llm agents are experiential learners. arXiv preprint arXiv:2308.10144, 2023
-
[44]
Language models as zero-shot planners: Extracting actionable knowledge for embodied agents
Huang W, Abbeel P, Pathak D, Mordatch I. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In: International Con- ference on Machine Learning. 2022, 9118–9147
work page 2022
-
[45]
Chain-of-thought prompting elicits reasoning in large language models
Wei J, Wang X, Schuurmans D, Bosma M, Xia F, Chi E, Le Q V , Zhou D, others . Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 2022, 35: 24824–24837
work page 2022
-
[46]
Large language models are zero-shot reasoners
Kojima T, Gu S S, Reid M, Matsuo Y , Iwasawa Y . Large language models are zero-shot reasoners. Advances in neural information processing systems, 2022, 35: 22199–22213
work page 2022
-
[47]
Planning with large language models via corrective re-prompting
Raman S S, Cohen V , Rosen E, Idrees I, Paulius D, Tellex S. Planning with large language models via corrective re-prompting. In: NeurIPS 2022 Foundation Models for Decision Making Workshop. 2022
work page 2022
-
[48]
Re- woo: Decoupling reasoning from observations for ef- ficient augmented language models
Xu B, Peng Z, Lei B, Mukherjee S, Liu Y , Xu D. Re- woo: Decoupling reasoning from observations for ef- ficient augmented language models. arXiv preprint arXiv:2305.18323, 2023
-
[49]
Lin B Y , Fu Y , Yang K, Brahman F, Huang S, Bhaga- vatula C, Ammanabrolu P, Choi Y , Ren X. Swiftsage: 36 Front. Comput. Sci., 2025, 0(0): 1–42 A generative agent with fast and slow thinking for com- plex interactive tasks. Advances in Neural Information Processing Systems, 2024, 36
work page 2025
-
[50]
Dual-process theories of higher cognition: Advancing the debate
Evans J S B, Stanovich K E. Dual-process theories of higher cognition: Advancing the debate. Perspectives on psychological science, 2013, 8(3): 223–241
work page 2013
-
[51]
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Wang X, Wei J, Schuurmans D, Le Q, Chi E, Narang S, Chowdhery A, Zhou D. Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[52]
Tree of thoughts: Deliberate problem solving with large language models
Yao S, Yu D, Zhao J, Shafran I, Gri ffiths T, Cao Y , Narasimhan K. Tree of thoughts: Deliberate problem solving with large language models. Advances in Neu- ral Information Processing Systems, 2024, 36
work page 2024
-
[53]
Recmind: Large language model powered agent for recommendation
Wang Y , Jiang Z, Chen Z, Yang F, Zhou Y , Cho E, Fan X, Huang X, Lu Y , Yang Y . Recmind: Large language model powered agent for recommendation. arXiv preprint arXiv:2308.14296, 2023
-
[54]
Graph of thoughts: Solving elaborate problems with large language mod- els
Besta M, Blach N, Kubicek A, Gerstenberger R, Gianinazzi L, Gajda J, Lehmann T, Podstawski M, Niewiadomski H, Nyczyk P, others . Graph of thoughts: Solving elaborate problems with large language mod- els. arXiv preprint arXiv:2308.09687, 2023
-
[55]
Algorithm of thoughts: Enhancing exploration of ideas in large language models
Sel B, Al-Tawaha A, Khattar V , Wang L, Jia R, Jin M. Algorithm of thoughts: Enhancing exploration of ideas in large language models. arXiv preprint arXiv:2308.10379, 2023
-
[56]
Generating executable ac- tion plans with environmentally-aware language mod- els
Gramopadhye M, Szafir D. Generating executable ac- tion plans with environmentally-aware language mod- els. In: 2023 IEEE /RSJ International Conference on Intelligent Robots and Systems (IROS). 2023, 3568– 3575
work page 2023
-
[57]
Reasoning with language model is planning with world model
Hao S, Gu Y , Ma H, Hong J J, Wang Z, Wang D Z, Hu Z. Reasoning with language model is planning with world model. arXiv preprint arXiv:2305.14992, 2023
-
[58]
LLM+P: Empowering Large Language Models with Optimal Planning Proficiency
Liu B, Jiang Y , Zhang X, Liu Q, Zhang S, Biswas J, Stone P. LLM +P: Empowering large language mod- els with optimal planning proficiency. arXiv preprint arXiv:2304.11477, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[59]
Dagan G, Keller F, Lascarides A. Dynamic planning with a llm. arXiv preprint arXiv:2308.06391, 2023
-
[60]
React: Synergizing reasoning and acting in language models
Yao S, Zhao J, Yu D, Du N, Shafran I, Narasimhan K, Cao Y . React: Synergizing reasoning and acting in language models. In: The Twelfth International Conference on Learning Representations. 2023
work page 2023
-
[61]
Llm-planner: Few-shot grounded planning for embodied agents with large language models
Song C H, Wu J, Washington C, Sadler B M, Chao W L, Su Y . Llm-planner: Few-shot grounded planning for embodied agents with large language models. In: Pro- ceedings of the IEEE /CVF International Conference on Computer Vision. 2023, 2998–3009
work page 2023
-
[62]
Inner Monologue: Embodied Reasoning through Planning with Language Models
Huang W, Xia F, Xiao T, Chan H, Liang J, Flo- rence P, Zeng A, Tompson J, Mordatch I, Chebotar Y , others . Inner monologue: Embodied reasoning through planning with language models. arXiv preprint arXiv:2207.05608, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[63]
Self-refine: Iterative refinement with self- feedback
Madaan A, Tandon N, Gupta P, Hallinan S, Gao L, Wiegreffe S, Alon U, Dziri N, Prabhumoye S, Yang Y , others . Self-refine: Iterative refinement with self- feedback. Advances in Neural Information Processing Systems, 2024, 36
work page 2024
-
[64]
Selfcheck: Using llms to zero-shot check their own step-by-step reasoning
Miao N, Teh Y W, Rainforth T. Selfcheck: Using llms to zero-shot check their own step-by-step reasoning. In: The Twelfth International Conference on Learning Representations. 2023
work page 2023
-
[65]
Interact: Exploring the poten- tials of chatgpt as a cooperative agent
Chen P L, Chang C S. Interact: Exploring the poten- tials of chatgpt as a cooperative agent. arXiv preprint arXiv:2308.01552, 2023
-
[66]
Chatcot: Tool-augmented chain-of-thought rea- soning on\\chat-based large language models
Chen Z, Zhou K, Zhang B, Gong Z, Zhao W X, Wen J R. Chatcot: Tool-augmented chain-of-thought rea- soning on\\chat-based large language models. arXiv preprint arXiv:2305.14323, 2023
-
[67]
WebGPT: Browser-assisted question-answering with human feedback
Nakano R, Hilton J, Balaji S, Wu J, Ouyang L, Kim C, Hesse C, Jain S, Kosaraju V , Saunders W, others . Webgpt: Browser-assisted question-answering with human feedback. arXiv preprint arXiv:2112.09332, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[68]
Tptu: Task planning and tool usage of large language model-based ai agents
Ruan J, Chen Y , Zhang B, Xu Z, Bao T, Du G, Shi S, Mao H, Zeng X, Zhao R. TPTU: Task planning and tool usage of large language model-based AI agents. arXiv preprint arXiv:2308.03427, 2023
-
[69]
Gorilla: Large Language Model Connected with Massive APIs
Patil S G, Zhang T, Wang X, Gonzalez J E. Gorilla: Large language model connected with massive apis. arXiv preprint arXiv:2305.15334, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[70]
Api-bank: A comprehensive benchmark for tool-augmented llms
Li M, Song F, Yu B, Yu H, Li Z, Huang F, Li Y . Api- bank: A benchmark for tool-augmented llms. arXiv preprint arXiv:2304.08244, 2023
-
[71]
Song Y , Xiong W, Zhu D, Li C, Wang K, Tian Y , Li S. Restgpt: Connecting large language models with real-world applications via restful apis. arXiv preprint arXiv:2306.06624, 2023
-
[72]
Liang Y , Wu C, Song T, Wu W, Xia Y , Liu Y , Ou Y , Lu S, Ji L, Mao S, others . Taskmatrix. ai: Completing tasks by connecting foundation models with millions of apis. Intelligent Computing, 2024, 3: 0063
work page 2024
-
[73]
Karpas E, Abend O, Belinkov Y , Lenz B, Lieber O, Ratner N, Shoham Y , Bata H, Levine Y , Leyton-Brown K, others . Mrkl systems: A modular, neuro-symbolic architecture that combines large language models, ex- ternal knowledge sources and discrete reasoning. arXiv preprint arXiv:2205.00445, 2022
work page internal anchor Pith review arXiv 2022
-
[74]
Openagi: When llm meets domain experts
Ge Y , Hua W, Mei K, Tan J, Xu S, Li Z, Zhang Y , others . Openagi: When llm meets domain experts. Advances Lei Wang et al. A Survey on Large Language Model based Autonomous Agents 37 in Neural Information Processing Systems, 2024, 36
work page 2024
-
[75]
Vipergpt: Visual infer- ence via python execution for reasoning
Surís D, Menon S, V ondrick C. Vipergpt: Visual infer- ence via python execution for reasoning. arXiv preprint arXiv:2303.08128, 2023
-
[76]
M., Cox, S., Schilter, O., Baldassari, C., White, A
Bran A M, Cox S, White A D, Schwaller P. Chem- crow: Augmenting large-language models with chem- istry tools. arXiv preprint arXiv:2304.05376, 2023
-
[77]
MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action
Yang Z, Li L, Wang J, Lin K, Azarnasab E, Ahmed F, Liu Z, Liu C, Zeng M, Wang L. Mm-react: Prompting chatgpt for multimodal reasoning and action. arXiv preprint arXiv:2303.11381, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[78]
S3: Social-network simulation system with large language model-empowered agents
Gao C, Lan X, Lu Z, Mao J, Piao J, Wang H, Jin D, Li Y . S3: Social-network simulation system with large language model-empowered agents. arXiv preprint arXiv:2307.14984, 2023
-
[79]
Do As I Can, Not As I Say: Grounding Language in Robotic Affordances
Ahn M, Brohan A, Brown N, Chebotar Y , Cortes O, David B, Finn C, Fu C, Gopalakrishnan K, Hausman K, others . Do as i can, not as i say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[80]
Social simulacra: Creating populated prototypes for social computing systems
Park J S, Popowski L, Cai C, Morris M R, Liang P, Bernstein M S. Social simulacra: Creating populated prototypes for social computing systems. In: Proceed- ings of the 35th Annual ACM Symposium on User Interface Software and Technology. 2022, 1–18
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.