arxiv: 2309.02427 · v3 · submitted 2023-09-05 · 💻 cs.AI · cs.CL· cs.LG· cs.SC

Recognition: 3 theorem links

Cognitive Architectures for Language Agents

Theodore R. Sumers , Shunyu Yao , Karthik Narasimhan , Thomas L. Griffiths

Authors on Pith no claims yet

Pith reviewed 2026-05-16 19:30 UTC · model grok-4.3

classification 💻 cs.AI cs.CLcs.LGcs.SC

keywords language agentscognitive architecturesLLM agentsagent frameworksmemory componentsdecision makingartificial intelligencesymbolic AI

0 comments

The pith

CoALA structures language agents with modular memory components, a structured action space, and a generalized decision-making process drawn from cognitive science.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper proposes CoALA as a framework to organize language agents built on large language models. It specifies agents that keep separate memory stores for different kinds of information, choose from a defined set of actions that can update memory or affect the outside world, and follow a general process to pick the next action. The same structure is applied to review many recent agent designs and to name specific ways to make future agents more capable. A sympathetic reader would see this as a way to connect today's ad-hoc LLM systems to older ideas from cognitive science instead of reinventing components with each new prompt or tool. The long-term goal is a path toward language-based general intelligence through systematic architecture rather than isolated improvements.

Core claim

CoALA describes a language agent with modular memory components, a structured action space to interact with internal memory and external environments, and a generalized decision-making process to choose actions. We use CoALA to retrospectively survey and organize a large body of recent work, and prospectively identify actionable directions towards more capable agents.

What carries the argument

CoALA, the proposed architecture that equips language agents with modular memory components, a structured action space for internal and external interactions, and a generalized decision-making process.

Load-bearing premise

That principles from cognitive science and symbolic AI transfer directly to LLM-based agents without losing the flexibility and broad capabilities that make the models effective.

What would settle it

A controlled experiment in which agents built without modular memory or structured actions match or exceed the performance and generalization of CoALA-based agents on multi-step reasoning and tool-use benchmarks.

read the original abstract

Recent efforts have augmented large language models (LLMs) with external resources (e.g., the Internet) or internal control flows (e.g., prompt chaining) for tasks requiring grounding or reasoning, leading to a new class of language agents. While these agents have achieved substantial empirical success, we lack a systematic framework to organize existing agents and plan future developments. In this paper, we draw on the rich history of cognitive science and symbolic artificial intelligence to propose Cognitive Architectures for Language Agents (CoALA). CoALA describes a language agent with modular memory components, a structured action space to interact with internal memory and external environments, and a generalized decision-making process to choose actions. We use CoALA to retrospectively survey and organize a large body of recent work, and prospectively identify actionable directions towards more capable agents. Taken together, CoALA contextualizes today's language agents within the broader history of AI and outlines a path towards language-based general intelligence.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper proposes Cognitive Architectures for Language Agents (CoALA), a descriptive framework for LLM-based agents that incorporates modular memory components, a structured action space for interactions with internal memory and external environments, and a generalized decision-making process. Drawing from cognitive science and symbolic AI, CoALA is applied retrospectively to organize a survey of recent language agent work and prospectively to suggest directions for more capable agents, with the goal of contextualizing current systems within broader AI history.

Significance. If the framework holds as an organizational tool, it offers a useful bridge between empirical language agent successes and historical cognitive/symbolic principles, potentially aiding systematic planning of future developments without introducing fitted parameters or self-referential derivations. The conceptual mapping of existing systems strengthens its value as a retrospective lens, though its prospective utility remains to be tested empirically.

minor comments (2)

In the survey section, the mapping of specific agents (e.g., those using external tools) to CoALA modules could include a table summarizing coverage to make the retrospective organization more explicit and verifiable.
The description of the generalized decision-making process in §3 would benefit from a short pseudocode example to clarify how it differs from standard prompt chaining without relying solely on prose.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive review and recommendation to accept the manuscript. Their summary accurately captures the goals, structure, and contributions of CoALA as a framework for organizing language agents.

Circularity Check

0 steps flagged

No significant circularity: CoALA is a conceptual framework proposal

full rationale

The paper proposes CoALA as a descriptive architecture drawing on external cognitive science and symbolic AI literature to organize language agents with modular memory, structured actions, and decision-making. No equations, fitted parameters, or predictions are defined; the central claims are retrospective organization of prior work and prospective suggestions, with no self-referential reductions or load-bearing self-citations that collapse the framework into its own inputs. The derivation chain is self-contained against external benchmarks from cognitive science.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that cognitive-science and symbolic-AI principles transfer usefully to LLM agents; no free parameters or new physical entities are introduced.

axioms (1)

domain assumption Cognitive science and symbolic AI supply reusable principles for agent design
Invoked in the introduction and CoALA definition sections to justify the modular structure.

invented entities (1)

CoALA architecture no independent evidence
purpose: To provide a unified description and planning tool for language agents
A proposed organizing framework rather than a new physical or computational entity with independent falsifiable predictions.

pith-pipeline@v0.9.0 · 5468 in / 1231 out tokens · 36168 ms · 2026-05-16T19:30:13.643090+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

LogicAsFunctionalEquation laws_of_logic_imply_dalembert_hypotheses unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We draw on the rich history of cognitive science and symbolic artificial intelligence to propose Cognitive Architectures for Language Agents (CoALA).
HierarchyEmergence hierarchy_emergence_forces_phi unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

CoALA contextualizes today’s language agents within the broader history of AI and outlines a path towards language-based general intelligence.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 18 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems
cs.MA 2024-10 unverdicted novelty 8.0

Prompt injection attacks can self-replicate across LLM agents in multi-agent systems, enabling data theft, misinformation, and system disruption while propagating silently.
The Moltbook Files: A Harmless Slopocalypse or Humanity's Last Experiment
cs.CL 2026-05 unverdicted novelty 7.0

An AI-agent social platform generated mostly neutral content whose use in fine-tuning reduced model truthfulness comparably to human Reddit data, suggesting limited unique harm but flagging tail risks like secret leaks.
More Is Not Always Better: Cross-Component Interference in LLM Agent Scaffolding
cs.AI 2026-05 conditional novelty 7.0

Full factorial testing of five LLM agent components reveals that the complete 'All-In' combination is consistently outperformed by smaller subsets due to cross-component interference, with optimal subsets being task- ...
OCR-Memory: Optical Context Retrieval for Long-Horizon Agent Memory
cs.CL 2026-04 unverdicted novelty 7.0

OCR-Memory encodes agent trajectories as images with visual anchors and retrieves verbatim text via locate-and-transcribe, yielding gains on long-horizon benchmarks under strict context limits.
The Missing Knowledge Layer in Cognitive Architectures for AI Agents
cs.AI 2026-04 conditional novelty 7.0

Cognitive architectures for AI agents require a distinct Knowledge layer with indefinite supersession persistence, separate from Memory decay, Wisdom evidence-gating, and Intelligence ephemerality.
ClawVM: Harness-Managed Virtual Memory for Stateful Tool-Using LLM Agents
cs.AI 2026-04 unverdicted novelty 7.0

ClawVM introduces a harness-managed virtual memory system for LLM agents that ensures deterministic residency and durability of state under token budgets by using typed pages and validated writeback.
ROZA Graphs: Self-Improving Near-Deterministic RAG through Evidence-Centric Feedback
cs.AI 2026-04 unverdicted novelty 7.0

ROZA graphs enable self-improving RAG by storing evidence-specific reasoning chains, yielding up to 10.6pp accuracy gains and 46% lower cost through graph traversal feedback.
MatClaw: An Autonomous Code-First LLM Agent for End-to-End Materials Exploration
cond-mat.mtrl-sci 2026-04 conditional novelty 7.0

MatClaw is a code-first LLM agent that autonomously executes end-to-end materials workflows by generating and running Python scripts on remote clusters, achieving reliable code generation via memory architecture and R...
$\tau$-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains
cs.AI 2024-06 unverdicted novelty 7.0

τ-bench shows state-of-the-art agents like GPT-4o succeed on under 50% of tool-using, rule-following tasks and are inconsistent across repeated trials.
How to Interpret Agent Behavior
cs.AI 2026-05 conditional novelty 6.0

ACT*ONOMY is a Grounded-Theory-derived hierarchical taxonomy and open repository that enables systematic comparison and characterization of autonomous agent behavior across trajectories.
Memanto: Typed Semantic Memory with Information-Theoretic Retrieval for Long-Horizon Agents
cs.AI 2026-04 unverdicted novelty 6.0

Memanto delivers 89.8% and 87.1% accuracy on LongMemEval and LoCoMo benchmarks using typed semantic memory and information-theoretic retrieval, outperforming hybrid graph and vector systems with a single query and zer...
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
cs.CL 2024-10 unverdicted novelty 6.0

OS-Atlas, trained on the largest open-source cross-platform GUI grounding corpus of 13 million elements, outperforms prior open-source models on six benchmarks across mobile, desktop, and web platforms.
A Roadmap to Pluralistic Alignment
cs.AI 2024-02 unverdicted novelty 6.0

The paper formalizes three types of pluralistic AI models and three benchmark classes, arguing that current alignment techniques may reduce rather than increase distributional pluralism.
Is Grep All You Need? How Agent Harnesses Reshape Agentic Search
cs.CL 2026-05 unverdicted novelty 5.0

Grep retrieval generally outperforms vector retrieval in agentic search tasks, with performance varying strongly by agent harness and tool-calling style.
Large Language Model based Multi-Agents: A Survey of Progress and Challenges
cs.CL 2024-01 unverdicted novelty 4.0

The paper surveys LLM-based multi-agent systems, covering simulated domains, agent profiling and communication, mechanisms for capacity growth, and common benchmarks.
The Rise and Potential of Large Language Model Based Agents: A Survey
cs.AI 2023-09 accept novelty 4.0

The paper surveys the origins, frameworks, applications, and open challenges of AI agents built on large language models.
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models
cs.AI 2025-01 unverdicted novelty 3.0

The paper surveys reinforced reasoning techniques for LLMs, covering automated data construction, learning-to-reason methods, and test-time scaling as steps toward Large Reasoning Models.
Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security
cs.HC 2024-01 unverdicted novelty 3.0

This survey discusses key components and challenges for Personal LLM Agents and reviews solutions for their capability, efficiency, and security.

Reference graph

Works this paper leans on

96 extracted references · 96 canonical work pages · cited by 18 Pith papers · 43 internal anchors

[1]

M. Ahn, A. Brohan, N. Brown, Y. Chebotar, O. Cortes, B. David, C. Finn, C. Fu, K. Gopalakrishnan, K. Hausman, et al. Do as I can, not as I say: Grounding language in robotic affordances.arXiv preprint arXiv:2204.01691,

work page internal anchor Pith review Pith/arXiv arXiv
[2]

J. Andreas. Language models as agent models. InFindings of the Association for Computational Linguistics: EMNLP 2022, pages 5769–5779,

work page 2022
[3]

19 Published in Transactions on Machine Learning Research (02/2024) A. D. Baddeley and G. Hitch. Working memory. InPsychology of Learning and Motivation, volume 8, pages 47–89. Elsevier,

work page 2024
[4]

Y. Bai, S. Kadavath, S. Kundu, A. Askell, J. Kernion, A. Jones, A. Chen, A. Goldie, A. Mirhoseini, C. McKinnon, et al. Constitutional AI: Harmlessness from AI feedback.arXiv preprint arXiv:2212.08073,

work page internal anchor Pith review Pith/arXiv arXiv
[5]

Model-Free Episodic Control

C. Blundell, B. Uria, A. Pritzel, Y. Li, A. Ruderman, J. Z. Leibo, J. Rae, D. Wierstra, and D. Hassabis. Model-free episodic control.arXiv preprint arXiv:1606.04460,

work page internal anchor Pith review Pith/arXiv arXiv
[6]

RT-1: Robotics Transformer for Real-World Control at Scale

A. Brohan, N. Brown, J. Carbajal, Y. Chebotar, J. Dabis, C. Finn, K. Gopalakrishnan, K. Hausman, A. Herzog, J. Hsu, et al. RT-1: Robotics transformer for real-world control at scale.arXiv preprint arXiv:2212.06817,

work page internal anchor Pith review Pith/arXiv arXiv
[7]

RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

A. Brohan, N. Brown, J. Carbajal, Y. Chebotar, X. Chen, K. Choromanski, T. Ding, D. Driess, A. Dubey, C. Finn, et al. RT-2: Vision-language-action models transfer web knowledge to robotic control.arXiv preprint arXiv:2307.15818,

work page internal anchor Pith review Pith/arXiv arXiv
[8]

Brown, B

T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al. Language models are few-shot learners.Advances in Neural Information Processing Systems, 33:1877–1901,

work page 1901
[9]

C.-M. Chan, W. Chen, Y. Su, J. Yu, W. Xue, S. Zhang, J. Fu, and Z. Liu. Chateval: Towards better llm-based evaluators through multi-agent debate.arXiv preprint arXiv:2308.07201,

work page internal anchor Pith review Pith/arXiv arXiv
[10]

B. Chen, F. Xia, B. Ichter, K. Rao, K. Gopalakrishnan, M. S. Ryoo, A. Stone, and D. Kappler. Open- vocabulary queryable scene representations for real world planning. In2023 IEEE International Conference on Robotics and Automation (ICRA), pages 11509–11522, 2023a. 20 Published in Transactions on Machine Learning Research (02/2024) D. Chen and R. Mooney. L...

work page 2024
[11]

D. Chen, A. Fisch, J. Weston, and A. Bordes. Reading Wikipedia to answer open-domain questions.arXiv preprint arXiv:1704.00051,

work page internal anchor Pith review Pith/arXiv arXiv
[12]

M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. d. O. Pinto, J. Kaplan, H. Edwards, Y. Burda, N. Joseph, G. Brockman, et al. Evaluating large language models trained on code.arXiv preprint arXiv:2107.03374,

work page internal anchor Pith review Pith/arXiv arXiv
[13]

X. Chen, M. Lin, N. Schärli, and D. Zhou. Teaching large language models to self-debug.arXiv preprint arXiv:2304.05128, 2023b. Y. Chen, L. Yuan, G. Cui, Z. Liu, and H. Ji. A close look into the calibration of pre-trained language models. arXiv preprint arXiv:2211.00151,

work page internal anchor Pith review Pith/arXiv arXiv
[14]

PaLM: Scaling Language Modeling with Pathways

A. Chowdhery, S. Narang, J. Devlin, M. Bosma, G. Mishra, A. Roberts, P. Barham, H. W. Chung, C. Sutton, S. Gehrmann, et al. Palm: Scaling language modeling with pathways.arXiv preprint arXiv:2204.02311,

work page internal anchor Pith review Pith/arXiv arXiv
[15]

M.-A. Côté, A. Kádár, X. Yuan, B. Kybartas, T. Barnes, E. Fine, J. Moore, M. Hausknecht, L. El Asri, M. Adada, et al. Textworld: A learning environment for text-based games. InComputer Games: 7th Workshop, CGW 2018, pages 41–75. Springer,

work page 2018
[16]

Dagan, F

G. Dagan, F. Keller, and A. Lascarides. Dynamic Planning with a LLM.arXiv preprint arXiv:2308.06391,

work page arXiv
[17]

X. Deng, Y. Gu, B. Zheng, S. Chen, S. Stevens, B. Wang, H. Sun, and Y. Su. Mind2Web: Towards a generalist agent for the web.arXiv preprint arXiv:2306.06070,

work page internal anchor Pith review Pith/arXiv arXiv
[18]

Dohan, W

D. Dohan, W. Xu, A. Lewkowycz, J. Austin, D. Bieber, R. G. Lopes, Y. Wu, H. Michalewski, R. A. Saurous, J. Sohl-Dickstein, et al. Language model cascades.arXiv preprint arXiv:2207.10342,

work page arXiv
[19]

Y. Dong, X. Jiang, Z. Jin, and G. Li. Self-collaboration code generation via chatgpt. arXiv preprint arXiv:2304.07590,

work page arXiv
[20]

PaLM-E: An Embodied Multimodal Language Model

D. Driess, F. Xia, M. S. Sajjadi, C. Lynch, A. Chowdhery, B. Ichter, A. Wahid, J. Tompson, Q. Vuong, T. Yu, et al. Palm-e: An embodied multimodal language model.arXiv preprint arXiv:2303.03378,

work page internal anchor Pith review Pith/arXiv arXiv
[21]

21 Published in Transactions on Machine Learning Research (02/2024) Y. Du, S. Li, A. Torralba, J. B. Tenenbaum, and I. Mordatch. Improving factuality and reasoning in language models through multiagent debate.arXiv preprint arXiv:2305.14325,

work page internal anchor Pith review Pith/arXiv arXiv 2024
[22]

Ecoffet, J

A. Ecoffet, J. Huizinga, J. Lehman, K. O. Stanley, and J. Clune. Go-explore: a new approach for hard- exploration problems. arXiv preprint arXiv:1901.10995,

work page arXiv 1901
[23]

URLhttps://www.adept.ai/blog/persimmon-8b. S. Feng, C. Y. Park, Y. Liu, and Y. Tsvetkov. From pretraining data to language models to downstream tasks: Tracking the trails of political biases leading to unfair nlp models.arXiv preprint arXiv:2305.08283,

work page arXiv
[24]

Ganguli, A

D. Ganguli, A. Askell, N. Schiefer, T. Liao, K. Lukoši¯ ut˙ e, A. Chen, A. Goldie, A. Mirhoseini, C. Olsson, D. Hernandez, et al. The capacity for moral self-correction in large language models.arXiv preprint arXiv:2302.07459,

work page arXiv
[25]

C. Gao, X. Lan, Z. Lu, J. Mao, J. Piao, H. Wang, D. Jin, and Y. Li. S3: Social-network simulation system with large language model-empowered agents.arXiv preprint arXiv:2307.14984,

work page arXiv
[26]

T. Gao, A. Fisch, and D. Chen. Making pre-trained language models better few-shot learners.arXiv preprint arXiv:2012.15723,

work page arXiv 2012
[27]

L. Guan, K. Valmeekam, S. Sreedharan, and S. Kambhampati. Leveraging pre-trained large language models to construct and utilize world models for model-based task planning.arXiv preprint arXiv:2305.14909,

work page arXiv
[28]

URLhttps://github.com/guidance-ai/guidance. I. Gur, H. Furuta, A. Huang, M. Safdari, Y. Matsuo, D. Eck, and A. Faust. A real-world webagent with planning, long context understanding, and program synthesis.arXiv preprint arXiv:2307.12856,

work page internal anchor Pith review arXiv
[29]

S. Hao, Y. Gu, H. Ma, J. J. Hong, Z. Wang, D. Z. Wang, and Z. Hu. Reasoning with language model is planning with world model.arXiv preprint arXiv:2305.14992,

work page internal anchor Pith review arXiv
[30]

Hasan, C

M. Hasan, C. Ozel, S. Potter, and E. Hoque. Sapien: Affective virtual agents powered by large language models. arXiv preprint arXiv:2308.03022,

work page arXiv
[31]

Haslum, N

22 Published in Transactions on Machine Learning Research (02/2024) P. Haslum, N. Lipovetzky, D. Magazzeni, C. Muise, R. Brachman, F. Rossi, and P. Stone.An introduction to the planning domain definition language, volume

work page 2024
[32]

S. Hong, X. Zheng, J. Chen, Y. Cheng, C. Zhang, Z. Wang, S. K. S. Yau, Z. Lin, L. Zhou, C. Ran, et al. Metagpt: Meta programming for multi-agent collaborative framework.arXiv preprint arXiv:2308.00352, 2023a. W. Hong, W. Wang, Q. Lv, J. Xu, W. Yu, J. Ji, Y. Wang, Z. Wang, Y. Dong, M. Ding, et al. Cogagent: A visual language model for gui agents.arXiv prep...

work page internal anchor Pith review Pith/arXiv arXiv
[33]

Inner Monologue: Embodied Reasoning through Planning with Language Models

W. Huang, P. Abbeel, D. Pathak, and I. Mordatch. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In International Conference on Machine Learning, pages 9118–9147, 2022b. W. Huang, F. Xia, T. Xiao, H. Chan, J. Liang, P. Florence, A. Zeng, J. Tompson, I. Mordatch, Y. Chebotar, et al. Inner monologue: Embodied reas...

work page internal anchor Pith review Pith/arXiv arXiv
[34]

AI safety via debate

G. Irving, P. Christiano, and D. Amodei. AI safety via debate.arXiv preprint arXiv:1805.00899,

work page internal anchor Pith review Pith/arXiv arXiv
[35]

Unsupervised Dense Information Retrieval with Contrastive Learning

G. Izacard, M. Caron, L. Hosseini, S. Riedel, P. Bojanowski, A. Joulin, and E. Grave. Unsupervised dense information retrieval with contrastive learning.arXiv preprint arXiv:2112.09118,

work page internal anchor Pith review Pith/arXiv arXiv
[36]

Jinxin, Z

S. Jinxin, Z. Jiabao, W. Yilei, W. Xingjiao, L. Jiawen, and H. Liang. Cgmi: Configurable general multi-agent interaction framework.arXiv preprint arXiv:2308.12503,

work page arXiv
[37]

Khattab, K

O. Khattab, K. Santhanam, X. L. Li, D. Hall, P. Liang, C. Potts, and M. Zaharia. Demonstrate-search-predict: Composing retrieval and language models for knowledge-intensive NLP.arXiv preprint arXiv:2212.14024,

work page arXiv
[38]

G.Kim, P.Baldi, andS.McAleer

URL https://github.com/stanfordnlp/dspy. G.Kim, P.Baldi, andS.McAleer. Languagemodelscansolvecomputertasks. arXiv preprint arXiv:2303.17491,

work page arXiv
[39]

23 Published in Transactions on Machine Learning Research (02/2024) J. R. Kirk, W. Robert, P. Lindes, and J. E. Laird. Improving Knowledge Extraction from LLMs for Robotic Task Learning through Agent Analysis.arXiv preprint arXiv:2306.06770,

work page arXiv 2024
[40]

Laidlaw, S

C. Laidlaw, S. Russell, and A. Dragan. Bridging rl theory and practice with the effective horizon.arXiv preprint arXiv:2304.09853,

work page arXiv
[41]

J. E. Laird. Introduction to Soar.arXiv preprint arXiv:2205.03854,

work page arXiv
[42]

Y. LeCun. A path towards autonomous machine intelligence version 0.9. 2, 2022-06-27.Open Review, 62,

work page 2022
[43]

B. Z. Li, W. Chen, P. Sharma, and J. Andreas. Lampp: Language models as probabilistic priors for perception and action. arXiv preprint arXiv:2302.02801, 2023a. C. Li, Z. Gan, Z. Yang, J. Yang, L. Li, L. Wang, and J. Gao. Multimodal foundation models: From specialists to general-purpose assistants.arXiv preprint arXiv:2309.10020, 2023b. H. Li, Y. Su, D. Ca...

work page arXiv 2024
[44]

Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate

T. Liang, Z. He, W. Jiao, X. Wang, Y. Wang, R. Wang, Y. Yang, Z. Tu, and S. Shi. Encouraging divergent thinking in large language models through multi-agent debate.arXiv preprint arXiv:2305.19118, 2023b. F. Lieder and T. L. Griffiths. Resource-rational analysis: Understanding human cognition as the optimal use of limited computational resources.Behavioral...

work page internal anchor Pith review Pith/arXiv arXiv
[45]

B. Y. Lin, Y. Fu, K. Yang, P. Ammanabrolu, F. Brahman, S. Huang, C. Bhagavatula, Y. Choi, and X. Ren. Swiftsage: A generative agent with fast and slow thinking for complex interactive tasks.arXiv preprint arXiv:2305.17390,

work page arXiv
[46]

B. Liu, Y. Jiang, X. Zhang, Q. Liu, S. Zhang, J. Biswas, and P. Stone. LLM+P: Empowering large language models with optimal planning proficiency.arXiv preprint arXiv:2304.11477, 2023a. H. Liu, C. Li, Q. Wu, and Y. J. Lee. Visual instruction tuning. InNeurIPS, 2023b. H. Liu, C. Sferrazza, and P. Abbeel. Languages are rewards: Hindsight finetuning using hum...

work page internal anchor Pith review Pith/arXiv arXiv
[47]

P. Liu, W. Yuan, J. Fu, Z. Jiang, H. Hayashi, and G. Neubig. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing.ACM Computing Surveys, 55(9), 2023d. ISSN 0360-0300. R. Liu, J. Wei, S. S. Gu, T.-Y. Wu, S. Vosoughi, C. Cui, D. Zhou, and A. M. Dai. Mind’s eye: Grounded language model reasoning through simu...

work page arXiv
[48]

Z. Ma, Y. Mei, and Z. Su. Understanding the benefits and challenges of using large language model-based conversational agents for mental well-being support.arXiv preprint arXiv:2307.15810,

work page arXiv
[49]

Self-Refine: Iterative Refinement with Self-Feedback

25 Published in Transactions on Machine Learning Research (02/2024) A. Madaan, N. Tandon, P. Gupta, S. Hallinan, L. Gao, S. Wiegreffe, U. Alon, N. Dziri, S. Prabhumoye, Y. Yang, et al. Self-refine: Iterative refinement with self-feedback.arXiv preprint arXiv:2303.17651,

work page internal anchor Pith review Pith/arXiv arXiv 2024
[50]

J. L. McClelland, F. Hill, M. Rudolph, J. Baldridge, and H. Schütze. Extending machine language models toward human-level language understanding.arXiv preprint arXiv:1912.05877,

work page arXiv 1912
[51]

Augmented Language Models: a Survey

G. Mialon, R. Dessì, M. Lomeli, C. Nalmpantis, R. Pasunuru, R. Raileanu, B. Rozière, T. Schick, J. Dwivedi- Yu, A. Celikyilmaz, et al. Augmented language models: a survey. arXiv preprint arXiv:2302.07842,

work page internal anchor Pith review Pith/arXiv arXiv
[52]

WebGPT: Browser-assisted question-answering with human feedback

R. Nakano, J. Hilton, S. Balaji, J. Wu, L. Ouyang, C. Kim, C. Hesse, S. Jain, V. Kosaraju, W. Saunders, et al. WebGPT: Browser-Assisted Question-Answering with Human Feedback.arXiv preprint arXiv:2112.09332,

work page internal anchor Pith review Pith/arXiv arXiv
[53]

Nguyen and H

K. Nguyen and H. Daumé III. Help, Anna! visual navigation with natural multimodal assistance via retrospective curiosity-encouraging imitation learning.arXiv preprint arXiv:1909.01871,

work page arXiv 1909
[54]

Nguyen, Y

K. Nguyen, Y. Bisk, and H. Daumé III. A framework for learning to request rich and contextually useful information from humans. InICML, July 2022a. 26 Published in Transactions on Machine Learning Research (02/2024) K. X. Nguyen. Language models are bounded pragmatic speakers. InFirst Workshop on Theory of Mind in Communicating Agents,

work page 2024
[55]

K. X. Nguyen, Y. Bisk, and H. D. Iii. A framework for learning to request rich and contextually useful information from humans. InInternational Conference on Machine Learning, pages 16553–16568, 2022b. T. T. Nguyen, T. T. Huynh, P. L. Nguyen, A. W.-C. Liew, H. Yin, and Q. V. H. Nguyen. A survey of machine unlearning.arXiv preprint arXiv:2209.02299, 2022c....

work page arXiv
[56]

M. Nye, A. J. Andreassen, G. Gur-Ari, H. Michalewski, J. Austin, D. Bieber, D. Dohan, A. Lewkowycz, M. Bosma, D. Luan, et al. Show your work: Scratchpads for intermediate computation with language models. arXiv preprint arXiv:2112.00114,

work page internal anchor Pith review Pith/arXiv arXiv
[57]

GPT-4 Technical Report

OpenAI. Gpt-4 technical report.ArXiv, abs/2303.08774, 2023a. OpenAI. Function calling and other API updates, 2023b. URL https://openai.com/blog/ function-calling-and-other-api-updates. L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, et al. Training language models to follow instructions with human...

work page internal anchor Pith review Pith/arXiv arXiv
[58]

Padmakumar, J

A. Padmakumar, J. Thomason, A. Shrivastava, P. Lange, A. Narayan-Chen, S. Gella, R. Piramuthu, G. Tur, and D. Hakkani-Tur. Teach: Task-driven embodied agents that chat. InProceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 2017–2025,

work page 2017
[59]

N. D. Palo, A. Byravan, L. Hasenclever, M. Wulfmeier, N. Heess, and M. Riedmiller. Towards a unified agent with foundation models. InWorkshop on Reincarnating Reinforcement Learning at ICLR 2023,

work page 2023
[60]

Parisi, Y

A. Parisi, Y. Zhao, and N. Fiedel. Talm: Tool augmented language models.arXiv preprint arXiv:2205.12255,

work page arXiv
[61]

J. S. Park, J. C. O’Brien, C. J. Cai, M. R. Morris, P. Liang, and M. S. Bernstein. Generative agents: Interactive simulacra of human behavior.arXiv preprint arXiv:2304.03442,

work page internal anchor Pith review Pith/arXiv arXiv
[62]

A. Peng, I. Sucholutsky, B. Li, T. R. Sumers, T. L. Griffiths, J. Andreas, and J. A. Shah. Language guided state abstractions. InWorkshop on Social Intelligence in Humans and Robots at RSS 2023,

work page 2023
[63]

27 Published in Transactions on Machine Learning Research (02/2024) M. L. Puterman.Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons,

work page 2024
[64]

C. Qian, X. Cong, C. Yang, W. Chen, Y. Su, J. Xu, Z. Liu, and M. Sun. Communicative agents for software development. arXiv preprint arXiv:2307.07924,

work page internal anchor Pith review Pith/arXiv arXiv
[65]

Y. Qin, S. Liang, Y. Ye, K. Zhu, L. Yan, Y. Lu, Y. Lin, X. Cong, X. Tang, B. Qian, et al. Toolllm: Facilitating large language models to master 16000+ real-world apis.arXiv preprint arXiv:2307.16789,

work page internal anchor Pith review Pith/arXiv arXiv
[66]

O. J. Romero, J. Zimmerman, A. Steinfeld, and A. Tomasic. Synergistic integration of large language models and cognitive architectures for robust ai: An exploratory analysis.arXiv preprint arXiv:2308.09830,

work page arXiv
[67]

Code Llama: Open Foundation Models for Code

B. Rozière, J. Gehring, F. Gloeckle, S. Sootla, I. Gat, X. Tan, Y. Adi, J. Liu, T. Remez, J. Rapin, A. Kozhevnikov, I. Evtimov, J. Bitton, M. P. Bhatt, C. C. Ferrer, A. Grattafiori, W. Xiong, A. D’efossez, J. Copet, F. Azhar, H. Touvron, L. Martin, N. Usunier, T. Scialom, and G. Synnaeve. Code llama: Open foundation models for code.ArXiv, abs/2308.12950,

work page internal anchor Pith review Pith/arXiv arXiv
[68]

Rubin, J

O. Rubin, J. Herzig, and J. Berant. Learning to retrieve prompts for in-context learning.arXiv preprint arXiv:2112.08633,

work page arXiv
[69]

Self-critiquing models for assisting human evaluators

W. Saunders, C. Yeh, J. Wu, S. Bills, L. Ouyang, J. Ward, and J. Leike. Self-critiquing models for assisting human evaluators.arXiv preprint arXiv:2206.05802,

work page internal anchor Pith review arXiv
[70]

Toolformer: Language Models Can Teach Themselves to Use Tools

T. Schick, J. Dwivedi-Yu, R. Dessì, R. Raileanu, M. Lomeli, L. Zettlemoyer, N. Cancedda, and T. Scialom. Toolformer: Language models can teach themselves to use tools.arXiv preprint arXiv:2302.04761,

work page internal anchor Pith review Pith/arXiv arXiv
[71]

Sculley, G

D. Sculley, G. Holt, D. Golovin, E. Davydov, T. Phillips, D. Ebner, V. Chaudhary, and M. Young. Machine Learning: The High Interest Credit Card of Technical Debt. InSE4ML: Software Engineering for Machine Learning (NIPS 2014 Workshop),

work page 2014
[72]

Reflexion: Language Agents with Verbal Reinforcement Learning

N. Shinn, F. Cassano, B. Labash, A. Gopinath, K. Narasimhan, and S. Yao. Reflexion: Language agents with verbal reinforcement learning.arXiv preprint arXiv:2303.11366,

work page internal anchor Pith review Pith/arXiv arXiv
[73]

ALFWorld: Aligning Text and Embodied Environments for Interactive Learning

M. Shridhar, X. Yuan, M.-A. Côté, Y. Bisk, A. Trischler, and M. Hausknecht. Alfworld: Aligning text and embodied environments for interactive learning.arXiv preprint arXiv:2010.03768,

work page internal anchor Pith review Pith/arXiv arXiv 2010
[74]

Silver, V

28 Published in Transactions on Machine Learning Research (02/2024) T. Silver, V. Hariprasad, R. S. Shuttleworth, N. Kumar, T. Lozano-Pérez, and L. P. Kaelbling. Pddl planning with pretrained large language models. InNeurIPS 2022 Foundation Models for Decision Making Workshop,

work page 2024
[75]

Silver, S

T. Silver, S. Dan, K. Srinivas, J. B. Tenenbaum, L. P. Kaelbling, and M. Katz. Generalized Planning in PDDL Domains with Pretrained Large Language Models.arXiv preprint arXiv:2305.11014,

work page arXiv
[76]

Tafjord, B

O. Tafjord, B. Dalvi, and P. Clark. Proofwriter: Generating implications, proofs, and abductive statements over natural language. InFindings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 3621–3634,

work page 2021
[77]

doi: 10.18653/v1/2020.acl-main.559

Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.559. M. Tambe, W. L. Johnson, R. M. Jones, F. Koss, J. E. Laird, P. S. Rosenbloom, and K. Schwamb. Intelligent agents for interactive simulation environments.AI magazine, 16(1):15–15,

work page doi:10.18653/v1/2020.acl-main.559 2020
[78]

M. Tang, S. Yao, J. Yang, and K. Narasimhan. Referral augmentation for zero-shot information retrieval, 2023a. Q. Tang, Z. Deng, H. Lin, X. Han, Q. Liang, and L. Sun. ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases.arXiv preprint arXiv:2306.05301, 2023b. G. Team, R. Anil, S. Borgeaud, Y. Wu, J.-B. Alayrac, J. Yu, R. Sor...

work page internal anchor Pith review Pith/arXiv arXiv
[79]

Multi-stage episodic control for strategic exploration in text games

J. Tuyls, S. Yao, S. Kakade, and K. Narasimhan. Multi-stage episodic control for strategic exploration in text games. arXiv preprint arXiv:2201.01251,

work page arXiv
[80]

arXiv preprint arXiv:2206.10498 , year=

29 Published in Transactions on Machine Learning Research (02/2024) K. Valmeekam, A. Olmo, S. Sreedharan, and S. Kambhampati. Large language models still can’t plan (a benchmark for llms on planning and reasoning about change).arXiv preprint arXiv:2206.10498,

work page arXiv 2024

Showing first 80 references.