pith. machine review for the scientific record. sign in

arxiv: 2309.02427 · v3 · submitted 2023-09-05 · 💻 cs.AI · cs.CL· cs.LG· cs.SC

Recognition: 3 theorem links

Cognitive Architectures for Language Agents

Authors on Pith no claims yet

Pith reviewed 2026-05-16 19:30 UTC · model grok-4.3

classification 💻 cs.AI cs.CLcs.LGcs.SC
keywords language agentscognitive architecturesLLM agentsagent frameworksmemory componentsdecision makingartificial intelligencesymbolic AI
0
0 comments X

The pith

CoALA structures language agents with modular memory components, a structured action space, and a generalized decision-making process drawn from cognitive science.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper proposes CoALA as a framework to organize language agents built on large language models. It specifies agents that keep separate memory stores for different kinds of information, choose from a defined set of actions that can update memory or affect the outside world, and follow a general process to pick the next action. The same structure is applied to review many recent agent designs and to name specific ways to make future agents more capable. A sympathetic reader would see this as a way to connect today's ad-hoc LLM systems to older ideas from cognitive science instead of reinventing components with each new prompt or tool. The long-term goal is a path toward language-based general intelligence through systematic architecture rather than isolated improvements.

Core claim

CoALA describes a language agent with modular memory components, a structured action space to interact with internal memory and external environments, and a generalized decision-making process to choose actions. We use CoALA to retrospectively survey and organize a large body of recent work, and prospectively identify actionable directions towards more capable agents.

What carries the argument

CoALA, the proposed architecture that equips language agents with modular memory components, a structured action space for internal and external interactions, and a generalized decision-making process.

Load-bearing premise

That principles from cognitive science and symbolic AI transfer directly to LLM-based agents without losing the flexibility and broad capabilities that make the models effective.

What would settle it

A controlled experiment in which agents built without modular memory or structured actions match or exceed the performance and generalization of CoALA-based agents on multi-step reasoning and tool-use benchmarks.

read the original abstract

Recent efforts have augmented large language models (LLMs) with external resources (e.g., the Internet) or internal control flows (e.g., prompt chaining) for tasks requiring grounding or reasoning, leading to a new class of language agents. While these agents have achieved substantial empirical success, we lack a systematic framework to organize existing agents and plan future developments. In this paper, we draw on the rich history of cognitive science and symbolic artificial intelligence to propose Cognitive Architectures for Language Agents (CoALA). CoALA describes a language agent with modular memory components, a structured action space to interact with internal memory and external environments, and a generalized decision-making process to choose actions. We use CoALA to retrospectively survey and organize a large body of recent work, and prospectively identify actionable directions towards more capable agents. Taken together, CoALA contextualizes today's language agents within the broader history of AI and outlines a path towards language-based general intelligence.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper proposes Cognitive Architectures for Language Agents (CoALA), a descriptive framework for LLM-based agents that incorporates modular memory components, a structured action space for interactions with internal memory and external environments, and a generalized decision-making process. Drawing from cognitive science and symbolic AI, CoALA is applied retrospectively to organize a survey of recent language agent work and prospectively to suggest directions for more capable agents, with the goal of contextualizing current systems within broader AI history.

Significance. If the framework holds as an organizational tool, it offers a useful bridge between empirical language agent successes and historical cognitive/symbolic principles, potentially aiding systematic planning of future developments without introducing fitted parameters or self-referential derivations. The conceptual mapping of existing systems strengthens its value as a retrospective lens, though its prospective utility remains to be tested empirically.

minor comments (2)
  1. In the survey section, the mapping of specific agents (e.g., those using external tools) to CoALA modules could include a table summarizing coverage to make the retrospective organization more explicit and verifiable.
  2. The description of the generalized decision-making process in §3 would benefit from a short pseudocode example to clarify how it differs from standard prompt chaining without relying solely on prose.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive review and recommendation to accept the manuscript. Their summary accurately captures the goals, structure, and contributions of CoALA as a framework for organizing language agents.

Circularity Check

0 steps flagged

No significant circularity: CoALA is a conceptual framework proposal

full rationale

The paper proposes CoALA as a descriptive architecture drawing on external cognitive science and symbolic AI literature to organize language agents with modular memory, structured actions, and decision-making. No equations, fitted parameters, or predictions are defined; the central claims are retrospective organization of prior work and prospective suggestions, with no self-referential reductions or load-bearing self-citations that collapse the framework into its own inputs. The derivation chain is self-contained against external benchmarks from cognitive science.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that cognitive-science and symbolic-AI principles transfer usefully to LLM agents; no free parameters or new physical entities are introduced.

axioms (1)
  • domain assumption Cognitive science and symbolic AI supply reusable principles for agent design
    Invoked in the introduction and CoALA definition sections to justify the modular structure.
invented entities (1)
  • CoALA architecture no independent evidence
    purpose: To provide a unified description and planning tool for language agents
    A proposed organizing framework rather than a new physical or computational entity with independent falsifiable predictions.

pith-pipeline@v0.9.0 · 5468 in / 1231 out tokens · 36168 ms · 2026-05-16T19:30:13.643090+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • LogicAsFunctionalEquation laws_of_logic_imply_dalembert_hypotheses unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    We draw on the rich history of cognitive science and symbolic artificial intelligence to propose Cognitive Architectures for Language Agents (CoALA).

  • HierarchyEmergence hierarchy_emergence_forces_phi unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    CoALA contextualizes today’s language agents within the broader history of AI and outlines a path towards language-based general intelligence.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 18 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems

    cs.MA 2024-10 unverdicted novelty 8.0

    Prompt injection attacks can self-replicate across LLM agents in multi-agent systems, enabling data theft, misinformation, and system disruption while propagating silently.

  2. The Moltbook Files: A Harmless Slopocalypse or Humanity's Last Experiment

    cs.CL 2026-05 unverdicted novelty 7.0

    An AI-agent social platform generated mostly neutral content whose use in fine-tuning reduced model truthfulness comparably to human Reddit data, suggesting limited unique harm but flagging tail risks like secret leaks.

  3. More Is Not Always Better: Cross-Component Interference in LLM Agent Scaffolding

    cs.AI 2026-05 conditional novelty 7.0

    Full factorial testing of five LLM agent components reveals that the complete 'All-In' combination is consistently outperformed by smaller subsets due to cross-component interference, with optimal subsets being task- ...

  4. OCR-Memory: Optical Context Retrieval for Long-Horizon Agent Memory

    cs.CL 2026-04 unverdicted novelty 7.0

    OCR-Memory encodes agent trajectories as images with visual anchors and retrieves verbatim text via locate-and-transcribe, yielding gains on long-horizon benchmarks under strict context limits.

  5. The Missing Knowledge Layer in Cognitive Architectures for AI Agents

    cs.AI 2026-04 conditional novelty 7.0

    Cognitive architectures for AI agents require a distinct Knowledge layer with indefinite supersession persistence, separate from Memory decay, Wisdom evidence-gating, and Intelligence ephemerality.

  6. ClawVM: Harness-Managed Virtual Memory for Stateful Tool-Using LLM Agents

    cs.AI 2026-04 unverdicted novelty 7.0

    ClawVM introduces a harness-managed virtual memory system for LLM agents that ensures deterministic residency and durability of state under token budgets by using typed pages and validated writeback.

  7. ROZA Graphs: Self-Improving Near-Deterministic RAG through Evidence-Centric Feedback

    cs.AI 2026-04 unverdicted novelty 7.0

    ROZA graphs enable self-improving RAG by storing evidence-specific reasoning chains, yielding up to 10.6pp accuracy gains and 46% lower cost through graph traversal feedback.

  8. MatClaw: An Autonomous Code-First LLM Agent for End-to-End Materials Exploration

    cond-mat.mtrl-sci 2026-04 conditional novelty 7.0

    MatClaw is a code-first LLM agent that autonomously executes end-to-end materials workflows by generating and running Python scripts on remote clusters, achieving reliable code generation via memory architecture and R...

  9. $\tau$-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains

    cs.AI 2024-06 unverdicted novelty 7.0

    τ-bench shows state-of-the-art agents like GPT-4o succeed on under 50% of tool-using, rule-following tasks and are inconsistent across repeated trials.

  10. How to Interpret Agent Behavior

    cs.AI 2026-05 conditional novelty 6.0

    ACT*ONOMY is a Grounded-Theory-derived hierarchical taxonomy and open repository that enables systematic comparison and characterization of autonomous agent behavior across trajectories.

  11. Memanto: Typed Semantic Memory with Information-Theoretic Retrieval for Long-Horizon Agents

    cs.AI 2026-04 unverdicted novelty 6.0

    Memanto delivers 89.8% and 87.1% accuracy on LongMemEval and LoCoMo benchmarks using typed semantic memory and information-theoretic retrieval, outperforming hybrid graph and vector systems with a single query and zer...

  12. OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

    cs.CL 2024-10 unverdicted novelty 6.0

    OS-Atlas, trained on the largest open-source cross-platform GUI grounding corpus of 13 million elements, outperforms prior open-source models on six benchmarks across mobile, desktop, and web platforms.

  13. A Roadmap to Pluralistic Alignment

    cs.AI 2024-02 unverdicted novelty 6.0

    The paper formalizes three types of pluralistic AI models and three benchmark classes, arguing that current alignment techniques may reduce rather than increase distributional pluralism.

  14. Is Grep All You Need? How Agent Harnesses Reshape Agentic Search

    cs.CL 2026-05 unverdicted novelty 5.0

    Grep retrieval generally outperforms vector retrieval in agentic search tasks, with performance varying strongly by agent harness and tool-calling style.

  15. Large Language Model based Multi-Agents: A Survey of Progress and Challenges

    cs.CL 2024-01 unverdicted novelty 4.0

    The paper surveys LLM-based multi-agent systems, covering simulated domains, agent profiling and communication, mechanisms for capacity growth, and common benchmarks.

  16. The Rise and Potential of Large Language Model Based Agents: A Survey

    cs.AI 2023-09 accept novelty 4.0

    The paper surveys the origins, frameworks, applications, and open challenges of AI agents built on large language models.

  17. Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models

    cs.AI 2025-01 unverdicted novelty 3.0

    The paper surveys reinforced reasoning techniques for LLMs, covering automated data construction, learning-to-reason methods, and test-time scaling as steps toward Large Reasoning Models.

  18. Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security

    cs.HC 2024-01 unverdicted novelty 3.0

    This survey discusses key components and challenges for Personal LLM Agents and reviews solutions for their capability, efficiency, and security.

Reference graph

Works this paper leans on

96 extracted references · 96 canonical work pages · cited by 18 Pith papers · 43 internal anchors

  1. [1]

    M. Ahn, A. Brohan, N. Brown, Y. Chebotar, O. Cortes, B. David, C. Finn, C. Fu, K. Gopalakrishnan, K. Hausman, et al. Do as I can, not as I say: Grounding language in robotic affordances.arXiv preprint arXiv:2204.01691,

  2. [2]

    J. Andreas. Language models as agent models. InFindings of the Association for Computational Linguistics: EMNLP 2022, pages 5769–5779,

  3. [3]

    19 Published in Transactions on Machine Learning Research (02/2024) A. D. Baddeley and G. Hitch. Working memory. InPsychology of Learning and Motivation, volume 8, pages 47–89. Elsevier,

  4. [4]

    Y. Bai, S. Kadavath, S. Kundu, A. Askell, J. Kernion, A. Jones, A. Chen, A. Goldie, A. Mirhoseini, C. McKinnon, et al. Constitutional AI: Harmlessness from AI feedback.arXiv preprint arXiv:2212.08073,

  5. [5]

    Model-Free Episodic Control

    C. Blundell, B. Uria, A. Pritzel, Y. Li, A. Ruderman, J. Z. Leibo, J. Rae, D. Wierstra, and D. Hassabis. Model-free episodic control.arXiv preprint arXiv:1606.04460,

  6. [6]

    RT-1: Robotics Transformer for Real-World Control at Scale

    A. Brohan, N. Brown, J. Carbajal, Y. Chebotar, J. Dabis, C. Finn, K. Gopalakrishnan, K. Hausman, A. Herzog, J. Hsu, et al. RT-1: Robotics transformer for real-world control at scale.arXiv preprint arXiv:2212.06817,

  7. [7]

    RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

    A. Brohan, N. Brown, J. Carbajal, Y. Chebotar, X. Chen, K. Choromanski, T. Ding, D. Driess, A. Dubey, C. Finn, et al. RT-2: Vision-language-action models transfer web knowledge to robotic control.arXiv preprint arXiv:2307.15818,

  8. [8]

    Brown, B

    T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al. Language models are few-shot learners.Advances in Neural Information Processing Systems, 33:1877–1901,

  9. [9]

    C.-M. Chan, W. Chen, Y. Su, J. Yu, W. Xue, S. Zhang, J. Fu, and Z. Liu. Chateval: Towards better llm-based evaluators through multi-agent debate.arXiv preprint arXiv:2308.07201,

  10. [10]

    B. Chen, F. Xia, B. Ichter, K. Rao, K. Gopalakrishnan, M. S. Ryoo, A. Stone, and D. Kappler. Open- vocabulary queryable scene representations for real world planning. In2023 IEEE International Conference on Robotics and Automation (ICRA), pages 11509–11522, 2023a. 20 Published in Transactions on Machine Learning Research (02/2024) D. Chen and R. Mooney. L...

  11. [11]

    D. Chen, A. Fisch, J. Weston, and A. Bordes. Reading Wikipedia to answer open-domain questions.arXiv preprint arXiv:1704.00051,

  12. [12]

    M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. d. O. Pinto, J. Kaplan, H. Edwards, Y. Burda, N. Joseph, G. Brockman, et al. Evaluating large language models trained on code.arXiv preprint arXiv:2107.03374,

  13. [13]

    X. Chen, M. Lin, N. Schärli, and D. Zhou. Teaching large language models to self-debug.arXiv preprint arXiv:2304.05128, 2023b. Y. Chen, L. Yuan, G. Cui, Z. Liu, and H. Ji. A close look into the calibration of pre-trained language models. arXiv preprint arXiv:2211.00151,

  14. [14]

    PaLM: Scaling Language Modeling with Pathways

    A. Chowdhery, S. Narang, J. Devlin, M. Bosma, G. Mishra, A. Roberts, P. Barham, H. W. Chung, C. Sutton, S. Gehrmann, et al. Palm: Scaling language modeling with pathways.arXiv preprint arXiv:2204.02311,

  15. [15]

    M.-A. Côté, A. Kádár, X. Yuan, B. Kybartas, T. Barnes, E. Fine, J. Moore, M. Hausknecht, L. El Asri, M. Adada, et al. Textworld: A learning environment for text-based games. InComputer Games: 7th Workshop, CGW 2018, pages 41–75. Springer,

  16. [16]

    Dagan, F

    G. Dagan, F. Keller, and A. Lascarides. Dynamic Planning with a LLM.arXiv preprint arXiv:2308.06391,

  17. [17]

    X. Deng, Y. Gu, B. Zheng, S. Chen, S. Stevens, B. Wang, H. Sun, and Y. Su. Mind2Web: Towards a generalist agent for the web.arXiv preprint arXiv:2306.06070,

  18. [18]

    Dohan, W

    D. Dohan, W. Xu, A. Lewkowycz, J. Austin, D. Bieber, R. G. Lopes, Y. Wu, H. Michalewski, R. A. Saurous, J. Sohl-Dickstein, et al. Language model cascades.arXiv preprint arXiv:2207.10342,

  19. [19]

    Y. Dong, X. Jiang, Z. Jin, and G. Li. Self-collaboration code generation via chatgpt. arXiv preprint arXiv:2304.07590,

  20. [20]

    PaLM-E: An Embodied Multimodal Language Model

    D. Driess, F. Xia, M. S. Sajjadi, C. Lynch, A. Chowdhery, B. Ichter, A. Wahid, J. Tompson, Q. Vuong, T. Yu, et al. Palm-e: An embodied multimodal language model.arXiv preprint arXiv:2303.03378,

  21. [21]

    21 Published in Transactions on Machine Learning Research (02/2024) Y. Du, S. Li, A. Torralba, J. B. Tenenbaum, and I. Mordatch. Improving factuality and reasoning in language models through multiagent debate.arXiv preprint arXiv:2305.14325,

  22. [22]

    Ecoffet, J

    A. Ecoffet, J. Huizinga, J. Lehman, K. O. Stanley, and J. Clune. Go-explore: a new approach for hard- exploration problems. arXiv preprint arXiv:1901.10995,

  23. [23]

    URLhttps://www.adept.ai/blog/persimmon-8b. S. Feng, C. Y. Park, Y. Liu, and Y. Tsvetkov. From pretraining data to language models to downstream tasks: Tracking the trails of political biases leading to unfair nlp models.arXiv preprint arXiv:2305.08283,

  24. [24]

    Ganguli, A

    D. Ganguli, A. Askell, N. Schiefer, T. Liao, K. Lukoši¯ ut˙ e, A. Chen, A. Goldie, A. Mirhoseini, C. Olsson, D. Hernandez, et al. The capacity for moral self-correction in large language models.arXiv preprint arXiv:2302.07459,

  25. [25]

    C. Gao, X. Lan, Z. Lu, J. Mao, J. Piao, H. Wang, D. Jin, and Y. Li. S3: Social-network simulation system with large language model-empowered agents.arXiv preprint arXiv:2307.14984,

  26. [26]

    T. Gao, A. Fisch, and D. Chen. Making pre-trained language models better few-shot learners.arXiv preprint arXiv:2012.15723,

  27. [27]

    L. Guan, K. Valmeekam, S. Sreedharan, and S. Kambhampati. Leveraging pre-trained large language models to construct and utilize world models for model-based task planning.arXiv preprint arXiv:2305.14909,

  28. [28]

    URLhttps://github.com/guidance-ai/guidance. I. Gur, H. Furuta, A. Huang, M. Safdari, Y. Matsuo, D. Eck, and A. Faust. A real-world webagent with planning, long context understanding, and program synthesis.arXiv preprint arXiv:2307.12856,

  29. [29]

    S. Hao, Y. Gu, H. Ma, J. J. Hong, Z. Wang, D. Z. Wang, and Z. Hu. Reasoning with language model is planning with world model.arXiv preprint arXiv:2305.14992,

  30. [30]

    Hasan, C

    M. Hasan, C. Ozel, S. Potter, and E. Hoque. Sapien: Affective virtual agents powered by large language models. arXiv preprint arXiv:2308.03022,

  31. [31]

    Haslum, N

    22 Published in Transactions on Machine Learning Research (02/2024) P. Haslum, N. Lipovetzky, D. Magazzeni, C. Muise, R. Brachman, F. Rossi, and P. Stone.An introduction to the planning domain definition language, volume

  32. [32]

    S. Hong, X. Zheng, J. Chen, Y. Cheng, C. Zhang, Z. Wang, S. K. S. Yau, Z. Lin, L. Zhou, C. Ran, et al. Metagpt: Meta programming for multi-agent collaborative framework.arXiv preprint arXiv:2308.00352, 2023a. W. Hong, W. Wang, Q. Lv, J. Xu, W. Yu, J. Ji, Y. Wang, Z. Wang, Y. Dong, M. Ding, et al. Cogagent: A visual language model for gui agents.arXiv prep...

  33. [33]

    Inner Monologue: Embodied Reasoning through Planning with Language Models

    W. Huang, P. Abbeel, D. Pathak, and I. Mordatch. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In International Conference on Machine Learning, pages 9118–9147, 2022b. W. Huang, F. Xia, T. Xiao, H. Chan, J. Liang, P. Florence, A. Zeng, J. Tompson, I. Mordatch, Y. Chebotar, et al. Inner monologue: Embodied reas...

  34. [34]

    AI safety via debate

    G. Irving, P. Christiano, and D. Amodei. AI safety via debate.arXiv preprint arXiv:1805.00899,

  35. [35]

    Unsupervised Dense Information Retrieval with Contrastive Learning

    G. Izacard, M. Caron, L. Hosseini, S. Riedel, P. Bojanowski, A. Joulin, and E. Grave. Unsupervised dense information retrieval with contrastive learning.arXiv preprint arXiv:2112.09118,

  36. [36]

    Jinxin, Z

    S. Jinxin, Z. Jiabao, W. Yilei, W. Xingjiao, L. Jiawen, and H. Liang. Cgmi: Configurable general multi-agent interaction framework.arXiv preprint arXiv:2308.12503,

  37. [37]

    Khattab, K

    O. Khattab, K. Santhanam, X. L. Li, D. Hall, P. Liang, C. Potts, and M. Zaharia. Demonstrate-search-predict: Composing retrieval and language models for knowledge-intensive NLP.arXiv preprint arXiv:2212.14024,

  38. [38]

    G.Kim, P.Baldi, andS.McAleer

    URL https://github.com/stanfordnlp/dspy. G.Kim, P.Baldi, andS.McAleer. Languagemodelscansolvecomputertasks. arXiv preprint arXiv:2303.17491,

  39. [39]

    23 Published in Transactions on Machine Learning Research (02/2024) J. R. Kirk, W. Robert, P. Lindes, and J. E. Laird. Improving Knowledge Extraction from LLMs for Robotic Task Learning through Agent Analysis.arXiv preprint arXiv:2306.06770,

  40. [40]

    Laidlaw, S

    C. Laidlaw, S. Russell, and A. Dragan. Bridging rl theory and practice with the effective horizon.arXiv preprint arXiv:2304.09853,

  41. [41]

    J. E. Laird. Introduction to Soar.arXiv preprint arXiv:2205.03854,

  42. [42]

    Y. LeCun. A path towards autonomous machine intelligence version 0.9. 2, 2022-06-27.Open Review, 62,

  43. [43]

    B. Z. Li, W. Chen, P. Sharma, and J. Andreas. Lampp: Language models as probabilistic priors for perception and action. arXiv preprint arXiv:2302.02801, 2023a. C. Li, Z. Gan, Z. Yang, J. Yang, L. Li, L. Wang, and J. Gao. Multimodal foundation models: From specialists to general-purpose assistants.arXiv preprint arXiv:2309.10020, 2023b. H. Li, Y. Su, D. Ca...

  44. [44]

    Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate

    T. Liang, Z. He, W. Jiao, X. Wang, Y. Wang, R. Wang, Y. Yang, Z. Tu, and S. Shi. Encouraging divergent thinking in large language models through multi-agent debate.arXiv preprint arXiv:2305.19118, 2023b. F. Lieder and T. L. Griffiths. Resource-rational analysis: Understanding human cognition as the optimal use of limited computational resources.Behavioral...

  45. [45]

    B. Y. Lin, Y. Fu, K. Yang, P. Ammanabrolu, F. Brahman, S. Huang, C. Bhagavatula, Y. Choi, and X. Ren. Swiftsage: A generative agent with fast and slow thinking for complex interactive tasks.arXiv preprint arXiv:2305.17390,

  46. [46]

    B. Liu, Y. Jiang, X. Zhang, Q. Liu, S. Zhang, J. Biswas, and P. Stone. LLM+P: Empowering large language models with optimal planning proficiency.arXiv preprint arXiv:2304.11477, 2023a. H. Liu, C. Li, Q. Wu, and Y. J. Lee. Visual instruction tuning. InNeurIPS, 2023b. H. Liu, C. Sferrazza, and P. Abbeel. Languages are rewards: Hindsight finetuning using hum...

  47. [47]

    P. Liu, W. Yuan, J. Fu, Z. Jiang, H. Hayashi, and G. Neubig. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing.ACM Computing Surveys, 55(9), 2023d. ISSN 0360-0300. R. Liu, J. Wei, S. S. Gu, T.-Y. Wu, S. Vosoughi, C. Cui, D. Zhou, and A. M. Dai. Mind’s eye: Grounded language model reasoning through simu...

  48. [48]

    Z. Ma, Y. Mei, and Z. Su. Understanding the benefits and challenges of using large language model-based conversational agents for mental well-being support.arXiv preprint arXiv:2307.15810,

  49. [49]

    Self-Refine: Iterative Refinement with Self-Feedback

    25 Published in Transactions on Machine Learning Research (02/2024) A. Madaan, N. Tandon, P. Gupta, S. Hallinan, L. Gao, S. Wiegreffe, U. Alon, N. Dziri, S. Prabhumoye, Y. Yang, et al. Self-refine: Iterative refinement with self-feedback.arXiv preprint arXiv:2303.17651,

  50. [50]

    J. L. McClelland, F. Hill, M. Rudolph, J. Baldridge, and H. Schütze. Extending machine language models toward human-level language understanding.arXiv preprint arXiv:1912.05877,

  51. [51]

    Augmented Language Models: a Survey

    G. Mialon, R. Dessì, M. Lomeli, C. Nalmpantis, R. Pasunuru, R. Raileanu, B. Rozière, T. Schick, J. Dwivedi- Yu, A. Celikyilmaz, et al. Augmented language models: a survey. arXiv preprint arXiv:2302.07842,

  52. [52]

    WebGPT: Browser-assisted question-answering with human feedback

    R. Nakano, J. Hilton, S. Balaji, J. Wu, L. Ouyang, C. Kim, C. Hesse, S. Jain, V. Kosaraju, W. Saunders, et al. WebGPT: Browser-Assisted Question-Answering with Human Feedback.arXiv preprint arXiv:2112.09332,

  53. [53]

    Nguyen and H

    K. Nguyen and H. Daumé III. Help, Anna! visual navigation with natural multimodal assistance via retrospective curiosity-encouraging imitation learning.arXiv preprint arXiv:1909.01871,

  54. [54]

    Nguyen, Y

    K. Nguyen, Y. Bisk, and H. Daumé III. A framework for learning to request rich and contextually useful information from humans. InICML, July 2022a. 26 Published in Transactions on Machine Learning Research (02/2024) K. X. Nguyen. Language models are bounded pragmatic speakers. InFirst Workshop on Theory of Mind in Communicating Agents,

  55. [55]

    K. X. Nguyen, Y. Bisk, and H. D. Iii. A framework for learning to request rich and contextually useful information from humans. InInternational Conference on Machine Learning, pages 16553–16568, 2022b. T. T. Nguyen, T. T. Huynh, P. L. Nguyen, A. W.-C. Liew, H. Yin, and Q. V. H. Nguyen. A survey of machine unlearning.arXiv preprint arXiv:2209.02299, 2022c....

  56. [56]

    M. Nye, A. J. Andreassen, G. Gur-Ari, H. Michalewski, J. Austin, D. Bieber, D. Dohan, A. Lewkowycz, M. Bosma, D. Luan, et al. Show your work: Scratchpads for intermediate computation with language models. arXiv preprint arXiv:2112.00114,

  57. [57]

    GPT-4 Technical Report

    OpenAI. Gpt-4 technical report.ArXiv, abs/2303.08774, 2023a. OpenAI. Function calling and other API updates, 2023b. URL https://openai.com/blog/ function-calling-and-other-api-updates. L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, et al. Training language models to follow instructions with human...

  58. [58]

    Padmakumar, J

    A. Padmakumar, J. Thomason, A. Shrivastava, P. Lange, A. Narayan-Chen, S. Gella, R. Piramuthu, G. Tur, and D. Hakkani-Tur. Teach: Task-driven embodied agents that chat. InProceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 2017–2025,

  59. [59]

    N. D. Palo, A. Byravan, L. Hasenclever, M. Wulfmeier, N. Heess, and M. Riedmiller. Towards a unified agent with foundation models. InWorkshop on Reincarnating Reinforcement Learning at ICLR 2023,

  60. [60]

    Parisi, Y

    A. Parisi, Y. Zhao, and N. Fiedel. Talm: Tool augmented language models.arXiv preprint arXiv:2205.12255,

  61. [61]

    J. S. Park, J. C. O’Brien, C. J. Cai, M. R. Morris, P. Liang, and M. S. Bernstein. Generative agents: Interactive simulacra of human behavior.arXiv preprint arXiv:2304.03442,

  62. [62]

    A. Peng, I. Sucholutsky, B. Li, T. R. Sumers, T. L. Griffiths, J. Andreas, and J. A. Shah. Language guided state abstractions. InWorkshop on Social Intelligence in Humans and Robots at RSS 2023,

  63. [63]

    27 Published in Transactions on Machine Learning Research (02/2024) M. L. Puterman.Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons,

  64. [64]

    C. Qian, X. Cong, C. Yang, W. Chen, Y. Su, J. Xu, Z. Liu, and M. Sun. Communicative agents for software development. arXiv preprint arXiv:2307.07924,

  65. [65]

    Y. Qin, S. Liang, Y. Ye, K. Zhu, L. Yan, Y. Lu, Y. Lin, X. Cong, X. Tang, B. Qian, et al. Toolllm: Facilitating large language models to master 16000+ real-world apis.arXiv preprint arXiv:2307.16789,

  66. [66]

    O. J. Romero, J. Zimmerman, A. Steinfeld, and A. Tomasic. Synergistic integration of large language models and cognitive architectures for robust ai: An exploratory analysis.arXiv preprint arXiv:2308.09830,

  67. [67]

    Code Llama: Open Foundation Models for Code

    B. Rozière, J. Gehring, F. Gloeckle, S. Sootla, I. Gat, X. Tan, Y. Adi, J. Liu, T. Remez, J. Rapin, A. Kozhevnikov, I. Evtimov, J. Bitton, M. P. Bhatt, C. C. Ferrer, A. Grattafiori, W. Xiong, A. D’efossez, J. Copet, F. Azhar, H. Touvron, L. Martin, N. Usunier, T. Scialom, and G. Synnaeve. Code llama: Open foundation models for code.ArXiv, abs/2308.12950,

  68. [68]

    Rubin, J

    O. Rubin, J. Herzig, and J. Berant. Learning to retrieve prompts for in-context learning.arXiv preprint arXiv:2112.08633,

  69. [69]

    Self-critiquing models for assisting human evaluators

    W. Saunders, C. Yeh, J. Wu, S. Bills, L. Ouyang, J. Ward, and J. Leike. Self-critiquing models for assisting human evaluators.arXiv preprint arXiv:2206.05802,

  70. [70]

    Toolformer: Language Models Can Teach Themselves to Use Tools

    T. Schick, J. Dwivedi-Yu, R. Dessì, R. Raileanu, M. Lomeli, L. Zettlemoyer, N. Cancedda, and T. Scialom. Toolformer: Language models can teach themselves to use tools.arXiv preprint arXiv:2302.04761,

  71. [71]

    Sculley, G

    D. Sculley, G. Holt, D. Golovin, E. Davydov, T. Phillips, D. Ebner, V. Chaudhary, and M. Young. Machine Learning: The High Interest Credit Card of Technical Debt. InSE4ML: Software Engineering for Machine Learning (NIPS 2014 Workshop),

  72. [72]

    Reflexion: Language Agents with Verbal Reinforcement Learning

    N. Shinn, F. Cassano, B. Labash, A. Gopinath, K. Narasimhan, and S. Yao. Reflexion: Language agents with verbal reinforcement learning.arXiv preprint arXiv:2303.11366,

  73. [73]

    ALFWorld: Aligning Text and Embodied Environments for Interactive Learning

    M. Shridhar, X. Yuan, M.-A. Côté, Y. Bisk, A. Trischler, and M. Hausknecht. Alfworld: Aligning text and embodied environments for interactive learning.arXiv preprint arXiv:2010.03768,

  74. [74]

    Silver, V

    28 Published in Transactions on Machine Learning Research (02/2024) T. Silver, V. Hariprasad, R. S. Shuttleworth, N. Kumar, T. Lozano-Pérez, and L. P. Kaelbling. Pddl planning with pretrained large language models. InNeurIPS 2022 Foundation Models for Decision Making Workshop,

  75. [75]

    Silver, S

    T. Silver, S. Dan, K. Srinivas, J. B. Tenenbaum, L. P. Kaelbling, and M. Katz. Generalized Planning in PDDL Domains with Pretrained Large Language Models.arXiv preprint arXiv:2305.11014,

  76. [76]

    Tafjord, B

    O. Tafjord, B. Dalvi, and P. Clark. Proofwriter: Generating implications, proofs, and abductive statements over natural language. InFindings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 3621–3634,

  77. [77]

    doi: 10.18653/v1/2020.acl-main.559

    Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.559. M. Tambe, W. L. Johnson, R. M. Jones, F. Koss, J. E. Laird, P. S. Rosenbloom, and K. Schwamb. Intelligent agents for interactive simulation environments.AI magazine, 16(1):15–15,

  78. [78]

    M. Tang, S. Yao, J. Yang, and K. Narasimhan. Referral augmentation for zero-shot information retrieval, 2023a. Q. Tang, Z. Deng, H. Lin, X. Han, Q. Liang, and L. Sun. ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases.arXiv preprint arXiv:2306.05301, 2023b. G. Team, R. Anil, S. Borgeaud, Y. Wu, J.-B. Alayrac, J. Yu, R. Sor...

  79. [79]

    Multi-stage episodic control for strategic exploration in text games

    J. Tuyls, S. Yao, S. Kakade, and K. Narasimhan. Multi-stage episodic control for strategic exploration in text games. arXiv preprint arXiv:2201.01251,

  80. [80]

    arXiv preprint arXiv:2206.10498 , year=

    29 Published in Transactions on Machine Learning Research (02/2024) K. Valmeekam, A. Olmo, S. Sreedharan, and S. Kambhampati. Large language models still can’t plan (a benchmark for llms on planning and reasoning about change).arXiv preprint arXiv:2206.10498,

Showing first 80 references.