A Technical Taxonomy of LLM Agent Communication Protocols

Alexander Lenz; Alois Knoll; Habtom Kahsay Gidey; Linus Sander

arxiv: 2606.19135 · v1 · pith:UH6V5PSAnew · submitted 2026-06-17 · 💻 cs.MA · cs.AI· cs.NI

A Technical Taxonomy of LLM Agent Communication Protocols

Linus Sander , Habtom Kahsay Gidey , Alexander Lenz , Alois Knoll This is my paper

Pith reviewed 2026-06-26 18:32 UTC · model grok-4.3

classification 💻 cs.MA cs.AIcs.NI

keywords LLM agent communicationmulti-agent systemsprotocol taxonomyinteroperabilityschema flexibilitysession-state persistencedecentralized discoveryfederated protocol stack

0 comments

The pith

A taxonomy of LLM agent communication protocols identifies five dimensions and shows hybrid payloads with session-state persistence in all sampled agent-to-agent cases.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a taxonomy through iterative classification of nine open-source LLM agent protocols to address interoperability challenges in multi-agent systems. It defines five dimensions—counterparty, payload, interaction state, discovery mechanism, and schema flexibility—and applies them to reveal patterns such as universal hybrid payloads paired with persistent sessions, support for multiple schemas in most cases, and rare decentralized discovery. A sympathetic reader would care because these patterns clarify current architectural choices and point to likely evolution in how agents communicate with each other and with tools or data. The analysis concludes that short-term pressure favors unified protocols while long-term development favors a federated layered stack. This framework also identifies gaps like privacy enforcement.

Core claim

Following an established iterative method with five rounds on nine actively maintained open-source protocols, the authors define a taxonomy whose five dimensions are counterparty, payload, interaction state, discovery mechanism, and schema flexibility; classification shows that all sampled agent-to-agent protocols combine hybrid payloads with session-state persistence, most support multiple predefined schemas while two negotiate schemas at runtime, and decentralized discovery remains rare, leading to the suggestion of short-term convergence toward unified agent-to-agent and agent-to-context communication but long-term evolution toward a federated layered protocol stack.

What carries the argument

The five taxonomy dimensions (counterparty, payload, interaction state, discovery mechanism, schema flexibility) that classify protocols by how they handle communication targets, data formats, state management, partner location, and schema handling.

If this is right

Short-term development will favor protocols that unify agent-to-agent and agent-to-context communication.
No single protocol will simultaneously maximize versatility, efficiency, and portability.
The field will evolve toward a federated, layered protocol stack rather than one dominant standard.
Protocol selection can be guided by the five dimensions to match specific use cases.
Research gaps remain in areas such as privacy and policy enforcement within these protocols.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the taxonomy dimensions prove stable, future protocol designers could use them as a checklist to ensure coverage of discovery and schema flexibility.
The observed trend toward runtime schema negotiation may reduce the need for upfront standardization across different agent ecosystems.
Extending the taxonomy to include closed-source or enterprise protocols could test whether the patterns of hybrid payloads and persistent sessions hold beyond open-source examples.
The rarity of decentralized discovery suggests that current systems still rely on central registries, which could create single points of failure in large-scale agent networks.

Load-bearing premise

The nine actively maintained open-source protocols with demonstrable adoption sufficiently represent the broader space of LLM agent communication protocols.

What would settle it

Discovery of one or more widely adopted protocols that use only non-hybrid payloads or lack session-state persistence while still qualifying as agent-to-agent communication.

read the original abstract

As large language models (LLMs) advance and multi-agent systems aim to overcome the limits of standalone agents, robust communication protocols are becoming essential infrastructure for distributed agent networks. Nonetheless, the fragmented protocol landscape presents a significant interoperability challenge. This study develops a technical taxonomy to classify and analyze LLM agent communication protocols. Following an established iterative method, we defined the taxonomy's purpose, meta-characteristic, and ending conditions, then performed five iterations, three empirical-to-conceptual and two conceptual-to-empirical, on nine actively maintained open-source protocols with demonstrable adoption. The taxonomy comprises five dimensions: counterparty, payload, interaction state, discovery mechanism, and schema flexibility. Classification reveals recurring architectural patterns: all sampled agent-to-agent protocols combine hybrid payloads with session-state persistence; most protocols support multiple predefined schemas, and two negotiate schemas at runtime, indicating a trend toward schema flexibility; decentralized discovery remains rare. Analysis suggests short-term convergence pressure toward protocols unifying agent-to-agent and agent-to-context (tool and data) communication. Long-term, however, no single protocol is likely to maximize versatility, efficiency, and portability simultaneously. The field will more likely evolve toward a federated, layered protocol stack. The framework guides protocol selection and highlights open research gaps such as privacy and policy enforcement.}

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper builds a five-dimension taxonomy from nine open-source LLM agent protocols and flags some patterns, but the trends rest on a narrow sample whose selection is not fully justified.

read the letter

The main takeaway is a taxonomy with five dimensions—counterparty, payload, interaction state, discovery mechanism, and schema flexibility—applied to nine actively maintained open-source protocols. The classification shows that all sampled agent-to-agent protocols use hybrid payloads plus session-state persistence, most handle multiple predefined schemas, and two support runtime negotiation. Decentralized discovery is rare in the set.

The work applies an established iterative taxonomy method with five rounds and produces usable dimensions that map the current landscape. It also draws practical implications about short-term convergence toward unified protocols and longer-term movement to federated stacks, plus gaps in privacy and policy enforcement. That part is straightforward and grounded in the inspected protocols.

The soft spot is the sample. The paper limits itself to open-source protocols with demonstrable adoption but gives no explicit search protocol, inclusion criteria, or argument that the set is saturated or representative. Proprietary enterprise systems or emerging decentralized efforts could easily change the observed regularities, so the claims about trends and convergence are harder to extend beyond these nine. The abstract does not resolve that.

This is for researchers and engineers working on multi-agent LLM systems who need a way to compare protocols or spot interoperability issues. A reader focused on protocol design would find the dimensions and pattern summary useful. It deserves peer review so referees can check the classification details and the sample justification.

Referee Report

2 major / 2 minor

Summary. The paper develops a technical taxonomy for LLM agent communication protocols by applying an established iterative taxonomy construction method (five iterations: three empirical-to-conceptual, two conceptual-to-empirical) to nine actively maintained open-source protocols with demonstrable adoption. The resulting taxonomy has five dimensions (counterparty, payload, interaction state, discovery mechanism, schema flexibility). Classification of the sample reveals that all agent-to-agent protocols use hybrid payloads with session-state persistence, most support multiple predefined schemas (with two enabling runtime negotiation), and decentralized discovery is rare. The analysis infers short-term convergence pressure toward unified agent-to-agent and agent-to-context protocols and long-term evolution toward a federated layered stack, while positioning the taxonomy as a guide for protocol selection and a source of research gaps (e.g., privacy, policy enforcement).

Significance. If the taxonomy dimensions and observed patterns are robust, the work supplies a grounded, bottom-up classification framework for a fragmented area of multi-agent systems research. The explicit use of an established iterative method and the absence of fitted parameters or circular derivations are strengths that increase the classification's credibility. The framework could usefully inform protocol design choices and surface under-explored issues such as privacy and policy enforcement.

major comments (2)

[Methodology / protocol selection description] The section describing protocol selection states that the nine protocols were chosen as 'actively maintained open-source protocols with demonstrable adoption' but supplies no explicit search protocol, inclusion/exclusion criteria, or saturation argument. Because the central claims about recurring architectural patterns and field-level trends (hybrid payloads + session-state persistence in all agent-to-agent cases; trend toward schema flexibility; short-term convergence) rest on the representativeness of this sample, the lack of a documented selection procedure limits the strength of the generalizations.
[Results / classification patterns] Results section on classification patterns: the claim that 'all sampled agent-to-agent protocols combine hybrid payloads with session-state persistence' is presented as a key regularity, yet the manuscript does not report how borderline cases were resolved across the five iterations or whether any protocol required reclassification after dimension refinement. This detail is load-bearing for the reliability of the pattern and the subsequent convergence inference.

minor comments (2)

[Methods] The abstract and methods paragraph list the iteration counts and types but would benefit from a compact table summarizing the purpose, meta-characteristic, and ending conditions for each iteration to improve traceability.
[Taxonomy dimensions] The term 'hybrid payloads' is used repeatedly in the results but receives only a brief definition in the taxonomy dimension section; an expanded example or diagram would aid readers unfamiliar with the protocols.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed review. The two major comments identify genuine opportunities to strengthen methodological transparency and reporting of the iterative classification process. We address each point below and will incorporate revisions in the next version of the manuscript.

read point-by-point responses

Referee: The section describing protocol selection states that the nine protocols were chosen as 'actively maintained open-source protocols with demonstrable adoption' but supplies no explicit search protocol, inclusion/exclusion criteria, or saturation argument. Because the central claims about recurring architectural patterns and field-level trends (hybrid payloads + session-state persistence in all agent-to-agent cases; trend toward schema flexibility; short-term convergence) rest on the representativeness of this sample, the lack of a documented selection procedure limits the strength of the generalizations.

Authors: We agree that the absence of an explicit search protocol and documented inclusion/exclusion criteria is a limitation that weakens the strength of the generalizations drawn from the sample. The protocols were identified through a combination of literature review and community knowledge of actively maintained projects with visible adoption (e.g., GitHub stars, integrations, and citations), but this process was not formalized in the manuscript. In revision we will add a dedicated subsection under Methodology that states the practical selection criteria used, lists the specific indicators of adoption considered, and explains how the five-iteration process itself served as an informal saturation check. This addition will not alter the sample but will make the selection rationale reproducible and will qualify the scope of the observed patterns. revision: yes
Referee: Results section on classification patterns: the claim that 'all sampled agent-to-agent protocols combine hybrid payloads with session-state persistence' is presented as a key regularity, yet the manuscript does not report how borderline cases were resolved across the five iterations or whether any protocol required reclassification after dimension refinement. This detail is load-bearing for the reliability of the pattern and the subsequent convergence inference.

Authors: We accept that the manuscript should have reported the handling of borderline cases and any reclassifications. In the actual execution of the five iterations, no protocol required reclassification after dimension refinement, and the agent-to-agent subset exhibited unambiguous hybrid payloads together with session-state persistence from the second iteration onward; no borderline cases arose that needed explicit resolution rules. We will revise the Results section to include a short paragraph summarizing the iteration outcomes, noting the stability of the key pattern and the absence of reclassifications. This addition will increase transparency without changing the reported classifications. revision: yes

Circularity Check

0 steps flagged

No circularity: taxonomy constructed bottom-up from direct protocol inspection

full rationale

The paper develops a taxonomy via an established iterative method applied to nine protocols, defining dimensions (counterparty, payload, interaction state, discovery mechanism, schema flexibility) and classifying observed patterns such as hybrid payloads with session-state persistence. No mathematical derivations, fitted parameters, predictions, or self-citations appear as load-bearing steps in the provided text. The classification reduces directly to the inspected protocols rather than to any prior fitted quantities or author-defined uniqueness theorems. The representativeness concern raised by the skeptic is a question of external validity, not a circular reduction within the derivation chain itself.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The taxonomy rests on the assumption that the chosen iterative method produces stable dimensions and that the nine protocols capture the relevant design space; no free parameters or new entities are introduced.

axioms (1)

domain assumption An established iterative taxonomy development method (purpose, meta-characteristic, ending conditions, alternating empirical-conceptual iterations) is appropriate and sufficient for classifying communication protocols.
The paper states it followed this method but provides no independent validation that the method yields exhaustive or unbiased dimensions for this domain.

pith-pipeline@v0.9.1-grok · 5763 in / 1242 out tokens · 29949 ms · 2026-06-26T18:32:28.951611+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

147 extracted references · 21 linked inside Pith

[1]

arXiv preprint arXiv:2402.03578 (2024)

Han, S., Zhang, Q., Yao, Y., Jin, W., Xu, Z.: LLM multi-agent systems: Challenges and open problems. arXiv preprint arXiv:2402.03578 (2024)

Pith/arXiv arXiv 2024
[2]

arXiv preprint arXiv:2404.11584 (2024)

Masterman, T., Besen, S., Sawtell, M., Chao, A.: The landscape of emerging AI agent architectures for reasoning, planning, and tool calling: A survey. arXiv preprint arXiv:2404.11584 (2024)

Pith/arXiv arXiv 2024
[3]

: The rise and potential of large language model based agents: A survey

Xi, Z., Chen, W., Guo, X., He, W., Ding, Y., Hong, B., Zhang, M., Wang, J., Jin, S., Zhou, E., et al. : The rise and potential of large language model based agents: A survey. Science China Information Sciences 68(2), 121101 (2025)

2025
[4]

In: International Conference on Learning Representations, vol

Yu, J., Wang, X., Tu, S., Cao, S., Zhang-Li, D., Lv, X., Peng, H., Yao, Z., Zhang, X., Li, H., et al.: Kola: Carefully benchmarking world knowledge of large language models. In: International Conference on Learning Representations, vol. 2024, pp. 44594–44637 (2024)

2024
[5]

arXiv preprint arXiv:2303.08774 (2023)

Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F.L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., et al.: GPT-4 technical report. arXiv preprint arXiv:2303.08774 (2023)

Pith/arXiv arXiv 2023
[6]

: A multitask, multilingual, multimodal evaluation of ChatGPT on reasoning, hallucination, and interactivity

Bang, Y., Cahyawijaya, S., Lee, N., Dai, W., Su, D., Wilie, B., Lovenia, H., Ji, Z., Yu, T., Chung, W., et al. : A multitask, multilingual, multimodal evaluation of ChatGPT on reasoning, hallucination, and interactivity. In: Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-pacific Cha...

2023
[7]

Advances in Neural Information Processing Systems 36, 38975–38987 (2023)

Valmeekam, K., Marquez, M., Olmo, A., Sreedharan, S., Kambhampati, S.: Planbench: An extensible benchmark for evaluating large language models on planning and reasoning about change. Advances in Neural Information Processing Systems 36, 38975–38987 (2023)

2023
[8]

: Agentbench: Evaluating LLMs as agents

Liu, X., Yu, H., Zhang, H., Xu, Y., Lei, X., Lai, H., Gu, Y., Ding, H., Men, K., Yang, K., et al. : Agentbench: Evaluating LLMs as agents. In: International Conference on Learning Representations, vol. 2024, pp. 52989–53046 (2024) 24

2024
[9]

Advances in neural information processing systems 36, 58202–58245 (2023)

Sun, H., Zhuang, Y., Kong, L., Dai, B., Zhang, C.: Adaplanner: Adaptive planning from feedback with language models. Advances in neural information processing systems 36, 58202–58245 (2023)

2023
[10]

In: AAAI 2025 Workshop LM4Plan (2025)

Hsiao, V., Fine-Morris, M., Roberts, M., Smith, L.N., Hiatt, L.M.: A critical assessment of LLMs for solving multi-step problems: Preliminary results. In: AAAI 2025 Workshop LM4Plan (2025)

2025
[11]

arXiv preprint arXiv:2402.01680 (2024)

Guo, T., Chen, X., Wang, Y., Chang, R., Pei, S., Chawla, N.V., Wiest, O., Zhang, X.: Large language model based multi-agents: A survey of progress and challenges. arXiv preprint arXiv:2402.01680 (2024)

Pith/arXiv arXiv 2024
[12]

arXiv preprint arXiv:2306.03314 (2023)

Talebirad, Y., Nadiri, A.: Multi-agent collaboration: Harnessing the power of intelligent LLM agents. arXiv preprint arXiv:2306.03314 (2023)

Pith/arXiv arXiv 2023
[13]

MIT press, Cambridge, MA (2015)

Malone, T.W., Bernstein, M.: Handbook of Collective Intelligence. MIT press, Cambridge, MA (2015)

2015
[14]

: Agentverse: Facilitating multi-agent collaboration and exploring emergent behaviors

Chen, W., Su, Y., Zuo, J., Yang, C., Yuan, C., Chan, C.-M., Yu, H., Lu, Y., Hung, Y.-H., Qian, C., et al. : Agentverse: Facilitating multi-agent collaboration and exploring emergent behaviors. In: International Conference on Learning Representations, vol. 2024, pp. 20094–20136 (2024)

2024
[15]

: MetaGPT: Meta programming for a multi-agent collaborative framework

Hong, S., Zhuge, M., Chen, J., Zheng, X., Cheng, Y., Wang, J., Zhang, C., Yau, S., Lin, Z., Zhou, L., et al. : MetaGPT: Meta programming for a multi-agent collaborative framework. In: International Conference on Learning Representations, vol. 2024, pp. 23247–23275 (2024)

2024
[16]

arXiv preprint arXiv:2502.14321 (2025)

Yan, B., Zhou, Z., Zhang, L., Zhang, L., Zhou, Z., Miao, D., Li, Z., Li, C., Zhang, X.: Beyond self-talk: A communication-centric survey of LLM-based multi-agent systems. arXiv preprint arXiv:2502.14321 (2025)

Pith/arXiv arXiv 2025
[17]

Microsoft: AutoGen. GitHub. Accessed 21 Jul 2025 (2023)

2025
[18]

: Autogen: Enabling next-gen LLM applications via multi-agent conversations

Wu, Q., Bansal, G., Zhang, J., Wu, Y., Li, B., Zhu, E., Jiang, L., Zhang, X., Zhang, S., Liu, J., et al. : Autogen: Enabling next-gen LLM applications via multi-agent conversations. In: First Conference on Language Modeling (2024)

2024
[19]

CrewAI: CrewAI. GitHub. Accessed 21 Jul 2025 (2023)

2025
[20]

CAMEL: CAMEL. GitHub. Accessed 21 Jul 2025 (2023)

2025
[21]

https://arxiv.org/abs/2303.17760

Li, G., Hammoud, H.A.A.K., Itani, H., Khizbullin, D., Ghanem, B.: CAMEL: Communicative Agents for ”Mind” Exploration of Large Language Model Society (2023). https://arxiv.org/abs/2303.17760

Pith/arXiv arXiv 2023
[22]

LangChain: LangGraph. GitHub. Accessed 21 Jul 2025 (2024) 25

2025
[23]

Cemri, M., Pan, M.Z., Yang, S., Agrawal, L.A., Chopra, B., Tiwari, R., Keutzer, K., Parameswaran, A., Klein, D., Ramchandran, K., et al.: Why do multi-agent LLM systems fail? arXiv preprint arXiv:2503.13657 (2025)

Pith/arXiv arXiv 2025
[24]

arXiv preprint arXiv:2506.19676 (2025)

Kong, D., Lin, S., Xu, Z., Wang, Z., Li, M., Li, Y., Zhang, Y., Peng, H., Chen, X., Sha, Z., et al.: A survey of LLM-driven AI agent communication: Protocols, security risks, and defense countermeasures. arXiv preprint arXiv:2506.19676 (2025)

arXiv 2025
[25]

arXiv preprint arXiv:2505.02279 (2025)

Ehtesham, A., Singh, A., Gupta, G.K., Kumar, S.: A survey of agent inter- operability protocols: Model Context Protocol (MCP), Agent Communication Protocol (ACP), Agent-to-Agent protocol (A2A), and Agent Network Protocol (ANP). arXiv preprint arXiv:2505.02279 (2025)

arXiv 2025
[26]

arXiv preprint arXiv:2410.11905 (2024)

Marro, S., La Malfa, E., Wright, J., Li, G., Shadbolt, N., Wooldridge, M., Torr, P.: A scalable communication protocol for networks of large language models. arXiv preprint arXiv:2410.11905 (2024)

arXiv 2024
[27]

arXiv preprint arXiv:2510.13821 (2025)

Li, X., Liu, M., Yuen, C.: LLM agent communication protocol (LACP) requires urgent standardization: A telecom-inspired protocol is necessary. arXiv preprint arXiv:2510.13821 (2025)

arXiv 2025
[28]

In: Proceedings of the Eleventh Annual Conference on Computational Learning Theory, pp

Russell, S.: Learning agents for uncertain environments. In: Proceedings of the Eleventh Annual Conference on Computational Learning Theory, pp. 101–103 (1998)

1998
[29]

The knowledge engineering review 10(2), 115–152 (1995)

Wooldridge, M., Jennings, N.R.: Intelligent agents: Theory and practice. The knowledge engineering review 10(2), 115–152 (1995)

1995
[30]

Oxford University Press, New York, NY (1995)

Mele, A.R.: Autonomous Agents: From Self-control to Autonomy. Oxford University Press, New York, NY (1995)

1995
[31]

Autonomous agents and multi-agent systems 1(1), 7–38 (1998)

Jennings, N.R., Sycara, K., Wooldridge, M.: A roadmap of agent research and development. Autonomous agents and multi-agent systems 1(1), 7–38 (1998)

1998
[32]

The knowledge engineering review 11(3), 205–244 (1996)

Nwana, H.S.: Software agents: An overview. The knowledge engineering review 11(3), 205–244 (1996)

1996
[33]

arXiv preprint arXiv:2412.19437 (2024)

Liu, A., Feng, B., Xue, B., Wang, B., Wu, B., Lu, C., Zhao, C., Deng, C., Zhang, C., Ruan, C., et al.: Deepseek-v3 technical report. arXiv preprint arXiv:2412.19437 (2024)

Pith/arXiv arXiv 2024
[34]

https://www

Anthropic: System Card: Claude Opus 4 & Claude Sonnet 4. https://www. anthropic.com/claude-4-system-card . Accessed 21 Jul 2025 (2025)

2025
[35]

In: International Conference on Learning Representations, vol

Chen, X., Lin, M., Schärli, N., Zhou, D.: Teaching large language models to self- debug. In: International Conference on Learning Representations, vol. 2024, pp. 26 8746–8825 (2024)

2024
[36]

: Self-refine: Iterative refine- ment with self-feedback

Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., et al. : Self-refine: Iterative refine- ment with self-feedback. Advances in neural information processing systems 36, 46534–46594 (2023)

2023
[37]

Advances in neural information processing systems 35, 22199–22213 (2022)

Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are zero-shot reasoners. Advances in neural information processing systems 35, 22199–22213 (2022)

2022
[38]

arXiv preprint arXiv:2503.00946 (2025)

Li, S., Padilla, S., Bras, P.L., Dong, J., Chantler, M.: A review of LLM-assisted ideation. arXiv preprint arXiv:2503.00946 (2025)

arXiv 2025
[39]

arXiv preprint arXiv:2303.18223 (2023)

Zhao, W.X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., et al.: A survey of large language models. arXiv preprint arXiv:2303.18223 (2023)

Pith/arXiv arXiv 2023
[40]

: Language models are few-shot learners

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al. : Language models are few-shot learners. Advances in neural information processing systems 33, 1877–1901 (2020)

1901
[41]

arXiv preprint arXiv:2305.17812 (2023)

Jin, Z., Lu, W.: Tab-cot: Zero-shot tabular chain of thought. arXiv preprint arXiv:2305.17812 (2023)

arXiv 2023
[42]

: TPTU: Task planning and tool usage of large language model-based ai agents

Ruan, J., Chen, Y., Zhang, B., Xu, Z., Bao, T., Mao, H., Li, Z., Zeng, X., Zhao, R., et al. : TPTU: Task planning and tool usage of large language model-based ai agents. In: NeurIPS 2023 Foundation Models for Decision Making Workshop (2023)

2023
[43]

In: Proceed- ings of the AAAI Conference on Artificial Intelligence, vol

Besta, M., Blach, N., Kubicek, A., Gerstenberger, R., Podstawski, M., Giani- nazzi, L., Gajda, J., Lehmann, T., Niewiadomski, H., Nyczyk, P., et al.: Graph of thoughts: Solving elaborate problems with large language models. In: Proceed- ings of the AAAI Conference on Artificial Intelligence, vol. 38, pp. 17682–17690 (2024)

2024
[44]

In: International Conference on Learning Representations (ICLR) (2023)

Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models. In: International Conference on Learning Representations (ICLR) (2023)

2023
[45]

Advances in neural information processing systems 36, 11809–11822 (2023)

Yao, S., Yu, D., Zhao, J., Shafran, I., Griﬀiths, T., Cao, Y., Narasimhan, K.: Tree of thoughts: Deliberate problem solving with large language models. Advances in neural information processing systems 36, 11809–11822 (2023)

2023
[46]

arXiv preprint arXiv:2304.06488 (2023)

Zhang, C., Zhang, C., Li, C., Qiao, Y., Zheng, S., Dam, S.K., Zhang, M., Kim, J.U., Kim, S.T., Choi, J., et al.: One small step for generative AI, one giant 27 leap for AGI: A complete survey on ChatGPT in AIGC era. arXiv preprint arXiv:2304.06488 (2023)

arXiv 2023
[47]

: A survey on large language model based autonomous agents

Wang, L., Ma, C., Feng, X., Zhang, Z., Yang, H., Zhang, J., Chen, Z., Tang, J., Chen, X., Lin, Y., et al. : A survey on large language model based autonomous agents. Frontiers of Computer Science 18(6), 186345 (2024)

2024
[48]

arXiv preprint arXiv:2510.09244 (2025)

Castrillo, V.d.L., Gidey, H.K., Lenz, A., Knoll, A.: Fundamentals of building autonomous LLM agents. arXiv preprint arXiv:2510.09244 (2025)

arXiv 2025
[49]

Cognition 49(1-2), 165–187 (1993)

Evans, J.S.B., Over, D.E., Manktelow, K.I.: Reasoning, decision making and rationality. Cognition 49(1-2), 165–187 (1993)

1993
[50]

arXiv preprint arXiv:2402.02716 (2024)

Huang, X., Liu, W., Chen, X., Wang, X., Wang, H., Lian, D., Wang, Y., Tang, R., Chen, E.: Understanding the planning of LLM agents: A survey. arXiv preprint arXiv:2402.02716 (2024)

Pith/arXiv arXiv 2024
[51]

: Chain-of-thought prompting elicits reasoning in large language models

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., Le, Q.V., Zhou, D., et al. : Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems 35, 24824–24837 (2022)

2022
[52]

Advances in Neural Information Processing Systems 36, 8634–8652 (2023)

Shinn, N., Cassano, F., Gopinath, A., Narasimhan, K., Yao, S.: Reflex- ion: Language agents with verbal reinforcement learning. Advances in Neural Information Processing Systems 36, 8634–8652 (2023)

2023
[53]

arXiv preprint arXiv:2304.12773 (2023)

Gidey, H.K., Marmsoler, D., Ascher, D.: Modeling adaptive self-healing systems. arXiv preprint arXiv:2304.12773 (2023)

arXiv 2023
[54]

Petroni, F., Rocktäschel, T., Lewis, P., Bakhtin, A., Wu, Y., Miller, A.H., Riedel, S.: Language models as knowledge bases? arXiv preprint arXiv:1909.01066 (2019)

Pith/arXiv arXiv 1909
[55]

arXiv preprint arXiv:2404.13501 (2024)

Zhang, Z., Bo, X., Ma, C., Li, R., Chen, X., Dai, Q., Zhu, J., Dong, Z., Wen, J.- R.: A survey on the memory mechanism of large language model based agents. arXiv preprint arXiv:2404.13501 (2024)

Pith/arXiv arXiv 2024
[56]

In: Proceedings of the AAAI Symposium Series, vol

Hatalis, K., Christou, D., Myers, J., Jones, S., Lambert, K., Amos-Binks, A., Dannenhauer, Z., Dannenhauer, D.: Memory matters: The need to improve long- term memory in LLM-agents. In: Proceedings of the AAAI Symposium Series, vol. 2, pp. 277–280 (2023)

2023
[57]

: Retrieval-augmented generation for knowledge-intensive NLP tasks

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küt- tler, H., Lewis, M., Yih, W.-t., Rocktäschel, T., et al. : Retrieval-augmented generation for knowledge-intensive NLP tasks. Advances in neural information processing systems 33, 9459–9474 (2020) 28

2020
[58]

arXiv preprint arXiv:2412.15266 (2024)

Zeng, R., Fang, J., Liu, S., Meng, Z.: On the structural memory of LLM agents. arXiv preprint arXiv:2412.15266 (2024)

arXiv 2024
[59]

In: International Conference on Intelligent Systems and Pattern Recognition, pp

Gidey, H.K., Kesseler, M., Stangl, P., Hillmann, P., Karcher, A.: Document- based knowledge discovery with microservices architecture. In: International Conference on Intelligent Systems and Pattern Recognition, pp. 146–161 (2022). Springer International Publishing Cham

2022
[60]

arXiv preprint arXiv:2409.18807 (2024)

Shen, Z.: LLM with tools: A survey. arXiv preprint arXiv:2409.18807 (2024)

arXiv 2024
[61]

arXiv preprint arXiv:2510.24459 (2025)

Gidey, H.K., Huber, N., Lenz, A., Knoll, A.: Affordance representation and recognition for autonomous agents. arXiv preprint arXiv:2510.24459 (2025)

arXiv 2025
[62]

arXiv preprint arXiv:2212.10846 (2022)

Guo, J., Li, J., Li, D., Tiong, A.M.H., Li, B., Tao, D., Hoi, S.C.: From images to textual prompts: Zero-shot visual question with frozen large language models. arXiv preprint arXiv:2212.10846 (2022)

arXiv 2022
[63]

arXiv preprint arXiv:2306.14824 (2023)

Peng, Z., Wang, W., Dong, L., Hao, Y., Huang, S., Ma, S., Wei, F.: Kosmos- 2: Grounding multimodal large language models to the world. arXiv preprint arXiv:2306.14824 (2023)

Pith/arXiv arXiv 2023
[64]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Pi, R., Yao, L., Gao, J., Zhang, J., Zhang, T.: PerceptionGPT: Effectively fusing visual perception into LLM. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 27124–27133 (2024)

2024
[65]

arXiv preprint arXiv:2604.28001 (2026)

Gidey, H.K., Lenz, A., Knoll, A.: A pattern language for resilient visual agents. arXiv preprint arXiv:2604.28001 (2026)

Pith/arXiv arXiv 2026
[66]

https://arxiv.org/abs/2307.16789

Qin, Y., Liang, S., Ye, Y., Zhu, K., Yan, L., Lu, Y., Lin, Y., Cong, X., Tang, X., Qian, B., Zhao, S., Hong, L., Tian, R., Xie, R., Zhou, J., Gerstein, M., Li, D., Liu, Z., Sun, M.: ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs (2023). https://arxiv.org/abs/2307.16789

Pith/arXiv arXiv 2023
[67]

In: Inter- national Conference on Computational Intelligence in Music, Sound, Art and Design (Part of EvoStar), pp

Macedo, J., Gidey, H.K., Rebuli, K.B., Machado, P.: Evolving user interfaces: A neuroevolution approach for natural human-machine interaction. In: Inter- national Conference on Computational Intelligence in Music, Sound, Art and Design (Part of EvoStar), pp. 246–264 (2024). Springer Nature Switzerland Cham

2024
[68]

Proceedings of the ACM on Software Engineering 2(ISSTA), 1054–1076 (2025)

Bouzenia, I., Pradel, M.: You name it, I run it: An LLM agent to execute tests of arbitrary projects. Proceedings of the ACM on Software Engineering 2(ISSTA), 1054–1076 (2025)

2025
[69]

Machine Learning with Applications 17, 100570 (2024) 29

Cao, C., Wang, F., Lindley, L., Wang, Z.: Managing Linux servers with LLM- based AI agents: An empirical evaluation with GPT4. Machine Learning with Applications 17, 100570 (2024) 29

2024
[70]

arXiv preprint arXiv:2307.07924 (2023)

Qian, C., Liu, W., Liu, H., Chen, N., Dang, Y., Li, J., Yang, C., Chen, W., Su, Y., Cong, X., et al.: Chatdev: Communicative agents for software development. arXiv preprint arXiv:2307.07924 (2023)

Pith/arXiv arXiv 2023
[71]

arXiv preprint arXiv:2501.16150 (2025)

Sager, P.J., Meyer, B., Yan, P., Wartburg-Kottler, R., Etaiwi, L., Enayati, A., Nobel, G., Abdulkadir, A., Grewe, B.F., Stadelmann, T.: AI agents for com- puter use: A review of instruction-based computer control, GUI automation, and operator assistants. arXiv preprint arXiv:2501.16150 (2025)

Pith/arXiv arXiv 2025
[72]

https://arxiv.org/abs/ 2409.05556

Ghafarollahi, A., Buehler, M.J.: SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning (2024). https://arxiv.org/abs/ 2409.05556

arXiv 2024
[73]

https://arxiv.org/abs/2312.07559

Lála, J., O’Donoghue, O., Shtedritski, A., Cox, S., Rodriques, S.G., White, A.D.: PaperQA: Retrieval-Augmented Generative Agent for Scientific Research (2023). https://arxiv.org/abs/2312.07559

arXiv 2023
[74]

https://arxiv

Katz, U., Levy, M., Goldberg, Y.: Knowledge Navigator: LLM-guided Browsing Framework for Exploratory Search in Scientific Literature (2024). https://arxiv. org/abs/2408.15836

arXiv 2024
[75]

In: International Conference on Machine Learning, Opti- mization, and Data Science, pp

Gidey, H.K., Hillmann, P., Karcher, A., Knoll, A.: User-like bots for cognitive automation: A survey. In: International Conference on Machine Learning, Opti- mization, and Data Science, pp. 388–402 (2023). Springer Nature Switzerland Cham

2023
[76]

Critical Care 27(1), 120 (2023)

Azamfirei, R., Kudchadkar, S.R., Fackler, J.: Large language models and the perils of their hallucinations. Critical Care 27(1), 120 (2023)

2023
[77]

Hsieh, C.-P., Sun, S., Kriman, S., Acharya, S., Rekesh, D., Jia, F., Zhang, Y., Ginsburg, B.: RULER: What’s the real context size of your long-context language models? arXiv preprint arXiv:2404.06654 (2024)

Pith/arXiv arXiv 2024
[78]

Transactions of the association for computational linguistics 12, 157–173 (2024)

Liu, N.F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F., Liang, P.: Lost in the middle: How language models use long contexts. Transactions of the association for computational linguistics 12, 157–173 (2024)

2024
[79]

In: International Conference on Machine Learning, pp

Shi, F., Chen, X., Misra, K., Scales, N., Dohan, D., Chi, E.H., Schärli, N., Zhou, D.: Large language models can be easily distracted by irrelevant context. In: International Conference on Machine Learning, pp. 31210–31227 (2023). PMLR

2023
[80]

Advances in Neural Information Processing Systems 36, 39648–39677 (2023)

Kim, G., Baldi, P., McAleer, S.: Language models can solve computer tasks. Advances in Neural Information Processing Systems 36, 39648–39677 (2023)

2023

Showing first 80 references.

[1] [1]

arXiv preprint arXiv:2402.03578 (2024)

Han, S., Zhang, Q., Yao, Y., Jin, W., Xu, Z.: LLM multi-agent systems: Challenges and open problems. arXiv preprint arXiv:2402.03578 (2024)

Pith/arXiv arXiv 2024

[2] [2]

arXiv preprint arXiv:2404.11584 (2024)

Masterman, T., Besen, S., Sawtell, M., Chao, A.: The landscape of emerging AI agent architectures for reasoning, planning, and tool calling: A survey. arXiv preprint arXiv:2404.11584 (2024)

Pith/arXiv arXiv 2024

[3] [3]

: The rise and potential of large language model based agents: A survey

Xi, Z., Chen, W., Guo, X., He, W., Ding, Y., Hong, B., Zhang, M., Wang, J., Jin, S., Zhou, E., et al. : The rise and potential of large language model based agents: A survey. Science China Information Sciences 68(2), 121101 (2025)

2025

[4] [4]

In: International Conference on Learning Representations, vol

Yu, J., Wang, X., Tu, S., Cao, S., Zhang-Li, D., Lv, X., Peng, H., Yao, Z., Zhang, X., Li, H., et al.: Kola: Carefully benchmarking world knowledge of large language models. In: International Conference on Learning Representations, vol. 2024, pp. 44594–44637 (2024)

2024

[5] [5]

arXiv preprint arXiv:2303.08774 (2023)

Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F.L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., et al.: GPT-4 technical report. arXiv preprint arXiv:2303.08774 (2023)

Pith/arXiv arXiv 2023

[6] [6]

: A multitask, multilingual, multimodal evaluation of ChatGPT on reasoning, hallucination, and interactivity

Bang, Y., Cahyawijaya, S., Lee, N., Dai, W., Su, D., Wilie, B., Lovenia, H., Ji, Z., Yu, T., Chung, W., et al. : A multitask, multilingual, multimodal evaluation of ChatGPT on reasoning, hallucination, and interactivity. In: Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-pacific Cha...

2023

[7] [7]

Advances in Neural Information Processing Systems 36, 38975–38987 (2023)

Valmeekam, K., Marquez, M., Olmo, A., Sreedharan, S., Kambhampati, S.: Planbench: An extensible benchmark for evaluating large language models on planning and reasoning about change. Advances in Neural Information Processing Systems 36, 38975–38987 (2023)

2023

[8] [8]

: Agentbench: Evaluating LLMs as agents

Liu, X., Yu, H., Zhang, H., Xu, Y., Lei, X., Lai, H., Gu, Y., Ding, H., Men, K., Yang, K., et al. : Agentbench: Evaluating LLMs as agents. In: International Conference on Learning Representations, vol. 2024, pp. 52989–53046 (2024) 24

2024

[9] [9]

Advances in neural information processing systems 36, 58202–58245 (2023)

Sun, H., Zhuang, Y., Kong, L., Dai, B., Zhang, C.: Adaplanner: Adaptive planning from feedback with language models. Advances in neural information processing systems 36, 58202–58245 (2023)

2023

[10] [10]

In: AAAI 2025 Workshop LM4Plan (2025)

Hsiao, V., Fine-Morris, M., Roberts, M., Smith, L.N., Hiatt, L.M.: A critical assessment of LLMs for solving multi-step problems: Preliminary results. In: AAAI 2025 Workshop LM4Plan (2025)

2025

[11] [11]

arXiv preprint arXiv:2402.01680 (2024)

Guo, T., Chen, X., Wang, Y., Chang, R., Pei, S., Chawla, N.V., Wiest, O., Zhang, X.: Large language model based multi-agents: A survey of progress and challenges. arXiv preprint arXiv:2402.01680 (2024)

Pith/arXiv arXiv 2024

[12] [12]

arXiv preprint arXiv:2306.03314 (2023)

Talebirad, Y., Nadiri, A.: Multi-agent collaboration: Harnessing the power of intelligent LLM agents. arXiv preprint arXiv:2306.03314 (2023)

Pith/arXiv arXiv 2023

[13] [13]

MIT press, Cambridge, MA (2015)

Malone, T.W., Bernstein, M.: Handbook of Collective Intelligence. MIT press, Cambridge, MA (2015)

2015

[14] [14]

: Agentverse: Facilitating multi-agent collaboration and exploring emergent behaviors

Chen, W., Su, Y., Zuo, J., Yang, C., Yuan, C., Chan, C.-M., Yu, H., Lu, Y., Hung, Y.-H., Qian, C., et al. : Agentverse: Facilitating multi-agent collaboration and exploring emergent behaviors. In: International Conference on Learning Representations, vol. 2024, pp. 20094–20136 (2024)

2024

[15] [15]

: MetaGPT: Meta programming for a multi-agent collaborative framework

Hong, S., Zhuge, M., Chen, J., Zheng, X., Cheng, Y., Wang, J., Zhang, C., Yau, S., Lin, Z., Zhou, L., et al. : MetaGPT: Meta programming for a multi-agent collaborative framework. In: International Conference on Learning Representations, vol. 2024, pp. 23247–23275 (2024)

2024

[16] [16]

arXiv preprint arXiv:2502.14321 (2025)

Yan, B., Zhou, Z., Zhang, L., Zhang, L., Zhou, Z., Miao, D., Li, Z., Li, C., Zhang, X.: Beyond self-talk: A communication-centric survey of LLM-based multi-agent systems. arXiv preprint arXiv:2502.14321 (2025)

Pith/arXiv arXiv 2025

[17] [17]

Microsoft: AutoGen. GitHub. Accessed 21 Jul 2025 (2023)

2025

[18] [18]

: Autogen: Enabling next-gen LLM applications via multi-agent conversations

Wu, Q., Bansal, G., Zhang, J., Wu, Y., Li, B., Zhu, E., Jiang, L., Zhang, X., Zhang, S., Liu, J., et al. : Autogen: Enabling next-gen LLM applications via multi-agent conversations. In: First Conference on Language Modeling (2024)

2024

[19] [19]

CrewAI: CrewAI. GitHub. Accessed 21 Jul 2025 (2023)

2025

[20] [20]

CAMEL: CAMEL. GitHub. Accessed 21 Jul 2025 (2023)

2025

[21] [21]

https://arxiv.org/abs/2303.17760

Li, G., Hammoud, H.A.A.K., Itani, H., Khizbullin, D., Ghanem, B.: CAMEL: Communicative Agents for ”Mind” Exploration of Large Language Model Society (2023). https://arxiv.org/abs/2303.17760

Pith/arXiv arXiv 2023

[22] [22]

LangChain: LangGraph. GitHub. Accessed 21 Jul 2025 (2024) 25

2025

[23] [23]

Cemri, M., Pan, M.Z., Yang, S., Agrawal, L.A., Chopra, B., Tiwari, R., Keutzer, K., Parameswaran, A., Klein, D., Ramchandran, K., et al.: Why do multi-agent LLM systems fail? arXiv preprint arXiv:2503.13657 (2025)

Pith/arXiv arXiv 2025

[24] [24]

arXiv preprint arXiv:2506.19676 (2025)

Kong, D., Lin, S., Xu, Z., Wang, Z., Li, M., Li, Y., Zhang, Y., Peng, H., Chen, X., Sha, Z., et al.: A survey of LLM-driven AI agent communication: Protocols, security risks, and defense countermeasures. arXiv preprint arXiv:2506.19676 (2025)

arXiv 2025

[25] [25]

arXiv preprint arXiv:2505.02279 (2025)

Ehtesham, A., Singh, A., Gupta, G.K., Kumar, S.: A survey of agent inter- operability protocols: Model Context Protocol (MCP), Agent Communication Protocol (ACP), Agent-to-Agent protocol (A2A), and Agent Network Protocol (ANP). arXiv preprint arXiv:2505.02279 (2025)

arXiv 2025

[26] [26]

arXiv preprint arXiv:2410.11905 (2024)

Marro, S., La Malfa, E., Wright, J., Li, G., Shadbolt, N., Wooldridge, M., Torr, P.: A scalable communication protocol for networks of large language models. arXiv preprint arXiv:2410.11905 (2024)

arXiv 2024

[27] [27]

arXiv preprint arXiv:2510.13821 (2025)

Li, X., Liu, M., Yuen, C.: LLM agent communication protocol (LACP) requires urgent standardization: A telecom-inspired protocol is necessary. arXiv preprint arXiv:2510.13821 (2025)

arXiv 2025

[28] [28]

In: Proceedings of the Eleventh Annual Conference on Computational Learning Theory, pp

Russell, S.: Learning agents for uncertain environments. In: Proceedings of the Eleventh Annual Conference on Computational Learning Theory, pp. 101–103 (1998)

1998

[29] [29]

The knowledge engineering review 10(2), 115–152 (1995)

Wooldridge, M., Jennings, N.R.: Intelligent agents: Theory and practice. The knowledge engineering review 10(2), 115–152 (1995)

1995

[30] [30]

Oxford University Press, New York, NY (1995)

Mele, A.R.: Autonomous Agents: From Self-control to Autonomy. Oxford University Press, New York, NY (1995)

1995

[31] [31]

Autonomous agents and multi-agent systems 1(1), 7–38 (1998)

Jennings, N.R., Sycara, K., Wooldridge, M.: A roadmap of agent research and development. Autonomous agents and multi-agent systems 1(1), 7–38 (1998)

1998

[32] [32]

The knowledge engineering review 11(3), 205–244 (1996)

Nwana, H.S.: Software agents: An overview. The knowledge engineering review 11(3), 205–244 (1996)

1996

[33] [33]

arXiv preprint arXiv:2412.19437 (2024)

Liu, A., Feng, B., Xue, B., Wang, B., Wu, B., Lu, C., Zhao, C., Deng, C., Zhang, C., Ruan, C., et al.: Deepseek-v3 technical report. arXiv preprint arXiv:2412.19437 (2024)

Pith/arXiv arXiv 2024

[34] [34]

https://www

Anthropic: System Card: Claude Opus 4 & Claude Sonnet 4. https://www. anthropic.com/claude-4-system-card . Accessed 21 Jul 2025 (2025)

2025

[35] [35]

In: International Conference on Learning Representations, vol

Chen, X., Lin, M., Schärli, N., Zhou, D.: Teaching large language models to self- debug. In: International Conference on Learning Representations, vol. 2024, pp. 26 8746–8825 (2024)

2024

[36] [36]

: Self-refine: Iterative refine- ment with self-feedback

Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., et al. : Self-refine: Iterative refine- ment with self-feedback. Advances in neural information processing systems 36, 46534–46594 (2023)

2023

[37] [37]

Advances in neural information processing systems 35, 22199–22213 (2022)

Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are zero-shot reasoners. Advances in neural information processing systems 35, 22199–22213 (2022)

2022

[38] [38]

arXiv preprint arXiv:2503.00946 (2025)

Li, S., Padilla, S., Bras, P.L., Dong, J., Chantler, M.: A review of LLM-assisted ideation. arXiv preprint arXiv:2503.00946 (2025)

arXiv 2025

[39] [39]

arXiv preprint arXiv:2303.18223 (2023)

Zhao, W.X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., et al.: A survey of large language models. arXiv preprint arXiv:2303.18223 (2023)

Pith/arXiv arXiv 2023

[40] [40]

: Language models are few-shot learners

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al. : Language models are few-shot learners. Advances in neural information processing systems 33, 1877–1901 (2020)

1901

[41] [41]

arXiv preprint arXiv:2305.17812 (2023)

Jin, Z., Lu, W.: Tab-cot: Zero-shot tabular chain of thought. arXiv preprint arXiv:2305.17812 (2023)

arXiv 2023

[42] [42]

: TPTU: Task planning and tool usage of large language model-based ai agents

Ruan, J., Chen, Y., Zhang, B., Xu, Z., Bao, T., Mao, H., Li, Z., Zeng, X., Zhao, R., et al. : TPTU: Task planning and tool usage of large language model-based ai agents. In: NeurIPS 2023 Foundation Models for Decision Making Workshop (2023)

2023

[43] [43]

In: Proceed- ings of the AAAI Conference on Artificial Intelligence, vol

Besta, M., Blach, N., Kubicek, A., Gerstenberger, R., Podstawski, M., Giani- nazzi, L., Gajda, J., Lehmann, T., Niewiadomski, H., Nyczyk, P., et al.: Graph of thoughts: Solving elaborate problems with large language models. In: Proceed- ings of the AAAI Conference on Artificial Intelligence, vol. 38, pp. 17682–17690 (2024)

2024

[44] [44]

In: International Conference on Learning Representations (ICLR) (2023)

Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: ReAct: Synergizing reasoning and acting in language models. In: International Conference on Learning Representations (ICLR) (2023)

2023

[45] [45]

Advances in neural information processing systems 36, 11809–11822 (2023)

Yao, S., Yu, D., Zhao, J., Shafran, I., Griﬀiths, T., Cao, Y., Narasimhan, K.: Tree of thoughts: Deliberate problem solving with large language models. Advances in neural information processing systems 36, 11809–11822 (2023)

2023

[46] [46]

arXiv preprint arXiv:2304.06488 (2023)

Zhang, C., Zhang, C., Li, C., Qiao, Y., Zheng, S., Dam, S.K., Zhang, M., Kim, J.U., Kim, S.T., Choi, J., et al.: One small step for generative AI, one giant 27 leap for AGI: A complete survey on ChatGPT in AIGC era. arXiv preprint arXiv:2304.06488 (2023)

arXiv 2023

[47] [47]

: A survey on large language model based autonomous agents

Wang, L., Ma, C., Feng, X., Zhang, Z., Yang, H., Zhang, J., Chen, Z., Tang, J., Chen, X., Lin, Y., et al. : A survey on large language model based autonomous agents. Frontiers of Computer Science 18(6), 186345 (2024)

2024

[48] [48]

arXiv preprint arXiv:2510.09244 (2025)

Castrillo, V.d.L., Gidey, H.K., Lenz, A., Knoll, A.: Fundamentals of building autonomous LLM agents. arXiv preprint arXiv:2510.09244 (2025)

arXiv 2025

[49] [49]

Cognition 49(1-2), 165–187 (1993)

Evans, J.S.B., Over, D.E., Manktelow, K.I.: Reasoning, decision making and rationality. Cognition 49(1-2), 165–187 (1993)

1993

[50] [50]

arXiv preprint arXiv:2402.02716 (2024)

Huang, X., Liu, W., Chen, X., Wang, X., Wang, H., Lian, D., Wang, Y., Tang, R., Chen, E.: Understanding the planning of LLM agents: A survey. arXiv preprint arXiv:2402.02716 (2024)

Pith/arXiv arXiv 2024

[51] [51]

: Chain-of-thought prompting elicits reasoning in large language models

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., Le, Q.V., Zhou, D., et al. : Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems 35, 24824–24837 (2022)

2022

[52] [52]

Advances in Neural Information Processing Systems 36, 8634–8652 (2023)

Shinn, N., Cassano, F., Gopinath, A., Narasimhan, K., Yao, S.: Reflex- ion: Language agents with verbal reinforcement learning. Advances in Neural Information Processing Systems 36, 8634–8652 (2023)

2023

[53] [53]

arXiv preprint arXiv:2304.12773 (2023)

Gidey, H.K., Marmsoler, D., Ascher, D.: Modeling adaptive self-healing systems. arXiv preprint arXiv:2304.12773 (2023)

arXiv 2023

[54] [54]

Petroni, F., Rocktäschel, T., Lewis, P., Bakhtin, A., Wu, Y., Miller, A.H., Riedel, S.: Language models as knowledge bases? arXiv preprint arXiv:1909.01066 (2019)

Pith/arXiv arXiv 1909

[55] [55]

arXiv preprint arXiv:2404.13501 (2024)

Zhang, Z., Bo, X., Ma, C., Li, R., Chen, X., Dai, Q., Zhu, J., Dong, Z., Wen, J.- R.: A survey on the memory mechanism of large language model based agents. arXiv preprint arXiv:2404.13501 (2024)

Pith/arXiv arXiv 2024

[56] [56]

In: Proceedings of the AAAI Symposium Series, vol

Hatalis, K., Christou, D., Myers, J., Jones, S., Lambert, K., Amos-Binks, A., Dannenhauer, Z., Dannenhauer, D.: Memory matters: The need to improve long- term memory in LLM-agents. In: Proceedings of the AAAI Symposium Series, vol. 2, pp. 277–280 (2023)

2023

[57] [57]

: Retrieval-augmented generation for knowledge-intensive NLP tasks

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küt- tler, H., Lewis, M., Yih, W.-t., Rocktäschel, T., et al. : Retrieval-augmented generation for knowledge-intensive NLP tasks. Advances in neural information processing systems 33, 9459–9474 (2020) 28

2020

[58] [58]

arXiv preprint arXiv:2412.15266 (2024)

Zeng, R., Fang, J., Liu, S., Meng, Z.: On the structural memory of LLM agents. arXiv preprint arXiv:2412.15266 (2024)

arXiv 2024

[59] [59]

In: International Conference on Intelligent Systems and Pattern Recognition, pp

Gidey, H.K., Kesseler, M., Stangl, P., Hillmann, P., Karcher, A.: Document- based knowledge discovery with microservices architecture. In: International Conference on Intelligent Systems and Pattern Recognition, pp. 146–161 (2022). Springer International Publishing Cham

2022

[60] [60]

arXiv preprint arXiv:2409.18807 (2024)

Shen, Z.: LLM with tools: A survey. arXiv preprint arXiv:2409.18807 (2024)

arXiv 2024

[61] [61]

arXiv preprint arXiv:2510.24459 (2025)

Gidey, H.K., Huber, N., Lenz, A., Knoll, A.: Affordance representation and recognition for autonomous agents. arXiv preprint arXiv:2510.24459 (2025)

arXiv 2025

[62] [62]

arXiv preprint arXiv:2212.10846 (2022)

Guo, J., Li, J., Li, D., Tiong, A.M.H., Li, B., Tao, D., Hoi, S.C.: From images to textual prompts: Zero-shot visual question with frozen large language models. arXiv preprint arXiv:2212.10846 (2022)

arXiv 2022

[63] [63]

arXiv preprint arXiv:2306.14824 (2023)

Peng, Z., Wang, W., Dong, L., Hao, Y., Huang, S., Ma, S., Wei, F.: Kosmos- 2: Grounding multimodal large language models to the world. arXiv preprint arXiv:2306.14824 (2023)

Pith/arXiv arXiv 2023

[64] [64]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Pi, R., Yao, L., Gao, J., Zhang, J., Zhang, T.: PerceptionGPT: Effectively fusing visual perception into LLM. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 27124–27133 (2024)

2024

[65] [65]

arXiv preprint arXiv:2604.28001 (2026)

Gidey, H.K., Lenz, A., Knoll, A.: A pattern language for resilient visual agents. arXiv preprint arXiv:2604.28001 (2026)

Pith/arXiv arXiv 2026

[66] [66]

https://arxiv.org/abs/2307.16789

Qin, Y., Liang, S., Ye, Y., Zhu, K., Yan, L., Lu, Y., Lin, Y., Cong, X., Tang, X., Qian, B., Zhao, S., Hong, L., Tian, R., Xie, R., Zhou, J., Gerstein, M., Li, D., Liu, Z., Sun, M.: ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs (2023). https://arxiv.org/abs/2307.16789

Pith/arXiv arXiv 2023

[67] [67]

In: Inter- national Conference on Computational Intelligence in Music, Sound, Art and Design (Part of EvoStar), pp

Macedo, J., Gidey, H.K., Rebuli, K.B., Machado, P.: Evolving user interfaces: A neuroevolution approach for natural human-machine interaction. In: Inter- national Conference on Computational Intelligence in Music, Sound, Art and Design (Part of EvoStar), pp. 246–264 (2024). Springer Nature Switzerland Cham

2024

[68] [68]

Proceedings of the ACM on Software Engineering 2(ISSTA), 1054–1076 (2025)

Bouzenia, I., Pradel, M.: You name it, I run it: An LLM agent to execute tests of arbitrary projects. Proceedings of the ACM on Software Engineering 2(ISSTA), 1054–1076 (2025)

2025

[69] [69]

Machine Learning with Applications 17, 100570 (2024) 29

Cao, C., Wang, F., Lindley, L., Wang, Z.: Managing Linux servers with LLM- based AI agents: An empirical evaluation with GPT4. Machine Learning with Applications 17, 100570 (2024) 29

2024

[70] [70]

arXiv preprint arXiv:2307.07924 (2023)

Qian, C., Liu, W., Liu, H., Chen, N., Dang, Y., Li, J., Yang, C., Chen, W., Su, Y., Cong, X., et al.: Chatdev: Communicative agents for software development. arXiv preprint arXiv:2307.07924 (2023)

Pith/arXiv arXiv 2023

[71] [71]

arXiv preprint arXiv:2501.16150 (2025)

Sager, P.J., Meyer, B., Yan, P., Wartburg-Kottler, R., Etaiwi, L., Enayati, A., Nobel, G., Abdulkadir, A., Grewe, B.F., Stadelmann, T.: AI agents for com- puter use: A review of instruction-based computer control, GUI automation, and operator assistants. arXiv preprint arXiv:2501.16150 (2025)

Pith/arXiv arXiv 2025

[72] [72]

https://arxiv.org/abs/ 2409.05556

Ghafarollahi, A., Buehler, M.J.: SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning (2024). https://arxiv.org/abs/ 2409.05556

arXiv 2024

[73] [73]

https://arxiv.org/abs/2312.07559

Lála, J., O’Donoghue, O., Shtedritski, A., Cox, S., Rodriques, S.G., White, A.D.: PaperQA: Retrieval-Augmented Generative Agent for Scientific Research (2023). https://arxiv.org/abs/2312.07559

arXiv 2023

[74] [74]

https://arxiv

Katz, U., Levy, M., Goldberg, Y.: Knowledge Navigator: LLM-guided Browsing Framework for Exploratory Search in Scientific Literature (2024). https://arxiv. org/abs/2408.15836

arXiv 2024

[75] [75]

In: International Conference on Machine Learning, Opti- mization, and Data Science, pp

Gidey, H.K., Hillmann, P., Karcher, A., Knoll, A.: User-like bots for cognitive automation: A survey. In: International Conference on Machine Learning, Opti- mization, and Data Science, pp. 388–402 (2023). Springer Nature Switzerland Cham

2023

[76] [76]

Critical Care 27(1), 120 (2023)

Azamfirei, R., Kudchadkar, S.R., Fackler, J.: Large language models and the perils of their hallucinations. Critical Care 27(1), 120 (2023)

2023

[77] [77]

Hsieh, C.-P., Sun, S., Kriman, S., Acharya, S., Rekesh, D., Jia, F., Zhang, Y., Ginsburg, B.: RULER: What’s the real context size of your long-context language models? arXiv preprint arXiv:2404.06654 (2024)

Pith/arXiv arXiv 2024

[78] [78]

Transactions of the association for computational linguistics 12, 157–173 (2024)

Liu, N.F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F., Liang, P.: Lost in the middle: How language models use long contexts. Transactions of the association for computational linguistics 12, 157–173 (2024)

2024

[79] [79]

In: International Conference on Machine Learning, pp

Shi, F., Chen, X., Misra, K., Scales, N., Dohan, D., Chi, E.H., Schärli, N., Zhou, D.: Large language models can be easily distracted by irrelevant context. In: International Conference on Machine Learning, pp. 31210–31227 (2023). PMLR

2023

[80] [80]

Advances in Neural Information Processing Systems 36, 39648–39677 (2023)

Kim, G., Baldi, P., McAleer, S.: Language models can solve computer tasks. Advances in Neural Information Processing Systems 36, 39648–39677 (2023)

2023