pith. sign in

arxiv: 2605.19035 · v1 · pith:DRSGZTVDnew · submitted 2026-05-18 · 💻 cs.AI

Trustworthy Agent Network: Trust in Agent Networks Must Be Baked In, Not Bolted On

Pith reviewed 2026-05-20 10:26 UTC · model grok-4.3

classification 💻 cs.AI
keywords agent-to-agent networkstrust in AI systemsmulti-agent coordinationLLM agentssystemic vulnerabilitiesadversarial compositiontrust architecture
0
0 comments X

The pith

Trust in agent-to-agent networks must be designed in from the start rather than retrofitted later.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that as autonomous agents powered by large language models begin to collaborate in networks, they face systemic vulnerabilities including adversarial composition, semantic misalignment, and cascading failures. Existing methods for aligning individual agents fall short for these network-level issues. Therefore, trustworthiness must be architected into the core of the agent coordination framework instead of being added on to existing protocols. The authors outline a conceptual framework built around four design pillars to achieve this. Readers interested in reliable multi-agent AI systems would care because these networks promise improved performance on complex tasks but introduce risks that could undermine that promise without proper foundational trust.

Core claim

The trustworthiness of A2A networks cannot be fully guaranteed via retrofitting on existing protocols that are largely designed for individual agents. Rather, it must be architected from the very beginning of the A2A coordination framework. We present a comprehensive conceptual framework that situates trust in A2A systems through four design pillars.

What carries the argument

The conceptual framework with four design pillars that situates trust in A2A systems from the beginning of coordination.

If this is right

  • If correct, agent networks require new protocols with trust as a foundational element.
  • Retrofitting trust will not suffice against systemic vulnerabilities in multi-agent setups.
  • Coordination frameworks must prioritize prevention of cascading failures at the design stage.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This perspective may apply to designing other distributed AI systems with emergent interaction risks.
  • Future work could involve empirical tests of the four pillars in simulated agent networks.

Load-bearing premise

Existing agent alignment techniques cannot address systemic vulnerabilities such as adversarial composition, semantic misalignment, and cascading operational failures in A2A networks.

What would settle it

Finding or building a retrofitted trust mechanism on an existing individual-agent protocol that fully mitigates network vulnerabilities would challenge the claim that a new from-the-beginning architecture is necessary.

Figures

Figures reproduced from arXiv: 2605.19035 by Carlee Joe-Wong, Jiechao Gao, Jie Wang, Minjia Zhang, Srivatsan Ravi, Xinyi Fan, Yixiang Yao, Yuhang Yao.

Figure 1
Figure 1. Figure 1: Illustration of trust issues in agent-to-agent networks. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Bolted-On v.s. Baked-In 2.3.2 Baked-In Trust (Intrinsic Constraints) In a baked-in architecture, the transition function δ is defined such that all reachable states satisfy safety invariants: ∀st ∈ Ssaf e, ∀at ∈ A : st+1 = δ(st, at) ∈ Ssaf e. (2) Rather than detecting violations after execution, baked-in designs eliminate unsafe transitions from the system topology. Any action at that would result in a sta… view at source ↗
Figure 3
Figure 3. Figure 3: From vulnerabilities to trust requirements in agent networks. Left: multi-agent risks. Right: [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Blueprint for trustworthy agent networks [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗
read the original abstract

The rapid advancement of Large Language Models has given rise to autonomous LLM-based agents capable of complex reasoning and execution. As these agents transition from isolated operation to collaborative ecosystems, we witness the emergence of the Agent-to-Agent (A2A) network, a paradigm where heterogeneous agents autonomously coordinate to solve multi-step tasks. While these networks may offer better task performance compared to simply using one agent to complete the entire task, they introduce systemic vulnerabilities, such as adversarial composition, semantic misalignment, and cascading operational failures, that existing agent alignment techniques cannot address. In this vision paper, we argue that the trustworthiness of A2A networks cannot be fully guaranteed via retrofitting on existing protocols that are largely designed for individual agents. Rather, it must be architected from the very beginning of the A2A coordination framework. We present a comprehensive conceptual framework that situates trust in A2A systems through four design pillars.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims that the trustworthiness of Agent-to-Agent (A2A) networks cannot be fully guaranteed via retrofitting on existing protocols designed for individual agents. Instead, trust must be architected from the very beginning of the A2A coordination framework. It presents a conceptual framework situating trust in A2A systems through four design pillars to address systemic vulnerabilities such as adversarial composition, semantic misalignment, and cascading operational failures.

Significance. If the central argument holds, the work could provide a useful high-level vision for shifting multi-agent AI design toward native trust mechanisms, potentially guiding future development of more robust collaborative LLM agent systems in networked settings.

major comments (2)
  1. [Abstract] Abstract: The claim that 'existing agent alignment techniques cannot address' systemic vulnerabilities (adversarial composition, semantic misalignment, cascading failures) is asserted without citing or analyzing any concrete techniques such as agent-level RLHF, safety fine-tuning, or protocol extensions in frameworks like AutoGen or LangChain. This is load-bearing for the motivation that only a from-scratch architecture suffices.
  2. [Section introducing the four design pillars] Section introducing the four design pillars: The pillars are introduced conceptually without a gap analysis or derivation demonstrating why adaptations or modular additions to existing single-agent methods would necessarily fail to mitigate the listed risks, leaving the necessity of the new framework as an unexamined premise rather than a substantiated conclusion.
minor comments (1)
  1. The manuscript would benefit from adding a related-work subsection to explicitly position the four pillars relative to prior multi-agent trust and safety literature, as this is a presentation issue that does not affect the core argument.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive comments on our vision paper. The feedback highlights important areas where we can strengthen the substantiation of our claims regarding the limitations of existing techniques and the necessity of the proposed framework. We address each major comment below and commit to revisions that will enhance the clarity and rigor of the manuscript while preserving its conceptual focus.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The claim that 'existing agent alignment techniques cannot address' systemic vulnerabilities (adversarial composition, semantic misalignment, cascading failures) is asserted without citing or analyzing any concrete techniques such as agent-level RLHF, safety fine-tuning, or protocol extensions in frameworks like AutoGen or LangChain. This is load-bearing for the motivation that only a from-scratch architecture suffices.

    Authors: We recognize that the abstract asserts this claim without specific citations or analysis, which could benefit from additional support. Since the paper is a vision paper proposing a high-level framework rather than an empirical or survey study, the focus was on outlining the systemic issues inherent to A2A networks. To address this, we will revise the abstract and add a brief discussion in the introduction citing relevant works on single-agent alignment techniques, such as RLHF adaptations for agents and safety mechanisms in multi-agent frameworks like AutoGen. We will explain that while these methods improve individual agent reliability, they do not inherently handle network-level phenomena like cascading failures or adversarial composition across heterogeneous agents. This revision will better ground the motivation for a baked-in approach. revision: yes

  2. Referee: [Section introducing the four design pillars] Section introducing the four design pillars: The pillars are introduced conceptually without a gap analysis or derivation demonstrating why adaptations or modular additions to existing single-agent methods would necessarily fail to mitigate the listed risks, leaving the necessity of the new framework as an unexamined premise rather than a substantiated conclusion.

    Authors: The referee correctly notes the absence of an explicit gap analysis in the section introducing the four design pillars. We will revise this section to include a short gap analysis that considers how existing single-agent methods might be adapted (e.g., through modular trust layers or protocol extensions) and why such adaptations may not suffice for the identified risks. For instance, we will derive that semantic misalignment arises from inter-agent communication dynamics not present in single-agent settings, and cascading failures require coordinated trust mechanisms across the network. This will substantiate why trust must be architected from the beginning rather than retrofitted. revision: yes

Circularity Check

0 steps flagged

No significant circularity in this conceptual vision paper

full rationale

The paper is a vision paper that advances a conceptual argument for architecting trust into A2A networks from the beginning rather than retrofitting existing single-agent protocols. It identifies systemic vulnerabilities such as adversarial composition, semantic misalignment, and cascading failures as motivation, then introduces four design pillars. There are no equations, fitted parameters, predictions, or derivations of any kind. No self-citations are used to justify uniqueness theorems, ansatzes, or load-bearing premises. The central claim does not reduce by construction to its own inputs or to prior self-referential results; the argument remains self-contained as a high-level framework proposal without mathematical or definitional circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim depends on the domain assumption that retrofitting cannot suffice, plus the introduction of an unspecified four-pillar structure; no free parameters or independently evidenced entities are introduced.

axioms (1)
  • domain assumption Existing agent alignment techniques cannot address systemic vulnerabilities such as adversarial composition, semantic misalignment, and cascading operational failures in A2A networks
    Directly invoked in the abstract to justify the need for a new baked-in approach.
invented entities (1)
  • Four design pillars for trust in A2A systems no independent evidence
    purpose: To situate and architect trust within the A2A coordination framework
    Conceptual constructs introduced without details or external evidence in the provided abstract.

pith-pipeline@v0.9.0 · 5714 in / 1255 out tokens · 43294 ms · 2026-05-20T10:26:45.528432+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

75 extracted references · 75 canonical work pages · 11 internal anchors

  1. [1]

    A survey on homomorphic encryption schemes: Theory and implementation.ACM Computing Surveys (Csur), 51(4):1–35, 2018

    Abbas Acar, Hidayet Aksu, A Selcuk Uluagac, and Mauro Conti. A survey on homomorphic encryption schemes: Theory and implementation.ACM Computing Surveys (Csur), 51(4):1–35, 2018

  2. [2]

    Concrete Problems in AI Safety

    Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Mané. Concrete problems in ai safety.arXiv preprint arXiv:1606.06565, 2016

  3. [3]

    Introducing the model context protocol

    Anthropic. Introducing the model context protocol. https://www.anthropic.com/news/ model-context-protocol, November 2024

  4. [4]

    Model context protocol, 2024

    Anthropic. Model context protocol, 2024. Technical documentation

  5. [5]

    Agent skills

    Anthropic. Agent skills. https://platform.claude.com/docs/en/agents-and-tools/ agent-skills/overview, 2026

  6. [6]

    Constitutional AI: Harmlessness from AI Feedback

    Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, et al. Constitutional ai: Harmlessness from ai feedback.arXiv preprint arXiv:2212.08073, 2022

  7. [7]

    Supervisor alignment framework: Enhanc- ing llm alignment with query-ignoring strategy and multi-agent interaction

    Ziqun Bao, Yu Ji, Wen Wu, Xi Chen, and Liang He. Supervisor alignment framework: Enhanc- ing llm alignment with query-ignoring strategy and multi-agent interaction. InICASSP 2025- 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2025

  8. [8]

    Ai agents need memory control over more context.arXiv preprint arXiv:2601.11653, 2026

    Fouad Bousetouane. Ai agents need memory control over more context.arXiv preprint arXiv:2601.11653, 2026

  9. [9]

    (leveled) fully homomorphic encryption without bootstrapping.ACM Transactions on Computation Theory (TOCT), 6(3):1– 36, 2014

    Zvika Brakerski, Craig Gentry, and Vinod Vaikuntanathan. (leveled) fully homomorphic encryption without bootstrapping.ACM Transactions on Computation Theory (TOCT), 6(3):1– 36, 2014

  10. [10]

    Language models are few-shot learners.Advances in neural information processing systems, 33:1877–1901, 2020

    Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners.Advances in neural information processing systems, 33:1877–1901, 2020

  11. [11]

    Why Do Multi-Agent LLM Systems Fail?

    Mert Cemri, Melissa Z Pan, Shuyi Yang, Lakshya A Agrawal, Bhavya Chopra, Rishabh Tiwari, Kurt Keutzer, Aditya Parameswaran, Dan Klein, Kannan Ramchandran, et al. Why do multi- agent llm systems fail?arXiv preprint arXiv:2503.13657, 2025. 16

  12. [12]

    Homomorphic encryption for arithmetic of approximate numbers

    Jung Hee Cheon, Andrey Kim, Miran Kim, and Yongsoo Song. Homomorphic encryption for arithmetic of approximate numbers. InInternational conference on the theory and application of cryptology and information security, pages 409–437. Springer, 2017

  13. [13]

    Deep reinforcement learning from human preferences.Advances in neural information processing systems, 30, 2017

    Paul F Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, and Dario Amodei. Deep reinforcement learning from human preferences.Advances in neural information processing systems, 30, 2017

  14. [14]

    Intel sgx explained.Cryptology ePrint Archive, 2016

    Victor Costan and Srinivas Devadas. Intel sgx explained.Cryptology ePrint Archive, 2016

  15. [15]

    A lattice model of secure information flow.Communications of the ACM, 19(5):236–243, 1976

    Dorothy E Denning. A lattice model of secure information flow.Communications of the ACM, 19(5):236–243, 1976

  16. [16]

    A survey on in-context learning

    Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Jingyuan Ma, Rui Li, Heming Xia, Jingjing Xu, Zhiyong Wu, Baobao Chang, et al. A survey on in-context learning. InProceedings of the 2024 conference on empirical methods in natural language processing, pages 1107–1128, 2024

  17. [17]

    Building guardrails for large language models,

    Yi Dong, Ronghui Mu, Gaojie Jin, Yi Qi, Jinwei Hu, Xingyu Zhao, Jie Meng, Wenjie Ruan, and Xiaowei Huang. Building guardrails for large language models.arXiv preprint arXiv:2402.01822, 2024

  18. [18]

    Secure multi-party computation problems and their applications: a review and open problems

    Wenliang Du and Mikhail J Atallah. Secure multi-party computation problems and their applications: a review and open problems. InProceedings of the 2001 workshop on New security paradigms, pages 13–22, 2001

  19. [19]

    Improv- ing factuality and reasoning in language models through multiagent debate

    Yilun Du, Shuang Li, Antonio Torralba, Joshua B Tenenbaum, and Igor Mordatch. Improv- ing factuality and reasoning in language models through multiagent debate. InForty-first international conference on machine learning, 2024

  20. [20]

    Blockchain for the metaverse: A review.arXiv preprint arXiv:2203.09738, 2022

    Thippa Reddy Gadekallu, Thien Huynh-The, Weizheng Wang, Gokul Yenduri, Pasika Ranaweera, Quoc-Viet Pham, Daniel Benevides da Costa, and Madhusanka Liyanage. Blockchain for the metaverse: A review.arXiv preprint arXiv:2203.09738, 2022

  21. [21]

    A comprehensive survey of machine unlearning techniques for large language models.arXiv preprint arXiv:2503.01854, 2025

    Jiahui Geng, Qing Li, Herbert Woisetschlaeger, Zongxiong Chen, Fengyu Cai, Yuxia Wang, Preslav Nakov, Hans-Arno Jacobsen, and Fakhri Karray. A comprehensive survey of machine unlearning techniques for large language models.arXiv preprint arXiv:2503.01854, 2025

  22. [22]

    Secure multi-party computation, 1998

    Oded Goldreich. Secure multi-party computation, 1998. Manuscript

  23. [23]

    Definitions and properties of zero-knowledge proof systems

    Oded Goldreich and Yair Oren. Definitions and properties of zero-knowledge proof systems. Journal of Cryptology, 7(1):1–32, 1994

  24. [24]

    Alignment faking in large language models

    Ryan Greenblatt, Carson Denison, Benjamin Wright, Fabien Roger, Monte MacDiarmid, Sam Marks, Johannes Treutlein, Tim Belonax, Jack Chen, David Duvenaud, et al. Alignment faking in large language models.arXiv preprint arXiv:2412.14093, 2024

  25. [25]

    Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection

    Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection. InProceedings of the 16th ACM workshop on artificial intelligence and security, pages 79–90, 2023

  26. [26]

    Building a secure agentic ai application leveraging a2a protocol.arXiv preprint arXiv:2504.16902, 2025

    Idan Habler, Ken Huang, Vineeth Sai Narajala, and Prashant Kulkarni. Building a secure agentic ai application leveraging a2a protocol.arXiv preprint arXiv:2504.16902, 2025

  27. [27]

    An Overview of Catastrophic AI Risks

    Dan Hendrycks, Mantas Mazeika, and Thomas Woodside. An overview of catastrophic ai risks. arXiv preprint arXiv:2306.12001, 2023

  28. [28]

    Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

    Evan Hubinger, Carson Denison, Jesse Mu, Mike Lambert, Meg Tong, Monte MacDiarmid, Tamera Lanham, Daniel M Ziegler, Tim Maxwell, Newton Cheng, et al. Sleeper agents: Training deceptive llms that persist through safety training.arXiv preprint arXiv:2401.05566, 2024

  29. [29]

    AI safety via debate

    Geoffrey Irving, Paul Christiano, and Dario Amodei. Ai safety via debate.arXiv preprint arXiv:1805.00899, 2018. 17

  30. [30]

    V oting or consensus? decision-making in multi-agent debate

    Lars Benedikt Kaesberg, Jonas Becker, Jan Philip Wahle, Terry Ruas, and Bela Gipp. V oting or consensus? decision-making in multi-agent debate. InFindings of the Association for Computational Linguistics: ACL 2025, pages 11640–11671, 2025

  31. [31]

    Quantifying misalign- ment between agents: Towards a sociotechnical understanding of alignment

    Aidan Kierans, Avijit Ghosh, Hananel Hazan, and Shiri Dori-Hacohen. Quantifying misalign- ment between agents: Towards a sociotechnical understanding of alignment. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 27365–27373, 2025

  32. [32]

    Retrieval-augmented generation for knowledge-intensive nlp tasks.Advances in neural information processing systems, 33:9459–9474, 2020

    Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. Retrieval-augmented generation for knowledge-intensive nlp tasks.Advances in neural information processing systems, 33:9459–9474, 2020

  33. [33]

    Agent-oriented planning in multi-agent systems.arXiv preprint arXiv:2410.02189, 2024

    Ao Li, Yuexiang Xie, Songze Li, Fugee Tsung, Bolin Ding, and Yaliang Li. Agent-oriented planning in multi-agent systems.arXiv preprint arXiv:2410.02189, 2024

  34. [34]

    Camel: Communicative agents for" mind" exploration of large language model society.Advances in neural information processing systems, 36:51991–52008, 2023

    Guohao Li, Hasan Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. Camel: Communicative agents for" mind" exploration of large language model society.Advances in neural information processing systems, 36:51991–52008, 2023

  35. [35]

    Alignment at pre-training! towards native alignment for arabic llms.Advances in Neural Information Processing Systems, 37:13872– 13896, 2024

    Juhao Liang, Zhenyang Cai, Jianqing Zhu, Huang Huang, Kewei Zong, Bang An, Mosen Alharthi, Juncai He, Lian Zhang, Haizhou Li, et al. Alignment at pre-training! towards native alignment for arabic llms.Advances in Neural Information Processing Systems, 37:13872– 13896, 2024

  36. [36]

    Axis: Efficient human-agent-computer interaction with api-first llm-based agents

    Junting Lu, Zhiyang Zhang, Fangkai Yang, Jue Zhang, Lu Wang, Chao Du, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, and Qi Zhang. Axis: Efficient human-agent-computer interaction with api-first llm-based agents. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7711–7743, 2025

  37. [37]

    Scalemcp: Dynamic and auto-synchronizing model context protocol tools for llm agents

    Elias Lumer, Anmol Gulati, Vamse Kumar Subbiah, Pradeep Honaganahalli Basavaraju, and James A Burke. Scalemcp: Dynamic and auto-synchronizing model context protocol tools for llm agents. InInternational Joint Conference on Computational Intelligence, pages 23–42. Springer, 2025

  38. [38]

    Towards Deep Learning Models Resistant to Adversarial Attacks

    Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks.arXiv preprint arXiv:1706.06083, 2017

  39. [39]

    Audit trails for accountability in large language models.arXiv preprint arXiv:2601.20727, 2026

    Victor Ojewale, Harini Suresh, and Suresh Venkatasubramanian. Audit trails for accountability in large language models.arXiv preprint arXiv:2601.20727, 2026

  40. [40]

    Moltbook: A social platform for autonomous ai agents

    OpenClaw Community. Moltbook: A social platform for autonomous ai agents. https: //www.moltbook.com, 2026. Accessed: 2026-03-05

  41. [41]

    Clawhub: The openclaw skill registry

    OpenClaw Developers. Clawhub: The openclaw skill registry. https://clawhub.ai, 2026. Accessed: 2026-03-05

  42. [42]

    Openclaw — personal ai assistant

    OpenClaw Team. Openclaw — personal ai assistant. https://github.com/openclaw/ openclaw, 2026

  43. [43]

    Training language models to follow instructions with human feedback.Advances in neural information processing systems, 35:27730–27744, 2022

    Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feedback.Advances in neural information processing systems, 35:27730–27744, 2022

  44. [44]

    Generative agents: Interactive simulacra of human behavior

    Joon Sung Park, Joseph C O’Brien, Carrie J Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. Generative agents: Interactive simulacra of human behavior. InAdvances in Neural Information Processing Systems (NeurIPS), 2023

  45. [45]

    Discovering language model behaviors with model-written evaluations

    Ethan Perez, Sam Ringer, Kamile Lukosiute, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. Discovering language model behaviors with model-written evaluations. InFindings of the association for computational linguistics: ACL 2023, pages 13387–13434, 2023. 18

  46. [46]

    Decentralized identifiers (dids) v1

    Drummond Reed, Manu Sporny, Dave Longley, Christopher Allen, Ryan Grant, Markus Sabadello, and Jonathan Holt. Decentralized identifiers (dids) v1. 0.Draft Community Group Report, 2020

  47. [47]

    Self-reflection in llm agents: Effects on problem-solving performance.arXiv preprint arXiv:2405.06682, 2024

    Matthew Renze and Erhan Guven. Self-reflection in llm agents: Effects on problem-solving performance.arXiv preprint arXiv:2405.06682, 2024

  48. [48]

    Trusted execution environment: What it is, and what it is not

    Mohamed Sabt, Mohammed Achemlal, and Abdelmadjid Bouabdallah. Trusted execution environment: What it is, and what it is not. In2015 IEEE Trustcom/BigDataSE/Ispa, volume 1, pages 57–64. IEEE, 2015

  49. [49]

    Role-based access control

    Ravi S Sandhu. Role-based access control. InAdvances in computers, volume 46, pages 237–286. Elsevier, 1998

  50. [50]

    Smart contract: Attacks and protections

    Sarwar Sayeed, Hector Marco-Gisbert, and Tom Caira. Smart contract: Attacks and protections. Ieee Access, 8:24416–24427, 2020

  51. [51]

    Toolformer: Language models can teach themselves to use tools.Advances in neural information processing systems, 36:68539– 68551, 2023

    Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools.Advances in neural information processing systems, 36:68539– 68551, 2023

  52. [52]

    Verifiable semantics for agent-to-agent communication.arXiv preprint arXiv:2602.16424, 2026

    Philipp Schoenegger, Matt Carlson, Chris Schneider, and Chris Daly. Verifiable semantics for agent-to-agent communication.arXiv preprint arXiv:2602.16424, 2026

  53. [53]

    Can you trust llm judgments? reliability of llm-as- a-judge.arXiv preprint arXiv:2412.12509, 2024

    Kayla Schroeder and Zach Wood-Doughty. Can you trust llm judgments? reliability of llm-as- a-judge.arXiv preprint arXiv:2412.12509, 2024

  54. [54]

    Character-llm: A trainable agent for role- playing

    Yunfan Shao, Linyang Li, Junqi Dai, and Xipeng Qiu. Character-llm: A trainable agent for role- playing. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 13153–13187, 2023

  55. [55]

    Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face.Advances in Neural Information Processing Systems, 36:38154–38180, 2023

    Yongliang Shen, Kaitao Song, Xu Tan, Dongsheng Li, Weiming Lu, and Yueting Zhuang. Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face.Advances in Neural Information Processing Systems, 36:38154–38180, 2023

  56. [56]

    Outside the closed world: On using machine learning for network intrusion detection

    Robin Sommer and Vern Paxson. Outside the closed world: On using machine learning for network intrusion detection. In2010 IEEE symposium on security and privacy, pages 305–316. IEEE, 2010

  57. [57]

    Authenticated delegation and authorized ai agents,

    Tobin South, Samuele Marro, Thomas Hardjono, Robert Mahari, Cedric Deslandes Whitney, Dazza Greenwood, Alan Chan, and Alex Pentland. Authenticated delegation and authorized ai agents.arXiv preprint arXiv:2501.09674, 2025

  58. [58]

    Cryptography with artificial intelligence: An overview

    Öznur Suçeken and Osman Özkaraca. Cryptography with artificial intelligence: An overview. In The International Conference on Artificial Intelligence and Applied Mathematics in Engineering, pages 162–172. Springer, 2024

  59. [59]

    A survey on zero-knowledge proof in blockchain.IEEE network, 35(4):198–205, 2021

    Xiaoqiang Sun, F Richard Yu, Peng Zhang, Zhiwei Sun, Weixin Xie, and Xiang Peng. A survey on zero-knowledge proof in blockchain.IEEE network, 35(4):198–205, 2021

  60. [60]

    Text-centric alignment for multi-modality learning.arXiv preprint arXiv:2402.08086, 2024

    Yun-Da Tsai, Ting-Yu Yen, Pei-Fu Guo, Zhe-Yan Li, and Shou-De Lin. Text-centric alignment for multi-modality learning.arXiv preprint arXiv:2402.08086, 2024

  61. [61]

    Efficient software- based fault isolation

    Robert Wahbe, Steven Lucco, Thomas E Anderson, and Susan L Graham. Efficient software- based fault isolation. InProceedings of the fourteenth ACM symposium on Operating systems principles, pages 203–216, 1993

  62. [62]

    Parameter-efficient fine-tuning in large language models: a survey of methodologies.Artificial Intelligence Review, 58(8):227, 2025

    Luping Wang, Sheng Chen, Linnan Jiang, Shu Pan, Runze Cai, Sen Yang, and Fei Yang. Parameter-efficient fine-tuning in large language models: a survey of methodologies.Artificial Intelligence Review, 58(8):227, 2025. 19

  63. [63]

    Adversarial preference learning for robust llm alignment

    Yuanfu Wang, Pengyu Wang, Chenyang Xi, Bo Tang, Junyi Zhu, Wenqiang Wei, Chen Chen, Chao Yang, Jingfeng Zhang, Chaochao Lu, et al. Adversarial preference learning for robust llm alignment. InFindings of the Association for Computational Linguistics: ACL 2025, pages 21865–21881, 2025

  64. [64]

    A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT

    Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. A prompt pattern catalog to enhance prompt engineering with chatgpt.arXiv preprint arXiv:2302.11382, 2023

  65. [65]

    Intelligent agents: Theory and practice.The knowledge engineering review, 10(2):115–152, 1995

    Michael Wooldridge and Nicholas R Jennings. Intelligent agents: Theory and practice.The knowledge engineering review, 10(2):115–152, 1995

  66. [66]

    Autogen: Enabling next-gen llm applications via multi-agent conversations

    Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, et al. Autogen: Enabling next-gen llm applications via multi-agent conversations. InFirst conference on language modeling, 2024

  67. [67]

    & Peng, J

    Renjun Xu and Jingwen Peng. A comprehensive survey of deep research: Systems, methodolo- gies, and applications.arXiv preprint arXiv:2506.12594, 2025

  68. [68]

    Auto-gpt for online decision making: Benchmarks and additional opinions.arXiv preprint arXiv:2306.02224, 2023

    Hui Yang, Sifu Yue, and Yunzhong He. Auto-gpt for online decision making: Benchmarks and additional opinions.arXiv preprint arXiv:2306.02224, 2023

  69. [69]

    Swe-agent: Agent-computer interfaces enable automated software engineering.Advances in Neural Information Processing Systems, 37:50528–50652, 2024

    John Yang, Carlos E Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press. Swe-agent: Agent-computer interfaces enable automated software engineering.Advances in Neural Information Processing Systems, 37:50528–50652, 2024

  70. [70]

    React: Synergizing reasoning and acting in language models

    Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Gao, et al. React: Synergizing reasoning and acting in language models. InInternational Conference on Learning Representations (ICLR), 2023

  71. [71]

    The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

    Guibin Zhang, Hejia Geng, Xiaohang Yu, Zhenfei Yin, Zaibin Zhang, Zelin Tan, Heng Zhou, Zhongzhi Li, Xiangyuan Xue, Yijiang Li, et al. The landscape of agentic reinforcement learning for llms: A survey.arXiv preprint arXiv:2509.02547, 2025

  72. [72]

    A survey on the memory mechanism of large language model-based agents

    Zeyu Zhang, Quanyu Dai, Xiaohe Bo, Chen Ma, Rui Li, Xu Chen, Jieming Zhu, Zhenhua Dong, and Ji-Rong Wen. A survey on the memory mechanism of large language model-based agents. ACM Transactions on Information Systems, 43(6):1–47, 2025

  73. [73]

    Judging llm-as-a-judge with mt-bench and chatbot arena.Advances in neural information processing systems, 36:46595–46623, 2023

    Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric Xing, et al. Judging llm-as-a-judge with mt-bench and chatbot arena.Advances in neural information processing systems, 36:46595–46623, 2023

  74. [74]

    A training-free llm-based approach to general chinese character error correction

    Houquan Zhou, Bo Zhang, Zhenghua Li, Ming Yan, and Min Zhang. A training-free llm-based approach to general chinese character error correction. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 13827–13852, 2025

  75. [75]

    Universal and Transferable Adversarial Attacks on Aligned Language Models

    Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr, J Zico Kolter, and Matt Fredrikson. Universal and transferable adversarial attacks on aligned language models.arXiv preprint arXiv:2307.15043, 2023. 20