Trustworthy Agent Network: Trust in Agent Networks Must Be Baked In, Not Bolted On

Carlee Joe-Wong; Jiechao Gao; Jie Wang; Minjia Zhang; Srivatsan Ravi; Xinyi Fan; Yixiang Yao; Yuhang Yao

arxiv: 2605.19035 · v1 · pith:DRSGZTVDnew · submitted 2026-05-18 · 💻 cs.AI

Trustworthy Agent Network: Trust in Agent Networks Must Be Baked In, Not Bolted On

Yixiang Yao , Yuhang Yao , Xinyi Fan , Jiechao Gao , Jie Wang , Minjia Zhang , Srivatsan Ravi , Carlee Joe-Wong This is my paper

Pith reviewed 2026-05-20 10:26 UTC · model grok-4.3

classification 💻 cs.AI

keywords agent-to-agent networkstrust in AI systemsmulti-agent coordinationLLM agentssystemic vulnerabilitiesadversarial compositiontrust architecture

0 comments

The pith

Trust in agent-to-agent networks must be designed in from the start rather than retrofitted later.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that as autonomous agents powered by large language models begin to collaborate in networks, they face systemic vulnerabilities including adversarial composition, semantic misalignment, and cascading failures. Existing methods for aligning individual agents fall short for these network-level issues. Therefore, trustworthiness must be architected into the core of the agent coordination framework instead of being added on to existing protocols. The authors outline a conceptual framework built around four design pillars to achieve this. Readers interested in reliable multi-agent AI systems would care because these networks promise improved performance on complex tasks but introduce risks that could undermine that promise without proper foundational trust.

Core claim

The trustworthiness of A2A networks cannot be fully guaranteed via retrofitting on existing protocols that are largely designed for individual agents. Rather, it must be architected from the very beginning of the A2A coordination framework. We present a comprehensive conceptual framework that situates trust in A2A systems through four design pillars.

What carries the argument

The conceptual framework with four design pillars that situates trust in A2A systems from the beginning of coordination.

If this is right

If correct, agent networks require new protocols with trust as a foundational element.
Retrofitting trust will not suffice against systemic vulnerabilities in multi-agent setups.
Coordination frameworks must prioritize prevention of cascading failures at the design stage.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This perspective may apply to designing other distributed AI systems with emergent interaction risks.
Future work could involve empirical tests of the four pillars in simulated agent networks.

Load-bearing premise

Existing agent alignment techniques cannot address systemic vulnerabilities such as adversarial composition, semantic misalignment, and cascading operational failures in A2A networks.

What would settle it

Finding or building a retrofitted trust mechanism on an existing individual-agent protocol that fully mitigates network vulnerabilities would challenge the claim that a new from-the-beginning architecture is necessary.

Figures

Figures reproduced from arXiv: 2605.19035 by Carlee Joe-Wong, Jiechao Gao, Jie Wang, Minjia Zhang, Srivatsan Ravi, Xinyi Fan, Yixiang Yao, Yuhang Yao.

**Figure 2.** Figure 2: Bolted-On v.s. Baked-In 2.3.2 Baked-In Trust (Intrinsic Constraints) In a baked-in architecture, the transition function δ is defined such that all reachable states satisfy safety invariants: ∀st ∈ Ssaf e, ∀at ∈ A : st+1 = δ(st, at) ∈ Ssaf e. (2) Rather than detecting violations after execution, baked-in designs eliminate unsafe transitions from the system topology. Any action at that would result in a sta… view at source ↗

**Figure 3.** Figure 3: From vulnerabilities to trust requirements in agent networks. Left: multi-agent risks. Right: [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Blueprint for trustworthy agent networks [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗

read the original abstract

The rapid advancement of Large Language Models has given rise to autonomous LLM-based agents capable of complex reasoning and execution. As these agents transition from isolated operation to collaborative ecosystems, we witness the emergence of the Agent-to-Agent (A2A) network, a paradigm where heterogeneous agents autonomously coordinate to solve multi-step tasks. While these networks may offer better task performance compared to simply using one agent to complete the entire task, they introduce systemic vulnerabilities, such as adversarial composition, semantic misalignment, and cascading operational failures, that existing agent alignment techniques cannot address. In this vision paper, we argue that the trustworthiness of A2A networks cannot be fully guaranteed via retrofitting on existing protocols that are largely designed for individual agents. Rather, it must be architected from the very beginning of the A2A coordination framework. We present a comprehensive conceptual framework that situates trust in A2A systems through four design pillars.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims that the trustworthiness of Agent-to-Agent (A2A) networks cannot be fully guaranteed via retrofitting on existing protocols designed for individual agents. Instead, trust must be architected from the very beginning of the A2A coordination framework. It presents a conceptual framework situating trust in A2A systems through four design pillars to address systemic vulnerabilities such as adversarial composition, semantic misalignment, and cascading operational failures.

Significance. If the central argument holds, the work could provide a useful high-level vision for shifting multi-agent AI design toward native trust mechanisms, potentially guiding future development of more robust collaborative LLM agent systems in networked settings.

major comments (2)

[Abstract] Abstract: The claim that 'existing agent alignment techniques cannot address' systemic vulnerabilities (adversarial composition, semantic misalignment, cascading failures) is asserted without citing or analyzing any concrete techniques such as agent-level RLHF, safety fine-tuning, or protocol extensions in frameworks like AutoGen or LangChain. This is load-bearing for the motivation that only a from-scratch architecture suffices.
[Section introducing the four design pillars] Section introducing the four design pillars: The pillars are introduced conceptually without a gap analysis or derivation demonstrating why adaptations or modular additions to existing single-agent methods would necessarily fail to mitigate the listed risks, leaving the necessity of the new framework as an unexamined premise rather than a substantiated conclusion.

minor comments (1)

The manuscript would benefit from adding a related-work subsection to explicitly position the four pillars relative to prior multi-agent trust and safety literature, as this is a presentation issue that does not affect the core argument.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive comments on our vision paper. The feedback highlights important areas where we can strengthen the substantiation of our claims regarding the limitations of existing techniques and the necessity of the proposed framework. We address each major comment below and commit to revisions that will enhance the clarity and rigor of the manuscript while preserving its conceptual focus.

read point-by-point responses

Referee: [Abstract] Abstract: The claim that 'existing agent alignment techniques cannot address' systemic vulnerabilities (adversarial composition, semantic misalignment, cascading failures) is asserted without citing or analyzing any concrete techniques such as agent-level RLHF, safety fine-tuning, or protocol extensions in frameworks like AutoGen or LangChain. This is load-bearing for the motivation that only a from-scratch architecture suffices.

Authors: We recognize that the abstract asserts this claim without specific citations or analysis, which could benefit from additional support. Since the paper is a vision paper proposing a high-level framework rather than an empirical or survey study, the focus was on outlining the systemic issues inherent to A2A networks. To address this, we will revise the abstract and add a brief discussion in the introduction citing relevant works on single-agent alignment techniques, such as RLHF adaptations for agents and safety mechanisms in multi-agent frameworks like AutoGen. We will explain that while these methods improve individual agent reliability, they do not inherently handle network-level phenomena like cascading failures or adversarial composition across heterogeneous agents. This revision will better ground the motivation for a baked-in approach. revision: yes
Referee: [Section introducing the four design pillars] Section introducing the four design pillars: The pillars are introduced conceptually without a gap analysis or derivation demonstrating why adaptations or modular additions to existing single-agent methods would necessarily fail to mitigate the listed risks, leaving the necessity of the new framework as an unexamined premise rather than a substantiated conclusion.

Authors: The referee correctly notes the absence of an explicit gap analysis in the section introducing the four design pillars. We will revise this section to include a short gap analysis that considers how existing single-agent methods might be adapted (e.g., through modular trust layers or protocol extensions) and why such adaptations may not suffice for the identified risks. For instance, we will derive that semantic misalignment arises from inter-agent communication dynamics not present in single-agent settings, and cascading failures require coordinated trust mechanisms across the network. This will substantiate why trust must be architected from the beginning rather than retrofitted. revision: yes

Circularity Check

0 steps flagged

No significant circularity in this conceptual vision paper

full rationale

The paper is a vision paper that advances a conceptual argument for architecting trust into A2A networks from the beginning rather than retrofitting existing single-agent protocols. It identifies systemic vulnerabilities such as adversarial composition, semantic misalignment, and cascading failures as motivation, then introduces four design pillars. There are no equations, fitted parameters, predictions, or derivations of any kind. No self-citations are used to justify uniqueness theorems, ansatzes, or load-bearing premises. The central claim does not reduce by construction to its own inputs or to prior self-referential results; the argument remains self-contained as a high-level framework proposal without mathematical or definitional circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim depends on the domain assumption that retrofitting cannot suffice, plus the introduction of an unspecified four-pillar structure; no free parameters or independently evidenced entities are introduced.

axioms (1)

domain assumption Existing agent alignment techniques cannot address systemic vulnerabilities such as adversarial composition, semantic misalignment, and cascading operational failures in A2A networks
Directly invoked in the abstract to justify the need for a new baked-in approach.

invented entities (1)

Four design pillars for trust in A2A systems no independent evidence
purpose: To situate and architect trust within the A2A coordination framework
Conceptual constructs introduced without details or external evidence in the provided abstract.

pith-pipeline@v0.9.0 · 5714 in / 1255 out tokens · 43294 ms · 2026-05-20T10:26:45.528432+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We model the agent network as a state transition system... st+1 = δ(st, at) ... baked-in: ∀st ∈ Ssafe, ∀at ∈ A: st+1 = δ(st, at) ∈ Ssafe
IndisputableMonolith/Foundation/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

four design pillars: Compositional Robustness, Semantic Containment, Accountability, Cross-Boundary Reliability

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

75 extracted references · 75 canonical work pages · 11 internal anchors

[1]

A survey on homomorphic encryption schemes: Theory and implementation.ACM Computing Surveys (Csur), 51(4):1–35, 2018

Abbas Acar, Hidayet Aksu, A Selcuk Uluagac, and Mauro Conti. A survey on homomorphic encryption schemes: Theory and implementation.ACM Computing Surveys (Csur), 51(4):1–35, 2018

work page 2018
[2]

Concrete Problems in AI Safety

Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Mané. Concrete problems in ai safety.arXiv preprint arXiv:1606.06565, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[3]

Introducing the model context protocol

Anthropic. Introducing the model context protocol. https://www.anthropic.com/news/ model-context-protocol, November 2024

work page 2024
[4]

Model context protocol, 2024

Anthropic. Model context protocol, 2024. Technical documentation

work page 2024
[5]

Agent skills

Anthropic. Agent skills. https://platform.claude.com/docs/en/agents-and-tools/ agent-skills/overview, 2026

work page 2026
[6]

Constitutional AI: Harmlessness from AI Feedback

Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, et al. Constitutional ai: Harmlessness from ai feedback.arXiv preprint arXiv:2212.08073, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[7]

Supervisor alignment framework: Enhanc- ing llm alignment with query-ignoring strategy and multi-agent interaction

Ziqun Bao, Yu Ji, Wen Wu, Xi Chen, and Liang He. Supervisor alignment framework: Enhanc- ing llm alignment with query-ignoring strategy and multi-agent interaction. InICASSP 2025- 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2025

work page 2025
[8]

Ai agents need memory control over more context.arXiv preprint arXiv:2601.11653, 2026

Fouad Bousetouane. Ai agents need memory control over more context.arXiv preprint arXiv:2601.11653, 2026

work page arXiv 2026
[9]

(leveled) fully homomorphic encryption without bootstrapping.ACM Transactions on Computation Theory (TOCT), 6(3):1– 36, 2014

Zvika Brakerski, Craig Gentry, and Vinod Vaikuntanathan. (leveled) fully homomorphic encryption without bootstrapping.ACM Transactions on Computation Theory (TOCT), 6(3):1– 36, 2014

work page 2014
[10]

Language models are few-shot learners.Advances in neural information processing systems, 33:1877–1901, 2020

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners.Advances in neural information processing systems, 33:1877–1901, 2020

work page 1901
[11]

Why Do Multi-Agent LLM Systems Fail?

Mert Cemri, Melissa Z Pan, Shuyi Yang, Lakshya A Agrawal, Bhavya Chopra, Rishabh Tiwari, Kurt Keutzer, Aditya Parameswaran, Dan Klein, Kannan Ramchandran, et al. Why do multi- agent llm systems fail?arXiv preprint arXiv:2503.13657, 2025. 16

work page internal anchor Pith review Pith/arXiv arXiv 2025
[12]

Homomorphic encryption for arithmetic of approximate numbers

Jung Hee Cheon, Andrey Kim, Miran Kim, and Yongsoo Song. Homomorphic encryption for arithmetic of approximate numbers. InInternational conference on the theory and application of cryptology and information security, pages 409–437. Springer, 2017

work page 2017
[13]

Deep reinforcement learning from human preferences.Advances in neural information processing systems, 30, 2017

Paul F Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, and Dario Amodei. Deep reinforcement learning from human preferences.Advances in neural information processing systems, 30, 2017

work page 2017
[14]

Intel sgx explained.Cryptology ePrint Archive, 2016

Victor Costan and Srinivas Devadas. Intel sgx explained.Cryptology ePrint Archive, 2016

work page 2016
[15]

A lattice model of secure information flow.Communications of the ACM, 19(5):236–243, 1976

Dorothy E Denning. A lattice model of secure information flow.Communications of the ACM, 19(5):236–243, 1976

work page 1976
[16]

A survey on in-context learning

Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Jingyuan Ma, Rui Li, Heming Xia, Jingjing Xu, Zhiyong Wu, Baobao Chang, et al. A survey on in-context learning. InProceedings of the 2024 conference on empirical methods in natural language processing, pages 1107–1128, 2024

work page 2024
[17]

Building guardrails for large language models,

Yi Dong, Ronghui Mu, Gaojie Jin, Yi Qi, Jinwei Hu, Xingyu Zhao, Jie Meng, Wenjie Ruan, and Xiaowei Huang. Building guardrails for large language models.arXiv preprint arXiv:2402.01822, 2024

work page arXiv 2024
[18]

Secure multi-party computation problems and their applications: a review and open problems

Wenliang Du and Mikhail J Atallah. Secure multi-party computation problems and their applications: a review and open problems. InProceedings of the 2001 workshop on New security paradigms, pages 13–22, 2001

work page 2001
[19]

Improv- ing factuality and reasoning in language models through multiagent debate

Yilun Du, Shuang Li, Antonio Torralba, Joshua B Tenenbaum, and Igor Mordatch. Improv- ing factuality and reasoning in language models through multiagent debate. InForty-first international conference on machine learning, 2024

work page 2024
[20]

Blockchain for the metaverse: A review.arXiv preprint arXiv:2203.09738, 2022

Thippa Reddy Gadekallu, Thien Huynh-The, Weizheng Wang, Gokul Yenduri, Pasika Ranaweera, Quoc-Viet Pham, Daniel Benevides da Costa, and Madhusanka Liyanage. Blockchain for the metaverse: A review.arXiv preprint arXiv:2203.09738, 2022

work page arXiv 2022
[21]

A comprehensive survey of machine unlearning techniques for large language models.arXiv preprint arXiv:2503.01854, 2025

Jiahui Geng, Qing Li, Herbert Woisetschlaeger, Zongxiong Chen, Fengyu Cai, Yuxia Wang, Preslav Nakov, Hans-Arno Jacobsen, and Fakhri Karray. A comprehensive survey of machine unlearning techniques for large language models.arXiv preprint arXiv:2503.01854, 2025

work page arXiv 2025
[22]

Secure multi-party computation, 1998

Oded Goldreich. Secure multi-party computation, 1998. Manuscript

work page 1998
[23]

Definitions and properties of zero-knowledge proof systems

Oded Goldreich and Yair Oren. Definitions and properties of zero-knowledge proof systems. Journal of Cryptology, 7(1):1–32, 1994

work page 1994
[24]

Alignment faking in large language models

Ryan Greenblatt, Carson Denison, Benjamin Wright, Fabien Roger, Monte MacDiarmid, Sam Marks, Johannes Treutlein, Tim Belonax, Jack Chen, David Duvenaud, et al. Alignment faking in large language models.arXiv preprint arXiv:2412.14093, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[25]

Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection

Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection. InProceedings of the 16th ACM workshop on artificial intelligence and security, pages 79–90, 2023

work page 2023
[26]

Building a secure agentic ai application leveraging a2a protocol.arXiv preprint arXiv:2504.16902, 2025

Idan Habler, Ken Huang, Vineeth Sai Narajala, and Prashant Kulkarni. Building a secure agentic ai application leveraging a2a protocol.arXiv preprint arXiv:2504.16902, 2025

work page arXiv 2025
[27]

An Overview of Catastrophic AI Risks

Dan Hendrycks, Mantas Mazeika, and Thomas Woodside. An overview of catastrophic ai risks. arXiv preprint arXiv:2306.12001, 2023

work page internal anchor Pith review arXiv 2023
[28]

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Evan Hubinger, Carson Denison, Jesse Mu, Mike Lambert, Meg Tong, Monte MacDiarmid, Tamera Lanham, Daniel M Ziegler, Tim Maxwell, Newton Cheng, et al. Sleeper agents: Training deceptive llms that persist through safety training.arXiv preprint arXiv:2401.05566, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[29]

AI safety via debate

Geoffrey Irving, Paul Christiano, and Dario Amodei. Ai safety via debate.arXiv preprint arXiv:1805.00899, 2018. 17

work page internal anchor Pith review Pith/arXiv arXiv 2018
[30]

V oting or consensus? decision-making in multi-agent debate

Lars Benedikt Kaesberg, Jonas Becker, Jan Philip Wahle, Terry Ruas, and Bela Gipp. V oting or consensus? decision-making in multi-agent debate. InFindings of the Association for Computational Linguistics: ACL 2025, pages 11640–11671, 2025

work page 2025
[31]

Quantifying misalign- ment between agents: Towards a sociotechnical understanding of alignment

Aidan Kierans, Avijit Ghosh, Hananel Hazan, and Shiri Dori-Hacohen. Quantifying misalign- ment between agents: Towards a sociotechnical understanding of alignment. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 27365–27373, 2025

work page 2025
[32]

Retrieval-augmented generation for knowledge-intensive nlp tasks.Advances in neural information processing systems, 33:9459–9474, 2020

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. Retrieval-augmented generation for knowledge-intensive nlp tasks.Advances in neural information processing systems, 33:9459–9474, 2020

work page 2020
[33]

Agent-oriented planning in multi-agent systems.arXiv preprint arXiv:2410.02189, 2024

Ao Li, Yuexiang Xie, Songze Li, Fugee Tsung, Bolin Ding, and Yaliang Li. Agent-oriented planning in multi-agent systems.arXiv preprint arXiv:2410.02189, 2024

work page arXiv 2024
[34]

Camel: Communicative agents for" mind" exploration of large language model society.Advances in neural information processing systems, 36:51991–52008, 2023

Guohao Li, Hasan Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. Camel: Communicative agents for" mind" exploration of large language model society.Advances in neural information processing systems, 36:51991–52008, 2023

work page 2023
[35]

Alignment at pre-training! towards native alignment for arabic llms.Advances in Neural Information Processing Systems, 37:13872– 13896, 2024

Juhao Liang, Zhenyang Cai, Jianqing Zhu, Huang Huang, Kewei Zong, Bang An, Mosen Alharthi, Juncai He, Lian Zhang, Haizhou Li, et al. Alignment at pre-training! towards native alignment for arabic llms.Advances in Neural Information Processing Systems, 37:13872– 13896, 2024

work page 2024
[36]

Axis: Efficient human-agent-computer interaction with api-first llm-based agents

Junting Lu, Zhiyang Zhang, Fangkai Yang, Jue Zhang, Lu Wang, Chao Du, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, and Qi Zhang. Axis: Efficient human-agent-computer interaction with api-first llm-based agents. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7711–7743, 2025

work page 2025
[37]

Scalemcp: Dynamic and auto-synchronizing model context protocol tools for llm agents

Elias Lumer, Anmol Gulati, Vamse Kumar Subbiah, Pradeep Honaganahalli Basavaraju, and James A Burke. Scalemcp: Dynamic and auto-synchronizing model context protocol tools for llm agents. InInternational Joint Conference on Computational Intelligence, pages 23–42. Springer, 2025

work page 2025
[38]

Towards Deep Learning Models Resistant to Adversarial Attacks

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks.arXiv preprint arXiv:1706.06083, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[39]

Audit trails for accountability in large language models.arXiv preprint arXiv:2601.20727, 2026

Victor Ojewale, Harini Suresh, and Suresh Venkatasubramanian. Audit trails for accountability in large language models.arXiv preprint arXiv:2601.20727, 2026

work page arXiv 2026
[40]

Moltbook: A social platform for autonomous ai agents

OpenClaw Community. Moltbook: A social platform for autonomous ai agents. https: //www.moltbook.com, 2026. Accessed: 2026-03-05

work page 2026
[41]

Clawhub: The openclaw skill registry

OpenClaw Developers. Clawhub: The openclaw skill registry. https://clawhub.ai, 2026. Accessed: 2026-03-05

work page 2026
[42]

Openclaw — personal ai assistant

OpenClaw Team. Openclaw — personal ai assistant. https://github.com/openclaw/ openclaw, 2026

work page 2026
[43]

Training language models to follow instructions with human feedback.Advances in neural information processing systems, 35:27730–27744, 2022

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feedback.Advances in neural information processing systems, 35:27730–27744, 2022

work page 2022
[44]

Generative agents: Interactive simulacra of human behavior

Joon Sung Park, Joseph C O’Brien, Carrie J Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. Generative agents: Interactive simulacra of human behavior. InAdvances in Neural Information Processing Systems (NeurIPS), 2023

work page 2023
[45]

Discovering language model behaviors with model-written evaluations

Ethan Perez, Sam Ringer, Kamile Lukosiute, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. Discovering language model behaviors with model-written evaluations. InFindings of the association for computational linguistics: ACL 2023, pages 13387–13434, 2023. 18

work page 2023
[46]

Decentralized identifiers (dids) v1

Drummond Reed, Manu Sporny, Dave Longley, Christopher Allen, Ryan Grant, Markus Sabadello, and Jonathan Holt. Decentralized identifiers (dids) v1. 0.Draft Community Group Report, 2020

work page 2020
[47]

Self-reflection in llm agents: Effects on problem-solving performance.arXiv preprint arXiv:2405.06682, 2024

Matthew Renze and Erhan Guven. Self-reflection in llm agents: Effects on problem-solving performance.arXiv preprint arXiv:2405.06682, 2024

work page arXiv 2024
[48]

Trusted execution environment: What it is, and what it is not

Mohamed Sabt, Mohammed Achemlal, and Abdelmadjid Bouabdallah. Trusted execution environment: What it is, and what it is not. In2015 IEEE Trustcom/BigDataSE/Ispa, volume 1, pages 57–64. IEEE, 2015

work page 2015
[49]

Role-based access control

Ravi S Sandhu. Role-based access control. InAdvances in computers, volume 46, pages 237–286. Elsevier, 1998

work page 1998
[50]

Smart contract: Attacks and protections

Sarwar Sayeed, Hector Marco-Gisbert, and Tom Caira. Smart contract: Attacks and protections. Ieee Access, 8:24416–24427, 2020

work page 2020
[51]

Toolformer: Language models can teach themselves to use tools.Advances in neural information processing systems, 36:68539– 68551, 2023

Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools.Advances in neural information processing systems, 36:68539– 68551, 2023

work page 2023
[52]

Verifiable semantics for agent-to-agent communication.arXiv preprint arXiv:2602.16424, 2026

Philipp Schoenegger, Matt Carlson, Chris Schneider, and Chris Daly. Verifiable semantics for agent-to-agent communication.arXiv preprint arXiv:2602.16424, 2026

work page arXiv 2026
[53]

Can you trust llm judgments? reliability of llm-as- a-judge.arXiv preprint arXiv:2412.12509, 2024

Kayla Schroeder and Zach Wood-Doughty. Can you trust llm judgments? reliability of llm-as- a-judge.arXiv preprint arXiv:2412.12509, 2024

work page arXiv 2024
[54]

Character-llm: A trainable agent for role- playing

Yunfan Shao, Linyang Li, Junqi Dai, and Xipeng Qiu. Character-llm: A trainable agent for role- playing. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 13153–13187, 2023

work page 2023
[55]

Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face.Advances in Neural Information Processing Systems, 36:38154–38180, 2023

Yongliang Shen, Kaitao Song, Xu Tan, Dongsheng Li, Weiming Lu, and Yueting Zhuang. Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face.Advances in Neural Information Processing Systems, 36:38154–38180, 2023

work page 2023
[56]

Outside the closed world: On using machine learning for network intrusion detection

Robin Sommer and Vern Paxson. Outside the closed world: On using machine learning for network intrusion detection. In2010 IEEE symposium on security and privacy, pages 305–316. IEEE, 2010

work page 2010
[57]

Authenticated delegation and authorized ai agents,

Tobin South, Samuele Marro, Thomas Hardjono, Robert Mahari, Cedric Deslandes Whitney, Dazza Greenwood, Alan Chan, and Alex Pentland. Authenticated delegation and authorized ai agents.arXiv preprint arXiv:2501.09674, 2025

work page arXiv 2025
[58]

Cryptography with artificial intelligence: An overview

Öznur Suçeken and Osman Özkaraca. Cryptography with artificial intelligence: An overview. In The International Conference on Artificial Intelligence and Applied Mathematics in Engineering, pages 162–172. Springer, 2024

work page 2024
[59]

A survey on zero-knowledge proof in blockchain.IEEE network, 35(4):198–205, 2021

Xiaoqiang Sun, F Richard Yu, Peng Zhang, Zhiwei Sun, Weixin Xie, and Xiang Peng. A survey on zero-knowledge proof in blockchain.IEEE network, 35(4):198–205, 2021

work page 2021
[60]

Text-centric alignment for multi-modality learning.arXiv preprint arXiv:2402.08086, 2024

Yun-Da Tsai, Ting-Yu Yen, Pei-Fu Guo, Zhe-Yan Li, and Shou-De Lin. Text-centric alignment for multi-modality learning.arXiv preprint arXiv:2402.08086, 2024

work page arXiv 2024
[61]

Efficient software- based fault isolation

Robert Wahbe, Steven Lucco, Thomas E Anderson, and Susan L Graham. Efficient software- based fault isolation. InProceedings of the fourteenth ACM symposium on Operating systems principles, pages 203–216, 1993

work page 1993
[62]

Parameter-efficient fine-tuning in large language models: a survey of methodologies.Artificial Intelligence Review, 58(8):227, 2025

Luping Wang, Sheng Chen, Linnan Jiang, Shu Pan, Runze Cai, Sen Yang, and Fei Yang. Parameter-efficient fine-tuning in large language models: a survey of methodologies.Artificial Intelligence Review, 58(8):227, 2025. 19

work page 2025
[63]

Adversarial preference learning for robust llm alignment

Yuanfu Wang, Pengyu Wang, Chenyang Xi, Bo Tang, Junyi Zhu, Wenqiang Wei, Chen Chen, Chao Yang, Jingfeng Zhang, Chaochao Lu, et al. Adversarial preference learning for robust llm alignment. InFindings of the Association for Computational Linguistics: ACL 2025, pages 21865–21881, 2025

work page 2025
[64]

A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT

Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. A prompt pattern catalog to enhance prompt engineering with chatgpt.arXiv preprint arXiv:2302.11382, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[65]

Intelligent agents: Theory and practice.The knowledge engineering review, 10(2):115–152, 1995

Michael Wooldridge and Nicholas R Jennings. Intelligent agents: Theory and practice.The knowledge engineering review, 10(2):115–152, 1995

work page 1995
[66]

Autogen: Enabling next-gen llm applications via multi-agent conversations

Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, et al. Autogen: Enabling next-gen llm applications via multi-agent conversations. InFirst conference on language modeling, 2024

work page 2024
[67]

& Peng, J

Renjun Xu and Jingwen Peng. A comprehensive survey of deep research: Systems, methodolo- gies, and applications.arXiv preprint arXiv:2506.12594, 2025

work page arXiv 2025
[68]

Auto-gpt for online decision making: Benchmarks and additional opinions.arXiv preprint arXiv:2306.02224, 2023

Hui Yang, Sifu Yue, and Yunzhong He. Auto-gpt for online decision making: Benchmarks and additional opinions.arXiv preprint arXiv:2306.02224, 2023

work page arXiv 2023
[69]

Swe-agent: Agent-computer interfaces enable automated software engineering.Advances in Neural Information Processing Systems, 37:50528–50652, 2024

John Yang, Carlos E Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press. Swe-agent: Agent-computer interfaces enable automated software engineering.Advances in Neural Information Processing Systems, 37:50528–50652, 2024

work page 2024
[70]

React: Synergizing reasoning and acting in language models

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Gao, et al. React: Synergizing reasoning and acting in language models. InInternational Conference on Learning Representations (ICLR), 2023

work page 2023
[71]

The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

Guibin Zhang, Hejia Geng, Xiaohang Yu, Zhenfei Yin, Zaibin Zhang, Zelin Tan, Heng Zhou, Zhongzhi Li, Xiangyuan Xue, Yijiang Li, et al. The landscape of agentic reinforcement learning for llms: A survey.arXiv preprint arXiv:2509.02547, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[72]

A survey on the memory mechanism of large language model-based agents

Zeyu Zhang, Quanyu Dai, Xiaohe Bo, Chen Ma, Rui Li, Xu Chen, Jieming Zhu, Zhenhua Dong, and Ji-Rong Wen. A survey on the memory mechanism of large language model-based agents. ACM Transactions on Information Systems, 43(6):1–47, 2025

work page 2025
[73]

Judging llm-as-a-judge with mt-bench and chatbot arena.Advances in neural information processing systems, 36:46595–46623, 2023

Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric Xing, et al. Judging llm-as-a-judge with mt-bench and chatbot arena.Advances in neural information processing systems, 36:46595–46623, 2023

work page 2023
[74]

A training-free llm-based approach to general chinese character error correction

Houquan Zhou, Bo Zhang, Zhenghua Li, Ming Yan, and Min Zhang. A training-free llm-based approach to general chinese character error correction. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 13827–13852, 2025

work page 2025
[75]

Universal and Transferable Adversarial Attacks on Aligned Language Models

Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr, J Zico Kolter, and Matt Fredrikson. Universal and transferable adversarial attacks on aligned language models.arXiv preprint arXiv:2307.15043, 2023. 20

work page internal anchor Pith review Pith/arXiv arXiv 2023

[1] [1]

A survey on homomorphic encryption schemes: Theory and implementation.ACM Computing Surveys (Csur), 51(4):1–35, 2018

Abbas Acar, Hidayet Aksu, A Selcuk Uluagac, and Mauro Conti. A survey on homomorphic encryption schemes: Theory and implementation.ACM Computing Surveys (Csur), 51(4):1–35, 2018

work page 2018

[2] [2]

Concrete Problems in AI Safety

Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Mané. Concrete problems in ai safety.arXiv preprint arXiv:1606.06565, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[3] [3]

Introducing the model context protocol

Anthropic. Introducing the model context protocol. https://www.anthropic.com/news/ model-context-protocol, November 2024

work page 2024

[4] [4]

Model context protocol, 2024

Anthropic. Model context protocol, 2024. Technical documentation

work page 2024

[5] [5]

Agent skills

Anthropic. Agent skills. https://platform.claude.com/docs/en/agents-and-tools/ agent-skills/overview, 2026

work page 2026

[6] [6]

Constitutional AI: Harmlessness from AI Feedback

Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, et al. Constitutional ai: Harmlessness from ai feedback.arXiv preprint arXiv:2212.08073, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[7] [7]

Supervisor alignment framework: Enhanc- ing llm alignment with query-ignoring strategy and multi-agent interaction

Ziqun Bao, Yu Ji, Wen Wu, Xi Chen, and Liang He. Supervisor alignment framework: Enhanc- ing llm alignment with query-ignoring strategy and multi-agent interaction. InICASSP 2025- 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2025

work page 2025

[8] [8]

Ai agents need memory control over more context.arXiv preprint arXiv:2601.11653, 2026

Fouad Bousetouane. Ai agents need memory control over more context.arXiv preprint arXiv:2601.11653, 2026

work page arXiv 2026

[9] [9]

(leveled) fully homomorphic encryption without bootstrapping.ACM Transactions on Computation Theory (TOCT), 6(3):1– 36, 2014

Zvika Brakerski, Craig Gentry, and Vinod Vaikuntanathan. (leveled) fully homomorphic encryption without bootstrapping.ACM Transactions on Computation Theory (TOCT), 6(3):1– 36, 2014

work page 2014

[10] [10]

Language models are few-shot learners.Advances in neural information processing systems, 33:1877–1901, 2020

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners.Advances in neural information processing systems, 33:1877–1901, 2020

work page 1901

[11] [11]

Why Do Multi-Agent LLM Systems Fail?

Mert Cemri, Melissa Z Pan, Shuyi Yang, Lakshya A Agrawal, Bhavya Chopra, Rishabh Tiwari, Kurt Keutzer, Aditya Parameswaran, Dan Klein, Kannan Ramchandran, et al. Why do multi- agent llm systems fail?arXiv preprint arXiv:2503.13657, 2025. 16

work page internal anchor Pith review Pith/arXiv arXiv 2025

[12] [12]

Homomorphic encryption for arithmetic of approximate numbers

Jung Hee Cheon, Andrey Kim, Miran Kim, and Yongsoo Song. Homomorphic encryption for arithmetic of approximate numbers. InInternational conference on the theory and application of cryptology and information security, pages 409–437. Springer, 2017

work page 2017

[13] [13]

Deep reinforcement learning from human preferences.Advances in neural information processing systems, 30, 2017

Paul F Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, and Dario Amodei. Deep reinforcement learning from human preferences.Advances in neural information processing systems, 30, 2017

work page 2017

[14] [14]

Intel sgx explained.Cryptology ePrint Archive, 2016

Victor Costan and Srinivas Devadas. Intel sgx explained.Cryptology ePrint Archive, 2016

work page 2016

[15] [15]

A lattice model of secure information flow.Communications of the ACM, 19(5):236–243, 1976

Dorothy E Denning. A lattice model of secure information flow.Communications of the ACM, 19(5):236–243, 1976

work page 1976

[16] [16]

A survey on in-context learning

Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Jingyuan Ma, Rui Li, Heming Xia, Jingjing Xu, Zhiyong Wu, Baobao Chang, et al. A survey on in-context learning. InProceedings of the 2024 conference on empirical methods in natural language processing, pages 1107–1128, 2024

work page 2024

[17] [17]

Building guardrails for large language models,

Yi Dong, Ronghui Mu, Gaojie Jin, Yi Qi, Jinwei Hu, Xingyu Zhao, Jie Meng, Wenjie Ruan, and Xiaowei Huang. Building guardrails for large language models.arXiv preprint arXiv:2402.01822, 2024

work page arXiv 2024

[18] [18]

Secure multi-party computation problems and their applications: a review and open problems

Wenliang Du and Mikhail J Atallah. Secure multi-party computation problems and their applications: a review and open problems. InProceedings of the 2001 workshop on New security paradigms, pages 13–22, 2001

work page 2001

[19] [19]

Improv- ing factuality and reasoning in language models through multiagent debate

Yilun Du, Shuang Li, Antonio Torralba, Joshua B Tenenbaum, and Igor Mordatch. Improv- ing factuality and reasoning in language models through multiagent debate. InForty-first international conference on machine learning, 2024

work page 2024

[20] [20]

Blockchain for the metaverse: A review.arXiv preprint arXiv:2203.09738, 2022

Thippa Reddy Gadekallu, Thien Huynh-The, Weizheng Wang, Gokul Yenduri, Pasika Ranaweera, Quoc-Viet Pham, Daniel Benevides da Costa, and Madhusanka Liyanage. Blockchain for the metaverse: A review.arXiv preprint arXiv:2203.09738, 2022

work page arXiv 2022

[21] [21]

A comprehensive survey of machine unlearning techniques for large language models.arXiv preprint arXiv:2503.01854, 2025

Jiahui Geng, Qing Li, Herbert Woisetschlaeger, Zongxiong Chen, Fengyu Cai, Yuxia Wang, Preslav Nakov, Hans-Arno Jacobsen, and Fakhri Karray. A comprehensive survey of machine unlearning techniques for large language models.arXiv preprint arXiv:2503.01854, 2025

work page arXiv 2025

[22] [22]

Secure multi-party computation, 1998

Oded Goldreich. Secure multi-party computation, 1998. Manuscript

work page 1998

[23] [23]

Definitions and properties of zero-knowledge proof systems

Oded Goldreich and Yair Oren. Definitions and properties of zero-knowledge proof systems. Journal of Cryptology, 7(1):1–32, 1994

work page 1994

[24] [24]

Alignment faking in large language models

Ryan Greenblatt, Carson Denison, Benjamin Wright, Fabien Roger, Monte MacDiarmid, Sam Marks, Johannes Treutlein, Tim Belonax, Jack Chen, David Duvenaud, et al. Alignment faking in large language models.arXiv preprint arXiv:2412.14093, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[25] [25]

Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection

Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection. InProceedings of the 16th ACM workshop on artificial intelligence and security, pages 79–90, 2023

work page 2023

[26] [26]

Building a secure agentic ai application leveraging a2a protocol.arXiv preprint arXiv:2504.16902, 2025

Idan Habler, Ken Huang, Vineeth Sai Narajala, and Prashant Kulkarni. Building a secure agentic ai application leveraging a2a protocol.arXiv preprint arXiv:2504.16902, 2025

work page arXiv 2025

[27] [27]

An Overview of Catastrophic AI Risks

Dan Hendrycks, Mantas Mazeika, and Thomas Woodside. An overview of catastrophic ai risks. arXiv preprint arXiv:2306.12001, 2023

work page internal anchor Pith review arXiv 2023

[28] [28]

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Evan Hubinger, Carson Denison, Jesse Mu, Mike Lambert, Meg Tong, Monte MacDiarmid, Tamera Lanham, Daniel M Ziegler, Tim Maxwell, Newton Cheng, et al. Sleeper agents: Training deceptive llms that persist through safety training.arXiv preprint arXiv:2401.05566, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[29] [29]

AI safety via debate

Geoffrey Irving, Paul Christiano, and Dario Amodei. Ai safety via debate.arXiv preprint arXiv:1805.00899, 2018. 17

work page internal anchor Pith review Pith/arXiv arXiv 2018

[30] [30]

V oting or consensus? decision-making in multi-agent debate

Lars Benedikt Kaesberg, Jonas Becker, Jan Philip Wahle, Terry Ruas, and Bela Gipp. V oting or consensus? decision-making in multi-agent debate. InFindings of the Association for Computational Linguistics: ACL 2025, pages 11640–11671, 2025

work page 2025

[31] [31]

Quantifying misalign- ment between agents: Towards a sociotechnical understanding of alignment

Aidan Kierans, Avijit Ghosh, Hananel Hazan, and Shiri Dori-Hacohen. Quantifying misalign- ment between agents: Towards a sociotechnical understanding of alignment. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 27365–27373, 2025

work page 2025

[32] [32]

Retrieval-augmented generation for knowledge-intensive nlp tasks.Advances in neural information processing systems, 33:9459–9474, 2020

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. Retrieval-augmented generation for knowledge-intensive nlp tasks.Advances in neural information processing systems, 33:9459–9474, 2020

work page 2020

[33] [33]

Agent-oriented planning in multi-agent systems.arXiv preprint arXiv:2410.02189, 2024

Ao Li, Yuexiang Xie, Songze Li, Fugee Tsung, Bolin Ding, and Yaliang Li. Agent-oriented planning in multi-agent systems.arXiv preprint arXiv:2410.02189, 2024

work page arXiv 2024

[34] [34]

Camel: Communicative agents for" mind" exploration of large language model society.Advances in neural information processing systems, 36:51991–52008, 2023

Guohao Li, Hasan Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. Camel: Communicative agents for" mind" exploration of large language model society.Advances in neural information processing systems, 36:51991–52008, 2023

work page 2023

[35] [35]

Alignment at pre-training! towards native alignment for arabic llms.Advances in Neural Information Processing Systems, 37:13872– 13896, 2024

Juhao Liang, Zhenyang Cai, Jianqing Zhu, Huang Huang, Kewei Zong, Bang An, Mosen Alharthi, Juncai He, Lian Zhang, Haizhou Li, et al. Alignment at pre-training! towards native alignment for arabic llms.Advances in Neural Information Processing Systems, 37:13872– 13896, 2024

work page 2024

[36] [36]

Axis: Efficient human-agent-computer interaction with api-first llm-based agents

Junting Lu, Zhiyang Zhang, Fangkai Yang, Jue Zhang, Lu Wang, Chao Du, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, and Qi Zhang. Axis: Efficient human-agent-computer interaction with api-first llm-based agents. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7711–7743, 2025

work page 2025

[37] [37]

Scalemcp: Dynamic and auto-synchronizing model context protocol tools for llm agents

Elias Lumer, Anmol Gulati, Vamse Kumar Subbiah, Pradeep Honaganahalli Basavaraju, and James A Burke. Scalemcp: Dynamic and auto-synchronizing model context protocol tools for llm agents. InInternational Joint Conference on Computational Intelligence, pages 23–42. Springer, 2025

work page 2025

[38] [38]

Towards Deep Learning Models Resistant to Adversarial Attacks

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks.arXiv preprint arXiv:1706.06083, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[39] [39]

Audit trails for accountability in large language models.arXiv preprint arXiv:2601.20727, 2026

Victor Ojewale, Harini Suresh, and Suresh Venkatasubramanian. Audit trails for accountability in large language models.arXiv preprint arXiv:2601.20727, 2026

work page arXiv 2026

[40] [40]

Moltbook: A social platform for autonomous ai agents

OpenClaw Community. Moltbook: A social platform for autonomous ai agents. https: //www.moltbook.com, 2026. Accessed: 2026-03-05

work page 2026

[41] [41]

Clawhub: The openclaw skill registry

OpenClaw Developers. Clawhub: The openclaw skill registry. https://clawhub.ai, 2026. Accessed: 2026-03-05

work page 2026

[42] [42]

Openclaw — personal ai assistant

OpenClaw Team. Openclaw — personal ai assistant. https://github.com/openclaw/ openclaw, 2026

work page 2026

[43] [43]

Training language models to follow instructions with human feedback.Advances in neural information processing systems, 35:27730–27744, 2022

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feedback.Advances in neural information processing systems, 35:27730–27744, 2022

work page 2022

[44] [44]

Generative agents: Interactive simulacra of human behavior

Joon Sung Park, Joseph C O’Brien, Carrie J Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. Generative agents: Interactive simulacra of human behavior. InAdvances in Neural Information Processing Systems (NeurIPS), 2023

work page 2023

[45] [45]

Discovering language model behaviors with model-written evaluations

Ethan Perez, Sam Ringer, Kamile Lukosiute, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, et al. Discovering language model behaviors with model-written evaluations. InFindings of the association for computational linguistics: ACL 2023, pages 13387–13434, 2023. 18

work page 2023

[46] [46]

Decentralized identifiers (dids) v1

Drummond Reed, Manu Sporny, Dave Longley, Christopher Allen, Ryan Grant, Markus Sabadello, and Jonathan Holt. Decentralized identifiers (dids) v1. 0.Draft Community Group Report, 2020

work page 2020

[47] [47]

Self-reflection in llm agents: Effects on problem-solving performance.arXiv preprint arXiv:2405.06682, 2024

Matthew Renze and Erhan Guven. Self-reflection in llm agents: Effects on problem-solving performance.arXiv preprint arXiv:2405.06682, 2024

work page arXiv 2024

[48] [48]

Trusted execution environment: What it is, and what it is not

Mohamed Sabt, Mohammed Achemlal, and Abdelmadjid Bouabdallah. Trusted execution environment: What it is, and what it is not. In2015 IEEE Trustcom/BigDataSE/Ispa, volume 1, pages 57–64. IEEE, 2015

work page 2015

[49] [49]

Role-based access control

Ravi S Sandhu. Role-based access control. InAdvances in computers, volume 46, pages 237–286. Elsevier, 1998

work page 1998

[50] [50]

Smart contract: Attacks and protections

Sarwar Sayeed, Hector Marco-Gisbert, and Tom Caira. Smart contract: Attacks and protections. Ieee Access, 8:24416–24427, 2020

work page 2020

[51] [51]

Toolformer: Language models can teach themselves to use tools.Advances in neural information processing systems, 36:68539– 68551, 2023

Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools.Advances in neural information processing systems, 36:68539– 68551, 2023

work page 2023

[52] [52]

Verifiable semantics for agent-to-agent communication.arXiv preprint arXiv:2602.16424, 2026

Philipp Schoenegger, Matt Carlson, Chris Schneider, and Chris Daly. Verifiable semantics for agent-to-agent communication.arXiv preprint arXiv:2602.16424, 2026

work page arXiv 2026

[53] [53]

Can you trust llm judgments? reliability of llm-as- a-judge.arXiv preprint arXiv:2412.12509, 2024

Kayla Schroeder and Zach Wood-Doughty. Can you trust llm judgments? reliability of llm-as- a-judge.arXiv preprint arXiv:2412.12509, 2024

work page arXiv 2024

[54] [54]

Character-llm: A trainable agent for role- playing

Yunfan Shao, Linyang Li, Junqi Dai, and Xipeng Qiu. Character-llm: A trainable agent for role- playing. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 13153–13187, 2023

work page 2023

[55] [55]

Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face.Advances in Neural Information Processing Systems, 36:38154–38180, 2023

Yongliang Shen, Kaitao Song, Xu Tan, Dongsheng Li, Weiming Lu, and Yueting Zhuang. Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face.Advances in Neural Information Processing Systems, 36:38154–38180, 2023

work page 2023

[56] [56]

Outside the closed world: On using machine learning for network intrusion detection

Robin Sommer and Vern Paxson. Outside the closed world: On using machine learning for network intrusion detection. In2010 IEEE symposium on security and privacy, pages 305–316. IEEE, 2010

work page 2010

[57] [57]

Authenticated delegation and authorized ai agents,

Tobin South, Samuele Marro, Thomas Hardjono, Robert Mahari, Cedric Deslandes Whitney, Dazza Greenwood, Alan Chan, and Alex Pentland. Authenticated delegation and authorized ai agents.arXiv preprint arXiv:2501.09674, 2025

work page arXiv 2025

[58] [58]

Cryptography with artificial intelligence: An overview

Öznur Suçeken and Osman Özkaraca. Cryptography with artificial intelligence: An overview. In The International Conference on Artificial Intelligence and Applied Mathematics in Engineering, pages 162–172. Springer, 2024

work page 2024

[59] [59]

A survey on zero-knowledge proof in blockchain.IEEE network, 35(4):198–205, 2021

Xiaoqiang Sun, F Richard Yu, Peng Zhang, Zhiwei Sun, Weixin Xie, and Xiang Peng. A survey on zero-knowledge proof in blockchain.IEEE network, 35(4):198–205, 2021

work page 2021

[60] [60]

Text-centric alignment for multi-modality learning.arXiv preprint arXiv:2402.08086, 2024

Yun-Da Tsai, Ting-Yu Yen, Pei-Fu Guo, Zhe-Yan Li, and Shou-De Lin. Text-centric alignment for multi-modality learning.arXiv preprint arXiv:2402.08086, 2024

work page arXiv 2024

[61] [61]

Efficient software- based fault isolation

Robert Wahbe, Steven Lucco, Thomas E Anderson, and Susan L Graham. Efficient software- based fault isolation. InProceedings of the fourteenth ACM symposium on Operating systems principles, pages 203–216, 1993

work page 1993

[62] [62]

Parameter-efficient fine-tuning in large language models: a survey of methodologies.Artificial Intelligence Review, 58(8):227, 2025

Luping Wang, Sheng Chen, Linnan Jiang, Shu Pan, Runze Cai, Sen Yang, and Fei Yang. Parameter-efficient fine-tuning in large language models: a survey of methodologies.Artificial Intelligence Review, 58(8):227, 2025. 19

work page 2025

[63] [63]

Adversarial preference learning for robust llm alignment

Yuanfu Wang, Pengyu Wang, Chenyang Xi, Bo Tang, Junyi Zhu, Wenqiang Wei, Chen Chen, Chao Yang, Jingfeng Zhang, Chaochao Lu, et al. Adversarial preference learning for robust llm alignment. InFindings of the Association for Computational Linguistics: ACL 2025, pages 21865–21881, 2025

work page 2025

[64] [64]

A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT

Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. A prompt pattern catalog to enhance prompt engineering with chatgpt.arXiv preprint arXiv:2302.11382, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[65] [65]

Intelligent agents: Theory and practice.The knowledge engineering review, 10(2):115–152, 1995

Michael Wooldridge and Nicholas R Jennings. Intelligent agents: Theory and practice.The knowledge engineering review, 10(2):115–152, 1995

work page 1995

[66] [66]

Autogen: Enabling next-gen llm applications via multi-agent conversations

Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, et al. Autogen: Enabling next-gen llm applications via multi-agent conversations. InFirst conference on language modeling, 2024

work page 2024

[67] [67]

& Peng, J

Renjun Xu and Jingwen Peng. A comprehensive survey of deep research: Systems, methodolo- gies, and applications.arXiv preprint arXiv:2506.12594, 2025

work page arXiv 2025

[68] [68]

Auto-gpt for online decision making: Benchmarks and additional opinions.arXiv preprint arXiv:2306.02224, 2023

Hui Yang, Sifu Yue, and Yunzhong He. Auto-gpt for online decision making: Benchmarks and additional opinions.arXiv preprint arXiv:2306.02224, 2023

work page arXiv 2023

[69] [69]

Swe-agent: Agent-computer interfaces enable automated software engineering.Advances in Neural Information Processing Systems, 37:50528–50652, 2024

John Yang, Carlos E Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press. Swe-agent: Agent-computer interfaces enable automated software engineering.Advances in Neural Information Processing Systems, 37:50528–50652, 2024

work page 2024

[70] [70]

React: Synergizing reasoning and acting in language models

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Gao, et al. React: Synergizing reasoning and acting in language models. InInternational Conference on Learning Representations (ICLR), 2023

work page 2023

[71] [71]

The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

Guibin Zhang, Hejia Geng, Xiaohang Yu, Zhenfei Yin, Zaibin Zhang, Zelin Tan, Heng Zhou, Zhongzhi Li, Xiangyuan Xue, Yijiang Li, et al. The landscape of agentic reinforcement learning for llms: A survey.arXiv preprint arXiv:2509.02547, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[72] [72]

A survey on the memory mechanism of large language model-based agents

Zeyu Zhang, Quanyu Dai, Xiaohe Bo, Chen Ma, Rui Li, Xu Chen, Jieming Zhu, Zhenhua Dong, and Ji-Rong Wen. A survey on the memory mechanism of large language model-based agents. ACM Transactions on Information Systems, 43(6):1–47, 2025

work page 2025

[73] [73]

Judging llm-as-a-judge with mt-bench and chatbot arena.Advances in neural information processing systems, 36:46595–46623, 2023

Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric Xing, et al. Judging llm-as-a-judge with mt-bench and chatbot arena.Advances in neural information processing systems, 36:46595–46623, 2023

work page 2023

[74] [74]

A training-free llm-based approach to general chinese character error correction

Houquan Zhou, Bo Zhang, Zhenghua Li, Ming Yan, and Min Zhang. A training-free llm-based approach to general chinese character error correction. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 13827–13852, 2025

work page 2025

[75] [75]

Universal and Transferable Adversarial Attacks on Aligned Language Models

Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr, J Zico Kolter, and Matt Fredrikson. Universal and transferable adversarial attacks on aligned language models.arXiv preprint arXiv:2307.15043, 2023. 20

work page internal anchor Pith review Pith/arXiv arXiv 2023