pith. sign in

arxiv: 2605.17169 · v1 · pith:TNEOYDE4new · submitted 2026-05-16 · 💻 cs.AI · cs.CL· cs.MA

Responsible Agentic AI Requires Explicit Provenance

Pith reviewed 2026-05-20 14:03 UTC · model grok-4.3

classification 💻 cs.AI cs.CLcs.MA
keywords agentic AIprovenanceresponsibility attributioncausal attributionAI lifecycleAI safetymulti-agent systemsaccountability
0
0 comments X

The pith

Explicit provenance across the full agentic lifecycle is the necessary condition for making responsibility in AI computable and actionable.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that agentic AI is spreading into domains like software engineering while public trust lags because responsibility stays subjective when harms arise from compositions no single party designed. Current frameworks produce no quantifiable, traceable, or interventionable records that would let stakeholders assign accountability. The authors argue that embedding explicit provenance throughout the lifecycle supplies the missing structural basis, allowing responsibility to shift from discussion to computation via formal mechanisms and practical implementation across layers.

Core claim

Explicit provenance is not optional but the necessary condition for responsible agentic AI, as only it supplies the quantifiable, traceable, and interventionable data needed to assign responsibility when harm emerges from agent compositions no single party designed.

What carries the argument

Explicit provenance, encoded through a causal attribution function and responsibility tensor and maintained across four lifecycle layers to support online estimation and intervention.

If this is right

  • Responsibility gaps across sociotechnical dimensions become identifiable once provenance records are available.
  • Provenance becomes estimable and interveneable online in preliminary experiments before irreversible harm accumulates.
  • A concrete agentic incident can be analyzed to determine which parties bear responsibility.
  • No stakeholder in the agentic AI ecosystem can treat explicit provenance as discretionary.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Mandates for provenance logging could emerge in future AI regulations to enforce accountability.
  • Performance or privacy costs of maintaining detailed provenance would need separate measurement in deployed systems.
  • The same traceability approach might apply to harms in non-agentic AI that involve chained decisions.
  • Automated tools for real-time responsibility scoring could be built on top of the proposed tensor.

Load-bearing premise

That current agentic systems generate no usable provenance today and that adding explicit provenance will directly render responsibility computable without extra mechanisms or major trade-offs.

What would settle it

A concrete multi-agent incident in which full explicit provenance is recorded yet responsibility for resulting harm still cannot be assigned to any stakeholder or intervened upon before damage occurs.

Figures

Figures reproduced from arXiv: 2605.17169 by Jinwei Hu, Qisong He, Xiaowei Huang, Xinmiao Huang, Yi Dong, Youcheng Sun.

Figure 1
Figure 1. Figure 1: Per-component trustworthy AI audits components in isolation ( [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Online causal signal is estimable from agent execution prefixes, supporting required properties for responsible agent. AUPRC measures how well a monitor identifies failing trajectories before harm materializes. NeSy monitors substantially outperform random and zero-shot LLM baselines. We report preliminary experi￾ments targeting the L2 with detailed implementation de￾tails in Appendix A, where online prove… view at source ↗
read the original abstract

Agentic AI is rapidly proliferating across diverse real-world domains such as software engineering, yet public trust has not kept pace. The central reason is that responsibility, despite being widely discussed, remains a subjective and unenforced concept, as no current agentic framework produces the quantifiable, traceable, and interventionable provenance needed to assign it when harm emerges from compositions no single party designed. We position that what is missing is not better benchmark-level evaluation but $\textbf{explicit provenance}$ across the full agentic lifecycle, which is the only viable basis for making responsibility computable and actionable. We advance this agenda along four axes: establishing $\textit{why}$ such provenance is a structural necessity by identifying responsibility gaps across sociotechnical dimensions, formalizing $\textit{what}$ it must encode through a causal attribution function and responsibility tensor, discussing $\textit{how}$ it can be made computable across four lifecycle layers with preliminary experiments showing that provenance is estimable and interveneable online before irreversible harm accumulates, and examining $\textit{who}$ bears responsibility through a concrete agentic incident. Explicit provenance is not a discretionary refinement but the necessary condition for responsible agentic AI, and no stakeholder across its ecosystem can afford to treat it as optional.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript claims that current agentic AI frameworks fail to produce quantifiable, traceable, and interventionable provenance, leaving responsibility subjective and unenforced when harm emerges from compositions with no single designer. It positions explicit provenance across the full lifecycle as the necessary condition for making responsibility computable, formalized via a causal attribution function and responsibility tensor. The work advances this via four axes: sociotechnical responsibility gaps, formal encoding, computability across four lifecycle layers (with preliminary experiments on online estimability and interveneability), and illustration through a concrete agentic incident.

Significance. If the proposed provenance structures can be shown to suffice without unstated mechanisms or prohibitive trade-offs, the framework could provide a concrete basis for accountability in multi-agent systems, addressing a core barrier to trust in deployed agentic AI. The four-axis structure offers a useful organizing agenda, and the preliminary experiments on estimability provide an initial empirical foothold.

major comments (3)
  1. [Abstract] Abstract and central positioning paragraph: the claim that explicit provenance is 'the only viable basis for making responsibility computable' is load-bearing yet rests on the unverified premise that the causal attribution function and responsibility tensor suffice for emergent interactions in compositions with no single designer; the cited preliminary experiments demonstrate estimability and interveneability but do not test generalization or elimination of supplementary causal assumptions.
  2. [Formalization section] Formalization of the responsibility tensor and causal attribution function (in the 'what' axis section): these constructs risk circularity because they appear defined primarily with reference to the desired responsibility outcomes rather than independent external benchmarks or falsifiable criteria, which weakens the assertion that they render responsibility computable.
  3. [Lifecycle layers section] Discussion of computability across four lifecycle layers (in the 'how' axis section): while preliminary experiments are reported as showing online estimability and interveneability before irreversible harm, the manuscript does not examine scalability limits, performance/privacy trade-offs, or additional mechanisms needed when harm arises from agent interactions, leaving the necessity claim under-supported.
minor comments (2)
  1. [Notation and definitions] The introduction of novel terms such as 'responsibility tensor' would be clarified by an explicit comparison to related concepts in causal inference and data provenance literature.
  2. [Incident analysis] Ensure the concrete agentic incident example includes sufficient detail on the four lifecycle layers to allow readers to trace the provenance encoding.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their insightful comments, which have helped us identify areas for improvement in the manuscript. We address each of the major comments point by point below, providing clarifications and indicating planned revisions where necessary.

read point-by-point responses
  1. Referee: [Abstract] Abstract and central positioning paragraph: the claim that explicit provenance is 'the only viable basis for making responsibility computable' is load-bearing yet rests on the unverified premise that the causal attribution function and responsibility tensor suffice for emergent interactions in compositions with no single designer; the cited preliminary experiments demonstrate estimability and interveneability but do not test generalization or elimination of supplementary causal assumptions.

    Authors: The manuscript argues that explicit provenance is necessary due to the structural responsibility gaps in existing agentic AI systems, as detailed in the 'why' axis. The causal attribution function draws from established causal inference methods, and the responsibility tensor provides a formal structure for aggregation. While we recognize that the preliminary experiments are limited and do not fully test generalization across all emergent interactions, the claim is positioned as a necessary condition rather than a complete sufficiency proof. We will revise the abstract to clarify this distinction and add a new subsection on assumptions and limitations to better support the positioning. revision: partial

  2. Referee: [Formalization section] Formalization of the responsibility tensor and causal attribution function (in the 'what' axis section): these constructs risk circularity because they appear defined primarily with reference to the desired responsibility outcomes rather than independent external benchmarks or falsifiable criteria, which weakens the assertion that they render responsibility computable.

    Authors: We maintain that the formalization avoids circularity. The causal attribution function is specified using interventionist causal models (e.g., via do-operators on agent interaction graphs), which are defined independently of responsibility outcomes. The responsibility tensor then operationalizes these attributions into a computable form. To prevent any misinterpretation of circularity, we will include additional explanations linking to falsifiable criteria from causal discovery literature and external benchmarks in the revised formalization section. revision: yes

  3. Referee: [Lifecycle layers section] Discussion of computability across four lifecycle layers (in the 'how' axis section): while preliminary experiments are reported as showing online estimability and interveneability before irreversible harm, the manuscript does not examine scalability limits, performance/privacy trade-offs, or additional mechanisms needed when harm arises from agent interactions, leaving the necessity claim under-supported.

    Authors: We agree with this observation. The current experiments focus on demonstrating basic online estimability and interveneability in controlled settings. The manuscript does not delve into scalability or specific trade-offs, which are indeed important for real-world applicability, especially in multi-agent scenarios. We will revise the 'how' axis to include an expanded discussion of these aspects, potential performance and privacy implications, and proposed mechanisms for handling emergent interactions, along with directions for future empirical work. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the derivation chain

full rationale

The paper identifies gaps in current agentic frameworks (no quantifiable/traceable/interventionable provenance for harm from multi-party compositions) and positions explicit provenance as the necessary condition for making responsibility computable. It advances this via sociotechnical analysis, formalization through a causal attribution function and responsibility tensor, lifecycle-layer discussion, and a concrete incident example. No equations, self-citations, or definitions are present in the provided text that reduce the central claim to its inputs by construction (e.g., no fitted parameter renamed as prediction, no self-definitional loop where the tensor is defined solely in terms of computability). The formal constructs are introduced as an independent proposal rather than a renaming or self-referential fit. The argument remains self-contained against external benchmarks of responsibility gaps and does not rely on load-bearing self-citation chains or ansatzes smuggled from prior work.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The paper rests on the domain assumption that no current framework produces the required provenance and introduces new constructs without independent evidence or derivations from external benchmarks.

axioms (1)
  • domain assumption Current agentic frameworks produce no quantifiable, traceable, and interventionable provenance for responsibility assignment.
    Stated as the central reason public trust has not kept pace with proliferation.
invented entities (2)
  • responsibility tensor no independent evidence
    purpose: To formalize what provenance must encode for computable responsibility.
    Introduced in the what axis as part of the formalization.
  • causal attribution function no independent evidence
    purpose: To enable attribution across agent compositions for responsibility.
    Defined alongside the responsibility tensor in the formalization step.

pith-pipeline@v0.9.0 · 5759 in / 1304 out tokens · 45270 ms · 2026-05-20T14:03:28.130644+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

78 extracted references · 78 canonical work pages · 8 internal anchors

  1. [1]

    GPT-4 Technical Report

    Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774, 2023

  2. [2]

    Frontier ai regulation: Managing emerging risks to public safety, 2023

    Markus Anderljung, Joslyn Barnhart, Anton Korinek, Jade Leung, Cullen O’Keefe, Jess Whittle- stone, Shahar Avin, Miles Brundage, Justin Bullock, Duncan Cass-Beggs, Ben Chang, Tantum Collins, Tim Fist, Gillian Hadfield, Alan Hayes, Lewis Ho, Sara Hooker, Eric Horvitz, Noam Kolt, Jonas Schuett, Yonadav Shavit, Divya Siddarth, Robert Trager, and Kevin Wolf. ...

  3. [3]

    Agentharm: A benchmark for measuring harmfulness of llm agents

    Maksym Andriushchenko, Alexandra Souly, Mateusz Dziemian, Derek Duenas, Maxwell Lin, Justin Wang, Dan Hendrycks, Andy Zou, J Zico Kolter, Matt Fredrikson, et al. Agentharm: A benchmark for measuring harmfulness of llm agents. InThe Thirteenth International Conference on Learning Representations

  4. [4]

    Conformal risk control

    Anastasios Nikolas Angelopoulos, Stephen Bates, Adam Fisch, Lihua Lei, and Tal Schuster. Conformal risk control. InThe Twelfth International Conference on Learning Representations, 2024

  5. [5]

    Tool use with claude, 2026

    Anthropic. Tool use with claude, 2026. Claude API documentation

  6. [6]

    The conclusion of contracts by software agents in the eyes of the law

    Tina Balke and Torsten Eymann. The conclusion of contracts by software agents in the eyes of the law. InProceedings of the 7th international joint conference on Autonomous agents and multiagent systems-Volume 2, pages 771–778, 2008

  7. [7]

    $\tau^2$-Bench: Evaluating Conversational Agents in a Dual-Control Environment

    Victor Barres, Honghua Dong, Soham Ray, Xujie Si, and Karthik Narasimhan. τ 2-Bench: Evaluating conversational agents in a dual-control environment.arXiv:2506.07982, 2025

  8. [8]

    Picking on the same person: Does algorithmic monoculture lead to outcome homogenization?Advances in neural information processing systems, 35:3663–3678, 2022

    Rishi Bommasani, Kathleen A Creel, Ananya Kumar, Dan Jurafsky, and Percy S Liang. Picking on the same person: Does algorithmic monoculture lead to outcome homogenization?Advances in neural information processing systems, 35:3663–3678, 2022

  9. [9]

    Language models are few-shot learners.Advances in neural information processing systems, 33:1877–1901, 2020

    Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners.Advances in neural information processing systems, 33:1877–1901, 2020

  10. [10]

    Harms from increasingly agentic algorithmic systems

    Alan Chan, Rebecca Salganik, Alva Markelius, Chris Pang, Nitarshan Rajkumar, Dmitrii Krasheninnikov, Lauro Langosco, Zhonghao He, Yawen Duan, Micah Carroll, et al. Harms from increasingly agentic algorithmic systems. InProceedings of the 2023 ACM conference on fairness, accountability, and transparency, pages 651–666, 2023

  11. [11]

    A survey on trust modeling.ACM Computing Surveys (CSUR), 48(2):1–40, 2015

    Jin-Hee Cho, Kevin Chan, and Sibel Adali. A survey on trust modeling.ACM Computing Surveys (CSUR), 48(2):1–40, 2015

  12. [12]

    Understanding accountability in algorithmic supply chains

    Jennifer Cobbe, Michael Veale, and Jatinder Singh. Understanding accountability in algorithmic supply chains. InProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, pages 1186–1197, 2023

  13. [13]

    Proposal for a directive of the european parliament and of the council on adapting noncontractual civil liability rules to artificial intelligence (ai liability directive)

    EU Commission et al. Proposal for a directive of the european parliament and of the council on adapting noncontractual civil liability rules to artificial intelligence (ai liability directive). European Commission, 2022

  14. [14]

    Ai governance: a research agenda.Governance of AI Program, Future of Humanity Institute, University of Oxford: Oxford, UK, 1442:1443, 2018

    Allan Dafoe. Ai governance: a research agenda.Governance of AI Program, Future of Humanity Institute, University of Oxford: Oxford, UK, 1442:1443, 2018

  15. [15]

    Business and it leaders report ai agents are scaling faster than their guardrails, 2026

    Deloitte. Business and it leaders report ai agents are scaling faster than their guardrails, 2026

  16. [16]

    Safeguarding large language models: A survey.Artificial intelligence review, 58(12):382, 2025

    Yi Dong, Ronghui Mu, Yanghao Zhang, Siqi Sun, Tianle Zhang, Changshun Wu, Gaojie Jin, Yi Qi, Jinwei Hu, Jie Meng, et al. Safeguarding large language models: A survey.Artificial intelligence review, 58(12):382, 2025. 10

  17. [17]

    Accountability of ai under the law: The role of explanation,

    Finale Doshi-Velez, Mason Kortz, Ryan Budish, Chris Bavitz, Sam Gershman, David O’Brien, Kate Scott, Stuart Schieber, James Waldo, David Weinberger, et al. Accountability of ai under the law: The role of explanation.arXiv preprint arXiv:1711.01134, 2017

  18. [18]

    Genai against humanity: Nefarious applications of generative artificial intel- ligence and large language models.Journal of Computational Social Science, 7(1):549–569, 2024

    Emilio Ferrara. Genai against humanity: Nefarious applications of generative artificial intel- ligence and large language models.Journal of Computational Social Science, 7(1):549–569, 2024

  19. [19]

    arXiv preprint arXiv:2404.16244 (2024).https://doi.org/10.48550/arXiv.2404.16244

    Iason Gabriel, Arianna Manzini, Geoff Keeling, Lisa Anne Hendricks, Verena Rieser, Hasan Iqbal, Nenad Tomašev, Ira Ktena, Zachary Kenton, Mikel Rodriguez, et al. The ethics of advanced ai assistants.arXiv preprint arXiv:2404.16244, 2024

  20. [20]

    Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

    Deep Ganguli, Liane Lovitt, Jackson Kernion, Amanda Askell, Yuntao Bai, Saurav Kadavath, Ben Mann, Ethan Perez, Nicholas Schiefer, Kamal Ndousse, et al. Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned.arXiv preprint arXiv:2209.07858, 2022

  21. [21]

    Neurosymbolic ai: The 3 rd wave.Artificial Intelligence Review, 56(11):12387–12406, 2023

    Artur d’Avila Garcez and Luis C Lamb. Neurosymbolic ai: The 3 rd wave.Artificial Intelligence Review, 56(11):12387–12406, 2023

  22. [22]

    Causal abstractions of neural networks.Advances in neural information processing systems, 34:9574–9586, 2021

    Atticus Geiger, Hanson Lu, Thomas Icard, and Christopher Potts. Causal abstractions of neural networks.Advances in neural information processing systems, 34:9574–9586, 2021

  23. [23]

    Skillprobe: Security auditing for emerging agent skill marketplaces via multi-agent collaboration

    Zihan Guo, Zhiyu Chen, Xiaohang Nie, Jianghao Lin, Yuanjian Zhou, and Weinan Zhang. Skillprobe: Security auditing for emerging agent skill marketplaces via multi-agent collaboration. arXiv preprint arXiv:2603.21019, 2026

  24. [24]

    Artificial intelligence in health care: accountability and safety.Bulletin of the World Health Organization, 98(4):251, 2020

    Ibrahim Habli, Tom Lawton, and Zoe Porter. Artificial intelligence in health care: accountability and safety.Bulletin of the World Health Organization, 98(4):251, 2020

  25. [25]

    Metagpt: Meta programming for a multi-agent collaborative framework

    Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, et al. Metagpt: Meta programming for a multi-agent collaborative framework. InThe twelfth international conference on learning representations, 2023

  26. [26]

    Ramchurn, and Xiaowei Huang

    Jinwei Hu, Yi Dong, Shuang Ao, Zhuoyun Li, Boxuan Wang, Lokesh Singh, Guangliang Cheng, Sarvapali D. Ramchurn, and Xiaowei Huang. Stop reducing responsibility in llm-powered multi-agent systems to local alignment, 2025

  27. [27]

    Enhancing robustness of llm- driven multi-agent systems through randomized smoothing.Chinese Journal of Aeronautics, page 103779, 2025

    Jinwei HU, Yi DONG, Zhengtao DING, and Xiaowei HUANG. Enhancing robustness of llm- driven multi-agent systems through randomized smoothing.Chinese Journal of Aeronautics, page 103779, 2025

  28. [28]

    Tapas are free! training-free adaptation of programmatic agents via llm-guided program synthesis in dynamic environments

    Jinwei Hu, Yi Dong, Youcheng Sun, and Xiaowei Huang. Tapas are free! training-free adaptation of programmatic agents via llm-guided program synthesis in dynamic environments. Proceedings of the AAAI Conference on Artificial Intelligence, 40(35):29477–29485, Mar. 2026

  29. [29]

    Lying with truths: Open-channel multi-agent collusion for belief manipulation via generative montage, 2026

    Jinwei Hu, Xinmiao Huang, Youcheng Sun, Yi Dong, and Xiaowei Huang. Lying with truths: Open-channel multi-agent collusion for belief manipulation via generative montage, 2026

  30. [30]

    Hurst and Nicole D

    Kristin F. Hurst and Nicole D. Sintov. Trusting autonomous vehicles as moral agents improves related policy support.Frontiers in Psychology, V olume 13 - 2022, 2022

  31. [31]

    SoK: Agentic Skills -- Beyond Tool Use in LLM Agents

    Yanna Jiang, Delong Li, Haiyu Deng, Baihe Ma, Xu Wang, Qin Wang, and Guangsheng Yu. Sok: Agentic skills–beyond tool use in llm agents.arXiv preprint arXiv:2602.20867, 2026

  32. [32]

    Os-harm: A benchmark for measuring safety of computer use agents

    Thomas Kuntz, Agatha Duzan, Hao Zhao, Francesco Croce, J Zico Kolter, Nicolas Flammarion, and Maksym Andriushchenko. Os-harm: A benchmark for measuring safety of computer use agents. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track

  33. [33]

    Large language models portray socially subordinate groups as more homogeneous, consistent with a bias observed in humans

    Messi HJ Lee, Jacob M Montgomery, and Calvin K Lai. Large language models portray socially subordinate groups as more homogeneous, consistent with a bias observed in humans. In Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, pages 1321–1340, 2024. 11

  34. [34]

    Trustworthy ai: From principles to practices.ACM Computing Surveys, 55(9):1–46, 2023

    Bo Li, Peng Qi, Bo Liu, Shuai Di, Jingen Liu, Jiquan Pei, Jinfeng Yi, and Bowen Zhou. Trustworthy ai: From principles to practices.ACM Computing Surveys, 55(9):1–46, 2023

  35. [35]

    SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks

    Xiangyi Li, Wenbo Chen, Yimin Liu, Shenghan Zheng, Xiaokun Chen, Yifeng He, Yubo Li, Bingran You, Haotian Shen, Jiankai Sun, et al. Skillsbench: Benchmarking how well agent skills work across diverse tasks.arXiv preprint arXiv:2602.12670, 2026

  36. [36]

    Holistic evaluation of language models.Transactions on Machine Learning Research

    Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Yasunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, Ananya Kumar, et al. Holistic evaluation of language models.Transactions on Machine Learning Research

  37. [37]

    Trustworthy ai: A computational perspective.ACM Transactions on Intelligent Systems and Technology, 14(1):1–59, 2022

    Haochen Liu, Yiqi Wang, Wenqi Fan, Xiaorui Liu, Yaxin Li, Shaili Jain, Yunhao Liu, Anil Jain, and Jiliang Tang. Trustworthy ai: A computational perspective.ACM Transactions on Intelligent Systems and Technology, 14(1):1–59, 2022

  38. [38]

    Out-of- distribution detection: A task-oriented survey of recent advances.ACM Computing Surveys, 58(2):1–39, 2025

    Shuo Lu, Yingsheng Wang, Lijun Sheng, Lingxiao He, Aihua Zheng, and Jian Liang. Out-of- distribution detection: A task-oriented survey of recent advances.ACM Computing Surveys, 58(2):1–39, 2025

  39. [39]

    The responsibility gap: Ascribing responsibility for the actions of learning automata.Ethics and information technology, 6(3):175–183, 2004

    Andreas Matthias. The responsibility gap: Ascribing responsibility for the actions of learning automata.Ethics and information technology, 6(3):175–183, 2004

  40. [40]

    The state of ai in 2025: Agents, innovation, and transformation, 2025

    McKinsey & Company. The state of ai in 2025: Agents, innovation, and transformation, 2025

  41. [41]

    State of ai trust in 2026: Shifting to the agentic era, 2026

    McKinsey & Company. State of ai trust in 2026: Shifting to the agentic era, 2026

  42. [42]

    Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces

    Mike A Merrill, Alexander G Shaw, Nicholas Carlini, Boxuan Li, Harsh Raj, Ivan Bercovich, Lin Shi, Jeong Yeon Shin, Thomas Walshe, E Kelly Buchanan, et al. Terminal-bench: Benchmarking agents on hard, realistic tasks in command line interfaces.arXiv preprint arXiv:2601.11868, 2026

  43. [43]

    Exploring the potential of llms as personalized assistants: Dataset, evaluation, and analysis

    Jisoo Mok, Ik-hwan Kim, Sangkwon Park, and Sungroh Yoon. Exploring the potential of llms as personalized assistants: Dataset, evaluation, and analysis. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 10212–10239, 2025

  44. [44]

    Accountability in artificial intelli- gence: what it is and how it works.Ai & Society, 39(4):1871–1882, 2024

    Claudio Novelli, Mariarosaria Taddeo, and Luciano Floridi. Accountability in artificial intelli- gence: what it is and how it works.Ai & Society, 39(4):1871–1882, 2024

  45. [45]

    Audit trails for accountability in large language models.arXiv preprint arXiv:2601.20727, 2026

    Victor Ojewale, Harini Suresh, and Suresh Venkatasubramanian. Audit trails for accountability in large language models.arXiv preprint arXiv:2601.20727, 2026

  46. [46]

    Contracting by artificial intelligence: Open offers, unilateral mistakes, and why algorithms are not agents.ANU Journal of Law and Technology, 2(1):45–87, 2021

    Matthew Oliver. Contracting by artificial intelligence: Open offers, unilateral mistakes, and why algorithms are not agents.ANU Journal of Law and Technology, 2(1):45–87, 2021

  47. [47]

    Introducing operator, 2025

    OpenAI. Introducing operator, 2025

  48. [48]

    New tools for building agents, 2025

    OpenAI. New tools for building agents, 2025

  49. [49]

    Clawhub: Skill directory for openclaw, 2026

    OpenClaw. Clawhub: Skill directory for openclaw, 2026

  50. [50]

    Openclaw: Personal ai assistant, 2026

    OpenClaw. Openclaw: Personal ai assistant, 2026

  51. [51]

    Generative agents: Interactive simulacra of human behavior

    Joon Sung Park, Joseph O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. Generative agents: Interactive simulacra of human behavior. InProceed- ings of the 36th annual acm symposium on user interface software and technology, pages 1–22, 2023

  52. [52]

    MIT press, 2017

    Jonas Peters, Dominik Janzing, and Bernhard Scholkopf.Elements of causal inference: founda- tions and learning algorithms. MIT press, 2017

  53. [53]

    Unravelling responsibility for ai.Journal of Responsible Technology, page 100124, 2025

    Zoe Porter, Philippa Ryan, Phillip Morgan, Joanna Al-Qaddoumi, Bernard Twomey, Paul Noordhof, John McDermid, and Ibrahim Habli. Unravelling responsibility for ai.Journal of Responsible Technology, page 100124, 2025. 12

  54. [54]

    Pwc’s ai agent survey, 2025

    PwC. Pwc’s ai agent survey, 2025

  55. [55]

    Closing the ai accountability gap: Defining an end-to-end framework for internal algorithmic auditing

    Inioluwa Deborah Raji, Andrew Smart, Rebecca N White, Margaret Mitchell, Timnit Gebru, Ben Hutchinson, Jamila Smith-Loud, Daniel Theron, and Parker Barnes. Closing the ai accountability gap: Defining an end-to-end framework for internal algorithmic auditing. In Proceedings of the 2020 conference on fairness, accountability, and transparency, pages 33–44, 2020

  56. [56]

    Identifying the risks of lm agents with an lm-emulated sandbox

    Yangjun Ruan, Honghua Dong, Andrew Wang, Silviu Pitis, Yongchao Zhou, Jimmy Ba, Yann Dubois, Chris J Maddison, and Tatsunori Hashimoto. Identifying the risks of lm agents with an lm-emulated sandbox. InThe Twelfth International Conference on Learning Representations

  57. [57]

    Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead.Nature machine intelligence, 1(5):206–215, 2019

    Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead.Nature machine intelligence, 1(5):206–215, 2019

  58. [58]

    Toolformer: Language models can teach themselves to use tools.Advances in neural information processing systems, 36:68539– 68551, 2023

    Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools.Advances in neural information processing systems, 36:68539– 68551, 2023

  59. [59]

    The 2025 AI Agent Index: Documenting Technical and Safety Features of Deployed Agentic AI Systems

    Leon Staufer, Kevin Feng, Kevin Wei, Luke Bailey, Yawen Duan, Mick Yang, A Pinar Ozisik, Stephen Casper, and Noam Kolt. The 2025 ai agent index: Documenting technical and safety features of deployed agentic ai systems.arXiv preprint arXiv:2602.17753, 2026

  60. [60]

    Accountability in offline reinforcement learning: Explaining decisions with a corpus of examples.Advances in Neural Information Processing Systems, 36:3143–3172, 2023

    Hao Sun, Alihan Hüyük, Daniel Jarrett, and Mihaela van der Schaar. Accountability in offline reinforcement learning: Explaining decisions with a corpus of examples.Advances in Neural Information Processing Systems, 36:3143–3172, 2023

  61. [61]

    Llama 2: Open Foundation and Fine-Tuned Chat Models

    Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. Llama 2: Open foundation and fine-tuned chat models.arXiv preprint arXiv:2307.09288, 2023

  62. [62]

    Find the gap: Ai, responsible agency and vulnerability

    Shannon Vallor and Tillmann Vierkant. Find the gap: Ai, responsible agency and vulnerability. Minds and Machines, 34(3):20, 2024

  63. [63]

    Springer Science & Business Media, 2011

    Nicole A Vincent, Ibo Van de Poel, and Jeroen Van Den Hoven.Moral responsibility: Beyond free will and determinism. Springer Science & Business Media, 2011

  64. [64]

    Machines without principals: liability rules and artificial intelligence.Wash

    David C Vladeck. Machines without principals: liability rules and artificial intelligence.Wash. L. Rev., 89:117, 2014

  65. [65]

    A survey on large language model based autonomous agents.Frontiers of Computer Science, 18(6):186345, 2024

    Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, et al. A survey on large language model based autonomous agents.Frontiers of Computer Science, 18(6):186345, 2024

  66. [66]

    Freematch: Self- adaptive thresholding for semi-supervised learning

    Yidong Wang, Hao Chen, Qiang Heng, Wenxin Hou, Yue Fan, Zhen Wu, Jindong Wang, Marios Savvides, Takahiro Shinozaki, Bhiksha Raj, Bernt Schiele, and Xing Xie. Freematch: Self- adaptive thresholding for semi-supervised learning. InThe Eleventh International Conference on Learning Representations, 2023

  67. [67]

    Taxonomy of risks posed by language models.Proceedings of the ACM Conference on Fairness, Accountability, and Transparency, pages 214–229, 2022

    Laura Weidinger, Jonathan Uesato, Maribeth Rauh, Conor Griffin, Po-Sen Huang, John Mellor, Amelia Glaese, Myra Cheng, Borja Balle, Atoosa Kasirzadeh, et al. Taxonomy of risks posed by language models.Proceedings of the ACM Conference on Fairness, Accountability, and Transparency, pages 214–229, 2022

  68. [68]

    What to account for when accounting for algorithms: a systematic literature review on algorithmic accountability

    Maranke Wieringa. What to account for when accounting for algorithms: a systematic literature review on algorithmic accountability. InProceedings of the 2020 conference on fairness, accountability, and transparency, pages 1–18, 2020

  69. [69]

    Autogen: Enabling next-gen llm applications via multi-agent conversations

    Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, et al. Autogen: Enabling next-gen llm applications via multi-agent conversations. InFirst conference on language modeling, 2024. 13

  70. [70]

    The rise and potential of large language model based agents: A survey.Science China Information Sciences, 68(2):121101, 2025

    Zhiheng Xi, Wenxiang Chen, Xin Guo, Wei He, Yiwen Ding, Boyang Hong, Ming Zhang, Junzhe Wang, Senjie Jin, Enyu Zhou, et al. The rise and potential of large language model based agents: A survey.Science China Information Sciences, 68(2):121101, 2025

  71. [71]

    Frank F. Xu, Yufan Song, Boxuan Li, Yuxuan Tang, Kritanjali Jain, Mengxue Bao, Zora Zhiruo Wang, Xuhui Zhou, Zhitong Guo, Murong Cao, Mingyang Yang, Hao Yang Lu, Amaad Martin, Zhe Su, Leander Melroy Maben, Raj Mehta, Wayne Chi, Lawrence Keunho Jang, Yiqing Xie, Shuyan Zhou, and Graham Neubig. Theagentcompany: Benchmarking LLM agents on consequential real ...

  72. [72]

    React: Synergizing reasoning and acting in language models

    Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models. InInternational Conference on Learning Representations (ICLR), 2023

  73. [73]

    Long-term fairness with unknown dynamics.Advances in Neural Information Processing Systems, 36:55110–55139, 2023

    Tongxin Yin, Reilly Raab, Mingyan Liu, and Yang Liu. Long-term fairness with unknown dynamics.Advances in Neural Information Processing Systems, 36:55110–55139, 2023

  74. [74]

    R-judge: Benchmarking safety risk awareness for LLM agents

    Tongxin Yuan, Zhiwei He, Lingzhong Dong, Yiming Wang, Ruijie Zhao, Tian Xia, Lizhen Xu, Binglin Zhou, Fangqi Li, Zhuosheng Zhang, Rui Wang, and Gongshen Liu. R-judge: Benchmarking safety risk awareness for LLM agents. In Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen, editors,Findings of the Association for Computational Linguistics: EMNLP 2024, pages ...

  75. [75]

    InjecAgent: Benchmarking indirect prompt injections in tool-integrated large language model agents

    Qiusi Zhan, Zhixiang Liang, Zifan Ying, and Daniel Kang. InjecAgent: Benchmarking indirect prompt injections in tool-integrated large language model agents. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors,Findings of the Association for Computational Linguistics: ACL 2024, pages 10471–10506, Bangkok, Thailand, August 2024. Association for Comput...

  76. [76]

    Agentracer: Who is inducing failure in the LLM agentic systems? InThe Fourteenth International Conference on Learning Representations, 2026

    Guibin Zhang, Junhao Wang, Junjie Chen, Wangchunshu Zhou, Kun Wang, and Shuicheng YAN. Agentracer: Who is inducing failure in the LLM agentic systems? InThe Fourteenth International Conference on Learning Representations, 2026

  77. [77]

    Which agent causes task failures and when? On automated failure attribution of LLM multi-agent systems

    Shaokun Zhang, Ming Yin, Jieyu Zhang, Jiale Liu, Zhiguang Han, Jingyang Zhang, Beibin Li, Chi Wang, Huazheng Wang, Yiran Chen, and Qingyun Wu. Which agent causes task failures and when? On automated failure attribution of LLM multi-agent systems. In Aarti Singh, Maryam Fazel, Daniel Hsu, Simon Lacoste-Julien, Felix Berkenkamp, Tegan Maharaj, Kiri Wagstaff...

  78. [78]

    Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Tianyue Ou, Yonatan Bisk, Daniel Fried, Uri Alon, and Graham Neubig

    Shuyan Zhou, Frank F. Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Tianyue Ou, Yonatan Bisk, Daniel Fried, Uri Alon, and Graham Neubig. Webarena: A realistic web environment for building autonomous agents. InThe Twelfth International Conference on Learning Representations, 2024. 14 A Implementation Details of Neuro-Symbolic Trial Thi...