Responsible Agentic AI Requires Explicit Provenance

Jinwei Hu; Qisong He; Xiaowei Huang; Xinmiao Huang; Yi Dong; Youcheng Sun

arxiv: 2605.17169 · v1 · pith:TNEOYDE4new · submitted 2026-05-16 · 💻 cs.AI · cs.CL· cs.MA

Responsible Agentic AI Requires Explicit Provenance

Jinwei Hu , Xinmiao Huang , Qisong He , Youcheng Sun , Yi Dong , Xiaowei Huang This is my paper

Pith reviewed 2026-05-20 14:03 UTC · model grok-4.3

classification 💻 cs.AI cs.CLcs.MA

keywords agentic AIprovenanceresponsibility attributioncausal attributionAI lifecycleAI safetymulti-agent systemsaccountability

0 comments

The pith

Explicit provenance across the full agentic lifecycle is the necessary condition for making responsibility in AI computable and actionable.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that agentic AI is spreading into domains like software engineering while public trust lags because responsibility stays subjective when harms arise from compositions no single party designed. Current frameworks produce no quantifiable, traceable, or interventionable records that would let stakeholders assign accountability. The authors argue that embedding explicit provenance throughout the lifecycle supplies the missing structural basis, allowing responsibility to shift from discussion to computation via formal mechanisms and practical implementation across layers.

Core claim

Explicit provenance is not optional but the necessary condition for responsible agentic AI, as only it supplies the quantifiable, traceable, and interventionable data needed to assign responsibility when harm emerges from agent compositions no single party designed.

What carries the argument

Explicit provenance, encoded through a causal attribution function and responsibility tensor and maintained across four lifecycle layers to support online estimation and intervention.

If this is right

Responsibility gaps across sociotechnical dimensions become identifiable once provenance records are available.
Provenance becomes estimable and interveneable online in preliminary experiments before irreversible harm accumulates.
A concrete agentic incident can be analyzed to determine which parties bear responsibility.
No stakeholder in the agentic AI ecosystem can treat explicit provenance as discretionary.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Mandates for provenance logging could emerge in future AI regulations to enforce accountability.
Performance or privacy costs of maintaining detailed provenance would need separate measurement in deployed systems.
The same traceability approach might apply to harms in non-agentic AI that involve chained decisions.
Automated tools for real-time responsibility scoring could be built on top of the proposed tensor.

Load-bearing premise

That current agentic systems generate no usable provenance today and that adding explicit provenance will directly render responsibility computable without extra mechanisms or major trade-offs.

What would settle it

A concrete multi-agent incident in which full explicit provenance is recorded yet responsibility for resulting harm still cannot be assigned to any stakeholder or intervened upon before damage occurs.

Figures

Figures reproduced from arXiv: 2605.17169 by Jinwei Hu, Qisong He, Xiaowei Huang, Xinmiao Huang, Yi Dong, Youcheng Sun.

**Figure 2.** Figure 2: Online causal signal is estimable from agent execution prefixes, supporting required properties for responsible agent. AUPRC measures how well a monitor identifies failing trajectories before harm materializes. NeSy monitors substantially outperform random and zero-shot LLM baselines. We report preliminary experiments targeting the L2 with detailed implementation details in Appendix A, where online prove… view at source ↗

read the original abstract

Agentic AI is rapidly proliferating across diverse real-world domains such as software engineering, yet public trust has not kept pace. The central reason is that responsibility, despite being widely discussed, remains a subjective and unenforced concept, as no current agentic framework produces the quantifiable, traceable, and interventionable provenance needed to assign it when harm emerges from compositions no single party designed. We position that what is missing is not better benchmark-level evaluation but $\textbf{explicit provenance}$ across the full agentic lifecycle, which is the only viable basis for making responsibility computable and actionable. We advance this agenda along four axes: establishing $\textit{why}$ such provenance is a structural necessity by identifying responsibility gaps across sociotechnical dimensions, formalizing $\textit{what}$ it must encode through a causal attribution function and responsibility tensor, discussing $\textit{how}$ it can be made computable across four lifecycle layers with preliminary experiments showing that provenance is estimable and interveneable online before irreversible harm accumulates, and examining $\textit{who}$ bears responsibility through a concrete agentic incident. Explicit provenance is not a discretionary refinement but the necessary condition for responsible agentic AI, and no stakeholder across its ecosystem can afford to treat it as optional.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This position paper claims explicit provenance is required to make responsibility computable in agentic AI, but the case rests on conceptual framing and preliminary experiments that leave key assumptions untested.

read the letter

The central claim is that current agentic systems lack the traceable, quantifiable provenance needed to assign responsibility when harm arises from agent compositions with no single designer, and that adding explicit provenance across the lifecycle is the fix. The paper advances this by breaking the issue into four parts: identifying sociotechnical responsibility gaps, formalizing the requirement with a causal attribution function and responsibility tensor, sketching implementation across four lifecycle layers, and walking through a concrete incident example. The early experiments cited for online estimability and interveneability add a bit of grounding that pure position papers often skip. That structure is the main thing the work contributes, and it does organize the discussion more clearly than many accountability pieces. The formal constructs at least give readers something specific to push on rather than vague calls for better ethics. The soft spots are in the support for the necessity claim. The experiments are described as preliminary, so it is not clear how well the proposed structures handle scale, emergent interactions, or cases where multiple agents contribute to harm without obvious causal cutoffs. There is also the risk that the responsibility tensor and attribution function end up defined in ways that make computability true by construction instead of matching independent external benchmarks. Trade-offs around performance, privacy, or added complexity are not examined in any detail. This is aimed at researchers and engineers working on AI deployment standards and governance, particularly in domains like software engineering agents. A reader already thinking about provenance tracking or regulatory approaches would find the framing useful as a starting point for discussion. It deserves a serious referee because the topic is timely and the four-axis structure gives referees concrete sections to evaluate, even if the empirical side needs more work.

Referee Report

3 major / 2 minor

Summary. The manuscript claims that current agentic AI frameworks fail to produce quantifiable, traceable, and interventionable provenance, leaving responsibility subjective and unenforced when harm emerges from compositions with no single designer. It positions explicit provenance across the full lifecycle as the necessary condition for making responsibility computable, formalized via a causal attribution function and responsibility tensor. The work advances this via four axes: sociotechnical responsibility gaps, formal encoding, computability across four lifecycle layers (with preliminary experiments on online estimability and interveneability), and illustration through a concrete agentic incident.

Significance. If the proposed provenance structures can be shown to suffice without unstated mechanisms or prohibitive trade-offs, the framework could provide a concrete basis for accountability in multi-agent systems, addressing a core barrier to trust in deployed agentic AI. The four-axis structure offers a useful organizing agenda, and the preliminary experiments on estimability provide an initial empirical foothold.

major comments (3)

[Abstract] Abstract and central positioning paragraph: the claim that explicit provenance is 'the only viable basis for making responsibility computable' is load-bearing yet rests on the unverified premise that the causal attribution function and responsibility tensor suffice for emergent interactions in compositions with no single designer; the cited preliminary experiments demonstrate estimability and interveneability but do not test generalization or elimination of supplementary causal assumptions.
[Formalization section] Formalization of the responsibility tensor and causal attribution function (in the 'what' axis section): these constructs risk circularity because they appear defined primarily with reference to the desired responsibility outcomes rather than independent external benchmarks or falsifiable criteria, which weakens the assertion that they render responsibility computable.
[Lifecycle layers section] Discussion of computability across four lifecycle layers (in the 'how' axis section): while preliminary experiments are reported as showing online estimability and interveneability before irreversible harm, the manuscript does not examine scalability limits, performance/privacy trade-offs, or additional mechanisms needed when harm arises from agent interactions, leaving the necessity claim under-supported.

minor comments (2)

[Notation and definitions] The introduction of novel terms such as 'responsibility tensor' would be clarified by an explicit comparison to related concepts in causal inference and data provenance literature.
[Incident analysis] Ensure the concrete agentic incident example includes sufficient detail on the four lifecycle layers to allow readers to trace the provenance encoding.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their insightful comments, which have helped us identify areas for improvement in the manuscript. We address each of the major comments point by point below, providing clarifications and indicating planned revisions where necessary.

read point-by-point responses

Referee: [Abstract] Abstract and central positioning paragraph: the claim that explicit provenance is 'the only viable basis for making responsibility computable' is load-bearing yet rests on the unverified premise that the causal attribution function and responsibility tensor suffice for emergent interactions in compositions with no single designer; the cited preliminary experiments demonstrate estimability and interveneability but do not test generalization or elimination of supplementary causal assumptions.

Authors: The manuscript argues that explicit provenance is necessary due to the structural responsibility gaps in existing agentic AI systems, as detailed in the 'why' axis. The causal attribution function draws from established causal inference methods, and the responsibility tensor provides a formal structure for aggregation. While we recognize that the preliminary experiments are limited and do not fully test generalization across all emergent interactions, the claim is positioned as a necessary condition rather than a complete sufficiency proof. We will revise the abstract to clarify this distinction and add a new subsection on assumptions and limitations to better support the positioning. revision: partial
Referee: [Formalization section] Formalization of the responsibility tensor and causal attribution function (in the 'what' axis section): these constructs risk circularity because they appear defined primarily with reference to the desired responsibility outcomes rather than independent external benchmarks or falsifiable criteria, which weakens the assertion that they render responsibility computable.

Authors: We maintain that the formalization avoids circularity. The causal attribution function is specified using interventionist causal models (e.g., via do-operators on agent interaction graphs), which are defined independently of responsibility outcomes. The responsibility tensor then operationalizes these attributions into a computable form. To prevent any misinterpretation of circularity, we will include additional explanations linking to falsifiable criteria from causal discovery literature and external benchmarks in the revised formalization section. revision: yes
Referee: [Lifecycle layers section] Discussion of computability across four lifecycle layers (in the 'how' axis section): while preliminary experiments are reported as showing online estimability and interveneability before irreversible harm, the manuscript does not examine scalability limits, performance/privacy trade-offs, or additional mechanisms needed when harm arises from agent interactions, leaving the necessity claim under-supported.

Authors: We agree with this observation. The current experiments focus on demonstrating basic online estimability and interveneability in controlled settings. The manuscript does not delve into scalability or specific trade-offs, which are indeed important for real-world applicability, especially in multi-agent scenarios. We will revise the 'how' axis to include an expanded discussion of these aspects, potential performance and privacy implications, and proposed mechanisms for handling emergent interactions, along with directions for future empirical work. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the derivation chain

full rationale

The paper identifies gaps in current agentic frameworks (no quantifiable/traceable/interventionable provenance for harm from multi-party compositions) and positions explicit provenance as the necessary condition for making responsibility computable. It advances this via sociotechnical analysis, formalization through a causal attribution function and responsibility tensor, lifecycle-layer discussion, and a concrete incident example. No equations, self-citations, or definitions are present in the provided text that reduce the central claim to its inputs by construction (e.g., no fitted parameter renamed as prediction, no self-definitional loop where the tensor is defined solely in terms of computability). The formal constructs are introduced as an independent proposal rather than a renaming or self-referential fit. The argument remains self-contained against external benchmarks of responsibility gaps and does not rely on load-bearing self-citation chains or ansatzes smuggled from prior work.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The paper rests on the domain assumption that no current framework produces the required provenance and introduces new constructs without independent evidence or derivations from external benchmarks.

axioms (1)

domain assumption Current agentic frameworks produce no quantifiable, traceable, and interventionable provenance for responsibility assignment.
Stated as the central reason public trust has not kept pace with proliferation.

invented entities (2)

responsibility tensor no independent evidence
purpose: To formalize what provenance must encode for computable responsibility.
Introduced in the what axis as part of the formalization.
causal attribution function no independent evidence
purpose: To enable attribution across agent compositions for responsibility.
Defined alongside the responsibility tensor in the formalization step.

pith-pipeline@v0.9.0 · 5759 in / 1304 out tokens · 45270 ms · 2026-05-20T14:03:28.130644+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Definition 4.1 (Causal Contribution) … κ(p, ω, τ) = Pr[ω|τ]−Pr[ω|τ−p]
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Definition 4.5 (Responsibility Tensor) … R[p, ω, dk]

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

78 extracted references · 78 canonical work pages · 8 internal anchors

[1]

GPT-4 Technical Report

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[2]

Frontier ai regulation: Managing emerging risks to public safety, 2023

Markus Anderljung, Joslyn Barnhart, Anton Korinek, Jade Leung, Cullen O’Keefe, Jess Whittle- stone, Shahar Avin, Miles Brundage, Justin Bullock, Duncan Cass-Beggs, Ben Chang, Tantum Collins, Tim Fist, Gillian Hadfield, Alan Hayes, Lewis Ho, Sara Hooker, Eric Horvitz, Noam Kolt, Jonas Schuett, Yonadav Shavit, Divya Siddarth, Robert Trager, and Kevin Wolf. ...

work page 2023
[3]

Agentharm: A benchmark for measuring harmfulness of llm agents

Maksym Andriushchenko, Alexandra Souly, Mateusz Dziemian, Derek Duenas, Maxwell Lin, Justin Wang, Dan Hendrycks, Andy Zou, J Zico Kolter, Matt Fredrikson, et al. Agentharm: A benchmark for measuring harmfulness of llm agents. InThe Thirteenth International Conference on Learning Representations

work page
[4]

Conformal risk control

Anastasios Nikolas Angelopoulos, Stephen Bates, Adam Fisch, Lihua Lei, and Tal Schuster. Conformal risk control. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024
[5]

Tool use with claude, 2026

Anthropic. Tool use with claude, 2026. Claude API documentation

work page 2026
[6]

The conclusion of contracts by software agents in the eyes of the law

Tina Balke and Torsten Eymann. The conclusion of contracts by software agents in the eyes of the law. InProceedings of the 7th international joint conference on Autonomous agents and multiagent systems-Volume 2, pages 771–778, 2008

work page 2008
[7]

$\tau^2$-Bench: Evaluating Conversational Agents in a Dual-Control Environment

Victor Barres, Honghua Dong, Soham Ray, Xujie Si, and Karthik Narasimhan. τ 2-Bench: Evaluating conversational agents in a dual-control environment.arXiv:2506.07982, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[8]

Picking on the same person: Does algorithmic monoculture lead to outcome homogenization?Advances in neural information processing systems, 35:3663–3678, 2022

Rishi Bommasani, Kathleen A Creel, Ananya Kumar, Dan Jurafsky, and Percy S Liang. Picking on the same person: Does algorithmic monoculture lead to outcome homogenization?Advances in neural information processing systems, 35:3663–3678, 2022

work page 2022
[9]

Language models are few-shot learners.Advances in neural information processing systems, 33:1877–1901, 2020

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners.Advances in neural information processing systems, 33:1877–1901, 2020

work page 1901
[10]

Harms from increasingly agentic algorithmic systems

Alan Chan, Rebecca Salganik, Alva Markelius, Chris Pang, Nitarshan Rajkumar, Dmitrii Krasheninnikov, Lauro Langosco, Zhonghao He, Yawen Duan, Micah Carroll, et al. Harms from increasingly agentic algorithmic systems. InProceedings of the 2023 ACM conference on fairness, accountability, and transparency, pages 651–666, 2023

work page 2023
[11]

A survey on trust modeling.ACM Computing Surveys (CSUR), 48(2):1–40, 2015

Jin-Hee Cho, Kevin Chan, and Sibel Adali. A survey on trust modeling.ACM Computing Surveys (CSUR), 48(2):1–40, 2015

work page 2015
[12]

Understanding accountability in algorithmic supply chains

Jennifer Cobbe, Michael Veale, and Jatinder Singh. Understanding accountability in algorithmic supply chains. InProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, pages 1186–1197, 2023

work page 2023
[13]

Proposal for a directive of the european parliament and of the council on adapting noncontractual civil liability rules to artificial intelligence (ai liability directive)

EU Commission et al. Proposal for a directive of the european parliament and of the council on adapting noncontractual civil liability rules to artificial intelligence (ai liability directive). European Commission, 2022

work page 2022
[14]

Ai governance: a research agenda.Governance of AI Program, Future of Humanity Institute, University of Oxford: Oxford, UK, 1442:1443, 2018

Allan Dafoe. Ai governance: a research agenda.Governance of AI Program, Future of Humanity Institute, University of Oxford: Oxford, UK, 1442:1443, 2018

work page 2018
[15]

Business and it leaders report ai agents are scaling faster than their guardrails, 2026

Deloitte. Business and it leaders report ai agents are scaling faster than their guardrails, 2026

work page 2026
[16]

Safeguarding large language models: A survey.Artificial intelligence review, 58(12):382, 2025

Yi Dong, Ronghui Mu, Yanghao Zhang, Siqi Sun, Tianle Zhang, Changshun Wu, Gaojie Jin, Yi Qi, Jinwei Hu, Jie Meng, et al. Safeguarding large language models: A survey.Artificial intelligence review, 58(12):382, 2025. 10

work page 2025
[17]

Accountability of ai under the law: The role of explanation,

Finale Doshi-Velez, Mason Kortz, Ryan Budish, Chris Bavitz, Sam Gershman, David O’Brien, Kate Scott, Stuart Schieber, James Waldo, David Weinberger, et al. Accountability of ai under the law: The role of explanation.arXiv preprint arXiv:1711.01134, 2017

work page arXiv 2017
[18]

Genai against humanity: Nefarious applications of generative artificial intel- ligence and large language models.Journal of Computational Social Science, 7(1):549–569, 2024

Emilio Ferrara. Genai against humanity: Nefarious applications of generative artificial intel- ligence and large language models.Journal of Computational Social Science, 7(1):549–569, 2024

work page 2024
[19]

arXiv preprint arXiv:2404.16244 (2024).https://doi.org/10.48550/arXiv.2404.16244

Iason Gabriel, Arianna Manzini, Geoff Keeling, Lisa Anne Hendricks, Verena Rieser, Hasan Iqbal, Nenad Tomašev, Ira Ktena, Zachary Kenton, Mikel Rodriguez, et al. The ethics of advanced ai assistants.arXiv preprint arXiv:2404.16244, 2024

work page arXiv 2024
[20]

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

Deep Ganguli, Liane Lovitt, Jackson Kernion, Amanda Askell, Yuntao Bai, Saurav Kadavath, Ben Mann, Ethan Perez, Nicholas Schiefer, Kamal Ndousse, et al. Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned.arXiv preprint arXiv:2209.07858, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[21]

Neurosymbolic ai: The 3 rd wave.Artificial Intelligence Review, 56(11):12387–12406, 2023

Artur d’Avila Garcez and Luis C Lamb. Neurosymbolic ai: The 3 rd wave.Artificial Intelligence Review, 56(11):12387–12406, 2023

work page 2023
[22]

Causal abstractions of neural networks.Advances in neural information processing systems, 34:9574–9586, 2021

Atticus Geiger, Hanson Lu, Thomas Icard, and Christopher Potts. Causal abstractions of neural networks.Advances in neural information processing systems, 34:9574–9586, 2021

work page 2021
[23]

Skillprobe: Security auditing for emerging agent skill marketplaces via multi-agent collaboration

Zihan Guo, Zhiyu Chen, Xiaohang Nie, Jianghao Lin, Yuanjian Zhou, and Weinan Zhang. Skillprobe: Security auditing for emerging agent skill marketplaces via multi-agent collaboration. arXiv preprint arXiv:2603.21019, 2026

work page arXiv 2026
[24]

Artificial intelligence in health care: accountability and safety.Bulletin of the World Health Organization, 98(4):251, 2020

Ibrahim Habli, Tom Lawton, and Zoe Porter. Artificial intelligence in health care: accountability and safety.Bulletin of the World Health Organization, 98(4):251, 2020

work page 2020
[25]

Metagpt: Meta programming for a multi-agent collaborative framework

Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, et al. Metagpt: Meta programming for a multi-agent collaborative framework. InThe twelfth international conference on learning representations, 2023

work page 2023
[26]

Ramchurn, and Xiaowei Huang

Jinwei Hu, Yi Dong, Shuang Ao, Zhuoyun Li, Boxuan Wang, Lokesh Singh, Guangliang Cheng, Sarvapali D. Ramchurn, and Xiaowei Huang. Stop reducing responsibility in llm-powered multi-agent systems to local alignment, 2025

work page 2025
[27]

Enhancing robustness of llm- driven multi-agent systems through randomized smoothing.Chinese Journal of Aeronautics, page 103779, 2025

Jinwei HU, Yi DONG, Zhengtao DING, and Xiaowei HUANG. Enhancing robustness of llm- driven multi-agent systems through randomized smoothing.Chinese Journal of Aeronautics, page 103779, 2025

work page 2025
[28]

Tapas are free! training-free adaptation of programmatic agents via llm-guided program synthesis in dynamic environments

Jinwei Hu, Yi Dong, Youcheng Sun, and Xiaowei Huang. Tapas are free! training-free adaptation of programmatic agents via llm-guided program synthesis in dynamic environments. Proceedings of the AAAI Conference on Artificial Intelligence, 40(35):29477–29485, Mar. 2026

work page 2026
[29]

Lying with truths: Open-channel multi-agent collusion for belief manipulation via generative montage, 2026

Jinwei Hu, Xinmiao Huang, Youcheng Sun, Yi Dong, and Xiaowei Huang. Lying with truths: Open-channel multi-agent collusion for belief manipulation via generative montage, 2026

work page 2026
[30]

Hurst and Nicole D

Kristin F. Hurst and Nicole D. Sintov. Trusting autonomous vehicles as moral agents improves related policy support.Frontiers in Psychology, V olume 13 - 2022, 2022

work page 2022
[31]

SoK: Agentic Skills -- Beyond Tool Use in LLM Agents

Yanna Jiang, Delong Li, Haiyu Deng, Baihe Ma, Xu Wang, Qin Wang, and Guangsheng Yu. Sok: Agentic skills–beyond tool use in llm agents.arXiv preprint arXiv:2602.20867, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[32]

Os-harm: A benchmark for measuring safety of computer use agents

Thomas Kuntz, Agatha Duzan, Hao Zhao, Francesco Croce, J Zico Kolter, Nicolas Flammarion, and Maksym Andriushchenko. Os-harm: A benchmark for measuring safety of computer use agents. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track

work page
[33]

Large language models portray socially subordinate groups as more homogeneous, consistent with a bias observed in humans

Messi HJ Lee, Jacob M Montgomery, and Calvin K Lai. Large language models portray socially subordinate groups as more homogeneous, consistent with a bias observed in humans. In Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, pages 1321–1340, 2024. 11

work page 2024
[34]

Trustworthy ai: From principles to practices.ACM Computing Surveys, 55(9):1–46, 2023

Bo Li, Peng Qi, Bo Liu, Shuai Di, Jingen Liu, Jiquan Pei, Jinfeng Yi, and Bowen Zhou. Trustworthy ai: From principles to practices.ACM Computing Surveys, 55(9):1–46, 2023

work page 2023
[35]

SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks

Xiangyi Li, Wenbo Chen, Yimin Liu, Shenghan Zheng, Xiaokun Chen, Yifeng He, Yubo Li, Bingran You, Haotian Shen, Jiankai Sun, et al. Skillsbench: Benchmarking how well agent skills work across diverse tasks.arXiv preprint arXiv:2602.12670, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[36]

Holistic evaluation of language models.Transactions on Machine Learning Research

Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Yasunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, Ananya Kumar, et al. Holistic evaluation of language models.Transactions on Machine Learning Research

work page
[37]

Trustworthy ai: A computational perspective.ACM Transactions on Intelligent Systems and Technology, 14(1):1–59, 2022

Haochen Liu, Yiqi Wang, Wenqi Fan, Xiaorui Liu, Yaxin Li, Shaili Jain, Yunhao Liu, Anil Jain, and Jiliang Tang. Trustworthy ai: A computational perspective.ACM Transactions on Intelligent Systems and Technology, 14(1):1–59, 2022

work page 2022
[38]

Out-of- distribution detection: A task-oriented survey of recent advances.ACM Computing Surveys, 58(2):1–39, 2025

Shuo Lu, Yingsheng Wang, Lijun Sheng, Lingxiao He, Aihua Zheng, and Jian Liang. Out-of- distribution detection: A task-oriented survey of recent advances.ACM Computing Surveys, 58(2):1–39, 2025

work page 2025
[39]

The responsibility gap: Ascribing responsibility for the actions of learning automata.Ethics and information technology, 6(3):175–183, 2004

Andreas Matthias. The responsibility gap: Ascribing responsibility for the actions of learning automata.Ethics and information technology, 6(3):175–183, 2004

work page 2004
[40]

The state of ai in 2025: Agents, innovation, and transformation, 2025

McKinsey & Company. The state of ai in 2025: Agents, innovation, and transformation, 2025

work page 2025
[41]

State of ai trust in 2026: Shifting to the agentic era, 2026

McKinsey & Company. State of ai trust in 2026: Shifting to the agentic era, 2026

work page 2026
[42]

Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces

Mike A Merrill, Alexander G Shaw, Nicholas Carlini, Boxuan Li, Harsh Raj, Ivan Bercovich, Lin Shi, Jeong Yeon Shin, Thomas Walshe, E Kelly Buchanan, et al. Terminal-bench: Benchmarking agents on hard, realistic tasks in command line interfaces.arXiv preprint arXiv:2601.11868, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[43]

Exploring the potential of llms as personalized assistants: Dataset, evaluation, and analysis

Jisoo Mok, Ik-hwan Kim, Sangkwon Park, and Sungroh Yoon. Exploring the potential of llms as personalized assistants: Dataset, evaluation, and analysis. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 10212–10239, 2025

work page 2025
[44]

Accountability in artificial intelli- gence: what it is and how it works.Ai & Society, 39(4):1871–1882, 2024

Claudio Novelli, Mariarosaria Taddeo, and Luciano Floridi. Accountability in artificial intelli- gence: what it is and how it works.Ai & Society, 39(4):1871–1882, 2024

work page 2024
[45]

Audit trails for accountability in large language models.arXiv preprint arXiv:2601.20727, 2026

Victor Ojewale, Harini Suresh, and Suresh Venkatasubramanian. Audit trails for accountability in large language models.arXiv preprint arXiv:2601.20727, 2026

work page arXiv 2026
[46]

Contracting by artificial intelligence: Open offers, unilateral mistakes, and why algorithms are not agents.ANU Journal of Law and Technology, 2(1):45–87, 2021

Matthew Oliver. Contracting by artificial intelligence: Open offers, unilateral mistakes, and why algorithms are not agents.ANU Journal of Law and Technology, 2(1):45–87, 2021

work page 2021
[47]

Introducing operator, 2025

OpenAI. Introducing operator, 2025

work page 2025
[48]

New tools for building agents, 2025

OpenAI. New tools for building agents, 2025

work page 2025
[49]

Clawhub: Skill directory for openclaw, 2026

OpenClaw. Clawhub: Skill directory for openclaw, 2026

work page 2026
[50]

Openclaw: Personal ai assistant, 2026

OpenClaw. Openclaw: Personal ai assistant, 2026

work page 2026
[51]

Generative agents: Interactive simulacra of human behavior

Joon Sung Park, Joseph O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. Generative agents: Interactive simulacra of human behavior. InProceed- ings of the 36th annual acm symposium on user interface software and technology, pages 1–22, 2023

work page 2023
[52]

MIT press, 2017

Jonas Peters, Dominik Janzing, and Bernhard Scholkopf.Elements of causal inference: founda- tions and learning algorithms. MIT press, 2017

work page 2017
[53]

Unravelling responsibility for ai.Journal of Responsible Technology, page 100124, 2025

Zoe Porter, Philippa Ryan, Phillip Morgan, Joanna Al-Qaddoumi, Bernard Twomey, Paul Noordhof, John McDermid, and Ibrahim Habli. Unravelling responsibility for ai.Journal of Responsible Technology, page 100124, 2025. 12

work page 2025
[54]

Pwc’s ai agent survey, 2025

PwC. Pwc’s ai agent survey, 2025

work page 2025
[55]

Closing the ai accountability gap: Defining an end-to-end framework for internal algorithmic auditing

Inioluwa Deborah Raji, Andrew Smart, Rebecca N White, Margaret Mitchell, Timnit Gebru, Ben Hutchinson, Jamila Smith-Loud, Daniel Theron, and Parker Barnes. Closing the ai accountability gap: Defining an end-to-end framework for internal algorithmic auditing. In Proceedings of the 2020 conference on fairness, accountability, and transparency, pages 33–44, 2020

work page 2020
[56]

Identifying the risks of lm agents with an lm-emulated sandbox

Yangjun Ruan, Honghua Dong, Andrew Wang, Silviu Pitis, Yongchao Zhou, Jimmy Ba, Yann Dubois, Chris J Maddison, and Tatsunori Hashimoto. Identifying the risks of lm agents with an lm-emulated sandbox. InThe Twelfth International Conference on Learning Representations

work page
[57]

Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead.Nature machine intelligence, 1(5):206–215, 2019

Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead.Nature machine intelligence, 1(5):206–215, 2019

work page 2019
[58]

Toolformer: Language models can teach themselves to use tools.Advances in neural information processing systems, 36:68539– 68551, 2023

Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools.Advances in neural information processing systems, 36:68539– 68551, 2023

work page 2023
[59]

The 2025 AI Agent Index: Documenting Technical and Safety Features of Deployed Agentic AI Systems

Leon Staufer, Kevin Feng, Kevin Wei, Luke Bailey, Yawen Duan, Mick Yang, A Pinar Ozisik, Stephen Casper, and Noam Kolt. The 2025 ai agent index: Documenting technical and safety features of deployed agentic ai systems.arXiv preprint arXiv:2602.17753, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2025
[60]

Accountability in offline reinforcement learning: Explaining decisions with a corpus of examples.Advances in Neural Information Processing Systems, 36:3143–3172, 2023

Hao Sun, Alihan Hüyük, Daniel Jarrett, and Mihaela van der Schaar. Accountability in offline reinforcement learning: Explaining decisions with a corpus of examples.Advances in Neural Information Processing Systems, 36:3143–3172, 2023

work page 2023
[61]

Llama 2: Open Foundation and Fine-Tuned Chat Models

Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. Llama 2: Open foundation and fine-tuned chat models.arXiv preprint arXiv:2307.09288, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[62]

Find the gap: Ai, responsible agency and vulnerability

Shannon Vallor and Tillmann Vierkant. Find the gap: Ai, responsible agency and vulnerability. Minds and Machines, 34(3):20, 2024

work page 2024
[63]

Springer Science & Business Media, 2011

Nicole A Vincent, Ibo Van de Poel, and Jeroen Van Den Hoven.Moral responsibility: Beyond free will and determinism. Springer Science & Business Media, 2011

work page 2011
[64]

Machines without principals: liability rules and artificial intelligence.Wash

David C Vladeck. Machines without principals: liability rules and artificial intelligence.Wash. L. Rev., 89:117, 2014

work page 2014
[65]

A survey on large language model based autonomous agents.Frontiers of Computer Science, 18(6):186345, 2024

Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, et al. A survey on large language model based autonomous agents.Frontiers of Computer Science, 18(6):186345, 2024

work page 2024
[66]

Freematch: Self- adaptive thresholding for semi-supervised learning

Yidong Wang, Hao Chen, Qiang Heng, Wenxin Hou, Yue Fan, Zhen Wu, Jindong Wang, Marios Savvides, Takahiro Shinozaki, Bhiksha Raj, Bernt Schiele, and Xing Xie. Freematch: Self- adaptive thresholding for semi-supervised learning. InThe Eleventh International Conference on Learning Representations, 2023

work page 2023
[67]

Taxonomy of risks posed by language models.Proceedings of the ACM Conference on Fairness, Accountability, and Transparency, pages 214–229, 2022

Laura Weidinger, Jonathan Uesato, Maribeth Rauh, Conor Griffin, Po-Sen Huang, John Mellor, Amelia Glaese, Myra Cheng, Borja Balle, Atoosa Kasirzadeh, et al. Taxonomy of risks posed by language models.Proceedings of the ACM Conference on Fairness, Accountability, and Transparency, pages 214–229, 2022

work page 2022
[68]

What to account for when accounting for algorithms: a systematic literature review on algorithmic accountability

Maranke Wieringa. What to account for when accounting for algorithms: a systematic literature review on algorithmic accountability. InProceedings of the 2020 conference on fairness, accountability, and transparency, pages 1–18, 2020

work page 2020
[69]

Autogen: Enabling next-gen llm applications via multi-agent conversations

Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, et al. Autogen: Enabling next-gen llm applications via multi-agent conversations. InFirst conference on language modeling, 2024. 13

work page 2024
[70]

The rise and potential of large language model based agents: A survey.Science China Information Sciences, 68(2):121101, 2025

Zhiheng Xi, Wenxiang Chen, Xin Guo, Wei He, Yiwen Ding, Boyang Hong, Ming Zhang, Junzhe Wang, Senjie Jin, Enyu Zhou, et al. The rise and potential of large language model based agents: A survey.Science China Information Sciences, 68(2):121101, 2025

work page 2025
[71]

Frank F. Xu, Yufan Song, Boxuan Li, Yuxuan Tang, Kritanjali Jain, Mengxue Bao, Zora Zhiruo Wang, Xuhui Zhou, Zhitong Guo, Murong Cao, Mingyang Yang, Hao Yang Lu, Amaad Martin, Zhe Su, Leander Melroy Maben, Raj Mehta, Wayne Chi, Lawrence Keunho Jang, Yiqing Xie, Shuyan Zhou, and Graham Neubig. Theagentcompany: Benchmarking LLM agents on consequential real ...

work page 2026
[72]

React: Synergizing reasoning and acting in language models

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models. InInternational Conference on Learning Representations (ICLR), 2023

work page 2023
[73]

Long-term fairness with unknown dynamics.Advances in Neural Information Processing Systems, 36:55110–55139, 2023

Tongxin Yin, Reilly Raab, Mingyan Liu, and Yang Liu. Long-term fairness with unknown dynamics.Advances in Neural Information Processing Systems, 36:55110–55139, 2023

work page 2023
[74]

R-judge: Benchmarking safety risk awareness for LLM agents

Tongxin Yuan, Zhiwei He, Lingzhong Dong, Yiming Wang, Ruijie Zhao, Tian Xia, Lizhen Xu, Binglin Zhou, Fangqi Li, Zhuosheng Zhang, Rui Wang, and Gongshen Liu. R-judge: Benchmarking safety risk awareness for LLM agents. In Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen, editors,Findings of the Association for Computational Linguistics: EMNLP 2024, pages ...

work page 2024
[75]

InjecAgent: Benchmarking indirect prompt injections in tool-integrated large language model agents

Qiusi Zhan, Zhixiang Liang, Zifan Ying, and Daniel Kang. InjecAgent: Benchmarking indirect prompt injections in tool-integrated large language model agents. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors,Findings of the Association for Computational Linguistics: ACL 2024, pages 10471–10506, Bangkok, Thailand, August 2024. Association for Comput...

work page 2024
[76]

Agentracer: Who is inducing failure in the LLM agentic systems? InThe Fourteenth International Conference on Learning Representations, 2026

Guibin Zhang, Junhao Wang, Junjie Chen, Wangchunshu Zhou, Kun Wang, and Shuicheng YAN. Agentracer: Who is inducing failure in the LLM agentic systems? InThe Fourteenth International Conference on Learning Representations, 2026

work page 2026
[77]

Which agent causes task failures and when? On automated failure attribution of LLM multi-agent systems

Shaokun Zhang, Ming Yin, Jieyu Zhang, Jiale Liu, Zhiguang Han, Jingyang Zhang, Beibin Li, Chi Wang, Huazheng Wang, Yiran Chen, and Qingyun Wu. Which agent causes task failures and when? On automated failure attribution of LLM multi-agent systems. In Aarti Singh, Maryam Fazel, Daniel Hsu, Simon Lacoste-Julien, Felix Berkenkamp, Tegan Maharaj, Kiri Wagstaff...

work page 2025
[78]

Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Tianyue Ou, Yonatan Bisk, Daniel Fried, Uri Alon, and Graham Neubig

Shuyan Zhou, Frank F. Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Tianyue Ou, Yonatan Bisk, Daniel Fried, Uri Alon, and Graham Neubig. Webarena: A realistic web environment for building autonomous agents. InThe Twelfth International Conference on Learning Representations, 2024. 14 A Implementation Details of Neuro-Symbolic Trial Thi...

work page 2024

[1] [1]

GPT-4 Technical Report

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[2] [2]

Frontier ai regulation: Managing emerging risks to public safety, 2023

Markus Anderljung, Joslyn Barnhart, Anton Korinek, Jade Leung, Cullen O’Keefe, Jess Whittle- stone, Shahar Avin, Miles Brundage, Justin Bullock, Duncan Cass-Beggs, Ben Chang, Tantum Collins, Tim Fist, Gillian Hadfield, Alan Hayes, Lewis Ho, Sara Hooker, Eric Horvitz, Noam Kolt, Jonas Schuett, Yonadav Shavit, Divya Siddarth, Robert Trager, and Kevin Wolf. ...

work page 2023

[3] [3]

Agentharm: A benchmark for measuring harmfulness of llm agents

Maksym Andriushchenko, Alexandra Souly, Mateusz Dziemian, Derek Duenas, Maxwell Lin, Justin Wang, Dan Hendrycks, Andy Zou, J Zico Kolter, Matt Fredrikson, et al. Agentharm: A benchmark for measuring harmfulness of llm agents. InThe Thirteenth International Conference on Learning Representations

work page

[4] [4]

Conformal risk control

Anastasios Nikolas Angelopoulos, Stephen Bates, Adam Fisch, Lihua Lei, and Tal Schuster. Conformal risk control. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024

[5] [5]

Tool use with claude, 2026

Anthropic. Tool use with claude, 2026. Claude API documentation

work page 2026

[6] [6]

The conclusion of contracts by software agents in the eyes of the law

Tina Balke and Torsten Eymann. The conclusion of contracts by software agents in the eyes of the law. InProceedings of the 7th international joint conference on Autonomous agents and multiagent systems-Volume 2, pages 771–778, 2008

work page 2008

[7] [7]

$\tau^2$-Bench: Evaluating Conversational Agents in a Dual-Control Environment

Victor Barres, Honghua Dong, Soham Ray, Xujie Si, and Karthik Narasimhan. τ 2-Bench: Evaluating conversational agents in a dual-control environment.arXiv:2506.07982, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[8] [8]

Picking on the same person: Does algorithmic monoculture lead to outcome homogenization?Advances in neural information processing systems, 35:3663–3678, 2022

Rishi Bommasani, Kathleen A Creel, Ananya Kumar, Dan Jurafsky, and Percy S Liang. Picking on the same person: Does algorithmic monoculture lead to outcome homogenization?Advances in neural information processing systems, 35:3663–3678, 2022

work page 2022

[9] [9]

Language models are few-shot learners.Advances in neural information processing systems, 33:1877–1901, 2020

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners.Advances in neural information processing systems, 33:1877–1901, 2020

work page 1901

[10] [10]

Harms from increasingly agentic algorithmic systems

Alan Chan, Rebecca Salganik, Alva Markelius, Chris Pang, Nitarshan Rajkumar, Dmitrii Krasheninnikov, Lauro Langosco, Zhonghao He, Yawen Duan, Micah Carroll, et al. Harms from increasingly agentic algorithmic systems. InProceedings of the 2023 ACM conference on fairness, accountability, and transparency, pages 651–666, 2023

work page 2023

[11] [11]

A survey on trust modeling.ACM Computing Surveys (CSUR), 48(2):1–40, 2015

Jin-Hee Cho, Kevin Chan, and Sibel Adali. A survey on trust modeling.ACM Computing Surveys (CSUR), 48(2):1–40, 2015

work page 2015

[12] [12]

Understanding accountability in algorithmic supply chains

Jennifer Cobbe, Michael Veale, and Jatinder Singh. Understanding accountability in algorithmic supply chains. InProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, pages 1186–1197, 2023

work page 2023

[13] [13]

Proposal for a directive of the european parliament and of the council on adapting noncontractual civil liability rules to artificial intelligence (ai liability directive)

EU Commission et al. Proposal for a directive of the european parliament and of the council on adapting noncontractual civil liability rules to artificial intelligence (ai liability directive). European Commission, 2022

work page 2022

[14] [14]

Ai governance: a research agenda.Governance of AI Program, Future of Humanity Institute, University of Oxford: Oxford, UK, 1442:1443, 2018

Allan Dafoe. Ai governance: a research agenda.Governance of AI Program, Future of Humanity Institute, University of Oxford: Oxford, UK, 1442:1443, 2018

work page 2018

[15] [15]

Business and it leaders report ai agents are scaling faster than their guardrails, 2026

Deloitte. Business and it leaders report ai agents are scaling faster than their guardrails, 2026

work page 2026

[16] [16]

Safeguarding large language models: A survey.Artificial intelligence review, 58(12):382, 2025

Yi Dong, Ronghui Mu, Yanghao Zhang, Siqi Sun, Tianle Zhang, Changshun Wu, Gaojie Jin, Yi Qi, Jinwei Hu, Jie Meng, et al. Safeguarding large language models: A survey.Artificial intelligence review, 58(12):382, 2025. 10

work page 2025

[17] [17]

Accountability of ai under the law: The role of explanation,

Finale Doshi-Velez, Mason Kortz, Ryan Budish, Chris Bavitz, Sam Gershman, David O’Brien, Kate Scott, Stuart Schieber, James Waldo, David Weinberger, et al. Accountability of ai under the law: The role of explanation.arXiv preprint arXiv:1711.01134, 2017

work page arXiv 2017

[18] [18]

Genai against humanity: Nefarious applications of generative artificial intel- ligence and large language models.Journal of Computational Social Science, 7(1):549–569, 2024

Emilio Ferrara. Genai against humanity: Nefarious applications of generative artificial intel- ligence and large language models.Journal of Computational Social Science, 7(1):549–569, 2024

work page 2024

[19] [19]

arXiv preprint arXiv:2404.16244 (2024).https://doi.org/10.48550/arXiv.2404.16244

Iason Gabriel, Arianna Manzini, Geoff Keeling, Lisa Anne Hendricks, Verena Rieser, Hasan Iqbal, Nenad Tomašev, Ira Ktena, Zachary Kenton, Mikel Rodriguez, et al. The ethics of advanced ai assistants.arXiv preprint arXiv:2404.16244, 2024

work page arXiv 2024

[20] [20]

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

Deep Ganguli, Liane Lovitt, Jackson Kernion, Amanda Askell, Yuntao Bai, Saurav Kadavath, Ben Mann, Ethan Perez, Nicholas Schiefer, Kamal Ndousse, et al. Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned.arXiv preprint arXiv:2209.07858, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[21] [21]

Neurosymbolic ai: The 3 rd wave.Artificial Intelligence Review, 56(11):12387–12406, 2023

Artur d’Avila Garcez and Luis C Lamb. Neurosymbolic ai: The 3 rd wave.Artificial Intelligence Review, 56(11):12387–12406, 2023

work page 2023

[22] [22]

Causal abstractions of neural networks.Advances in neural information processing systems, 34:9574–9586, 2021

Atticus Geiger, Hanson Lu, Thomas Icard, and Christopher Potts. Causal abstractions of neural networks.Advances in neural information processing systems, 34:9574–9586, 2021

work page 2021

[23] [23]

Skillprobe: Security auditing for emerging agent skill marketplaces via multi-agent collaboration

Zihan Guo, Zhiyu Chen, Xiaohang Nie, Jianghao Lin, Yuanjian Zhou, and Weinan Zhang. Skillprobe: Security auditing for emerging agent skill marketplaces via multi-agent collaboration. arXiv preprint arXiv:2603.21019, 2026

work page arXiv 2026

[24] [24]

Artificial intelligence in health care: accountability and safety.Bulletin of the World Health Organization, 98(4):251, 2020

Ibrahim Habli, Tom Lawton, and Zoe Porter. Artificial intelligence in health care: accountability and safety.Bulletin of the World Health Organization, 98(4):251, 2020

work page 2020

[25] [25]

Metagpt: Meta programming for a multi-agent collaborative framework

Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, et al. Metagpt: Meta programming for a multi-agent collaborative framework. InThe twelfth international conference on learning representations, 2023

work page 2023

[26] [26]

Ramchurn, and Xiaowei Huang

Jinwei Hu, Yi Dong, Shuang Ao, Zhuoyun Li, Boxuan Wang, Lokesh Singh, Guangliang Cheng, Sarvapali D. Ramchurn, and Xiaowei Huang. Stop reducing responsibility in llm-powered multi-agent systems to local alignment, 2025

work page 2025

[27] [27]

Enhancing robustness of llm- driven multi-agent systems through randomized smoothing.Chinese Journal of Aeronautics, page 103779, 2025

Jinwei HU, Yi DONG, Zhengtao DING, and Xiaowei HUANG. Enhancing robustness of llm- driven multi-agent systems through randomized smoothing.Chinese Journal of Aeronautics, page 103779, 2025

work page 2025

[28] [28]

Tapas are free! training-free adaptation of programmatic agents via llm-guided program synthesis in dynamic environments

Jinwei Hu, Yi Dong, Youcheng Sun, and Xiaowei Huang. Tapas are free! training-free adaptation of programmatic agents via llm-guided program synthesis in dynamic environments. Proceedings of the AAAI Conference on Artificial Intelligence, 40(35):29477–29485, Mar. 2026

work page 2026

[29] [29]

Lying with truths: Open-channel multi-agent collusion for belief manipulation via generative montage, 2026

Jinwei Hu, Xinmiao Huang, Youcheng Sun, Yi Dong, and Xiaowei Huang. Lying with truths: Open-channel multi-agent collusion for belief manipulation via generative montage, 2026

work page 2026

[30] [30]

Hurst and Nicole D

Kristin F. Hurst and Nicole D. Sintov. Trusting autonomous vehicles as moral agents improves related policy support.Frontiers in Psychology, V olume 13 - 2022, 2022

work page 2022

[31] [31]

SoK: Agentic Skills -- Beyond Tool Use in LLM Agents

Yanna Jiang, Delong Li, Haiyu Deng, Baihe Ma, Xu Wang, Qin Wang, and Guangsheng Yu. Sok: Agentic skills–beyond tool use in llm agents.arXiv preprint arXiv:2602.20867, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[32] [32]

Os-harm: A benchmark for measuring safety of computer use agents

Thomas Kuntz, Agatha Duzan, Hao Zhao, Francesco Croce, J Zico Kolter, Nicolas Flammarion, and Maksym Andriushchenko. Os-harm: A benchmark for measuring safety of computer use agents. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track

work page

[33] [33]

Large language models portray socially subordinate groups as more homogeneous, consistent with a bias observed in humans

Messi HJ Lee, Jacob M Montgomery, and Calvin K Lai. Large language models portray socially subordinate groups as more homogeneous, consistent with a bias observed in humans. In Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, pages 1321–1340, 2024. 11

work page 2024

[34] [34]

Trustworthy ai: From principles to practices.ACM Computing Surveys, 55(9):1–46, 2023

Bo Li, Peng Qi, Bo Liu, Shuai Di, Jingen Liu, Jiquan Pei, Jinfeng Yi, and Bowen Zhou. Trustworthy ai: From principles to practices.ACM Computing Surveys, 55(9):1–46, 2023

work page 2023

[35] [35]

SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks

Xiangyi Li, Wenbo Chen, Yimin Liu, Shenghan Zheng, Xiaokun Chen, Yifeng He, Yubo Li, Bingran You, Haotian Shen, Jiankai Sun, et al. Skillsbench: Benchmarking how well agent skills work across diverse tasks.arXiv preprint arXiv:2602.12670, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[36] [36]

Holistic evaluation of language models.Transactions on Machine Learning Research

Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Yasunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, Ananya Kumar, et al. Holistic evaluation of language models.Transactions on Machine Learning Research

work page

[37] [37]

Trustworthy ai: A computational perspective.ACM Transactions on Intelligent Systems and Technology, 14(1):1–59, 2022

Haochen Liu, Yiqi Wang, Wenqi Fan, Xiaorui Liu, Yaxin Li, Shaili Jain, Yunhao Liu, Anil Jain, and Jiliang Tang. Trustworthy ai: A computational perspective.ACM Transactions on Intelligent Systems and Technology, 14(1):1–59, 2022

work page 2022

[38] [38]

Out-of- distribution detection: A task-oriented survey of recent advances.ACM Computing Surveys, 58(2):1–39, 2025

Shuo Lu, Yingsheng Wang, Lijun Sheng, Lingxiao He, Aihua Zheng, and Jian Liang. Out-of- distribution detection: A task-oriented survey of recent advances.ACM Computing Surveys, 58(2):1–39, 2025

work page 2025

[39] [39]

The responsibility gap: Ascribing responsibility for the actions of learning automata.Ethics and information technology, 6(3):175–183, 2004

Andreas Matthias. The responsibility gap: Ascribing responsibility for the actions of learning automata.Ethics and information technology, 6(3):175–183, 2004

work page 2004

[40] [40]

The state of ai in 2025: Agents, innovation, and transformation, 2025

McKinsey & Company. The state of ai in 2025: Agents, innovation, and transformation, 2025

work page 2025

[41] [41]

State of ai trust in 2026: Shifting to the agentic era, 2026

McKinsey & Company. State of ai trust in 2026: Shifting to the agentic era, 2026

work page 2026

[42] [42]

Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces

Mike A Merrill, Alexander G Shaw, Nicholas Carlini, Boxuan Li, Harsh Raj, Ivan Bercovich, Lin Shi, Jeong Yeon Shin, Thomas Walshe, E Kelly Buchanan, et al. Terminal-bench: Benchmarking agents on hard, realistic tasks in command line interfaces.arXiv preprint arXiv:2601.11868, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[43] [43]

Exploring the potential of llms as personalized assistants: Dataset, evaluation, and analysis

Jisoo Mok, Ik-hwan Kim, Sangkwon Park, and Sungroh Yoon. Exploring the potential of llms as personalized assistants: Dataset, evaluation, and analysis. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 10212–10239, 2025

work page 2025

[44] [44]

Accountability in artificial intelli- gence: what it is and how it works.Ai & Society, 39(4):1871–1882, 2024

Claudio Novelli, Mariarosaria Taddeo, and Luciano Floridi. Accountability in artificial intelli- gence: what it is and how it works.Ai & Society, 39(4):1871–1882, 2024

work page 2024

[45] [45]

Audit trails for accountability in large language models.arXiv preprint arXiv:2601.20727, 2026

Victor Ojewale, Harini Suresh, and Suresh Venkatasubramanian. Audit trails for accountability in large language models.arXiv preprint arXiv:2601.20727, 2026

work page arXiv 2026

[46] [46]

Contracting by artificial intelligence: Open offers, unilateral mistakes, and why algorithms are not agents.ANU Journal of Law and Technology, 2(1):45–87, 2021

Matthew Oliver. Contracting by artificial intelligence: Open offers, unilateral mistakes, and why algorithms are not agents.ANU Journal of Law and Technology, 2(1):45–87, 2021

work page 2021

[47] [47]

Introducing operator, 2025

OpenAI. Introducing operator, 2025

work page 2025

[48] [48]

New tools for building agents, 2025

OpenAI. New tools for building agents, 2025

work page 2025

[49] [49]

Clawhub: Skill directory for openclaw, 2026

OpenClaw. Clawhub: Skill directory for openclaw, 2026

work page 2026

[50] [50]

Openclaw: Personal ai assistant, 2026

OpenClaw. Openclaw: Personal ai assistant, 2026

work page 2026

[51] [51]

Generative agents: Interactive simulacra of human behavior

Joon Sung Park, Joseph O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. Generative agents: Interactive simulacra of human behavior. InProceed- ings of the 36th annual acm symposium on user interface software and technology, pages 1–22, 2023

work page 2023

[52] [52]

MIT press, 2017

Jonas Peters, Dominik Janzing, and Bernhard Scholkopf.Elements of causal inference: founda- tions and learning algorithms. MIT press, 2017

work page 2017

[53] [53]

Unravelling responsibility for ai.Journal of Responsible Technology, page 100124, 2025

Zoe Porter, Philippa Ryan, Phillip Morgan, Joanna Al-Qaddoumi, Bernard Twomey, Paul Noordhof, John McDermid, and Ibrahim Habli. Unravelling responsibility for ai.Journal of Responsible Technology, page 100124, 2025. 12

work page 2025

[54] [54]

Pwc’s ai agent survey, 2025

PwC. Pwc’s ai agent survey, 2025

work page 2025

[55] [55]

Closing the ai accountability gap: Defining an end-to-end framework for internal algorithmic auditing

Inioluwa Deborah Raji, Andrew Smart, Rebecca N White, Margaret Mitchell, Timnit Gebru, Ben Hutchinson, Jamila Smith-Loud, Daniel Theron, and Parker Barnes. Closing the ai accountability gap: Defining an end-to-end framework for internal algorithmic auditing. In Proceedings of the 2020 conference on fairness, accountability, and transparency, pages 33–44, 2020

work page 2020

[56] [56]

Identifying the risks of lm agents with an lm-emulated sandbox

Yangjun Ruan, Honghua Dong, Andrew Wang, Silviu Pitis, Yongchao Zhou, Jimmy Ba, Yann Dubois, Chris J Maddison, and Tatsunori Hashimoto. Identifying the risks of lm agents with an lm-emulated sandbox. InThe Twelfth International Conference on Learning Representations

work page

[57] [57]

Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead.Nature machine intelligence, 1(5):206–215, 2019

Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead.Nature machine intelligence, 1(5):206–215, 2019

work page 2019

[58] [58]

Toolformer: Language models can teach themselves to use tools.Advances in neural information processing systems, 36:68539– 68551, 2023

Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools.Advances in neural information processing systems, 36:68539– 68551, 2023

work page 2023

[59] [59]

The 2025 AI Agent Index: Documenting Technical and Safety Features of Deployed Agentic AI Systems

Leon Staufer, Kevin Feng, Kevin Wei, Luke Bailey, Yawen Duan, Mick Yang, A Pinar Ozisik, Stephen Casper, and Noam Kolt. The 2025 ai agent index: Documenting technical and safety features of deployed agentic ai systems.arXiv preprint arXiv:2602.17753, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2025

[60] [60]

Accountability in offline reinforcement learning: Explaining decisions with a corpus of examples.Advances in Neural Information Processing Systems, 36:3143–3172, 2023

Hao Sun, Alihan Hüyük, Daniel Jarrett, and Mihaela van der Schaar. Accountability in offline reinforcement learning: Explaining decisions with a corpus of examples.Advances in Neural Information Processing Systems, 36:3143–3172, 2023

work page 2023

[61] [61]

Llama 2: Open Foundation and Fine-Tuned Chat Models

Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. Llama 2: Open foundation and fine-tuned chat models.arXiv preprint arXiv:2307.09288, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[62] [62]

Find the gap: Ai, responsible agency and vulnerability

Shannon Vallor and Tillmann Vierkant. Find the gap: Ai, responsible agency and vulnerability. Minds and Machines, 34(3):20, 2024

work page 2024

[63] [63]

Springer Science & Business Media, 2011

Nicole A Vincent, Ibo Van de Poel, and Jeroen Van Den Hoven.Moral responsibility: Beyond free will and determinism. Springer Science & Business Media, 2011

work page 2011

[64] [64]

Machines without principals: liability rules and artificial intelligence.Wash

David C Vladeck. Machines without principals: liability rules and artificial intelligence.Wash. L. Rev., 89:117, 2014

work page 2014

[65] [65]

A survey on large language model based autonomous agents.Frontiers of Computer Science, 18(6):186345, 2024

Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, et al. A survey on large language model based autonomous agents.Frontiers of Computer Science, 18(6):186345, 2024

work page 2024

[66] [66]

Freematch: Self- adaptive thresholding for semi-supervised learning

Yidong Wang, Hao Chen, Qiang Heng, Wenxin Hou, Yue Fan, Zhen Wu, Jindong Wang, Marios Savvides, Takahiro Shinozaki, Bhiksha Raj, Bernt Schiele, and Xing Xie. Freematch: Self- adaptive thresholding for semi-supervised learning. InThe Eleventh International Conference on Learning Representations, 2023

work page 2023

[67] [67]

Taxonomy of risks posed by language models.Proceedings of the ACM Conference on Fairness, Accountability, and Transparency, pages 214–229, 2022

Laura Weidinger, Jonathan Uesato, Maribeth Rauh, Conor Griffin, Po-Sen Huang, John Mellor, Amelia Glaese, Myra Cheng, Borja Balle, Atoosa Kasirzadeh, et al. Taxonomy of risks posed by language models.Proceedings of the ACM Conference on Fairness, Accountability, and Transparency, pages 214–229, 2022

work page 2022

[68] [68]

What to account for when accounting for algorithms: a systematic literature review on algorithmic accountability

Maranke Wieringa. What to account for when accounting for algorithms: a systematic literature review on algorithmic accountability. InProceedings of the 2020 conference on fairness, accountability, and transparency, pages 1–18, 2020

work page 2020

[69] [69]

Autogen: Enabling next-gen llm applications via multi-agent conversations

Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, et al. Autogen: Enabling next-gen llm applications via multi-agent conversations. InFirst conference on language modeling, 2024. 13

work page 2024

[70] [70]

The rise and potential of large language model based agents: A survey.Science China Information Sciences, 68(2):121101, 2025

Zhiheng Xi, Wenxiang Chen, Xin Guo, Wei He, Yiwen Ding, Boyang Hong, Ming Zhang, Junzhe Wang, Senjie Jin, Enyu Zhou, et al. The rise and potential of large language model based agents: A survey.Science China Information Sciences, 68(2):121101, 2025

work page 2025

[71] [71]

Frank F. Xu, Yufan Song, Boxuan Li, Yuxuan Tang, Kritanjali Jain, Mengxue Bao, Zora Zhiruo Wang, Xuhui Zhou, Zhitong Guo, Murong Cao, Mingyang Yang, Hao Yang Lu, Amaad Martin, Zhe Su, Leander Melroy Maben, Raj Mehta, Wayne Chi, Lawrence Keunho Jang, Yiqing Xie, Shuyan Zhou, and Graham Neubig. Theagentcompany: Benchmarking LLM agents on consequential real ...

work page 2026

[72] [72]

React: Synergizing reasoning and acting in language models

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models. InInternational Conference on Learning Representations (ICLR), 2023

work page 2023

[73] [73]

Long-term fairness with unknown dynamics.Advances in Neural Information Processing Systems, 36:55110–55139, 2023

Tongxin Yin, Reilly Raab, Mingyan Liu, and Yang Liu. Long-term fairness with unknown dynamics.Advances in Neural Information Processing Systems, 36:55110–55139, 2023

work page 2023

[74] [74]

R-judge: Benchmarking safety risk awareness for LLM agents

Tongxin Yuan, Zhiwei He, Lingzhong Dong, Yiming Wang, Ruijie Zhao, Tian Xia, Lizhen Xu, Binglin Zhou, Fangqi Li, Zhuosheng Zhang, Rui Wang, and Gongshen Liu. R-judge: Benchmarking safety risk awareness for LLM agents. In Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen, editors,Findings of the Association for Computational Linguistics: EMNLP 2024, pages ...

work page 2024

[75] [75]

InjecAgent: Benchmarking indirect prompt injections in tool-integrated large language model agents

Qiusi Zhan, Zhixiang Liang, Zifan Ying, and Daniel Kang. InjecAgent: Benchmarking indirect prompt injections in tool-integrated large language model agents. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors,Findings of the Association for Computational Linguistics: ACL 2024, pages 10471–10506, Bangkok, Thailand, August 2024. Association for Comput...

work page 2024

[76] [76]

Agentracer: Who is inducing failure in the LLM agentic systems? InThe Fourteenth International Conference on Learning Representations, 2026

Guibin Zhang, Junhao Wang, Junjie Chen, Wangchunshu Zhou, Kun Wang, and Shuicheng YAN. Agentracer: Who is inducing failure in the LLM agentic systems? InThe Fourteenth International Conference on Learning Representations, 2026

work page 2026

[77] [77]

Which agent causes task failures and when? On automated failure attribution of LLM multi-agent systems

Shaokun Zhang, Ming Yin, Jieyu Zhang, Jiale Liu, Zhiguang Han, Jingyang Zhang, Beibin Li, Chi Wang, Huazheng Wang, Yiran Chen, and Qingyun Wu. Which agent causes task failures and when? On automated failure attribution of LLM multi-agent systems. In Aarti Singh, Maryam Fazel, Daniel Hsu, Simon Lacoste-Julien, Felix Berkenkamp, Tegan Maharaj, Kiri Wagstaff...

work page 2025

[78] [78]

Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Tianyue Ou, Yonatan Bisk, Daniel Fried, Uri Alon, and Graham Neubig

Shuyan Zhou, Frank F. Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Tianyue Ou, Yonatan Bisk, Daniel Fried, Uri Alon, and Graham Neubig. Webarena: A realistic web environment for building autonomous agents. InThe Twelfth International Conference on Learning Representations, 2024. 14 A Implementation Details of Neuro-Symbolic Trial Thi...

work page 2024