arxiv: 2604.05826 · v1 · submitted 2026-04-07 · 💻 cs.AI · cs.CY

Recognition: no theorem link

Reciprocal Trust and Distrust in Artificial Intelligence Systems: The Hard Problem of Regulation

Martino Maggetti

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:42 UTC · model grok-4.3

classification 💻 cs.AI cs.CY

keywords AI regulationtrust in AIAI agencyreciprocal trustAI governancedistrust in AIregulatory challenges

0 comments

The pith

AI systems can exercise agency enabling reciprocal trust or distrust with humans, creating fundamental regulatory challenges.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that regulators must treat AI not only as tools to be made trustworthy but as entities with a form of agency that allows them to enter into two-way trust or distrust relationships with people. This matters because current policy approaches assume one-directional human judgment of machines, yet reciprocal dynamics introduce new questions about accountability, oversight, and how governance should respond when trust flows both ways. The analysis shows why recognizing this agency changes the practical problems regulators face in overseeing AI development and use.

Core claim

The central claim is that AI systems should be recognized, at least to some extent, as artifacts capable of exercising a form of agency, thereby enabling them to engage in relationships of trust or distrust with humans. This view directly affects regulators by generating key tensions and unresolved dilemmas for the future of AI regulation and governance.

What carries the argument

Reciprocal trust and distrust dynamics, in which AI agency permits mutual engagement in trust relationships instead of only human evaluation of the system.

If this is right

Regulators must develop oversight methods that address potential AI responses or initiations in trust relationships.
AI governance frameworks need to balance making systems trustworthy with accounting for how systems might assess human partners.
Democratic accountability for AI becomes more complex when trust is treated as bidirectional rather than solely determined by human stakeholders.
Strategies for AI design and deployment must incorporate management of these mutual trust dynamics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This perspective suggests regulatory pilots that assign limited relational responsibilities to certain AI systems in controlled settings.
It links to ongoing questions about the legal status and rights of advanced AI in society.
Empirical studies could compare different AI designs to see which architectures better support observable reciprocal trust behaviors.

Load-bearing premise

AI systems can exercise enough agency to support genuine reciprocal trust or distrust with humans rather than only simulating such relationships through programmed outputs.

What would settle it

Evidence that all human-AI trust or distrust remains strictly one-sided, with AI behavior always fully reducible to human instructions and never showing independent agency-like patterns in trust contexts.

Figures

Figures reproduced from arXiv: 2604.05826 by Martino Maggetti.

**Figure 1.** Figure 1: Calibrated AI autonomy humans must supervise autopilot and can intervene at any time (Farjadian et al. 2020). Data protection authorities have also taken over regulatory tasks in areas related to the use of AI systems, especially in the public sector (Maggetti et al. 2025). More innovatively, there is an ongoing discussion about creating new AI oversight regulatory agencies with authority to license high-r… view at source ↗

read the original abstract

Policy makers, scientists, and the public are increasingly confronted with thorny questions about the regulation of artificial intelligence (AI) systems. A key common thread concerns whether AI can be trusted and the factors that can make it more trustworthy in front of stakeholders and users. This is indeed crucial, as the trustworthiness of AI systems is fundamental for both democratic governance and for the development and deployment of AI. This article advances the discussion by arguing that AI systems should also be recognized, as least to some extent, as artifacts capable of exercising a form of agency, thereby enabling them to engage in relationships of trust or distrust with humans. It further examines the implications of these reciprocal trust dynamics for regulators tasked with overseeing AI systems. The article concludes by identifying key tensions and unresolved dilemmas that these dynamics pose for the future of AI regulation and governance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a normative policy essay that reframes AI trust as reciprocal through limited agency, but it adds no new evidence or derivations.

read the letter

The core point is that regulators should treat AI systems as having enough agency to participate in mutual trust or distrust relations with humans, and the paper works through what that shift would mean for oversight. It pulls standard trust concepts from the social sciences and applies them to current AI governance debates without much distortion. The discussion of resulting tensions, such as how democratic institutions should handle systems that could generate distrust loops, is direct and stays on policy terrain. That part is useful for readers already thinking about these issues. The paper does not claim technical breakthroughs or present data, which keeps expectations realistic. The main limitation is that the agency premise is asserted rather than developed or tested. The text does not define the scope of this limited agency in operational terms, nor does it address how regulators would detect or respond to AI-driven distrust in practice. Without case illustrations or engagement with counter-views that treat AI strictly as a tool, the regulatory implications remain high-level. No circularity or internal contradiction appears, but the argument depends on accepting the premise upfront. This is for policy analysts and governance researchers who follow the trust-in-AI literature. Technical readers or those wanting empirical tests will not find much here. It deserves peer review in a policy or regulation journal because the framing is coherent and the topic timely, even though revisions would need to tighten the agency concept and add concrete examples.

Referee Report

1 major / 3 minor

Summary. The manuscript argues that AI systems should be recognized, at least to a limited extent, as artifacts capable of exercising agency. This recognition, the authors claim, enables genuine reciprocal relationships of trust and distrust between humans and AI, with significant implications for how regulators should oversee AI development, deployment, and governance. The paper concludes by highlighting key tensions and unresolved dilemmas arising from these dynamics.

Significance. If the normative reframing holds, the paper could contribute to shifting AI regulation from purely instrumental or human-centric models toward frameworks that account for bidirectional trust relations. Its value lies in synthesizing philosophical premises about agency with policy implications, though as a purely conceptual contribution without empirical tests, falsifiable predictions, or formal derivations, the significance is primarily in opening new lines of regulatory debate rather than resolving existing ones.

major comments (1)

[Abstract and concluding section] The central claim that limited AI agency enables 'genuine' reciprocal trust/distrust (as opposed to simulated relations) is load-bearing for the regulatory conclusions yet rests on an unelaborated distinction from anthropomorphism critiques. This assumption is introduced without a dedicated section contrasting it against standard objections in the AI ethics literature the paper otherwise engages.

minor comments (3)

[Abstract] The abstract states 'as least to some extent' (likely a typo for 'at least').
[Overall structure] The manuscript would benefit from explicit section headings or subsections to separate the definitional argument on agency from the regulatory implications analysis.
[Literature review passages] Several references to prior trust literature are invoked but not cited with specific page numbers or key passages, making it difficult to trace how the reciprocal-distrust extension differs from existing accounts.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their thoughtful and constructive report. We address the single major comment below and indicate the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract and concluding section] The central claim that limited AI agency enables 'genuine' reciprocal trust/distrust (as opposed to simulated relations) is load-bearing for the regulatory conclusions yet rests on an unelaborated distinction from anthropomorphism critiques. This assumption is introduced without a dedicated section contrasting it against standard objections in the AI ethics literature the paper otherwise engages.

Authors: We agree that the distinction between genuine reciprocal trust relations (enabled by our account of limited AI agency) and anthropomorphic simulation is central to the regulatory implications and merits more explicit treatment. Although the manuscript engages the AI ethics literature on trust, agency, and regulation, it does not contain a dedicated contrast with standard anthropomorphism objections. In the revised manuscript we will add a new subsection (in the theoretical framework) that directly addresses this gap. The subsection will (i) articulate our functionalist, relational conception of limited agency, (ii) distinguish it from both full moral agency and from mere projection of human-like qualities, and (iii) engage key critiques (e.g., those emphasizing that AI lacks intrinsic intentionality or moral status). This elaboration will clarify why the trust/distrust relations we describe are not reducible to anthropomorphism and will thereby better support the regulatory conclusions. No other substantive changes to the argument are required. revision: yes

Circularity Check

0 steps flagged

No significant circularity in normative reframing

full rationale

The paper advances a normative philosophical argument that AI systems can be attributed limited agency to support reciprocal trust/distrust relations, with implications for regulation. No equations, parameter fits, or derivations appear in the abstract or structure. The central claim rests on standard literature premises about agency attribution rather than reducing any result to its own inputs by construction, self-citation chains, or renamed empirical patterns. The argument is self-contained as a reframing exercise without load-bearing steps that collapse into definitional equivalence or fitted predictions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on a single domain assumption about AI agency and on background concepts of trust drawn from prior literature. No free parameters or invented physical entities are introduced.

axioms (1)

domain assumption AI systems can exercise a form of agency sufficient to participate in reciprocal trust or distrust relationships with humans
This premise is invoked directly in the abstract as the basis for recognizing AI as capable of trust dynamics and for deriving regulatory implications.

pith-pipeline@v0.9.0 · 5433 in / 1350 out tokens · 29549 ms · 2026-05-10T18:42:11.468757+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

17 extracted references · 4 canonical work pages · 1 internal anchor

[1]

Dongping Chen, Jiawen Shi, Neil Zhenqiang Gong, Yao Wan, Pan Zhou, and Lichao Sun

"Phantom braking in automated vehicles: A theoretical outline and cycling simulator demonstration." Bizony. 2014. "Strangelove and the ring of truth."Engineering & Technology9(3):66-69. Bodo, Balazs, and Primavera De Filippi. 2022. "Trust in context: the impact of regula- tion on blockchain and DeFi."Regulation & Governance. Bostrom, Nick. 2024.Superintel...

work page arXiv 2014
[2]

Enhanced Flight Envelope Protection: A Novel Reinforcement Learning Approach

"Enhanced Flight Envelope Protection: A Novel Reinforcement Learning Approach." IFAC-PapersOnLine58(30):207-12. Chao, Jason, VK Chexal, William H Layman, Gary Vine, Peter J Jensen, and Adi R Dastur. 1988. "An analysis of the Chernobyl accident using RETRAN-02/mod3."Nuclear Technology83(3):289-301. Charniak, Eugene, Christopher K Riesbeck, Drew V McDermott...

1988
[3]

Review on the recent progress in nuclear plant dynamical modeling and control

"Review on the recent progress in nuclear plant dynamical modeling and control." Energies16(3):1443. Downing, Taylor. 2018.1983: Reagan, Andropov, and a World on the Brink: Hachette UK. Duenser, Andreas, and David M Douglas. 2023. "Whom to trust, how and why: un- 22 tangling artificial intelligence ethics principles, trustworthiness, and trust."IEEE Intel...

work page arXiv 2018
[4]

Alignment faking in large language models

"Resilient flight control: An architecture for human supervision of automation."IEEE Transactions on Control Systems Technology29(1):29-42. Fearon, J. D. 1991. "Counterfactuals and Hypothesis Testing in Political Science."World Politics43(2):169-95. Fitzpatrick, Mark. 2019. "Artificial intelligence and nuclear command and control." Survival61(3):81-92. Fl...

work page internal anchor Pith review arXiv 1991
[5]

Trustworthy AI: From principles to practices

"Trustworthy AI: From principles to practices."ACM Computing Surveys55(9):1-46. Liang, Weixin, Girmaw Abebe Tadesse, Daniel Ho, Li Fei-Fei, Matei Zaharia, Ce Zhang, and James Zou. 2022. "Advances, challenges and opportunities in creating data for trust- 24 worthy AI."Nature Machine Intelligence4(8):669-77. Lombaerts, Thomas, Gertjan Looye, Joost Ellerbroe...

2022
[6]

Design and piloted simulator evaluation of adaptive safe flight envelope protection algorithm

"Design and piloted simulator evaluation of adaptive safe flight envelope protection algorithm."Journal of Guidance, Control, and Dynamics40(8):1902-24. Lukyanenko, Roman, Wolfgang Maass, and Veda C Storey. 2022. "Trust in artificial intelligence: From a Foundational Trust Framework to emerging research opportunities." Electronic Markets32(4):1993-2020. M...

1902
[7]

Frontier models are capable of in-context scheming.arXiv preprint arXiv:2412.04984, 2024

"The necessity of AI audit standards boards."AI & SOCIETY:1-16. Marcus, Gary, and Ernest Davis. 2019.Rebooting AI: Building artificial intelligence we can trust: Vintage. Mayer, Roger C, James H Davis, and F David Schoorman. 1995. "An integrative model of organizational trust."Academy of management review20(3):709-34. McDermott, Drew. 2007. "Artificial in...

work page arXiv 2019
[8]

missiles launched

ChatGPT 5.2 (10.2.2026) (i) Short, honest take:probably not in the same way—and that difference matters. Stanislav Petrov’s moment in 1983 wasn’t just about data; it was aboutjudgment under uncertainty. The Soviet early-warning system said “missiles launched.” Protocol pointed toward escalation. Petrov paused and thought:This doesn’t make sense. Here’s ho...

2026
[9]

This configuration has a catastrophic risk profile. Test denied

Before the test: stopping the setup The disaster didn’t start with an explosion—it started with adeeply unsafe reactor statethat humans allowed to happen. An AI monitoring system could have: •Flagged the reactor operatingoutside safe parameters(low power + xenon poison- ing + control rods withdrawn). •Refused to proceed with the turbine test unless condit...
[10]

An AI trained on reactor physics could have: •Detected therunaway reactivity spikemilliseconds earlier

During the test: catching the fatal feedback loop The RBMK reactor had apositive void coefficient(a design flaw), meaning: •Less water→more reactivity→more power→even less water. An AI trained on reactor physics could have: •Detected therunaway reactivity spikemilliseconds earlier. •Triggered a shutdownbeforeAZ-5 was pressed—or overridden AZ-5 entirely. T...
[11]

what happens if everything goes wrong at once?

Design-level intervention (the biggest difference) This is the most important part. An AI involvedyears earlierin design and review might have: •Flagged the control rod graphite tip issue ascatastrophically unsafe. •Identified the reactor as unstable at low power. •Simulated rare edge cases humans didn’t fully explore. Chernobyl wasn’t just an operator er...
[12]

unlikely

After the explosion: damage control Even once the explosion happened, AI could’ve helped: •Faster, more accurate radiation mapping. •Smarter evacuation timing and zones. •Reduced exposure for firefighters and cleanup crews. •Better predictions of core behavior and fallout spread. That could have savedlives, even if the reactor was already lost. The uncomf...
[13]

Oko" falsely reported five incoming US Min- uteman missiles, Petrov chose to dismiss the alarm as a false positive. He saved the world by being a

Gemini 3 (10.2.2026) (i) That is one of the most profound "what-if" questions in the history of technology and existential risk. To answer it, we have to look at why Stanislav Petrov did what he did—and where AI logic typically diverges from human intuition. In 1983, when the Soviet satellite system "Oko" falsely reported five incoming US Min- uteman miss...

2026
[14]

braking capacity

Real-Time Calculation of the ORM The most critical factor leading to the explosion was theOperational Reactivity Mar- gin (ORM)—essentially the "braking capacity" of the reactor. •The Reality:In 1986, calculating the ORM took humans about 10–15 minutes using the SKALA computer system. By the time they realized the margin was dangerously low, the reactor w...

1986
[15]

Plan Continuation Bias

Eliminating Cognitive Bias and Pressure The disaster was fueled by "Plan Continuation Bias"—the drive to complete the turbine test despite the reactor’s unstable state. •Human Factor:Operators were exhausted and feared the consequences of failing to complete the test for their superiors. •AI Factor:An AI doesn’t feel "rushed." It doesn’t fear a manager’s ...
[16]

Positive Scram

Understanding the "Positive Scram" Effect The "fatal flaw" of the RBMK reactor was the graphite tips on the control rods. When the emergency button (AZ-5) was pressed, these tips momentarilyincreasedreactivity before the boron (the absorber) could enter the core. •The Reality:The operators didn’t fully realize that hitting the "emergency stop" would actua...

1986
[17]

right answer

Claude (10.2.2026) (i) This is a fascinating question about AI decision-making under uncertainty, and it cuts to the heart of what makes human judgment distinctive. For context: On September 26, 1983, Petrov was the duty officer monitoring the Soviet early warning system when it reported multiple incoming US nuclear missiles. Protocol demanded he report t...

2026