arxiv: 2604.22819 · v1 · submitted 2026-04-16 · 💻 cs.CY

Recognition: unknown

A pragmatic approach to regulating AI agents

Philipp Hacker , Matthias Holweg

Authors on Pith no claims yet

Pith reviewed 2026-05-10 09:57 UTC · model grok-4.3

classification 💻 cs.CY

keywords AI agentsEU AI Actregulationcontract laworchestration layeraccountabilitymulti-agent systemsautonomy

0 comments

The pith

AI agents require regulation as distinct AI systems under the EU AI Act due to their autonomous cross-system actions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that AI agents, which independently reason, plan, and carry out tasks across different external systems, create fresh risks especially when several agents interact with one another. This situation calls for moving regulatory focus to the orchestration layer where these interactions happen. Even when agents rest on general-purpose models, their structural complexity and ability to move between systems mean they should count as AI systems under the EU AI Act and carry separate duties. On the contract side, the authors propose a risk-based traffic light system for approving tasks and a fixed statutory list of actions that agents may never perform. These steps would keep growing agent autonomy inside existing legal rules while preserving human accountability.

Core claim

The unique capacity of agents to autonomously reason, plan, and execute tasks across disparate external systems necessitates a fundamental shift in oversight toward the orchestration layer, where multi-agent interactions introduce novel risks of misalignment. While agents generally utilise general-purpose AI models, their structural complexity and cross-system permeability require them to be regulated as AI systems with distinct obligations under the AI Act. Contract law should therefore adopt a traffic light system of staggered task authorization based on operational risk together with a statutory list of non-delegable legal acts.

What carries the argument

The orchestration layer as the primary site for overseeing multi-agent interactions, paired with a risk-tiered traffic light system for task authorization and a statutory list of non-delegable legal acts.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Regulators outside the EU might copy the orchestration-layer focus when writing rules for agentic systems.
Developers could add monitoring hooks at the point where agents coordinate to simplify compliance.
Controlled trials of multi-agent deployments could check whether orchestration-level rules actually lower misalignment events.
The approach might lead to standard contract templates that flag which actions remain human-only.

Load-bearing premise

The premise that agents built on general-purpose models still create unique risks through their structural complexity and cross-system operations that cannot be managed by regulating the models alone, and that existing contract law can absorb a statutory list of non-delegable acts without major revision.

What would settle it

A documented case in which agents operating across multiple systems under only general-purpose model rules produce no misalignment, or a judicial ruling that EU contract law cannot accommodate a statutory list of non-delegable acts.

Figures

Figures reproduced from arXiv: 2604.22819 by Matthias Holweg, Philipp Hacker.

**Figure 2.** Figure 2: Composition of LLMs, Source: Sculley et al. (2025:4) [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

read the original abstract

The current advancement in and deployment of agentic AI systems has created a set of key challenges for the legal frameworks that govern their use. We cover two central components: first, the regulatory classification of agents under the EU AI Act, and second, the legal status and validity of autonomous actions within the established framework of EU contract law. We argue that the unique capacity of agents to autonomously reason, plan, and execute tasks across disparate external systems necessitates a fundamental shift in oversight toward the orchestration layer, where multi-agent interactions introduce novel risks of misalignment. While agents generally utilise general-purpose AI models, we posit that their structural complexity and cross-system permeability require them to be regulated as "AI systems" with distinct obligations under the AI Act. Consequently, our proposals highlight the need for robust accountability mechanisms to manage this heightened autonomy. On the contractual side, we advocate for a "traffic light" system of staggered task authorization based on operational risk and the creation of a statutory list of non-delegable legal acts. By implementing these measures, we provide a pragmatic pathway to ensure that the increasing autonomy of AI agents remains firmly anchored in human accountability and existing legal standards

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper argues that AI agents' autonomous reasoning, planning, and cross-system execution create novel risks at the orchestration layer of multi-agent systems, requiring them to be classified and regulated as distinct 'AI systems' under the EU AI Act with tailored obligations, even when built on general-purpose models. It further proposes, within EU contract law, a traffic-light system of staggered task authorization keyed to operational risk together with a statutory list of non-delegable legal acts to preserve human accountability.

Significance. If the classification and compatibility arguments are substantiated, the manuscript offers a concrete policy bridge between technical agent architectures and the EU AI Act plus contract-law instruments, potentially informing how regulators allocate obligations between model providers and agent deployers.

major comments (2)

[Regulatory classification section (around the paragraph beginning 'While agents generally utilise general-purpose AI...)] The central claim that agents' 'structural complexity and cross-system permeability' necessitate distinct 'AI system' obligations beyond GPAI rules is load-bearing yet unsupported by explicit mapping. The manuscript should cite the precise definitions in Article 3 and the obligations in Articles 13 and Chapter V of the AI Act and demonstrate why orchestration-layer interactions fall outside those provisions.
[Contract-law proposals section (the paragraphs advocating the 'traffic light' system and statutory list)] The proposal for a statutory list of non-delegable acts and a traffic-light authorization scheme assumes compatibility with existing principles of contractual autonomy and electronic agency under the e-Commerce Directive and national laws, but provides no analysis of potential conflicts with liability allocation or representation rules; this gap directly affects the feasibility of the contract-law recommendations.

minor comments (2)

[Abstract and introduction] The abstract and introduction would benefit from a short table or bullet list contrasting the proposed obligations with current GPAI transparency and risk-management requirements to improve readability.
[Throughout] Several sentences contain repetitive phrasing (e.g., repeated use of 'autonomously reason, plan, and execute'); tightening would strengthen the legal argumentation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and address each major point below, indicating the revisions we will incorporate.

read point-by-point responses

Referee: The central claim that agents' 'structural complexity and cross-system permeability' necessitate distinct 'AI system' obligations beyond GPAI rules is load-bearing yet unsupported by explicit mapping. The manuscript should cite the precise definitions in Article 3 and the obligations in Articles 13 and Chapter V of the AI Act and demonstrate why orchestration-layer interactions fall outside those provisions.

Authors: We accept that an explicit mapping to the AI Act provisions would strengthen the argument. In the revised manuscript we will add a dedicated subsection quoting Article 3(1) (definition of 'AI system') and Article 3(63) (definition of 'general-purpose AI model'), then map the orchestration layer's autonomous planning and cross-system execution to the obligations in Article 13 and Chapter V. We will show that these features generate emergent misalignment risks and systemic interactions that fall outside the model-provider-focused transparency and risk-management duties applicable to GPAI, thereby justifying distinct classification and obligations at the orchestration level. revision: yes
Referee: The proposal for a statutory list of non-delegable acts and a traffic-light authorization scheme assumes compatibility with existing principles of contractual autonomy and electronic agency under the e-Commerce Directive and national laws, but provides no analysis of potential conflicts with liability allocation or representation rules; this gap directly affects the feasibility of the contract-law recommendations.

Authors: We agree that compatibility analysis is required. The revised version will include a new paragraph examining the e-Commerce Directive (particularly electronic contracting and agency provisions) and national representation rules. We will argue that the traffic-light scheme preserves contractual autonomy by conditioning high-risk authorizations on human approval, thereby maintaining existing liability allocation to the human principal rather than shifting it to the agent. The statutory list of non-delegable acts will be presented as consistent with capacity and ratification requirements, avoiding conflicts by treating agent outputs as conditionally authorized representations subject to human oversight. revision: yes

Circularity Check

0 steps flagged

No circularity: policy arguments grounded in external EU legal frameworks

full rationale

The paper is a normative legal analysis proposing regulatory classifications and contractual mechanisms under the EU AI Act and contract law. Its claims rest on interpretive application of external statutes (e.g., definitions of AI systems, obligations in Articles 3/13/Chapter V, e-Commerce Directive) rather than any self-referential equations, fitted parameters renamed as predictions, or self-citation chains that reduce the central thesis to its own inputs. No load-bearing steps exhibit self-definition, ansatz smuggling, or renaming of known results; the derivation is self-contained against independent legal benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The paper is a legal policy proposal; it rests on interpretive assumptions about EU law applicability rather than empirical or mathematical derivations. No free parameters or invented entities are introduced.

axioms (2)

domain assumption EU AI Act framework applies to agentic systems and can be extended with distinct obligations for orchestration layers
Invoked in the abstract's discussion of regulatory classification and shift in oversight.
domain assumption Existing EU contract law can incorporate a traffic light authorization system and statutory list of non-delegable acts
Central to the contractual proposals in the abstract.

pith-pipeline@v0.9.0 · 5491 in / 1315 out tokens · 46117 ms · 2026-05-10T09:57:31.305341+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

27 extracted references · 19 canonical work pages · 2 internal anchors

[1]

and Sugimoto, K

Aranda, L. and Sugimoto, K. (2026) The agentic AI landscape and its conceptual foundations. OECD Artificial Intelligence Papers, No

2026
[2]

InFindings of the Associa- tion for Computational Linguistics: ACL 2024, pages 13921–13937, Bangkok, Thailand

Paris: OECD Publishing. Belcak, P., Heinrich, G., Diao, S., Fu, Y., Dong, X., Muralidharan, S., ... & Molchanov, P. (2025). Small language models are the future of agentic AI. arXiv preprint arXiv:2506.02153. Bellia Jr, A. J. (2001). Contracting with electronic agents. Emory LJ, 50,

work page arXiv 2025
[3]

Belova, M., Kansal, Y., Liang, Y., Xiao, J., & Jha, N. K. (2026). An Alternative Trajectory for Generative AI. arXiv preprint arXiv:2603.14147. Bengio, Y. et al. (2026). International AI Safety Report 2026: Navigating Rapid AI Advancement and Emerging Risks. [online] Department for Science, Innovation and Technology (UK). Available at: https://www.gov.uk/...

work page arXiv 2026
[4]

On the Opportunities and Risks of Foundation Models

Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M. S., Bohg, J., Bosselut, A., & Brunskill, E. (2021). On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258. Bundesamt für Sicherheit in der Informationstechnik. (2025). Generative KI-Modelle: Chancen und Risiken für Industrie und Behörd...

work page internal anchor Pith review arXiv 2021
[5]

arXiv preprint arXiv:2505.19591 , year=

Multi- agent collaboration via evolving orchestration. arXiv preprint arXiv:2505.19591. De Bruyne, J., Dheu, O., & Ducuing, C. (2023). The European Commission's approach to extra-contractual liability and AI–An evaluation of the AI liability directive and the revised product liability directive. Computer Law & Security Review, 51, 105894. Deepmind: The Et...

work page arXiv 2023
[6]

TRAIL: Trace reasoning and agentic issue localization.arXiv preprint arXiv:2505.08638, 2025

TRAIL: Trace Reasoning and Agentic Issue Localization. arXiv 2505.08638. Available at: https://arxiv.org/abs/2505.08638 Ding, D., Mallick, A., Wang, C., Sim, R., Mukherjee, S., Ruhle, V., ... & Awadallah, A. H. (2024). Hybrid LLM: Cost- efficient and quality-aware query routing. ICLR

work page arXiv 2024
[7]

& Kolter, Z

Dziemian, M., Lin, M., Fu, X., Nowak, M., Winter, N., Jones, E., ... & Kolter, Z. (2026). How Vulnerable Are AI Agents to Indirect Prompt Injections? Insights from a Large-Scale Public Competition. arXiv preprint arXiv:2603.15714. Ebers, M., & Penagos, E. V. (2026). Upstream, downstream, and in between: navigating the GPAI value chain under EU law. Inform...

work page arXiv 2026
[8]

February, DOI:10.13140/RG.2.2.11455.42400

Explainable AI in Multi-Agent Systems: Advancing Transparency with Layered Prompting. February, DOI:10.13140/RG.2.2.11455.42400. Fei Yu, Hongbo Zhang, Prayag Tiwari, and Benyou Wang (2024). Natural language reasoning, a survey. Comput. Surveys 56, 12: 1–39. Felin, T. and Holweg, M. (2024). Theory Is All You Need: AI, Human Cognition, and Decision Making. ...

work page doi:10.13140/rg.2.2.11455.42400 2024
[9]

Gartner Predicts 40% of Enterprise Apps Will Feature Task-Specific AI Agents by 2026, Up from Less Than 5% in

Gartner (2025). Gartner Predicts 40% of Enterprise Apps Will Feature Task-Specific AI Agents by 2026, Up from Less Than 5% in

2025
[10]

[online] Gartner Newsroom. Available at: https://www.gartner.com/en/newsroom/press-releases/2025-08-26-gartner-predicts-40-percent-of- enterprise-apps-will-feature-task-specific-ai-agents-by-2026-up-from-less-than-5-percent-in-2025 Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press. Gössl, S. (2024). Art

2025
[11]

Martini and C

In M. Martini and C. Wendehorst (Eds.), KI-VO: Verordnung über Künstliche Intelligenz: Kommentar. Beck. Grundmann, S., & Hacker, P. (2017). Digital technology as a challenge to European contract law: from the existing to the future architecture. European Review of Contract Law, 13(3), 255-293. Gutierrez, C. I., Aguirre, A., Uuk, R., Boine, C. C., & Frankl...

2017
[12]

arXiv preprint arXiv:2310.19852 , year=

Hacker, P. (2020). Datenprivatrecht. Mohr Siebeck. Hacker, P. (2023). The European AI liability directives–Critique of a half-hearted approach and lessons for the future. Computer Law & Security Review, 51, 105871. Hacker, P. (2024). Comments on the final trilogue version of the AI Act. Available at SSRN 4757603. Hacker, P., & Ebert, K. (2025). Attributin...

work page doi:10.48550/arxiv.2310.19852 2020
[13]

2025.Char- acterizing AI Agents for Alignment and Gover- nance

Characterizing AI agents for alignment and governance. arXiv preprint arXiv:2504.21848. Kidd Jr, D. L., & Daughtrey Jr, W. H. (1999). Adapting contract law to accommodate electronic contracts: overview and suggestions. Rutgers Computer & Tech. LJ, 26,

work page arXiv 1999
[14]

Kolt, N. (2025). Governing AI agents. arXiv preprint arXiv:2501.07913. Koorndijk, J. (2025). Empirical Evidence for Alignment Faking in a Small LLM and Prompt-Based Mitigation Techniques. arXiv. https://doi.org/10.48550/arxiv.2506.21584 Kötz, H. (2017). Agency and representation. In European contract law (2nd ed., pp. 293–318). Oxford University Press Lan...

work page doi:10.48550/arxiv.2506.21584 2025
[15]

LogPrompt: Prompt engineering towards zero-shot and interpretable log analysis,

Interpretable online log analysis using large language models with prompt strategies. arXiv preprint arXiv:2308.07610. Magesh, V., Surani, F., Dahl, M., Suzgun, M., Manning, C. D., & Ho, D. E. (2025). Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools. Journal of Empirical Legal Studies, 22(2), 216-242. Martini, M. (2026). Art

work page arXiv 2025
[16]

GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models

In M. Martini & C. Wendehorst (Eds.), KI-VO: Verordnung über Künstliche Intelligenz. Kommentar (2nd ed.). C. H. Beck. Migliorini, S. (2024). “More than words”: A legal approach to the risks of commercial chatbots powered by generative artificial intelligence. European Journal of Risk Regulation, 15, 719–736. Mirzadeh, I., Alizadeh, K., Aleaziz, H., Davies...

work page Pith review arXiv 2024
[17]

Beyond black-box benchmarking: Observability, analytics, and optimization of agentic systems

Beyond Black‑Box Benchmarking: Observability, Analytics, and Optimization of Agentic Systems. arXiv 2503.06745. Available at: https://arxiv.org/abs/2503.06745 35 Nannini, L., Smith, A. L., Maggini, M. J., Panai, E., Feliciano, S., Tiulkanov, A., ... & Bisconti, P. (2026). AI Agents Under EU Law. arXiv preprint arXiv:2604.04604. Ngo-Ho, A. K. N., Chauvin, ...

work page arXiv 2026
[18]

Novelli, C., Casolari, F., Hacker, P., Spedicato, G., & Floridi, L. (2024a). Generative AI in EU law: Liability, privacy, intellectual property, and cybersecurity. Computer Law & Security Review, 55, 106066. Novelli, C., Hacker, P., Morley, J., Trondal, J., & Floridi, L. (2024). A Robust Governance for the AI Act: AI Office, AI Board, Scientific Panel, an...

work page doi:10.1017/err.2024.1057 2024
[19]

Retrieved from https://max-eup2012.mpipriv.de/index.php/E-Commerce Russell, S

Max Planck Institute for Comparative and International Private Law. Retrieved from https://max-eup2012.mpipriv.de/index.php/E-Commerce Russell, S. J., & Norvig, P. (2022). Artificial Intelligence: A Modern Approach (4th Global ed.). Pearson Education. Sage, N. (2023). Reconciling contract law’s objective and subjective standards. The Modern Law Review, 86...

2022
[20]

Schwartmann, R., & Zenner, K. (2025). GPAI Applications Under Scrutiny: The Regulation of the AI Regulation Along the Value Chain. Journal for European Data and Information Law, 1, 3-9 Sculley, D., Holt, G., Golovin, D., Davydov, E., Phillips, T., Ebner, D., Chaudhary, V., Young, M., Crespo, J.-F., & Dennison, D. (2015). Hidden technical debt in machine l...

2025
[21]

Large language model routing with benchmark datasets.arXiv preprint arXiv:2309.15789, 2023

Shnitzer, T., Ou, A., Silva, M., Soule, K., Sun, Y., Solomon, J., ... & Yurochkin, M. (2023). Large language model routing with benchmark datasets. arXiv preprint arXiv:2309.15789. Shojaee, M., Reddy, S. and Ghassemi, M. (2025). The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity. Ap...

work page arXiv 2023
[22]

Sorge, C. (2006). Softwareagenten: Vertragsschluss, Vertragsstrafe, Reugeld (1st ed.). KIT Scientific Publishing. St-Hilaire, I. (2025). Lying chatbot makes airline liable: Negligent misrepresentation in Moffatt v Air Canada. UBC Law Review, 58(2), 591-624. Sumers TR and others, ‘Cognitive Architectures for Language Agents’ (2024) 2024 Transactions on Mac...

2006
[23]

Veale, M., & Borgesius, F. Z. (2021). Demystifying the Draft EU Artificial Intelligence Act–Analysing the good, the bad, and the unclear elements of the proposed approach. Computer Law Review International, 22(4), 97-112. Veale, M., & Quintais, J. P. (2025). The Obligations of Providers of General-Purpose AI Models. Available at SSRN 5744602. Wachter, S. ...

2021
[24]

Wagner, G. (2023). Liability Rules for the Digital Age: –Aiming for the Brussels Effect–. Journal of European Tort Law, 13(3), 191-243. Wang, P., Li, X., Xiang, C., Zhang, J., Li, Y., Zhang, L., ... & Tian, Y. (2026). The Landscape of Prompt Injection Threats in LLM Agents: From Taxonomy to Analysis. arXiv preprint arXiv:2602.10453. Webb, T., Mondal, S. S...

work page arXiv 2023
[25]

Weitzenboeck, E. M. (2001). Electronic agents and the formation of contracts. International Journal of Law and Information Technology, 9(3), 204–234. Wendehorst, C. (2024). Principles of AI in Contracting, in Tim Sagaert and Wendy Vananroye (eds.), Privaatrecht plenis coloribus: Liber Amicorum Matthias Storme. Kluwer 2024, 1097 Wolpert, D.H. and Macready,...

work page doi:10.1109/4235.585893 2001
[26]

& Kankanhalli, M

An introduction to multiagent systems. John Wiley & Sons. Xu, Z., Jain, S., & Kankanhalli, M. (2024). Hallucination is inevitable: An innate limitation of large language models. arXiv preprint arXiv:2401.11817. Yang, C., Zhao, R., Liu, Y., & Jiang, L. (2025). Survey of specialized large language model. arXiv preprint arXiv:2508.19667. Yao, S., Zhao, J., Y...

work page arXiv 2024
[27]

Tree of Thoughts: Deliberate Problem Solving with Large Language Models

Tree of Thoughts: Deliberate Problem Solving with Large Language Models. arXiv preprint arXiv:2305.10601. Zaharia, M., Khattab, O., Chen, L., Davis, J. Q., Miller, H., Potts, C., Zou, J., Carbin, M., Frankle, J., Rao, N., & Ghodsi, A. (2024, February 18). The shift from models to compound AI systems. Berkeley Artificial Intelligence Research Blog. https:/...

work page internal anchor Pith review arXiv 2024