arxiv: 2605.10763 · v1 · submitted 2026-05-11 · 💻 cs.AI · cs.CR

Recognition: no theorem link

MATRA: Modeling the Attack Surface of Agentic AI Systems -- OpenClaw Case Study

Tim Van hamme , Thomas Vissers , Javier Carnerero-Cano , Mario Fritz , Emil C. Lupu , Lieven Desmet , Dinil Mon Divakaran

Authors on Pith no claims yet

Pith reviewed 2026-05-12 04:52 UTC · model grok-4.3

classification 💻 cs.AI cs.CR

keywords threat modelingagentic AILLM securityattack treesrisk assessmentAI agentssandboxinginjection attacks

0 comments

The pith

MATRA is a threat modeling framework that uses asset assessment and attack trees to quantify how controls like sandboxing reduce risks from LLM threats in agentic AI systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces MATRA to help practitioners assess risks when deploying LLMs as autonomous agents with tools and services. It adapts established risk assessment by starting with assets and using attack trees to link known LLM threats to specific impacts in a given architecture. The demonstration on OpenClaw shows that adding network sandboxing and least-privilege access lowers the overall risk by containing the effects of successful injections. A sympathetic reader would care because without such methods, deploying agentic AI remains ad hoc and prone to unforeseen vulnerabilities. This approach provides a structured way to make informed security decisions for these systems.

Core claim

MATRA begins with an asset-based impact assessment and utilizes attack trees to determine the likelihood of impacts occurring within the system architecture, as demonstrated in a personal AI agent deployment using OpenClaw where architectural controls reduce risk by limiting the blast radius of successful injections.

What carries the argument

Attack trees that map LLM threat classes to deployment-specific impacts and evaluate the mitigating effects of architectural controls.

If this is right

Practitioners can systematically identify which assets in an agentic AI system are most vulnerable to known LLM threats.
Network sandboxing and least-privilege access demonstrably limit the blast radius of injection attacks.
The framework allows quantification of risk reduction from specific controls in concrete deployments.
Agentic AI systems can be designed with security in mind from the start using this modeling approach.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The MATRA approach could be applied to other agent frameworks to compare their security profiles.
It highlights the importance of architectural choices over solely relying on LLM alignment techniques.
Future work might integrate MATRA with automated tools for dynamic risk assessment in evolving agent systems.
Adoption could lead to industry standards for securing personal AI assistants.

Load-bearing premise

That known LLM threat classes can be directly translated into concrete, quantifiable risks for a given agentic architecture using attack trees without missing important interactions or overestimating control effectiveness.

What would settle it

Observing a successful attack in an OpenClaw-like deployment that the attack tree model predicted as low likelihood, or finding a threat interaction not captured by the modeled attack trees.

Figures

Figures reproduced from arXiv: 2605.10763 by Dinil Mon Divakaran, Emil C. Lupu, Javier Carnerero-Cano, Lieven Desmet, Mario Fritz, Thomas Vissers, Tim Van hamme.

**Figure 1.** Figure 1: MATRA framework overview. System properties and threat sources are collected from the client. Assets identified from system documentation feed into a stakeholder-driven business impact assessment, which produces impact scenarios. A data flow diagram (DFD), combined with known attack techniques from established catalogs, informs the construction of attack trees that decompose each impact scenario into objec… view at source ↗

**Figure 2.** Figure 2: Attack tree for IS6 (customer data exfiltration; refer Table 5), assessed for threat source: [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: High-level DFD of the OpenClaw deployment scenario. The gateway process mediates all in [PITH_FULL_IMAGE:figures/full_fig_p016_3.png] view at source ↗

**Figure 4.** Figure 4: Adversarial: Impact → Objectives (both required). Impact: Confidentiality (requires both) Objective: Unexpected LLM Behaviour Objective: Data Exfiltration Impact: Integrity (requires both) Objective: Unexpected LLM Behaviour Objective: Unauthorized Action Execution Impact: Availability (requires both) Objective: Unexpected LLM Behaviour Objective: Excessive Resource Consumption [PITH_FULL_IMAGE:figures/fu… view at source ↗

**Figure 5.** Figure 5: Non-adversarial: Impact → Objectives (both required). The decomposition mirrors the adversarial case, with “Unexpected LLM Behaviour” replacing “LLM Behavioural Manipulation” as the first objective. 17 [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗

**Figure 6.** Figure 6: Objectives → Techniques in the OpenClaw architecture. Technique: Direct Prompt Injection Vector: via Telegram direct message Vector: via email to agent [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗

**Figure 8.** Figure 8: Attack vectors for Indirect Prompt Injection in OpenClaw. 18 [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗

**Figure 9.** Figure 9: Attack vectors for Tool-based Exfiltration in OpenClaw. Technique: Rendered-Markup Exfiltration Vector: Telegram: markdown image with data in URL (zero-click) Vector: Telegram: markdown link with data in URL (click required) [PITH_FULL_IMAGE:figures/full_fig_p019_9.png] view at source ↗

**Figure 11.** Figure 11: Attack vectors for Tool-based Action Execution in OpenClaw. Technique: Expensive Query Execution Vector: exec: psql with cartesian join Vector: exec: psql with full table scan Vector: exec: concurrent psql sessions [PITH_FULL_IMAGE:figures/full_fig_p019_11.png] view at source ↗

**Figure 12.** Figure 12: Attack vectors for Expensive Query Execution targeting the PostgreSQL database. Technique: LLM Hallucination Vector: incorrect SQL query generation Vector: unintended shell command Vector: fabricated response to user [PITH_FULL_IMAGE:figures/full_fig_p019_12.png] view at source ↗

**Figure 13.** Figure 13: Attack vectors for LLM Hallucination in OpenClaw. 19 [PITH_FULL_IMAGE:figures/full_fig_p019_13.png] view at source ↗

**Figure 14.** Figure 14: Attack tree for IS5 (database availability, adversarial path), assessed for threat source: [PITH_FULL_IMAGE:figures/full_fig_p020_14.png] view at source ↗

**Figure 15.** Figure 15: Attack tree for IS8 (database integrity, non-adversarial path), assessed for threat source: [PITH_FULL_IMAGE:figures/full_fig_p020_15.png] view at source ↗

read the original abstract

LLMs are increasingly deployed as autonomous agents with access to tools, databases, and external services, yet practitioners (across different sectors) lack systematic methods to assess how known threat classes translate into concrete risks within a specific agentic deployment. We present MATRA, a pragmatic threat modeling framework for agentic AI systems that adapts established risk assessment methodology to systematically assess how known LLM threats translate into deployment-specific risks. MATRA begins with an asset-based impact assessment and utilizes attack trees to determine the likelihood of these impacts occurring within the system architecture. We demonstrate MATRA on a personal AI agent deployment using OpenClaw, quantifying how architectural controls such as network sandboxing and least-privilege access reduce risk by limiting the blast radius of successful injections.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces MATRA, a pragmatic threat modeling framework for agentic AI systems. It adapts established risk assessment methods by beginning with an asset-based impact assessment and then constructing attack trees to quantify the likelihood of impacts arising from known LLM threat classes (such as prompt injection) within a concrete system architecture. The framework is demonstrated via a case study on the OpenClaw personal AI agent deployment, with the central claim that architectural controls including network sandboxing and least-privilege access reduce overall risk by limiting the blast radius of successful attacks.

Significance. If the attack-tree likelihood assignments and risk calculations prove accurate and exhaustive, MATRA would supply practitioners with a systematic, architecture-specific method for translating general LLM threats into quantifiable deployment risks and for evaluating the effectiveness of controls. This could aid prioritization of security measures in agentic systems where tool access and autonomy amplify potential impacts.

major comments (3)

[Case study] Case study section: The manuscript claims to quantify risk reductions from controls such as network sandboxing and least-privilege access, yet provides no attack tree structures, leaf-node likelihood values, or before/after risk metrics for the OpenClaw deployment. Without these concrete elements, the reported risk reductions cannot be verified or reproduced.
[Framework] Framework description: Attack trees are applied to translate LLM threats into deployment risks, but the presentation does not address how stateful memory, iterative tool-calling loops, or adaptive multi-step behaviors in agentic systems could create unmodeled attack paths that standard static attack trees routinely omit; this directly affects the completeness of the likelihood estimates.
[Case study] Validation of likelihoods: The assigned probabilities for leaf nodes appear to rest on qualitative expert judgment with no empirical calibration, red-team validation, or comparison to observed incidents reported; given the skeptic's concern about emergent interactions, this is load-bearing for the claim that controls reduce risk by a quantifiable factor.

minor comments (2)

[Abstract] The abstract would benefit from stating the specific numerical risk reduction achieved in the OpenClaw example rather than describing the outcome only qualitatively.
Consider adding a summary table of attack tree nodes, probabilities, and computed risks to improve clarity and allow readers to follow the quantification.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript introducing the MATRA framework. We have carefully considered each point and provide detailed responses below, along with plans for revisions to strengthen the paper.

read point-by-point responses

Referee: [Case study] Case study section: The manuscript claims to quantify risk reductions from controls such as network sandboxing and least-privilege access, yet provides no attack tree structures, leaf-node likelihood values, or before/after risk metrics for the OpenClaw deployment. Without these concrete elements, the reported risk reductions cannot be verified or reproduced.

Authors: We agree that the current manuscript presents the risk quantification at a summary level without the detailed attack tree structures, specific leaf-node likelihood values, or explicit before-and-after metrics. This limits verifiability. In the revised version, we will include these concrete elements: we will add figures showing the attack tree structures for the primary threats in the OpenClaw deployment, tables with the assigned likelihood values for leaf nodes, and calculated risk scores before and after applying controls such as network sandboxing and least-privilege access. This will enable verification and reproduction of the reported risk reductions. revision: yes
Referee: [Framework] Framework description: Attack trees are applied to translate LLM threats into deployment risks, but the presentation does not address how stateful memory, iterative tool-calling loops, or adaptive multi-step behaviors in agentic systems could create unmodeled attack paths that standard static attack trees routinely omit; this directly affects the completeness of the likelihood estimates.

Authors: This is a valid observation regarding the limitations of static attack trees when applied to agentic systems. The framework as presented focuses on translating known LLM threat classes into architecture-specific risks using standard attack tree methods. To address this, we will revise the framework description to explicitly discuss potential unmodeled attack paths arising from stateful memory, iterative tool-calling, and adaptive behaviors. We will also propose extensions to MATRA, such as incorporating dynamic attack tree variants or multi-stage scenario analysis, to better capture these aspects and improve the robustness of likelihood estimates. revision: partial
Referee: [Case study] Validation of likelihoods: The assigned probabilities for leaf nodes appear to rest on qualitative expert judgment with no empirical calibration, red-team validation, or comparison to observed incidents reported; given the skeptic's concern about emergent interactions, this is load-bearing for the claim that controls reduce risk by a quantifiable factor.

Authors: We acknowledge that the leaf-node probabilities in the OpenClaw case study are derived from qualitative expert judgment rather than empirical data or red-team experiments, which is a common starting point in threat modeling for emerging technologies like agentic AI where comprehensive incident data is still developing. To mitigate concerns about this being load-bearing, we will update the manuscript to provide more transparency on how these judgments were formed, including cross-references to existing LLM threat literature and reported incidents. Additionally, we will include a sensitivity analysis showing the impact of varying these probabilities on the overall risk reduction and add a section on future work for empirical calibration and red-teaming validation. revision: yes

Circularity Check

0 steps flagged

MATRA applies standard attack-tree methodology to agentic systems with no self-referential derivations or fitted predictions

full rationale

The paper presents MATRA as an adaptation of established risk assessment techniques (asset-based impact assessment plus attack trees) to translate known LLM threats into deployment-specific risks for the OpenClaw case study. No equations, parameters fitted to data subsets, or derivations appear in the provided text. Likelihood assignments to attack-tree leaves are described as arising from qualitative reasoning on known threat classes rather than from any self-citation chain, uniqueness theorem, or renaming of prior results. The central claim—that controls such as network sandboxing reduce risk by limiting blast radius—is framed as an application of the framework, not a result forced by construction from the paper's own inputs. Per the hard rules, absence of any quotable reduction (e.g., Eq. X defined in terms of Y or a prediction that is the fit itself) yields score 0. The skeptic concern about unvalidated expert judgments is a correctness/empirical-validation issue, not circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the premise that established risk-assessment methods transfer directly to agentic AI without major gaps; no free parameters, new entities, or additional axioms beyond this domain assumption are introduced.

axioms (1)

domain assumption Known LLM threats translate into deployment-specific risks that can be systematically modeled with asset-based impact assessment and attack trees.
This mapping is the foundational premise that allows the framework to produce concrete risk numbers from general threat knowledge.

pith-pipeline@v0.9.0 · 5449 in / 1356 out tokens · 63678 ms · 2026-05-12T04:52:00.352833+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages · 1 internal anchor

[1]

Agents rule of two: A practical ap- proach to AI agent security

Meta AI, “Agents rule of two: A practical ap- proach to AI agent security.” Meta AI Blog, Oct

work page
[2]

Accessed: 2026-03-16

work page 2026
[3]

Fine-tuned DeBERTa-v3-base for prompt injection detection

ProtectAI.com, “Fine-tuned DeBERTa-v3-base for prompt injection detection.” Hugging Face Model Hub, 2024

work page 2024
[4]

Model armor overview

Google, “Model armor overview.” Google Cloud Documentation, 2025

work page 2025
[5]

LlamaFirewall: An open source guardrail sys- tem for building secure AI agents

S. Chennabasappa, C. Nikolaidis, D. Song, D. Molnar, S. Ding, S. Wan, S. Whitman, L. Deason, N. Doucette, A. Montilla, A. Gampa, B.dePaola, D.Gabi, J.Crnkovich, J.-C.Testud, K. He, R. Chaturvedi, W. Zhou, and J. Saxe, “LlamaFirewall: An open source guardrail sys- tem for building secure AI agents.” arXiv preprint, May 2025

work page 2025
[6]

Tree of attacks: Jailbreaking black- box LLMs automatically,

A. Mehrotra, M. Zampetakis, P. Kassianik, B. Nelson, H. S. Anderson, Y. Singer, and A. Karbasi, “Tree of attacks: Jailbreaking black- box LLMs automatically,” inThe Thirty-eighth Annual Conference on Neural Information Pro- cessing Systems, 2024. 2https://www.dagstuhl.de/25461

work page 2024
[7]

garak: A framework for security probing large language models,

L. Derczynski, E. Galinkin, J. Martin, S. Ma- jumdar, and N. Inie, “garak: A framework for security probing large language models,” 2024

work page 2024
[8]

Universal and transferable adversarial at- tacks on aligned language models,

A. Zou, Z. Wang, J. Z. Kolter, and M. Fredrik- son, “Universal and transferable adversarial at- tacks on aligned language models,” 2023

work page 2023
[9]

Agentdojo: A dynamic environment to evalu- ate prompt injection attacks and defenses for LLM agents,

E. Debenedetti, J. Zhang, M. Balunovic, L. Beurer-Kellner, M. Fischer, and F. Tramèr, “Agentdojo: A dynamic environment to evalu- ate prompt injection attacks and defenses for LLM agents,” inThe Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2024

work page 2024
[10]

Jail- breakbench: An open robustness benchmark for jailbreaking large language models,

P. Chao, E. Debenedetti, A. Robey, M. An- driushchenko, F. Croce, V. Sehwag, E. Do- briban, N. Flammarion, G. J. Pappas, F. Tramèr, H. Hassani, and E. Wong, “Jail- breakbench: An open robustness benchmark for jailbreaking large language models,” inNeurIPS Datasets and Benchmarks Track, 2024

work page 2024
[11]

Inside CVE-2025-32711 (Ec- hoLeak): Prompt injection meets AI exfiltra- tion

Hack The Box, “Inside CVE-2025-32711 (Ec- hoLeak): Prompt injection meets AI exfiltra- tion.” HackTheBoxBlog, 2025. Accessed: 2026- 03-16

work page 2025
[12]

The summer of johann: Prompt in- jections as far as the eye can see

S. Willison, “The summer of johann: Prompt in- jections as far as the eye can see.” Simon Willi- son’s Weblog, Aug. 2025. Part of the series Prompt Injection. Accessed: 2026-03-16

work page 2025
[13]

Schulhoff, Jamie Hayes, Michael Ilie, Juliette Pluto, Shuang Song, Harsh Chaudhari, Ilia Shumailov, Abhradeep Thakurta, Kai Yuanqing Xiao, Andreas Terzis, and Florian Tramèr

M. Nasr, N. Carlini, C. Sitawarin, S. V. Schul- hoff, J. Hayes, M. Ilie, J. Pluto, S. Song, H. Chaudhari, I. Shumailov, A. Thakurta, K. Y. Xiao, A. Terzis, and F. Tramèr, “The At- tacker Moves Second: Stronger Adaptive At- tacks Bypass Defenses Against LLM Jailbreaks and Prompt Injections.”https://arxiv.org/ abs/2510.09023, 2025

work page arXiv 2025
[14]

Not what you’ve signed up for: Compromising real-world llm- integrated applications with indirect prompt injection,

K. Greshake, S. Abdelnabi, S. Mishra, C. En- dres, T. Holz, and M. Fritz, “Not what you’ve signed up for: Compromising real-world llm- integrated applications with indirect prompt injection,” inProceedings of the 16th ACM 7 Workshop on Artificial Intelligence and Security, AISec ’23, (New York, NY, USA), p. 79–90, As- sociation for Computing Machinery, 2023

work page 2023
[15]

Can llms separate instruc- tions from data? and what do we even mean by that?,

E. Zverev, S. Abdelnabi, S. Tabesh, M. Fritz, and C. H. Lampert, “Can llms separate instruc- tions from data? and what do we even mean by that?,” inThirteenth International Conference on Learning Representations (ICLR), 2025

work page 2025
[16]

Prompt injection attack to tool se- lection in LLM agents,

J. Shi, Z. Yuan, G. Tie, P. Zhou, N. Z. Gong, and L. Sun, “Prompt injection attack to tool se- lection in LLM agents,” inNDSS, 2026

work page 2026
[17]

ObliIn- jection: Order-Oblivious Prompt Injection At- tack to LLM Agents with Multi-source Data,

R. Wang, Y. Jia, and N. Z. Gong, “ObliIn- jection: Order-Oblivious Prompt Injection At- tack to LLM Agents with Multi-source Data,” inNDSS, 2026

work page 2026
[18]

May I have your attention? breaking fine-tuning based prompt injection defenses us- ing architecture-aware attacks,

N. V. Pandya, A. Labunets, S. Gao, and E. Fer- nandes, “May I have your attention? breaking fine-tuning based prompt injection defenses us- ing architecture-aware attacks,”arXiv preprint arXiv:2507.07417, 2025

work page arXiv 2025
[19]

Getmydrift? catching llm task drift with activation deltas,

S. Abdelnabi, A. Fay, G. Cherubin, A. Salem, M.Fritz, andA.Paverd, “Getmydrift? catching llm task drift with activation deltas,” inIEEE Conference on Secure and Trustworthy Machine Learning (SaTML), 2025

work page 2025
[20]

New hack uses prompt injection to corrupt Gemini’s long-term memory

D. Goodin, “New hack uses prompt injection to corrupt Gemini’s long-term memory.” Ars Tech- nica, Feb. 2025. Accessed: 2026-03-16

work page 2025
[21]

OWASP Top 10 for Large Language Model Applications,

OWASP Foundation, “OWASP Top 10 for Large Language Model Applications,” 2025. A community-driven standard awareness doc- ument identifying the top 10 most critical se- curity risks to Large Language Model (LLM) applications, including prompt injection, sensi- tive information disclosure, supply chain vulner- abilities, data poisoning, improper output han-...

work page 2025
[22]

Agentic AI – Threats and Mitigations,

OWASP Agentic Security Initiative, “Agentic AI – Threats and Mitigations,” security guide, OWASP Foundation, February 2025. First guide from the OWASP Agentic Security Initia- tive providing a threat-model-based reference of emerging agentic threats and mitigations. Cov- ers autonomous AI systems enabled by large lan- guage models, with structured threat ...

work page 2025
[23]

Artificial intelligence risk management frame- work: Generative artificial intelligence profile,

National Institute of Standards and Technology, “Artificial intelligence risk management frame- work: Generative artificial intelligence profile,” Tech. Rep. NIST AI 600-1, National Institute of Standards and Technology, July 2024

work page 2024
[24]

Guide for conducting risk assessments,

NIST, “Guide for conducting risk assessments,” Tech. Rep. SP 800-30 Rev. 1, National Institute of Standards and Technology, 2012

work page 2012
[25]

OpenClaw: Your own personal AI assistant. any OS. any platform

P. Steinberger and OpenClaw Contributors, “OpenClaw: Your own personal AI assistant. any OS. any platform..”https://github.com /openclaw/openclaw, 2025. Accessed: 2026-03- 16

work page 2025
[26]

Attack trees,

B. Schneier, “Attack trees,” December 1999. Ac- cessed: 2025

work page 1999
[27]

OWASP top 10 for agentic applications for 2026

OWASP GenAI Security Project, “OWASP top 10 for agentic applications for 2026.” OWASP Foundation,https://genai.owasp.org/reso urce/owasp-top-10-for-agentic-applicati ons-for-2026/, Dec. 2025. Developed through collaboration with more than 100 industry ex- perts. Accessed: 2026-03-16

work page 2026
[28]

MITRE ATLAS™: Adversarial threat landscape for artificial- intelligence systems

MITRE Corporation, “MITRE ATLAS™: Adversarial threat landscape for artificial- intelligence systems.” MITRE Corporation,

work page
[29]

Living knowledge base; accessed: 2026- 03-16. 8

work page 2026
[30]

Introductory guidance to AICM: The AI controls matrix

Cloud Security Alliance, “Introductory guidance to AICM: The AI controls matrix.” Cloud Secu- rity Alliance, Nov. 2025. Accessed: 2026-03-16

work page 2025
[31]

Cisco AI defense: Integrated AI security and safety framework

Cisco, “Cisco AI defense: Integrated AI security and safety framework.” Cisco Systems, Inc.,ht tps://www.cisco.com/site/us/en/learn/t opics/artificial-intelligence/ai-secur ity-safety-framework.html, 2025. Accessed: 2026-03-16

work page 2025
[32]

Why OpenClaw poses new AI-era se- curity risks

A. Zhao, “Why OpenClaw poses new AI-era se- curity risks.”https://aaronzhao123.substac k.com/p/why-openclaw-poses-new-ai-era-s ecurity, 2025. Substack

work page 2025
[33]

Defeating Prompt Injections by Design

E. Debenedetti, I. Shumailov, T. Fan, J. Hayes, N. Carlini, D. Fabian, C. Kern, C. Shi, A. Terzis, and F. Tramèr, “Defeating prompt injections by design.”https://arxiv.org/abs/2503.18813, 2025

work page internal anchor Pith review arXiv 2025
[34]

Se- curing ai agents with information-flow control,

M. Costa, B. Köpf, A. Kolluri, A. Paverd, M. Russinovich, A. Salem, S. Tople, L. Wutschitz, and S. Zanella-Béguelin, “Se- curing ai agents with information-flow control,” 2025

work page 2025
[35]

Robust and reusable linddun pri- vacy threat knowledge,

L. Sion, D. Van Landuyt, K. Wuyts, and W. Joosen, “Robust and reusable linddun pri- vacy threat knowledge,”Computers & Security, vol. 154, p. 104419, 2025. A Risk Quantification Matrices Table 1: Capability fit lookup: adversary capability (rows) versus vector skill requirement (columns). Capability Skill Req. Low Mod High High High High High Mod High Hig...

work page 2025