pith. machine review for the scientific record. sign in

arxiv: 2604.05969 · v1 · submitted 2026-04-07 · 💻 cs.CR · cs.AI

Recognition: no theorem link

A Formal Security Framework for MCP-Based AI Agents: Threat Taxonomy, Verification Models, and Defense Mechanisms

Authors on Pith no claims yet

Pith reviewed 2026-05-10 19:34 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords MCP-based AI agentsthreat taxonomyformal verificationdefense mechanismssecurity frameworkAI securityattack surfaces
0
0 comments X

The pith

MCPSHIELD integrates a threat taxonomy, formal verification, and layered defenses to cover 91 percent of risks to MCP-based AI agents.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds a unified formal security framework for AI agents that connect to external tools through the Model Context Protocol. It organizes risks into a hierarchy of categories and vectors drawn from large-scale tool data, then supplies a verification method based on labeled transition systems to check interaction chains. Existing defenses are compared and shown to leave most threats unaddressed, while the new integrated architecture combines access control, attestation, flow tracking, and runtime enforcement. A sympathetic reader would see this as a way to move from fragmented point solutions to systematic protection for agent ecosystems.

Core claim

The paper establishes that a hierarchical threat taxonomy of seven categories and twenty-three attack vectors across four surfaces, paired with a labeled transition system model that annotates trust boundaries, enables both static and runtime analysis of MCP tool chains. When these elements are combined in a defense-in-depth reference architecture that adds capability-based access control, cryptographic attestation, information flow tracking, and policy enforcement, the result is a theoretical coverage of 91 percent of the threat landscape, compared with no more than 34 percent for any single prior mechanism.

What carries the argument

The labeled transition system model with trust boundary annotations, which performs static and runtime analysis of MCP tool interaction chains and supports the integration of multiple defense layers.

Load-bearing premise

The threat taxonomy derived from analysis of over 177,000 tools fully represents real-world risks and the formal model can be applied practically to verify actual agent behaviors.

What would settle it

Running the verification model on a real MCP agent implementation known to contain one of the twenty-three attack vectors and checking whether the model flags the violation before the attack succeeds.

read the original abstract

The Model Context Protocol (MCP), introduced by Anthropic in November 2024 and now governed by the Linux Foundation's Agentic AI Foundation, has rapidly become the de facto standard for connecting large language model (LLM)-based agents to external tools and data sources, with over 97 million monthly SDK downloads and more than 177000 registered tools. However, this explosive adoption has exposed a critical gap: the absence of a unified, formal security framework capable of systematically characterizing, analyzing, and mitigating the diverse threats facing MCP-based agent ecosystems. Existing security research remains fragmented across individual attack papers, isolated benchmarks, and point defense mechanisms. This paper presents MCPSHIELD, a comprehensive formal security framework for MCP-based AI agents. We make four principal contributions: (1) a hierarchical threat taxonomy comprising 7 threat categories and 23 distinct attack vectors organized across four attack surfaces, grounded in the analysis of over 177000 MCP tools; (2) a formal verification model based on labeled transition systems with trust boundary annotations that enables static and runtime analysis of MCP tool interaction chains; (3) a systematic comparative evaluation of 12 existing defense mechanisms, identifying coverage gaps across our threat taxonomy; and (4) a defense in depth reference architecture integrating capability based access control, cryptographic tool attestation, information flow tracking, and runtime policy enforcement. Our analysis reveals that no existing single defense covers more than 34 percent of the identified threat landscape, whereas MCPSHIELD's integrated architecture achieves theoretical coverage of 91 percent. We further identify seven open research challenges that must be addressed to secure the next generation of agentic AI systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes MCPSHIELD as a unified formal security framework for MCP-based AI agents. It contributes (1) a hierarchical threat taxonomy of 7 categories and 23 attack vectors across four surfaces, derived from analysis of >177,000 MCP tools; (2) a labeled transition system (LTS) model with trust-boundary annotations for static and runtime verification of tool interaction chains; (3) a comparative evaluation of 12 existing defenses against the taxonomy; and (4) a defense-in-depth reference architecture combining capability-based access control, cryptographic attestation, information flow tracking, and runtime enforcement. The central empirical claim is that no single existing defense covers more than 34% of the threat landscape while the integrated MCPSHIELD architecture achieves 91% theoretical coverage; seven open research challenges are also identified.

Significance. If the taxonomy is comprehensive and the coverage figures are rigorously derived, the work could serve as a valuable reference point for securing the rapidly expanding MCP ecosystem. The scale of the tool analysis grounding the taxonomy and the attempt to combine formal modeling with defense integration are positive contributions that address fragmentation in current agent security research. The significance is reduced, however, by the unclear linkage between the LTS verification model and the headline quantitative results.

major comments (3)
  1. [Comparative evaluation of defenses] Evaluation section (comparative analysis of defenses): The headline result that no existing defense exceeds 34% coverage while MCPSHIELD reaches 91% theoretical coverage is presented as the outcome of systematic evaluation against the 7-category/23-vector taxonomy. These percentages are obtained via manual, unweighted checklist mapping of defenses to attack vectors rather than any quantitative aggregation, reachability analysis, or state-space coverage metric computed from the labeled transition system model with trust-boundary annotations. Consequently the 91% figure does not inherit soundness from the formal model and remains sensitive to arbitrary decisions about what counts as 'covering' a vector.
  2. [Formal verification model] Formal verification model section: The LTS model is introduced as enabling static and runtime analysis of MCP tool chains, yet the paper provides no concrete example of its application to any of the 23 attack vectors, no definition of coverage or reachability metrics within the LTS, and no demonstration that the 91% figure follows from the model. This leaves the formal contribution disconnected from the central comparative claim.
  3. [Threat taxonomy] Threat taxonomy section: The claim that the taxonomy is 'grounded in the analysis of over 177,000 MCP tools' is central to the validity of the coverage percentages, but the paper does not describe the derivation process (e.g., how vectors were extracted, whether coverage was validated against real tool logs, or any inter-rater reliability for the 7 categories). Without this, it is impossible to assess whether the taxonomy accurately reflects real-world risks or whether the 34%/91% comparison is robust.
minor comments (2)
  1. [Abstract and evaluation] The abstract and introduction use 'theoretical coverage' without defining the term or distinguishing it from empirical coverage; a precise definition should be added in the evaluation section.
  2. [Formal model] Notation for the LTS (states, labels, trust-boundary annotations) is introduced but not used in any subsequent figure or proof sketch; either remove unused formalism or provide at least one worked example.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which help clarify the connections between our contributions. We address each major point below and will revise the manuscript to improve rigor, explicitness, and linkage between sections.

read point-by-point responses
  1. Referee: Evaluation section (comparative analysis of defenses): The headline result that no existing defense exceeds 34% coverage while MCPSHIELD reaches 91% theoretical coverage is presented as the outcome of systematic evaluation against the 7-category/23-vector taxonomy. These percentages are obtained via manual, unweighted checklist mapping of defenses to attack vectors rather than any quantitative aggregation, reachability analysis, or state-space coverage metric computed from the labeled transition system model with trust-boundary annotations. Consequently the 91% figure does not inherit soundness from the formal model and remains sensitive to arbitrary decisions about what counts as 'covering' a vector.

    Authors: We agree that the 34%/91% figures result from manual mapping rather than LTS-derived metrics. The LTS model supports verification of individual tool chains, while the comparative evaluation is a separate high-level assessment of defense coverage against the taxonomy. In revision we will add an explicit subsection clarifying this distinction, stating that the 91% is a theoretical bound based on the integrated architecture's ability to address each vector, and include a sensitivity discussion of mapping criteria. We will also outline how LTS reachability could be used in future work to compute quantitative coverage. revision: yes

  2. Referee: Formal verification model section: The LTS model is introduced as enabling static and runtime analysis of MCP tool chains, yet the paper provides no concrete example of its application to any of the 23 attack vectors, no definition of coverage or reachability metrics within the LTS, and no demonstration that the 91% figure follows from the model. This leaves the formal contribution disconnected from the central comparative claim.

    Authors: We acknowledge the absence of a concrete LTS application example and explicit metrics. The LTS is designed to model tool interaction chains with trust boundaries for detecting violations, but we did not demonstrate it on the taxonomy vectors. In the revised manuscript we will add an illustrative example applying the LTS to at least one attack vector (e.g., tool injection), define reachability and coverage metrics within the LTS, and explicitly state that the 91% coverage claim is independent of the LTS and arises from the defense-in-depth architecture. revision: yes

  3. Referee: Threat taxonomy section: The claim that the taxonomy is 'grounded in the analysis of over 177,000 MCP tools' is central to the validity of the coverage percentages, but the paper does not describe the derivation process (e.g., how vectors were extracted, whether coverage was validated against real tool logs, or any inter-rater reliability for the 7 categories). Without this, it is impossible to assess whether the taxonomy accurately reflects real-world risks or whether the 34%/91% comparison is robust.

    Authors: We agree that the derivation methodology must be described. The taxonomy was constructed by reviewing tool descriptions, API schemas, and vulnerability reports from the MCP registry; vectors were extracted by identifying misuse patterns across tool categories. In revision we will insert a dedicated subsection detailing the process: categorization criteria, examples of vector extraction, validation steps against sample tool logs, and the collaborative author review used in place of formal inter-rater statistics. This will allow readers to evaluate the taxonomy's grounding and the robustness of the coverage comparison. revision: yes

Circularity Check

0 steps flagged

No significant circularity; taxonomy and coverage claims rest on external tool analysis and manual evaluation.

full rationale

The paper derives its threat taxonomy from analysis of over 177,000 external MCP tools and presents the 91% coverage figure as the outcome of a systematic comparative evaluation of 12 existing defenses against that taxonomy. The formal LTS model is described only as enabling future static/runtime analysis rather than as the source of the reported percentages. No equations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided text; the central claims remain independent of any self-referential reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no specific free parameters, axioms, or invented entities are described in the provided text.

pith-pipeline@v0.9.0 · 5603 in / 1213 out tokens · 93419 ms · 2026-05-10T19:34:38.850716+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

40 extracted references · 14 canonical work pages · 1 internal anchor

  1. [1]

    ReAct: Synergizing reasoning and acting in language models,

    S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y . Cao, “ReAct: Synergizing reasoning and acting in language models,” inProceedings of the International Conference on Learning Representations, 2023

  2. [2]

    Toolformer: Language models can teach themselves to use tools,

    T. Schick, J. Dwivedi-Yu, R. Dessi, R. Raileanu, M. Lomeli, E. Hambro, L. Zettlemoyer, N. Cancedda, and T. Scialom, “Toolformer: Language models can teach themselves to use tools,” inAdvances in Neural Information Processing Systems, vol. 36, 2023

  3. [3]

    Agentic AI: A compre- hensive survey of architectures, applications, and future directions,

    M. Abou Ali and F. Dornaika, “Agentic AI: A compre- hensive survey of architectures, applications, and future directions,”Artificial Intelligence Review, 2025

  4. [4]

    AI agents under threat: A survey of key se- curity challenges and future pathways,

    Z. Deng, Y . Guo, C. Han, W. Ma, J. Xiong, S. Wen, and Y . Xiang, “AI agents under threat: A survey of key se- curity challenges and future pathways,”ACM Computing Surveys, 2025

  5. [5]

    Introducing the Model Context Protocol,

    Anthropic, “Introducing the Model Context Protocol,” https://www.anthropic.com/news/ model-context-protocol, 2024, accessed: 2026-04- 01

  6. [6]

    Model Context Protocol specification (version 2025-11-25),

    ——, “Model Context Protocol specification (version 2025-11-25),” https://modelcontextprotocol. io/specification/2025-11-25, 2025, accessed: 2026-04- 01

  7. [7]

    How are AI agents used? Evidence from 177,000 MCP tools,

    M. Stein, “How are AI agents used? Evidence from 177,000 MCP tools,” arXiv preprint arXiv:2603.23802, 2026

  8. [8]

    Linux Foundation announces the formation of the Agentic AI Founda- tion,

    Linux Foundation, “Linux Foundation announces the formation of the Agentic AI Founda- tion,” https://www.linuxfoundation.org/press/ linux-foundation-announces-the-formation-of-the-agentic-ai-foundation, 2025, accessed: 2026-04-01

  9. [9]

    MCP security notification: Tool poisoning attacks,

    Invariant Labs, “MCP security notification: Tool poisoning attacks,” https://invariantlabs.ai/blog/ mcp-security-notification-tool-poisoning-attacks, 2025, accessed: 2026-04-01

  10. [10]

    Mcptox: A benchmark for tool poisoning attack on real- world mcp servers.arXiv preprint arXiv:2508.14925, 2025b

    Z. Wang, Y . Gao, Y . Wang, S. Liu, H. Sun, H. Cheng, G. Shi, H. Du, and X. Li, “MCPTox: A benchmark for tool poisoning attack on real-world MCP servers,” arXiv preprint arXiv:2508.14925, 2025

  11. [11]

    ETDI: Mitigat- ing tool squatting and rug pull attacks in Model Context Protocol (MCP) by using OAuth-enhanced tool defini- tions and policy-based access control,

    M. Bhatt, V . S. Narajala, and I. Habler, “ETDI: Mitigat- ing tool squatting and rug pull attacks in Model Context Protocol (MCP) by using OAuth-enhanced tool defini- tions and policy-based access control,” arXiv preprint arXiv:2506.01333, 2025

  12. [12]

    Log-to-leak: Prompt injection attacks on tool-using LLM agents via Model Context Protocol,

    Anonymous, “Log-to-leak: Prompt injection attacks on tool-using LLM agents via Model Context Protocol,” OpenReview, 2025, available at https://openreview.net/ forum?id=UVgbFuXPaO

  13. [13]

    MCPSecBench: A Systematic Security Benchmark and Playground for Testing Model Context Protocols,

    Y . Yang, C. Gao, D. Wu, Y . Chen, Y . Li, and S. Wang, “MCPSecBench: A systematic security benchmark and playground for testing Model Context Protocols,” arXiv preprint arXiv:2508.13220, 2025

  14. [14]

    Mcp-safetybench: A benchmark for safety evaluation of large language models with real-world mcp servers,

    X. Zong, Z. Shen, L. Wang, Y . Lan, and C. Yang, “MCP- SafetyBench: A benchmark for safety evaluation of large language models with real-world MCP servers,” arXiv preprint arXiv:2512.15163, 2025

  15. [15]

    MCP-Guard: A multi-stage defense-in-depth framework for securing Model Context Protocol in agentic AI,

    W. Xing, Z. Qi, Y . Qin, Y . Li, C. Chang, J. Yu, C. Lin, Z. Xie, and M. Han, “MCP-Guard: A multi-stage defense-in-depth framework for securing Model Context Protocol in agentic AI,” arXiv preprint arXiv:2508.10991, 2025

  16. [16]

    Mcpguard: Auto- matically detecting vulnerabilities in mcp servers.arXiv preprint arXiv:2510.23673, 2025

    B. Wang, Z. Liu, H. Yu, A. Yang, Y . Huang, J. Guo, H. Cheng, H. Li, and H. Wu, “MCPGuard: Automatically detecting vulnerabilities in MCP servers,” arXiv preprint arXiv:2510.23673, 2025

  17. [17]

    Enterprise-grade security for the Model Context Protocol (MCP): Frameworks and mitigation strategies,

    V . S. Narajala and I. Habler, “Enterprise-grade security for the Model Context Protocol (MCP): Frameworks and mitigation strategies,” arXiv preprint arXiv:2504.08623, 2025

  18. [18]

    Model Context Protocol (MCP): Landscape, security threats, and future research directions,

    X. Hou, Y . Zhao, S. Wang, and H. Wang, “Model Context Protocol (MCP): Landscape, security threats, and future research directions,”ACM Transactions on Software Engineering and Methodology, 2026

  19. [19]

    arXiv preprint arXiv:2505.02279 , year =

    A. Ehtesham, A. Singh, G. K. Gupta, and S. Kumar, “A survey of agent interoperability protocols: Model Context Protocol (MCP), Agent Communication Protocol (ACP), Agent-to-Agent Protocol (A2A), and Agent Network Protocol (ANP),” arXiv preprint arXiv:2505.02279, 2025

  20. [20]

    Agent2Agent Protocol (A2A),

    Google, “Agent2Agent Protocol (A2A),” https://a2a-protocol.org/latest/, 2025, accessed: 2026-04- 01

  21. [21]

    Securing the Model Context Protocol: Defending LLMs against tool poisoning and adversarial attacks,

    S. Jamshidi, K. W. Nafi, A. M. Dakhel, N. Sha- habi, F. Khomh, and N. Ezzati-Jivan, “Securing the Model Context Protocol: Defending LLMs against tool poisoning and adversarial attacks,” arXiv preprint arXiv:2512.06556, 2025

  22. [22]

    Secure tool manifest and digital signing solution for verifiable MCP and LLM pipelines,

    S. Jamshidi, K. W. Nafi, A. M. Dakhel, F. Khomh, 11 A. Nikanjam, and M. A. Hamdaqa, “Secure tool manifest and digital signing solution for verifiable MCP and LLM pipelines,” arXiv preprint arXiv:2601.23132, 2026

  23. [23]

    MCPS — MCP Secure: Cryptographic identity, message signing, and trust verification for the Model Context Protocol,

    R. Sharif, “MCPS — MCP Secure: Cryptographic identity, message signing, and trust verification for the Model Context Protocol,” https://mcp-secure.dev/, 2026, iETF Internet-Draft draft-sharif-mcps-secure-mcp-

  24. [24]

    Accessed: 2026-04-01

  25. [25]

    The emerged security and privacy of LLM agent: A survey with case studies,

    F. He, T. Zhu, D. Ye, B. Liu, W. Zhou, and P. S. Yu, “The emerged security and privacy of LLM agent: A survey with case studies,”ACM Computing Surveys, 2025

  26. [26]

    Se- curity of LLM-based agents regarding attacks, defenses, and applications: A comprehensive survey,

    Y . Tang, Y . Liu, J. Lan, Z. Yan, and E. Gelenbe, “Se- curity of LLM-based agents regarding attacks, defenses, and applications: A comprehensive survey,”Information Fusion, vol. 127, 2026

  27. [27]

    SoK: Attack Surface of Agentic AI,

    A. Dehghantanha and S. Homayoun, “SoK: The attack surface of agentic AI — tools, and autonomy,” arXiv preprint arXiv:2603.22928, 2026

  28. [28]

    OW ASP top 10 for large language model applications (version 2025),

    S. Wilson and A. Dawson, “OW ASP top 10 for large language model applications (version 2025),” https://genai.owasp.org/resource/ owasp-top-10-for-llm-applications-2025/, 2025, accessed: 2026-04-01

  29. [29]

    Shostack,Threat Modeling: Designing for Security

    A. Shostack,Threat Modeling: Designing for Security. Wiley, 2014

  30. [30]

    SoK: Taxonomy of attacks on open-source software supply chains,

    P. Ladisa, H. Plate, M. Martinez, and O. Barais, “SoK: Taxonomy of attacks on open-source software supply chains,” inProceedings of the IEEE Symposium on Security and Privacy. IEEE, 2023

  31. [31]

    Prompt injection attacks in large language models and AI agent systems: A comprehensive review of vulnera- bilities, attack vectors, and defense mechanisms,

    S. Gulyamov, S. Gulyamov, A. Rodionov, R. Khur- sanov, K. Mekhmonov, D. Babaev, and A. Rakhimjonov, “Prompt injection attacks in large language models and AI agent systems: A comprehensive review of vulnera- bilities, attack vectors, and defense mechanisms,”Infor- mation, vol. 17, no. 1, p. 54, 2026

  32. [32]

    From prompt injections to protocol exploits: Threats in LLM-powered AI agents workflows,

    M. A. Ferrag, N. Tihanyi, D. Hamouda, L. Maglaras, A. Lakas, and M. Debbah, “From prompt injections to protocol exploits: Threats in LLM-powered AI agents workflows,”ICT Express, 2025

  33. [33]

    A lattice model of secure information flow,

    D. E. Denning, “A lattice model of secure information flow,”Communications of the ACM, vol. 19, no. 5, pp. 236–243, 1976

  34. [34]

    Enforceable security policies,

    F. B. Schneider, “Enforceable security policies,”ACM Transactions on Information and System Security, vol. 3, no. 1, pp. 30–50, 2000

  35. [35]

    Edit automata: Enforcement mechanisms for run-time security policies,

    J. Ligatti, L. Bauer, and D. Walker, “Edit automata: Enforcement mechanisms for run-time security policies,” International Journal of Information Security, vol. 4, no. 1–2, pp. 2–16, 2005

  36. [36]

    Introducing the Agent Governance Toolkit: Open-source runtime security for AI agents,

    Microsoft, “Introducing the Agent Governance Toolkit: Open-source runtime security for AI agents,” https://opensource.microsoft.com/blog/2026/04/02/ introducing-the-agent-governance-toolkit-open-source-runtime-security-for-ai-agents/, 2026, accessed: 2026-04-04

  37. [37]

    Zero trust architecture,

    S. Rose, O. Borchert, S. Mitchell, and S. Connelly, “Zero trust architecture,” National Institute of Standards and Technology, Tech. Rep. SP 800-207, 2020

  38. [38]

    Programming se- mantics for multiprogrammed computations,

    J. B. Dennis and E. C. Van Horn, “Programming se- mantics for multiprogrammed computations,”Communi- cations of the ACM, vol. 9, no. 3, pp. 143–155, 1966

  39. [39]

    Security Considerations for Multi-agent Systems

    T. Nguyen, M. Ndebugre, and D. Arremsetty, “Security considerations for multi-agent systems,” arXiv preprint arXiv:2603.09002, 2026

  40. [40]

    International ai safety report 2026.arXiv preprint arXiv:2602.21012, 2026

    Y . Bengioet al., “International AI safety report 2026,” arXiv preprint arXiv:2602.21012, 2026