pith. machine review for the scientific record. sign in

arxiv: 2604.17125 · v1 · submitted 2026-04-18 · 💻 cs.CR · cs.AI

Recognition: unknown

CASCADE: A Cascaded Hybrid Defense Architecture for Prompt Injection Detection in MCP-Based Systems

Authors on Pith no claims yet

Pith reviewed 2026-05-10 06:03 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords prompt injection detectionModel Context ProtocolMCPcascaded defensetool poisoninglocal LLM securitysemantic analysishybrid detection
0
0 comments X

The pith

A three-layer cascaded defense detects prompt injection attacks in MCP-based LLM applications with high precision while running entirely locally.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents CASCADE as a hybrid defense architecture for prompt injection detection in Model Context Protocol based systems. It combines pre-filtering with regex and entropy measures, semantic analysis using embeddings and a local language model, and output filtering. This design aims to overcome limitations of existing defenses like high false positives or reliance on cloud services. Evaluation shows strong performance on many attack types while operating entirely locally, which matters for secure deployment of tool-using AI agents.

Core claim

CASCADE is a cascaded hybrid defense architecture consisting of three layers for MCP-based systems. The first layer performs fast pre-filtering using regex, phrase weighting, and entropy analysis. The second layer conducts semantic analysis with BGE embeddings and an Ollama Llama3 fallback. The third layer applies pattern-based output filtering. On a 5,000-sample dataset, it achieves 95.85% precision, 6.06% false positive rate, 61.05% recall, and 74.59% F1-score, with high detection for data exfiltration and prompt injection attacks.

What carries the argument

The three-tiered cascaded architecture that progressively applies statistical pre-filters, semantic embeddings, and pattern matching to identify attacks.

If this is right

  • Strong detection of data exfiltration attacks at 91.5% and prompt injection at 84.2% follows directly from the multi-layer checks.
  • Fully local operation without external API calls distinguishes it from prior solutions and supports privacy-preserving use.
  • Lower detection rates for semantic attacks at 52.5% and tool poisoning at 59.9% point to specific categories needing further development in the cascade.
  • The overall metrics indicate a viable balance for practical deployment in MCP environments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Adopting CASCADE could allow MCP tool developers to add defense as a standard component without depending on third-party services.
  • Enhancing the semantic layer with additional local models might address the weaker performance on certain attack types.
  • Applying the same cascaded approach to other LLM tool protocols could extend its utility beyond MCP.
  • Validating performance on live traffic from MCP implementations would test if the results hold in practice.

Load-bearing premise

The 5,000 samples and 31 attack types in the evaluation dataset represent the real-world attacks that occur in deployed MCP-based systems.

What would settle it

Substantially lower precision or higher false positive rates when evaluated on a fresh dataset of MCP interactions or different attack implementations would disprove the effectiveness claim.

Figures

Figures reproduced from arXiv: 2604.17125 by Edip G\"um\"u\c{s}, \.Ipek Abas{\i}kele\c{s} Turgut.

Figure 1
Figure 1. Figure 1: Traditional LLM Architecture most serious threats to MCP-based systems is tool poison￾ing, classified by OWASP as MCP03:2025 [4]. 1.2. Problem Statement [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: MCP-based System updates to trusted tools), schema poisoning (corruption of interface definitions), and tool shadowing (introduction of fake or duplicate tools) [4]. Studies have demonstrated attack success rates of up to 72.8% across 20 different LLM agents [6]. Additionally, indirect injection attacks, where malicious instructions are hidden within tool outputs, can be processed by the model and cause ha… view at source ↗
Figure 3
Figure 3. Figure 3: The CASCADE architecture etc.). To reduce false positives, weak signals are suppressed when benign workflow patterns (debug, fix, explain, etc.) are detected. The risk score is calculated on a 0–100 scale, with base scores defined for each category (e.g., prompt_injection: 90, tool_abuse: 85, data_exfiltration: 84). Inputs with a risk score exceeding 50 or with a detected injection mode are blocked with L1… view at source ↗
read the original abstract

Model Context Protocol (MCP) is a rapidly adopted standard for defining and invoking external tools in LLM applications. The multi-layered architecture of MCP introduces new attack surfaces such as tool poisoning, in addition to traditional prompt injection. Existing defense systems suffer from limitations including high false positive rates, API dependency, or white-box access requirements. In this study, we propose CASCADE, a three-tiered cascaded defense architecture for MCP-based systems: (i) Layer 1 performs fast pre-filtering using regex, phrase weighting, and entropy analysis; (ii) Layer 2 conducts semantic analysis via BGE embedding with an Ollama Llama3 fallback mechanism; (iii) Layer 3 applies pattern-based output filtering. Evaluation on a dataset of 5,000 samples yielded 95.85% precision, 6.06% false positive rate, 61.05% recall, and 74.59% F1-score. Analysis across 31 attack types categorized into 6 tiers revealed high detection rates for data exfiltration (91.5%) and prompt injection (84.2%), while semantic attack (52.5%) and tool poisoning (59.9%) categories showed potential for improvement. A key advantage of CASCADE over existing solutions is its fully local operation, requiring no external API calls

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes CASCADE, a cascaded hybrid defense architecture consisting of three layers for detecting prompt injection and related attacks in systems using the Model Context Protocol (MCP). The first layer uses regex, phrase weighting, and entropy analysis for pre-filtering; the second employs BGE embeddings for semantic analysis with an Ollama Llama3 fallback; and the third applies pattern-based output filtering. The authors evaluate this on a dataset of 5,000 samples covering 31 attack types in 6 tiers, reporting 95.85% precision, 6.06% false positive rate, 61.05% recall, and 74.59% F1-score, with stronger performance on data exfiltration and prompt injection but weaker on semantic attacks and tool poisoning. The system is highlighted for its fully local operation without external API calls.

Significance. Assuming the evaluation is sound, this work contributes a practical defense mechanism for emerging MCP-based LLM applications, addressing both traditional prompt injection and new threats like tool poisoning through a hybrid local approach. The cascaded design promotes efficiency by handling simple cases quickly and escalating complex ones, which could be valuable for real-time systems. The per-category analysis provides useful diagnostic information on where the defense succeeds or needs enhancement.

major comments (3)
  1. [Abstract] The central performance claims rely on a 5,000-sample dataset, yet the manuscript provides no information on dataset construction, including benign sample sources, class balance, how the 31 attack types were generated or instantiated (particularly for semantic and tool poisoning categories), or any validation of coverage. This directly affects the reliability of the reported metrics such as the 61.05% recall and the tier-specific rates (e.g., 52.5% for semantic attacks).
  2. [Abstract] No baseline comparisons or statistical significance tests are mentioned in the evaluation summary. Without these, it is not possible to determine if the 74.59% F1-score represents a meaningful improvement over existing defenses or if the results are robust.
  3. [Abstract] The low recall (61.05%) and F1-score, combined with notably lower detection rates for tool poisoning (59.9%) and semantic attacks (52.5%), suggest that the cascaded architecture may not adequately cover all MCP-introduced attack surfaces. This challenges the claim of a comprehensive defense and indicates potential load-bearing weaknesses in Layers 2 and 3 for certain attack types.
minor comments (2)
  1. The abstract references '6 tiers' for the 31 attack types but does not define or list them; this should be clarified in the main text for reproducibility.
  2. Consider adding details on the specific regex patterns, entropy thresholds, and embedding similarity thresholds used in each layer to allow replication.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We are grateful to the referee for the insightful comments that will help improve the clarity and rigor of our work on the CASCADE defense architecture. We address each major comment below and commit to making the necessary revisions to the manuscript.

read point-by-point responses
  1. Referee: [Abstract] The central performance claims rely on a 5,000-sample dataset, yet the manuscript provides no information on dataset construction, including benign sample sources, class balance, how the 31 attack types were generated or instantiated (particularly for semantic and tool poisoning categories), or any validation of coverage. This directly affects the reliability of the reported metrics such as the 61.05% recall and the tier-specific rates (e.g., 52.5% for semantic attacks).

    Authors: We agree that the manuscript would benefit from greater transparency on dataset construction. The current text references the 5,000 samples and 31 attack types across 6 tiers but omits the requested details. In the revision we will add an explicit 'Dataset Construction' subsection describing: benign samples drawn from public LLM interaction corpora; class balance (approximately 2,000 benign / 3,000 malicious); generation procedures (template-driven for prompt injection, LLM-assisted paraphrasing for semantic attacks, and manually crafted poisoned tool schemas for tool poisoning); and validation via expert review of a 500-sample subset. These additions will directly support interpretation of the 61.05% recall and tier-specific figures. revision: yes

  2. Referee: [Abstract] No baseline comparisons or statistical significance tests are mentioned in the evaluation summary. Without these, it is not possible to determine if the 74.59% F1-score represents a meaningful improvement over existing defenses or if the results are robust.

    Authors: We acknowledge the value of baselines and statistical tests. The revised manuscript will include side-by-side evaluation against two adapted baselines—a standalone BGE embedding detector and a regex-plus-entropy filter—run on the identical 5,000-sample set. We will also report McNemar’s test results for F1-score differences across tiers. These additions will demonstrate that the reported 74.59% F1 constitutes a meaningful gain, particularly under the constraint of fully local operation. We will note that MCP-specific attacks such as tool poisoning lack prior public baselines. revision: yes

  3. Referee: [Abstract] The low recall (61.05%) and F1-score, combined with notably lower detection rates for tool poisoning (59.9%) and semantic attacks (52.5%), suggest that the cascaded architecture may not adequately cover all MCP-introduced attack surfaces. This challenges the claim of a comprehensive defense and indicates potential load-bearing weaknesses in Layers 2 and 3 for certain attack types.

    Authors: The referee correctly highlights a genuine limitation. CASCADE is not presented as a fully comprehensive solution; the manuscript already notes weaker results on semantic and tool-poisoning categories and frames the work as a practical, low-FPR hybrid. The 6.06% false-positive rate is a deliberate design choice for real-time MCP usability. In revision we will temper abstract and conclusion language to avoid any implication of complete coverage, explicitly discuss the load on Layers 2 and 3 for these attack classes, and outline targeted future improvements such as embedding fine-tuning. This accurately reflects the contribution without overstating scope. revision: partial

Circularity Check

0 steps flagged

No significant circularity; purely empirical evaluation

full rationale

The paper describes a three-layer cascaded defense architecture (regex pre-filter, semantic embedding analysis, output filtering) and reports direct performance metrics from evaluating it on a fixed 5,000-sample dataset across 31 attack types. No equations, fitted parameters, predictions, or derivations are present that could reduce to the authors' own inputs or prior choices by construction. The reported precision, recall, and F1 values are presented as straightforward evaluation outcomes rather than self-referential results. Self-citations are absent from the provided text, and the central claims rest on external dataset testing rather than internal definitions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an applied engineering paper with no mathematical model, free parameters, axioms, or invented theoretical entities; the contribution rests on the empirical performance of the described detection pipeline.

pith-pipeline@v0.9.0 · 5550 in / 1408 out tokens · 45596 ms · 2026-05-10T06:03:27.554420+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

23 extracted references · 14 canonical work pages · 2 internal anchors

  1. [1]

    X. Hou, Y. Zhao, S. Wang, H. Wang, Model context protocol (mcp): Landscape, security threats, and future research directions, ACM Transactions on Software Engineering and Methodology (2025)

  2. [2]

    M. M. Hasan, H. Li, E. Fallahzadeh, G. K. Rajbahadur, B. Adams, A. E. Hassan, Model context protocol (mcp) at first glance: Study- ing the security and maintainability of mcp servers, arXiv preprint arXiv:2506.13538 (2025)

  3. [3]

    Gulyamov, S

    S. Gulyamov, S. Gulyamov, A. Rodionov, R. Khursanov, K. Mekhmonov, D. Babaev, A. Rakhimjonov, Prompt injection attacks in large language models and ai agent systems: A comprehensive review of vulnerabilities, attack vectors, and defense mechanisms, Information 17 (1) (2026) 54

  4. [4]

    OWASP,OWASPTop10forModelContextProtocol(MCP),https: //owasp.org/www-project-mcp-top-10/, accessed: 2026-04-18 (2025)

  5. [5]

    Huang, X

    C. Huang, X. Huang, N. P. Tran, A. M. Fard, Model context protocol threatmodelingandanalyzingvulnerabilitiestopromptinjectionwith tool poisoning, arXiv preprint arXiv:2603.22489 (2026)

  6. [6]

    Z. Wang, Y. Gao, Y. Wang, S. Liu, H. Sun, H. Cheng, G. Shi, H. Du, X. Li, Mcptox: A benchmark for tool poisoning attack on real-world mcp servers, arXiv preprint arXiv:2508.14925 (2025)

  7. [7]

    Y. Hu, C. Fan, S. Samyoun, J. Du, Log-to-leak: Prompt injection attacks on tool-using llm agents via model context protocol (2025)

  8. [8]

    Securing the Model Context Protocol: Defending LLMs against tool poisoning and adversarial attacks,

    S. Jamshidi, K. W. Nafi, A. M. Dakhel, N. Shahabi, F. Khomh, N. Ezzati-Jivan, Securing the model context protocol: Defending llms against tool poisoning and adversarial attacks, arXiv preprint arXiv:2512.06556 (2025)

  9. [9]

    Z. Wang, J. Zhang, G. Shi, H. Cheng, Y. Yao, K. Guo, H. Du, X.-Y. Li, Mindguard: Tracking, detecting, and attributing mcp tool poisoning attack via decision dependence graph, arXiv preprint arXiv:2508.20412 (2025)

  10. [10]

    Y. Liu, G. Deng, Y. Li, K. Wang, Z. Wang, X. Wang, T. Zhang, Y. Liu, H. Wang, Y. Zheng, et al., Prompt injection attack against llm-integratedapplications,arXivpreprintarXiv:2306.05499(2023)

  11. [11]

    OWASP, OWASP Top 10 for Large Lan- guage Model Applications,https://owasp.org/ www-project-top-10-for-large-language-model-applications/, accessed: 2026-04-18 (2025)

  12. [12]

    D. Lee, M. Tiwari, Prompt infection: Llm-to-llm prompt injection withinmulti-agentsystems,arXivpreprintarXiv:2410.07283(2024)

  13. [13]

    J. Shi, Z. Yuan, Y. Liu, Y. Huang, P. Zhou, L. Sun, N. Z. Gong, Optimization-based prompt injection attack to llm-as-a-judge, in: Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security, 2024, pp. 660–674

  14. [14]

    Debenedetti, J

    E. Debenedetti, J. Zhang, M. Balunovic, L. Beurer-Kellner, M. Fis- cher, F. Tramèr, Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents, Advances in Neural Information Processing Systems 37 (2024) 82895–82920

  15. [15]

    Suo, Signed-prompt: A new approach to prevent prompt injection attacks against llm-integrated applications, in: AIP Conference Pro- ceedings, Vol

    X. Suo, Signed-prompt: A new approach to prevent prompt injection attacks against llm-integrated applications, in: AIP Conference Pro- ceedings, Vol. 3194, AIP Publishing LLC, 2024, p. 040013

  16. [16]

    X. Zong, Z. Shen, L. Wang, Y. Lan, C. Yang, Mcp-safetybench: A benchmark for safety evaluation of large language models with real- world mcp servers, arXiv preprint arXiv:2512.15163 (2025)

  17. [17]

    Breaking the protocol: Security analysis of the model context protocol specification and prompt injection vulnerabilities in tool-integrated LLM agents,

    N. Maloyan, D. Namiot, Breaking the protocol: Security analy- sis of the model context protocol specification and prompt injec- tion vulnerabilities in tool-integrated llm agents, arXiv preprint arXiv:2601.17549 (2026)

  18. [18]

    Mcp safety audit: Llms with the model context protocol allow major security exploits

    B. Radosevich, J. Halloran, Mcp safety audit: Llms with the model context protocol allow major security exploits, 2025, URL https://arxiv. org/abs/2504.03767

  19. [19]

    R. Li, Z. Wang, Y. Yao, X.-Y. Li, Mcp-itp: An automated framework for implicit tool poisoning in mcp, arXiv preprint arXiv:2601.07395 (2026)

  20. [20]

    Siameh, A

    T. Siameh, A. A. Addobea, C.-H. Liu, Context injection vulnera- bilities and resource exploitation attacks in model context protocol, Authorea Preprints (2025)

  21. [21]

    Z. Zhou, Y. Zhang, H. Cai, M. Aloqaily, O. Bouachir, L. Pang, P.Mehrotra,K.Wang,Q.Wen,Mcpshield:Asecuritycognitionlayer for adaptive trust calibration in model context protocol agents, arXiv preprint arXiv:2602.14281 (2026)

  22. [22]

    S.Kumar,A.Girdhar,R.Patil,D.Tripathi,Mcpguardian:Asecurity- first layer for safeguarding mcp-based ai system, arXiv preprint arXiv:2504.12757 (2025)

  23. [23]

    W. Xing, Z. Qi, Y. Qin, Y. Li, C. Chang, J. Yu, C. Lin, Z. Xie, M. Han, Mcp-guard: A defense framework for model context pro- tocol integrity in large language model applications, arXiv preprint arXiv:2508.10991 (2025). İ. Abasıkeleş-Turgut and E. Gümüş:PreprintPage 8 of 8