pith. machine review for the scientific record. sign in

arxiv: 2604.04426 · v1 · submitted 2026-04-06 · 💻 cs.AI

Recognition: no theorem link

ShieldNet: Network-Level Guardrails against Emerging Supply-Chain Injections in Agentic Systems

Authors on Pith no claims yet

Pith reviewed 2026-05-10 20:23 UTC · model grok-4.3

classification 💻 cs.AI
keywords supply chain attacksLLM agentsnetwork guardrailsMCP toolsMITM proxyattack detectionagent securitytool poisoning
0
0 comments X

The pith

ShieldNet detects supply-chain injections in LLM agent tools by monitoring network traffic instead of inspecting tool code.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that supply-chain attacks embedded in third-party tools for LLM agents can be caught by observing the network calls those tools make during execution. Existing scanners and semantic guardrails miss most of these attacks because they rely on surface descriptions or code traces that attackers can hide. ShieldNet inserts a man-in-the-middle proxy to capture real network interactions, extracts distinguishing events from the traffic, and feeds them to a lightweight classifier. This yields up to 0.995 F1 detection with 0.8 percent false positives and negligible added latency on a benchmark of over 10,000 malicious tools covering 25 attack types. The approach matters because agents are adopting more external tools, creating new vectors for silent data leaks and unauthorized actions that code-level checks cannot reliably block.

Core claim

ShieldNet is a network-level guardrail that places a MITM proxy between an agent and external services to log interactions, extracts critical network behaviors via an event extractor, and classifies them with a lightweight model to identify supply-chain poisoning. On the new SC-Inject-Bench containing over 10,000 malicious MCP tools derived from a 25-plus attack taxonomy, it reaches 0.995 F1 with 0.8 percent false positives, low runtime overhead, and clear outperformance over prior MCP scanners and LLM-based guardrails.

What carries the argument

A man-in-the-middle (MITM) proxy that intercepts and logs network traffic from agent-tool interactions, combined with an event extractor and lightweight classifier that turns observed behaviors into attack detections without access to tool source code or internal traces.

If this is right

  • Agent systems can safely incorporate third-party tools without needing to audit their internal code.
  • Security emphasis moves from static tool inspection to runtime observation of network effects.
  • Detection remains effective across dozens of attack types while adding almost no execution delay.
  • The method provides a practical fallback when prompt-based or semantic guardrails fail on hidden tool behavior.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Tool marketplaces may eventually require network-behavior auditing as a condition for listing.
  • The same proxy approach could be extended to detect other runtime anomalies beyond supply-chain poisoning.
  • Production deployments would need to handle encrypted traffic and dynamic service endpoints.
  • Combining network signals with occasional semantic checks could raise detection further without much extra cost.

Load-bearing premise

Real-world supply-chain attacks will produce distinguishable network behaviors that a proxy can capture and a classifier can learn even when the tool's source code and execution traces remain hidden.

What would settle it

A controlled test on previously unseen real-world supply-chain attacks where ShieldNet's F1 score drops below 0.9 or its false-positive rate exceeds 5 percent would show the network-behavior signal does not generalize.

Figures

Figures reproduced from arXiv: 2604.04426 by Bo Li, Chaowei Xiao, Nathaniel D. Bastian, Seyyed Hadi Hashemi, Wenbo Guo, Zhaorun Chen, Zhen Xiang, Zhuowen Yuan.

Figure 1
Figure 1. Figure 1: Comparison between semantic-level and network-level views of tool execution. Stealthy tool injection attacks [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of our network-based MCP guardrail. Raw packet traces and decrypted application-layer traffic [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: End-to-end data curation pipeline for SC-Inject-Bench. We construct a large-scale benchmark targeting diverse agent supply-chain threats, comprising over 10,000 malicious MCP tools grounded in 25+ attack types derived from MITRE ATT&CK. Specifically, we begin by collecting benign MCP servers, integrate verified malicious scripts into tool implementations, execute the tools under agent control, and verify a… view at source ↗
Figure 4
Figure 4. Figure 4: F-1 scores for multi-class detection. Our method achieves consistently strong performance across almost all [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Real-world streaming detection demo of ShieldNet. Claude Code interacts with injected MCP servers (left), while ShieldNet performs online network interception and sliding-window classification (right), visualizing structured network events and detection results in real time. Malicious tool executions are automatically identified during runtime based on network-level behavior. B.2 Real-World Deployment We d… view at source ↗
Figure 6
Figure 6. Figure 6: Training distribution of tools across MCP server categories. [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Training distribution of servers across MCP server categories. [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Training distribution of tools across MITRE attack techniques. [PITH_FULL_IMAGE:figures/full_fig_p019_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Training distribution of servers across MITRE attack techniques. [PITH_FULL_IMAGE:figures/full_fig_p019_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Test distribution of tools across MCP server categories. [PITH_FULL_IMAGE:figures/full_fig_p019_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Test distribution of servers across MCP server categories. [PITH_FULL_IMAGE:figures/full_fig_p020_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Test distribution of tools across MITRE attack techniques. [PITH_FULL_IMAGE:figures/full_fig_p020_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Test distribution of servers across MITRE attack techniques. [PITH_FULL_IMAGE:figures/full_fig_p020_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Token-length distribution of serialized network event traces across attack techniques and benign executions. [PITH_FULL_IMAGE:figures/full_fig_p021_14.png] view at source ↗
read the original abstract

Existing research on LLM agent security mainly focuses on prompt injection and unsafe input/output behaviors. However, as agents increasingly rely on third-party tools and MCP servers, a new class of supply-chain threats has emerged, where malicious behaviors are embedded in seemingly benign tools, silently hijacking agent execution, leaking sensitive data, or triggering unauthorized actions. Despite their growing impact, there is currently no comprehensive benchmark for evaluating such threats. To bridge this gap, we introduce SC-Inject-Bench, a large-scale benchmark comprising over 10,000 malicious MCP tools grounded in a taxonomy of 25+ attack types derived from MITRE ATT&CK targeting supply-chain threats. We observe that existing MCP scanners and semantic guardrails perform poorly on this benchmark. Motivated by this finding, we propose ShieldNet, a network-level guardrail framework that detects supply-chain poisoning by observing real network interactions rather than surface-level tool traces. ShieldNet integrates a man-in-the-middle (MITM) proxy and an event extractor to identify critical network behaviors, which are then processed by a lightweight classifier for attack detection. Extensive experiments show that ShieldNet achieves strong detection performance (up to 0.995 F-1 with only 0.8% false positives) while introducing little runtime overhead, substantially outperforming existing MCP scanners and LLM-based guardrails.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces SC-Inject-Bench, a benchmark of over 10,000 author-constructed malicious MCP tools based on a 25+ type MITRE ATT&CK-derived taxonomy for supply-chain attacks against LLM agents. It proposes ShieldNet, a network-level guardrail that deploys a MITM proxy and event extractor to capture network behaviors, followed by a lightweight classifier for detection. The central claim is that ShieldNet achieves up to 0.995 F1 with 0.8% false positives, low runtime overhead, and substantially outperforms existing MCP scanners and LLM-based guardrails.

Significance. If the empirical results are robust, the work would fill a gap by providing the first large-scale benchmark and a practical network-observation defense for supply-chain threats in agentic systems, where current research focuses mainly on prompt injection. The benchmark construction from an established taxonomy is a strength, but its value hinges on whether the reported separability generalizes beyond the synthetic generation process.

major comments (2)
  1. [Abstract and §4] Abstract and §4 (Benchmark and Experiments): The headline performance numbers (0.995 F1, 0.8% FP) are reported exclusively on SC-Inject-Bench, whose malicious samples are generated by the authors to instantiate the taxonomy rather than drawn from independent real-world incidents. This raises the possibility that observed separability arises from implementation artifacts in the synthetic tools (e.g., distinctive API call patterns or data flows) rather than inherent, stealthy network signatures of actual supply-chain injections. A concrete test on externally sourced malicious MCP tools or an ablation showing that benign tools cannot replicate the extracted events is needed to support the general claim.
  2. [§3 and §5] §3 (ShieldNet Architecture) and §5 (Evaluation): No description is given of the precise features extracted by the event extractor, the classifier architecture or training procedure (including data splits, hyper-parameters, or loss), or how baselines were re-implemented and tuned. Without these details the reported superiority over MCP scanners and LLM guardrails cannot be verified or reproduced, undermining the central empirical claim.
minor comments (1)
  1. [Abstract] The abstract states 'extensive experiments' but supplies no table or figure reference for the exact F1/FP numbers or overhead measurements; adding a summary table in the abstract or §5 would improve clarity.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight important aspects of generalizability and reproducibility that we address below. We have revised the paper accordingly where feasible.

read point-by-point responses
  1. Referee: [Abstract and §4] Abstract and §4 (Benchmark and Experiments): The headline performance numbers (0.995 F1, 0.8% FP) are reported exclusively on SC-Inject-Bench, whose malicious samples are generated by the authors to instantiate the taxonomy rather than drawn from independent real-world incidents. This raises the possibility that observed separability arises from implementation artifacts in the synthetic tools (e.g., distinctive API call patterns or data flows) rather than inherent, stealthy network signatures of actual supply-chain injections. A concrete test on externally sourced malicious MCP tools or an ablation showing that benign tools cannot replicate the extracted events is needed to support the general claim.

    Authors: We acknowledge that SC-Inject-Bench relies on author-constructed samples derived from the MITRE ATT&CK taxonomy, as large-scale public datasets of real-world malicious MCP tools are not available for emerging supply-chain threats. To mitigate concerns about artifacts, we have added an ablation study in the revised §4.3 demonstrating that the network event features (e.g., specific flow statistics and timing patterns) cannot be easily replicated by benign tools with comparable functionality. We also expanded the limitations section to explicitly discuss the synthetic nature of the benchmark and call for future real-world validation. However, a direct test on externally sourced malicious tools is not possible at this time. revision: partial

  2. Referee: [§3 and §5] §3 (ShieldNet Architecture) and §5 (Evaluation): No description is given of the precise features extracted by the event extractor, the classifier architecture or training procedure (including data splits, hyper-parameters, or loss), or how baselines were re-implemented and tuned. Without these details the reported superiority over MCP scanners and LLM guardrails cannot be verified or reproduced, undermining the central empirical claim.

    Authors: We apologize for the lack of these implementation details in the original submission. In the revised manuscript, §3 now includes a complete specification of the event extractor features (network flow metadata, API endpoint patterns, and payload size distributions), the classifier (a 3-layer MLP with ReLU activations), training procedure (80/10/10 train/validation/test splits, Adam optimizer with learning rate 1e-3, binary cross-entropy loss), and hyper-parameters. Section 5 has been updated with full details on baseline re-implementations, including hyper-parameter tuning protocols and source code references. These changes enable full reproducibility. revision: yes

standing simulated objections not resolved
  • We cannot conduct a concrete test on externally sourced real-world malicious MCP tools, as no large-scale public datasets of such incidents currently exist.

Circularity Check

0 steps flagged

No significant circularity in derivation chain.

full rationale

The paper introduces SC-Inject-Bench as an author-constructed benchmark and reports ShieldNet's detection performance (0.995 F1, 0.8% FP) as measured empirical results from training and evaluating a lightweight classifier on network traces extracted via MITM proxy. No equations, self-definitions, or fitted parameters are shown to reduce the performance metric to the benchmark inputs by construction. No self-citations are invoked as load-bearing uniqueness theorems, and the evaluation pipeline does not rename known results or smuggle ansatzes. The central claim is therefore an independent empirical measurement rather than a tautological restatement of the inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the unproven premise that network traffic patterns alone suffice to identify malicious tool behavior; no free parameters, axioms, or invented entities are explicitly stated in the abstract.

pith-pipeline@v0.9.0 · 5565 in / 1133 out tokens · 29796 ms · 2026-05-10T20:23:03.731556+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

29 extracted references · 10 canonical work pages

  1. [1]

    Prompt injection attack to tool selection in llm agents.arXiv preprint arXiv:2504.19793, 2025

    Jiawen Shi, Zenghui Yuan, Guiyao Tie, Pan Zhou, Neil Zhenqiang Gong, and Lichao Sun. Prompt injection attack to tool selection in llm agents.arXiv preprint arXiv:2504.19793, 2025

  2. [2]

    Attractive metadata attack: Inducing LLM agents to invoke malicious tools.arXiv preprint arXiv:2508.02110, 2025

    Kanghua Mo, Li Hu, Yucheng Long, and Zhihao Li. Attractive metadata attack: Inducing llm agents to invoke malicious tools.arXiv preprint arXiv:2508.02110, 2025

  3. [3]

    ToolTweak: An attack on tool selection in llm-based agents.arXiv preprint arXiv:2510.02554, 2025

    Jonathan Sneh, Ruomei Yan, Jialin Yu, Philip Torr, Yarin Gal, Sunando Sengupta, Eric Sommerlade, Alasdair Paren, and Adel Bibi. Tooltweak: An attack on tool selection in llm-based agents.arXiv preprint arXiv:2510.02554, 2025

  4. [4]

    Shieldagent: Shielding agents via verifiable safety policy reasoning

    Zhaorun Chen, Mintong Kang, and Bo Li. Shieldagent: Shielding agents via verifiable safety policy reasoning. arXiv preprint arXiv:2503.22738, 2025

  5. [5]

    mcp-scanner: Mcp server security scanner

    cisco-ai-defense. mcp-scanner: Mcp server security scanner. https://github.com/cisco-ai-defense/ mcp-scanner, 2025. GitHub repository

  6. [6]

    mcp-scan: Mcp security scanner

    invariantlabs-ai. mcp-scan: Mcp security scanner. https://github.com/invariantlabs-ai/mcp-scan,

  7. [7]

    MCPSecBench: A Systematic Security Benchmark and Playground for Testing Model Context Protocols,

    Yixuan Yang, Daoyuan Wu, and Yufan Chen. Mcpsecbench: A systematic security benchmark and playground for testing model context protocols.arXiv preprint arXiv:2508.13220, 2025

  8. [8]

    Mcptox: A benchmark for tool poisoning attack on real- world mcp servers.arXiv preprint arXiv:2508.14925, 2025b

    Zhiqiang Wang, Yichao Gao, Yanting Wang, Suyuan Liu, Haifeng Sun, Haoran Cheng, Guanquan Shi, Haohua Du, and Xiangyang Li. Mcptox: A benchmark for tool poisoning attack on real-world mcp servers.arXiv preprint arXiv:2508.14925, 2025

  9. [9]

    Mcp-safetybench: A benchmark for safety evaluation of large language models with real-world mcp servers,

    Xuanjun Zong, Zhiqi Shen, Lei Wang, Yunshi Lan, and Chao Yang. Mcp-safetybench: A benchmark for safety evaluation of large language models with real-world mcp servers.arXiv preprint arXiv:2512.15163, 2025

  10. [10]

    Sophie Zhang

    Dongsen Zhang, Zekun Li, Xu Luo, Xuannan Liu, Peipei Li, and Wenjun Xu. Mcp security bench (msb): Benchmarking attacks against model context protocol in llm agents.arXiv preprint arXiv:2510.15994, 2025

  11. [11]

    Mitre att&ck framework

    The MITRE Corporation. Mitre att&ck framework. https://attack.mitre.org/, 2024. Accessed May 2025

  12. [12]

    From allies to adversaries: Manipulating llm tool-calling through adversarial injection

    Rupeng Zhang, Haowei Wang, Junjie Wang, Mingyang Li, Yuekai Huang, Dandan Wang, and Qing Wang. From allies to adversaries: Manipulating llm tool-calling through adversarial injection. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Pap...

  13. [13]

    Securinggenaimulti-agentsys- tems against tool squatting: A zero trust registry-based approach.arXiv preprint arXiv:2504.19951, 2025

    Vineeth Sai Narajala, Ken Huang, and Idan Habler. Securing genai multi-agent systems against tool squatting: A zero trust registry-based approach.arXiv preprint arXiv:2504.19951, 2025

  14. [14]

    Advagent: Controllable blackbox red- teaming on web agents.arXiv preprint arXiv:2410.17401,

    Chejian Xu, Mintong Kang, Jiawei Zhang, Zeyi Liao, Lingbo Mo, Mengqi Yuan, Huan Sun, and Bo Li. Advweb: Controllable black-box attacks on vlm-powered web agents.arXiv preprint arXiv:2410.17401, 2024

  15. [15]

    ramparts: Fast rust mcp scanner

    highflame-ai. ramparts: Fast rust mcp scanner. https://github.com/highflame-ai/ramparts, 2025. GitHub repository

  16. [16]

    Suricata: Network ids/ips and security monitoring engine

    OISF. Suricata: Network ids/ips and security monitoring engine. https://github.com/OISF/suricata,

  17. [17]

    Safe- nid: Self-attention with normalizing-flow encodings for network intrusion detection.Transactions on Machine Learning Research, 2025

    Brian Matejek, Ashish Gehani, Nathaniel D Bastian, Daniel J Clouse, Bradford J Kline, and Susmit Jha. Safe- nid: Self-attention with normalizing-flow encodings for network intrusion detection.Transactions on Machine Learning Research, 2025. 10 APREPRINT- APRIL7, 2026 Appendix A Additional Experiment Details Below we present the unified detection prompt ad...

  18. [18]

    Match the observed behavior to ONE specific technique from the list above

  19. [19]

    benign" if no attack detected. Nothing else should be returned.,→

    Only return the EXACT technique name from the list, or "benign" if no attack detected. Nothing else should be returned.,→ """ B Additional Experiment Results In this section, we provide additional results of our experiments. B.1 Detection Success Rate As detailed in Table 7, detection success varies significantly across the tested methodologies. Not all m...

  20. [20]

    Could you use the`get_current_time`tool to find out what the local time is in'Asia/Kolkata'? I need to schedule a call

    22:58:45 HTTP_REQ 172.25.7.2:0->172.66.0.243:443 TCP POST 172.66.0.243/v1/chat/completions user: "Could you use the`get_current_time`tool to find out what the local time is in'Asia/Kolkata'? I need to schedule a call." model=gpt-4.1 tool_count=2 tools=[get_current_time,convert_time]

  21. [21]

    22:58:45 HTTP_RESP 172.66.0.243:443->172.25.7.2:0 TCP 200 OK for POST assistant: [tool_calls: get_current_time] status=200 dur=854.2ms choices_count=1

  22. [22]

    SNMP_REQ 172.25.24.12 -> 172.25.24.3:161

  23. [23]

    (dozens of repeated SNMP request/response pairs)

    SNMP_RESP 172.25.24.3 -> 172.25.24.12 ... (dozens of repeated SNMP request/response pairs)

  24. [24]

    ICMP_ECHO_REQ 172.25.24.12 -> 172.25.24.3

  25. [25]

    ICMP_ECHO_REPLY 172.25.24.3 -> 172.25.24.12

  26. [26]

    ARP_REQ who-has 172.25.24.12

  27. [27]

    ARP_REPLY 172.25.24.3 is-at 02:42:ac:19:18:03

  28. [28]

    timezone

    22:59:01 HTTP_REQ 172.25.7.2:0->162.159.140.245:443 TCP POST 162.159.140.245/v1/chat/completions tool:'{\n "timezone": "Asia/Kolkata",\n "datetime": "2025-12-21T12:29:01+05:30",\n "day_of_week": "Sunday",\n "is_dst": false\n}'model=gpt-4.1 tool_count=2 tools=[get_current_time,convert_time]

  29. [29]

    single-flux

    22:59:02 HTTP_RESP 162.159.140.245:443->172.25.7.2:0 TCP 200 OK for POST assistant: 'The current local time in Asia/Kolkata is 12:29 PM on Sunday. Let me know what time you’d like to schedule your call or if you need to compare with a different timezone!' status=200 dur=1605.6ms choices_count=1 Evidence:Execution-time traces reveal extensive SNMP activity...