pith. sign in

arxiv: 2510.16558 · v2 · submitted 2025-10-18 · 💻 cs.CR · cs.AI

A First Look at the Security Issues in the Model Context Protocol Ecosystem

Pith reviewed 2026-05-18 06:08 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords Model Context ProtocolLLM securitytool integrationserver hijackingregistry vettingmetadata manipulationadversarial serverspre-integration analysis
0
0 comments X

The pith

Weak vetting in MCP registries lets hijacked servers manipulate LLM tool calls that hosts then execute without checks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies security risks in the Model Context Protocol ecosystem that links large language models to external tools. It describes a two-stage attack surface where registries with weak ownership checks admit adversarial or hijacked servers, after which attacker-controlled tool metadata can steer LLM reasoning toward intended operations. Hosts perform these operations without independent verification or sandboxing. The authors scanned 67,057 servers across six registries and found widespread conditions for hijacking and metadata manipulation. They built MCPInspect to flag misleading descriptions and code vulnerabilities before integration, locating 833 vulnerable servers and 18 with suspicious descriptions.

Core claim

MCP registries allow adversarial or hijacked servers to integrate due to insufficient vetting and ownership verification; once integrated, attacker-controlled tool metadata shapes LLM reasoning to induce specific operations that hosts execute without independent checks, and code-level flaws can amplify the effect even if not required for the initial attack.

What carries the argument

The two-stage attack surface of registry-level weak vetting followed by post-integration metadata manipulation of LLM outputs without host verification.

If this is right

  • Adversarial servers can enter hosts through public registries without strong ownership checks.
  • Attacker metadata can induce LLMs to request specific tool calls that hosts then perform.
  • Widespread server conditions enable hijacking and invocation manipulation across analyzed registries.
  • Pre-integration scanning tools can identify misleading metadata and exploitable code before use.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar trust gaps may appear in other protocols that let LLMs invoke external services based on third-party descriptions.
  • Requiring cryptographic signing of tool metadata or mandatory host-side policy checks could reduce the attack surface.
  • Registry operators might adopt automated ownership verification to limit hijacking before servers reach hosts.

Load-bearing premise

MCP hosts execute tool operations based on server-provided metadata without performing independent verification or sandboxing.

What would settle it

Finding that MCP hosts routinely perform their own verification of tool metadata and refuse to act on operations suggested only by untrusted server descriptions would contradict the attack surface.

Figures

Figures reproduced from arXiv: 2510.16558 by Xiaofan Li, Xing Gao.

Figure 1
Figure 1. Figure 1: The overview of the MCP ecosystem. 1 { 2 "mcpServers": { 3 # Local MCP server 4 "github": { 5 "command": "docker", 6 "args": ["run", "-i", "--rm", "-e", "GITHUB_PERSONAL_ACCESS_TOKEN", "ghcr.io/github/github-mcp-server"], 7 "env": { 8 "GITHUB_PERSONAL_ACCESS_TOKEN": "token" 9 } 10 } 11 # Remote MCP server 12 "semgrep": { 13 "url": "https://mcp.semgrep.ai/sse" 14 } 15 } 16 } Listing 1: MCP server configurat… view at source ↗
Figure 2
Figure 2. Figure 2: Number of MCP servers across registries. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Distribution of programming languages used in [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Context dangling tool issue success rate per [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Tool shadowing attack success rate per host–model [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: A leaked GitHub token discovered on the registry [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: MCP server maintainer comparison on npm registry. [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗
read the original abstract

The Model Context Protocol (MCP) has emerged as a standard for connecting large language models (LLMs) with external tools. However, this MCP ecosystem introduces new security risks across hosts, servers, and registries. In this paper, we present the first cross-entity security study of MCP under a two-stage attack surface. At the registry-level, weak vetting and ownership checks allow adversarial or hijacked servers to enter hosts. After integration, attacker-controlled tool metadata can shape LLM reasoning and induce attacker-intended operations, which hosts execute without independent verification. Code-level vulnerabilities (e.g., code injection) are not required but can amplify attacker-controlled parameters into exploitation. We analyze 67,057 servers across six public registries and identify widespread conditions enabling server hijacking and invocation manipulation. We further implement MCPInspect, a pre-integration analysis tool that detects misleading tool metadata and exploitable code vulnerabilities, identifying 833 vulnerable servers and 18 with suspicious descriptions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents the first cross-entity security study of the Model Context Protocol (MCP), describing a two-stage attack surface involving registry-level hijacking risks and subsequent metadata-driven manipulation of LLM reasoning in hosts. Through analysis of 67,057 servers across six public registries, the authors identify conditions enabling server hijacking and implement MCPInspect to flag 833 servers with misleading metadata or code vulnerabilities and 18 with suspicious descriptions.

Significance. If the results hold, this work is significant for highlighting security risks in an emerging standard for LLM-external tool integration. The concrete empirical data from scanning over 67,000 servers and the implementation of the MCPInspect tool provide strong grounding for the claims about vulnerable servers. These contributions could help the community address weak vetting practices and improve pre-integration checks.

major comments (2)
  1. [Abstract (two-stage attack surface)] The description of the two-stage attack surface claims that hosts execute attacker-intended operations shaped by server-provided tool metadata 'without independent verification or sandboxing'. However, the manuscript does not include any analysis of MCP host implementations, runtime tests against hosts, or citations to protocol specifications requiring such verification. This leaves the execution-without-verification aspect as an assumption rather than an empirically demonstrated property.
  2. [Results (scanning and detection)] The paper reports detecting 833 vulnerable servers using MCPInspect but provides insufficient detail on the scanning methodology, criteria for 'misleading tool metadata', and false-positive rates of the detection. This affects the reliability of the count and the claim of 'widespread conditions'.
minor comments (2)
  1. [Abstract] The six public registries are not named; specifying them would aid reproducibility.
  2. [Abstract] Some terms like 'MCPInspect' could benefit from a brief definition on first use for readers unfamiliar with the tool.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments, which have helped us identify areas for improvement in our manuscript on security issues in the Model Context Protocol ecosystem. We respond to each major comment below, indicating where revisions have been or will be made.

read point-by-point responses
  1. Referee: [Abstract (two-stage attack surface)] The description of the two-stage attack surface claims that hosts execute attacker-intended operations shaped by server-provided tool metadata 'without independent verification or sandboxing'. However, the manuscript does not include any analysis of MCP host implementations, runtime tests against hosts, or citations to protocol specifications requiring such verification. This leaves the execution-without-verification aspect as an assumption rather than an empirically demonstrated property.

    Authors: We acknowledge the referee's observation that the execution-without-verification aspect is presented without direct empirical analysis of host implementations. The two-stage attack surface is framed as a conceptual model grounded in the MCP protocol's design, where tool metadata from servers is passed to LLMs via hosts without protocol-mandated verification steps. To address this, we have added citations to the official MCP specification and a discussion of publicly documented host behaviors in the revised manuscript. We have also clarified the abstract language to indicate this follows from the protocol architecture rather than from runtime testing conducted in our study. Full runtime experiments against diverse hosts were outside the scope of this registry- and server-focused analysis. revision: partial

  2. Referee: [Results (scanning and detection)] The paper reports detecting 833 vulnerable servers using MCPInspect but provides insufficient detail on the scanning methodology, criteria for 'misleading tool metadata', and false-positive rates of the detection. This affects the reliability of the count and the claim of 'widespread conditions'.

    Authors: We agree that greater methodological transparency is needed to support the reported counts and the characterization of widespread conditions. In the revised manuscript, we have expanded the MCPInspect section with a detailed account of the data collection process across the six registries, the precise criteria and heuristics applied to detect misleading tool metadata (including name-description mismatches and suspicious phrasing), and the results of a manual validation on a sampled subset of detections to estimate false-positive rates. These additions improve reproducibility and allow readers to better assess the strength of the 833 vulnerable servers finding. revision: yes

Circularity Check

0 steps flagged

Empirical audit of MCP ecosystem with no circular derivation chain

full rationale

The manuscript is an empirical security analysis that scans public registries, identifies hijacking conditions in 67,057 entries, and implements MCPInspect to flag 833 servers with metadata or code issues. No equations, fitted parameters, predictions, or self-citations appear in the provided text that reduce any central claim to its own inputs by construction. The two-stage attack surface is presented as an observed property of the ecosystem rather than a derived result; the assumption that hosts execute metadata-induced operations without verification is stated directly from protocol description and is not obtained via self-definition, renaming, or load-bearing self-citation. The work therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on domain assumptions about current MCP host behavior rather than new mathematical axioms or fitted parameters. No free parameters or invented entities are introduced.

axioms (1)
  • domain assumption MCP hosts execute tool operations without independent verification of server-provided metadata or parameters.
    This premise underpins the second stage of the described attack surface and is invoked when stating that hosts run attacker-intended operations.

pith-pipeline@v0.9.0 · 5687 in / 1208 out tokens · 33343 ms · 2026-05-18T06:08:09.959877+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. A First Measurement Study on Authentication Security in Real-World Remote MCP Servers

    cs.CR 2026-05 conditional novelty 8.0

    First measurement study of 7,973 remote MCP servers finds 40.55% lack authentication and all 119 tested OAuth servers have flaws that risk data leaks or account takeover.

  2. Free-Riding the Agentic Web: A Systematic Security Analysis of x402 Payments

    cs.CR 2026-05 unverdicted novelty 6.0

    Systematic analysis of x402 reveals four security flaw classes with up to 100% leakage, a proof that output-only pricing cannot be both fair and inflation-bounded, and mitigations reducing costs 47% while inverting at...

  3. Security Threat Modeling for Emerging AI-Agent Protocols: A Comparative Analysis of MCP, A2A, Agora, and ANP

    cs.CR 2026-02 unverdicted novelty 5.0

    The paper identifies twelve protocol-level security risks across MCP, A2A, Agora, and ANP and quantifies wrong-provider tool execution risk in MCP via a measurement-driven case study on multi-server composition.

Reference graph

Works this paper leans on

60 extracted references · 60 canonical work pages · cited by 3 Pith papers · 3 internal anchors

  1. [1]

    Securing Large Language Models: Threats, Vulnerabilities and Responsible Practices,

    Sara Abdali, Richard Anarfi, CJ Barberan, Jia He, and Er- fan Shayegani. Securing large language models: Threats, vulnerabilities and responsible practices.arXiv preprint arXiv:2403.12503, 2024

  2. [2]

    Tro- janpuzzle: Covertly poisoning code-suggestion models

    Hojjat Aghakhani, Wei Dai, Andre Manoel, Xavier Fer- nandes, Anant Kharkar, Christopher Kruegel, Giovanni Vigna, David Evans, Ben Zorn, and Robert Sim. Tro- janpuzzle: Covertly poisoning code-suggestion models. In2024 IEEE Symposium on Security and Privacy (SP), pages 1122–1140. IEEE, 2024

  3. [3]

    Seven months’ worth of mistakes: A lon- gitudinal study of typosquatting abuse

    Pieter Agten, Wouter Joosen, Frank Piessens, and Nick Nikiforakis. Seven months’ worth of mistakes: A lon- gitudinal study of typosquatting abuse. InProceedings of the 22nd Network and Distributed System Security Symposium (NDSS 2015). Internet Society, 2015

  4. [4]

    Introducing the model context pro- tocol

    Anthropic. Introducing the model context pro- tocol. https://www.anthropic.com/news/ model-context-protocol, 2025

  5. [5]

    Cursor security

    Cursor Security. Cursor security. https://cursor. com/security, 2025

  6. [6]

    arXiv preprint arXiv:2307.08715 , year=

    Gelei Deng, Yi Liu, Yuekang Li, Kailong Wang, Ying Zhang, Zefeng Li, Haoyu Wang, Tianwei Zhang, and Yang Liu. Jailbreaker: Automated jailbreak across mul- tiple large language model chatbots.arXiv preprint arXiv:2307.08715, 2023

  7. [7]

    Ai agents under threat: A survey of key secu- rity challenges and future pathways

    Z Deng, Y Guo, C Han, W Ma, J Xiong, S Wen, and Y Xiang. Ai agents under threat: A survey of key secu- rity challenges and future pathways. arxiv, 2024

  8. [8]

    Evaluating login challenges as adefense against account takeover

    Periwinkle Doerfler, Kurt Thomas, Maija Marincenko, Juri Ranieri, Yu Jiang, Angelika Moscicki, and Damon McCoy. Evaluating login challenges as adefense against account takeover. InThe World Wide Web Conference, pages 372–382, 2019

  9. [9]

    Towards measuring supply chain attacks on package managers for interpreted languages

    Ruian Duan, Omar Alrawi, Ranjita Pai Kasturi, Ryan Elder, Brendan Saltaformaggio, and Wenke Lee. To- wards measuring supply chain attacks on package managers for interpreted languages.arXiv preprint arXiv:2002.01139, 2020

  10. [10]

    Deleting your personal account

    GitHub Docs. Deleting your personal account. https: //docs.github.com/en/account-and-profile/ setting-up-and-managing-your-personal-account-on-github/ managing-your-personal-account/ deleting-your-personal-account, 2025

  11. [11]

    Managing your personal access tokens

    GitHub Docs. Managing your personal access tokens. https://docs.github.com/en/authentication/ keeping-your-account-and-data-secure/ managing-your-personal-access-tokens, 2025

  12. [12]

    Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injec- tion

    Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injec- tion. InProceedings of the 16th ACM Workshop on Artificial Intelligence and Security, pages 79–90, 2023

  13. [13]

    Use-after-freemail: Generalizing the use-after-free problem and applying it to email services

    Daniel Gruss, Michael Schwarz, Matthias Wübbeling, Simon Guggi, Timo Malderle, Stefan More, and Moritz Lipp. Use-after-freemail: Generalizing the use-after-free problem and applying it to email services. InProceed- ings of the 2018 on Asia Conference on Computer and Communications Security, pages 297–311, 2018

  14. [14]

    Investigating package related security threats in soft- ware registries

    Yacong Gu, Lingyun Ying, Yingyuan Pu, Xiao Hu, Hua- jun Chai, Ruimin Wang, Xing Gao, and Haixin Duan. Investigating package related security threats in soft- ware registries. In2023 IEEE Symposium on Security and Privacy (SP), pages 1578–1595. IEEE, 2023

  15. [15]

    Catastrophic Jailbreak of Open-source LLMs via Exploiting Generation

    Yangsibo Huang, Samyak Gupta, Mengzhou Xia, Kai Li, and Danqi Chen. Catastrophic jailbreak of open- source llms via exploiting generation.arXiv preprint arXiv:2310.06987, 2023

  16. [16]

    Mcp security notification: Tool poi- soning attacks

    Invariantlabs. Mcp security notification: Tool poi- soning attacks. https://invariantlabs.ai/blog/ mcp-security-notification-tool-poisoning-attacks , 2025

  17. [17]

    Whatsapp mcp exploited: Exfiltrating your message history via mcp

    Invariantlabs. Whatsapp mcp exploited: Exfiltrating your message history via mcp. https://invariantlabs.ai/blog/ whatsapp-mcp-exploited, 2025

  18. [18]

    arXiv preprint arXiv:2307.14692 , year=

    Nikhil Kandpal, Matthew Jagielski, Florian Tramèr, and Nicholas Carlini. Backdoor attacks for in-context learning with language models.arXiv preprint arXiv:2307.14692, 2023. 14

  19. [19]

    Every second counts: Quantifying the negative externalities of cybercrime via typosquatting

    Mohammad Taha Khan, Xiang Huo, Zhou Li, and Chris Kanich. Every second counts: Quantifying the negative externalities of cybercrime via typosquatting. In2015 IEEE Symposium on Security and Privacy, pages 135–

  20. [20]

    Hiding in plain sight: A longitudinal study of com- bosquatting abuse

    Panagiotis Kintis, Najmeh Miramirkhani, Charles Lever, Yizheng Chen, Rosa Romero-Gómez, Nikolaos Pitropakis, Nick Nikiforakis, and Manos Antonakakis. Hiding in plain sight: A longitudinal study of com- bosquatting abuse. InProceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pages 569–586, 2017

  21. [21]

    arXiv preprint arXiv:2304.05197 , year=

    Haoran Li, Dadi Guo, Wei Fan, Mingshi Xu, Jie Huang, Fanpu Meng, and Yangqiu Song. Multi-step jail- breaking privacy attacks on chatgpt.arXiv preprint arXiv:2304.05197, 2023

  22. [22]

    To- ward understanding the security of plugins in continuous integration services

    Xiaofan Li, Yacong Gu, Chu Qiao, Zhenkai Zhang, Daip- ing Liu, Lingyun Ying, Haixin Duan, and Xing Gao. To- ward understanding the security of plugins in continuous integration services. InProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security, pages 482–496, 2024

  23. [23]

    A first look at security and privacy risks in the rapidapi ecosys- tem

    Song Liao, Long Cheng, Xiapu Luo, Zheng Song, Haipeng Cai, Danfeng Yao, and Hongxin Hu. A first look at security and privacy risks in the rapidapi ecosys- tem. InProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security, pages 1626–1640, 2024

  24. [24]

    Exploring the unchartered space of container registry typosquatting

    Guannan Liu, Xing Gao, Haining Wang, and Kun Sun. Exploring the unchartered space of container registry typosquatting. In31st USENIX Security Symposium (USENIX Security 22), pages 35–51, 2022

  25. [25]

    malicious code

    Tong Liu, Zizhuang Deng, Guozhu Meng, Yuekang Li, and Kai Chen. Demystifying rce vulnerabilities in llm-integrated apps. arxiv 2023.arXiv preprint arXiv:2309.02926, 2023

  26. [26]

    Prompt Injection attack against LLM-integrated Applications

    Yi Liu, Gelei Deng, Yuekang Li, Kailong Wang, Zihao Wang, Xiaofeng Wang, Tianwei Zhang, Yepang Liu, Haoyu Wang, Yan Zheng, et al. Prompt injection at- tack against llm-integrated applications.arXiv preprint arXiv:2306.05499, 2023

  27. [27]

    Formalizing and benchmarking prompt injection attacks and defenses

    Yupei Liu, Yuqi Jia, Runpeng Geng, Jinyuan Jia, and Neil Zhenqiang Gong. Formalizing and benchmarking prompt injection attacks and defenses. In33rd USENIX Security Symposium (USENIX Security 24), pages 1831– 1847, 2024

  28. [28]

    The mcp registry registry

    mastra. The mcp registry registry. https://mastra. ai/mcp-registry-registry, 2025

  29. [29]

    Discover top mcp servers | mcp market

    MCP Market. Discover top mcp servers | mcp market. https://mcpmarket.com/, 2025

  30. [30]

    Mcp store - find and connect to 20,000+ mcp servers.https://mcpstore.co/, 2025

    MCP Store. Mcp store - find and connect to 20,000+ mcp servers.https://mcpstore.co/, 2025

  31. [31]

    Mcp servers.https://mcp.so/, 2025

    mcp.so. Mcp servers.https://mcp.so/, 2025

  32. [32]

    Model context protocol (mcp) is now generally available in microsoft copilot studio

    Microsoft. Model context protocol (mcp) is now generally available in microsoft copilot studio. https://www.microsoft.com/en-us/ microsoft-copilot/blog/copilot-studio/ model-context-protocol-mcp-is-now-generally-available-in-microsoft-copilot-studio/ , 2025

  33. [33]

    Measuring the perpetrators and funders of typosquatting

    Tyler Moore and Benjamin Edelman. Measuring the perpetrators and funders of typosquatting. InInterna- tional Conference on Financial Cryptography and Data Security, pages 175–191. Springer, 2010

  34. [34]

    Soundsquatting: Uncovering the use of homophones in domain squatting

    Nick Nikiforakis, Marco Balduzzi, Lieven Desmet, Frank Piessens, and Wouter Joosen. Soundsquatting: Uncovering the use of homophones in domain squatting. InInternational Conference on Information Security, pages 291–308. Springer, 2014

  35. [35]

    Bit- squatting: Exploiting bit-flips for fun, or profit? InPro- ceedings of the 22nd international conference on World Wide Web, pages 989–998, 2013

    Nick Nikiforakis, Steven Van Acker, Wannes Meert, Lieven Desmet, Frank Piessens, and Wouter Joosen. Bit- squatting: Exploiting bit-flips for fun, or profit? InPro- ceedings of the 22nd international conference on World Wide Web, pages 989–998, 2013

  36. [36]

    Cheatagent: Attack- ing llm-empowered recommender systems via llm agent

    Liang-bo Ning, Shijie Wang, Wenqi Fan, Qing Li, Xin Xu, Hao Chen, and Feiran Huang. Cheatagent: Attack- ing llm-empowered recommender systems via llm agent. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 2284– 2295, 2024

  37. [37]

    Prompt-to-sql injections in llm-integrated web applications: Risks and defenses

    Rodrigo Pedro, Miguel E Coimbra, Daniel Castro, Paulo Carreira, and Nuno Santos. Prompt-to-sql injections in llm-integrated web applications: Risks and defenses. In 2025 IEEE/ACM 47th International Conference on Soft- ware Engineering (ICSE), pages 76–88. IEEE Computer Society, 2024

  38. [38]

    Pulsemcp | keep up-to-date with mcp

    Pulse MCP. Pulsemcp | keep up-to-date with mcp. https://www.pulsemcp.com/, 2025

  39. [39]

    You autocomplete me: Poisoning vul- nerabilities in neural code completion

    Roei Schuster, Congzheng Song, Eran Tromer, and Vi- taly Shmatikov. You autocomplete me: Poisoning vul- nerabilities in neural code completion. In30th USENIX Security Symposium (USENIX Security 21), pages 1559– 1575, 2021

  40. [40]

    arXiv preprint arXiv:2310.10844 (2023)

    Erfan Shayegani, Md Abdullah Al Mamun, Yu Fu, Pe- dram Zaree, Yue Dong, and Nael Abu-Ghazaleh. Survey of vulnerabilities in large language models revealed by 15 adversarial attacks.arXiv preprint arXiv:2310.10844, 2023

  41. [41]

    do anything now

    Xinyue Shen, Zeyuan Chen, Michael Backes, Yun Shen, and Yang Zhang. " do anything now": Characterizing and evaluating in-the-wild jailbreak prompts on large language models. InProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security, pages 1671–1685, 2024

  42. [42]

    Smithery - model context protocol registry

    Smithery. Smithery - model context protocol registry. https://smithery.ai/, 2025

  43. [43]

    The long {“Taile”} of typosquatting domain names

    Janos Szurdi, Balazs Kocso, Gabor Cseh, Jonathan Spring, Mark Felegyhazi, and Chris Kanich. The long {“Taile”} of typosquatting domain names. In23rd USENIX Security Symposium (USENIX Security 14), pages 191–206, 2014

  44. [44]

    Google to embrace anthropic’s standard for connecting ai models to data

    TechCrunch. Google to embrace anthropic’s standard for connecting ai models to data. https://techcrunch.com/2025/04/09/ google-says-itll-embrace-anthropics-standard-for-connecting-ai-models-to-data/ , 2025

  45. [45]

    Openai adopts rival anthropic’s standard for connecting ai models to data

    TechCrunch. Openai adopts rival anthropic’s standard for connecting ai models to data. https://techcrunch.com/2025/03/26/ openai-adopts-rival-anthropics-standard-for-connecting-ai-models-to-data/ , 2025

  46. [46]

    PhD thesis, Univer- sität Hamburg, Fachbereich Informatik, 2016

    Nikolai Philipp Tschacher.Typosquatting in program- ming language package managers. PhD thesis, Univer- sität Hamburg, Fachbereich Informatik, 2016

  47. [47]

    Typosquatting and com- bosquatting attacks on the python ecosystem

    Duc-Ly Vu, Ivan Pashchenko, Fabio Massacci, Henrik Plate, and Antonino Sabetta. Typosquatting and com- bosquatting attacks on the python ecosystem. In2020 ieee european symposium on security and privacy work- shops (euros&pw), pages 509–514. IEEE, 2020

  48. [48]

    Adversarial demonstration attacks on large language models.arXiv preprint arXiv:2305.14950,

    Jiongxiao Wang, Zichen Liu, Keun Hee Park, Zhuo- jun Jiang, Zhaoheng Zheng, Zhuofeng Wu, Muhao Chen, and Chaowei Xiao. Adversarial demonstration attacks on large language models.arXiv preprint arXiv:2305.14950, 2023

  49. [49]

    Jailbroken: How does llm safety training fail?Advances in Neural Information Processing Systems, 36, 2024

    Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. Jailbroken: How does llm safety training fail?Advances in Neural Information Processing Systems, 36, 2024

  50. [50]

    Deceptprompt: Exploiting llm-driven code generation via adversarial natural language instructions, 2023

    Fangzhou Wu, Xiaogeng Liu, and Chaowei Xiao. De- ceptprompt: Exploiting llm-driven code generation via adversarial natural language instructions.arXiv preprint arXiv:2312.04730, 2023

  51. [51]

    I know what you asked: Prompt leakage via kv-cache sharing in multi- tenant llm serving

    Guanlong Wu, Zheng Zhang, Yao Zhang, Weili Wang, Jianyu Niu, Ye Wu, and Yinqian Zhang. I know what you asked: Prompt leakage via kv-cache sharing in multi- tenant llm serving. InProceedings of the 2025 Network and Distributed System Security (NDSS) Symposium. San Diego, CA, USA, 2025

  52. [52]

    Backdooring instruction-tuned large lan- guage models with virtual prompt injection.arXiv preprint arXiv:2307.16888, 2023

    Jun Yan, Vikas Yadav, Shiyang Li, Lichang Chen, Zheng Tang, Hai Wang, Vijay Srinivasan, Xiang Ren, and Hongxia Jin. Backdooring instruction-tuned large lan- guage models with virtual prompt injection.arXiv preprint arXiv:2307.16888, 2023

  53. [53]

    An llm-assisted easy-to-trigger backdoor attack on code completion models: Injecting disguised vulnerabilities against strong detection, 2024

    Shenao Yan, Shen Wang, Yue Duan, Hanbin Hong, Kiho Lee, Doowon Kim, and Yuan Hong. An llm- assisted easy-to-trigger backdoor attack on code comple- tion models: Injecting disguised vulnerabilities against strong detection.arXiv preprint arXiv:2406.06822, 2024

  54. [54]

    What are weak links in the npm supply chain? InProceedings of the 44th International Conference on Software Engineering: Software Engineering in Prac- tice, pages 331–340, 2022

    Nusrat Zahan, Thomas Zimmermann, Patrice Gode- froid, Brendan Murphy, Chandra Maddila, and Laurie Williams. What are weak links in the npm supply chain? InProceedings of the 44th International Conference on Software Engineering: Software Engineering in Prac- tice, pages 331–340, 2022

  55. [55]

    Advdoor: adversarial backdoor attack of deep learning system

    Quan Zhang, Yifeng Ding, Yongqiang Tian, Jianmin Guo, Min Yuan, and Yu Jiang. Advdoor: adversarial backdoor attack of deep learning system. InProceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis, pages 127–138, 2021

  56. [56]

    Human-imperceptible re- trieval poisoning attacks in llm-powered applications

    Quan Zhang, Binqi Zeng, Chijin Zhou, Gwihwan Go, Heyuan Shi, and Yu Jiang. Human-imperceptible re- trieval poisoning attacks in llm-powered applications. In Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering, pages 502–506, 2024

  57. [57]

    Imperceptible content poisoning in llm-powered applications

    Quan Zhang, Chijin Zhou, Gwihwan Go, Binqi Zeng, Heyuan Shi, Zichen Xu, and Yu Jiang. Imperceptible content poisoning in llm-powered applications. InPro- ceedings of the 39th IEEE/ACM International Confer- ence on Automated Software Engineering, pages 242– 254, 2024

  58. [58]

    Priva- cyasst: Safeguarding user privacy in tool-using large language model agents.IEEE Transactions on Depend- able and Secure Computing, 21(6):5242–5258, 2024

    Xinyu Zhang, Huiyu Xu, Zhongjie Ba, Zhibo Wang, Yuan Hong, Jian Liu, Zhan Qin, and Kui Ren. Priva- cyasst: Safeguarding user privacy in tool-using large language model agents.IEEE Transactions on Depend- able and Secure Computing, 21(6):5242–5258, 2024

  59. [59]

    Small world with high risks: A study of security threats in the npm ecosystem

    Markus Zimmermann, Cristian-Alexandru Staicu, Cam Tenny, and Michael Pradel. Small world with high risks: A study of security threats in the npm ecosystem. In 28th USENIX Security symposium (USENIX security 19), pages 995–1010, 2019. 16

  60. [60]

    Universal and Transferable Adversarial Attacks on Aligned Language Models

    Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr, J Zico Kolter, and Matt Fredrikson. Universal and trans- ferable adversarial attacks on aligned language models. arXiv preprint arXiv:2307.15043, 2023. 17