A First Look at the Security Issues in the Model Context Protocol Ecosystem

Xiaofan Li; Xing Gao

arxiv: 2510.16558 · v2 · submitted 2025-10-18 · 💻 cs.CR · cs.AI

A First Look at the Security Issues in the Model Context Protocol Ecosystem

Xiaofan Li , Xing Gao This is my paper

Pith reviewed 2026-05-18 06:08 UTC · model grok-4.3

classification 💻 cs.CR cs.AI

keywords Model Context ProtocolLLM securitytool integrationserver hijackingregistry vettingmetadata manipulationadversarial serverspre-integration analysis

0 comments

The pith

Weak vetting in MCP registries lets hijacked servers manipulate LLM tool calls that hosts then execute without checks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies security risks in the Model Context Protocol ecosystem that links large language models to external tools. It describes a two-stage attack surface where registries with weak ownership checks admit adversarial or hijacked servers, after which attacker-controlled tool metadata can steer LLM reasoning toward intended operations. Hosts perform these operations without independent verification or sandboxing. The authors scanned 67,057 servers across six registries and found widespread conditions for hijacking and metadata manipulation. They built MCPInspect to flag misleading descriptions and code vulnerabilities before integration, locating 833 vulnerable servers and 18 with suspicious descriptions.

Core claim

MCP registries allow adversarial or hijacked servers to integrate due to insufficient vetting and ownership verification; once integrated, attacker-controlled tool metadata shapes LLM reasoning to induce specific operations that hosts execute without independent checks, and code-level flaws can amplify the effect even if not required for the initial attack.

What carries the argument

The two-stage attack surface of registry-level weak vetting followed by post-integration metadata manipulation of LLM outputs without host verification.

If this is right

Adversarial servers can enter hosts through public registries without strong ownership checks.
Attacker metadata can induce LLMs to request specific tool calls that hosts then perform.
Widespread server conditions enable hijacking and invocation manipulation across analyzed registries.
Pre-integration scanning tools can identify misleading metadata and exploitable code before use.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar trust gaps may appear in other protocols that let LLMs invoke external services based on third-party descriptions.
Requiring cryptographic signing of tool metadata or mandatory host-side policy checks could reduce the attack surface.
Registry operators might adopt automated ownership verification to limit hijacking before servers reach hosts.

Load-bearing premise

MCP hosts execute tool operations based on server-provided metadata without performing independent verification or sandboxing.

What would settle it

Finding that MCP hosts routinely perform their own verification of tool metadata and refuse to act on operations suggested only by untrusted server descriptions would contradict the attack surface.

Figures

Figures reproduced from arXiv: 2510.16558 by Xiaofan Li, Xing Gao.

**Figure 1.** Figure 1: The overview of the MCP ecosystem. 1 { 2 "mcpServers": { 3 # Local MCP server 4 "github": { 5 "command": "docker", 6 "args": ["run", "-i", "--rm", "-e", "GITHUB_PERSONAL_ACCESS_TOKEN", "ghcr.io/github/github-mcp-server"], 7 "env": { 8 "GITHUB_PERSONAL_ACCESS_TOKEN": "token" 9 } 10 } 11 # Remote MCP server 12 "semgrep": { 13 "url": "https://mcp.semgrep.ai/sse" 14 } 15 } 16 } Listing 1: MCP server configurat… view at source ↗

**Figure 2.** Figure 2: Number of MCP servers across registries. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Distribution of programming languages used in [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 5.** Figure 5: Context dangling tool issue success rate per [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Tool shadowing attack success rate per host–model [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: A leaked GitHub token discovered on the registry [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗

**Figure 8.** Figure 8: MCP server maintainer comparison on npm registry. [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗

read the original abstract

The Model Context Protocol (MCP) has emerged as a standard for connecting large language models (LLMs) with external tools. However, this MCP ecosystem introduces new security risks across hosts, servers, and registries. In this paper, we present the first cross-entity security study of MCP under a two-stage attack surface. At the registry-level, weak vetting and ownership checks allow adversarial or hijacked servers to enter hosts. After integration, attacker-controlled tool metadata can shape LLM reasoning and induce attacker-intended operations, which hosts execute without independent verification. Code-level vulnerabilities (e.g., code injection) are not required but can amplify attacker-controlled parameters into exploitation. We analyze 67,057 servers across six public registries and identify widespread conditions enabling server hijacking and invocation manipulation. We further implement MCPInspect, a pre-integration analysis tool that detects misleading tool metadata and exploitable code vulnerabilities, identifying 833 vulnerable servers and 18 with suspicious descriptions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper scans 67k MCP servers and ships a detection tool for bad metadata, but the host execution claim stays untested.

read the letter

Hi, the main thing to know is that they ran a large registry scan across 67,057 MCP servers and built MCPInspect to flag 833 cases of misleading metadata or code problems. That gives the first concrete numbers on how weak the entry points are for this protocol. The work is new in combining registry vetting gaps with the idea that server metadata could steer LLM decisions toward attacker goals. They show real conditions for hijacking and invocation issues without needing code injection. The scan itself is the strongest part and supplies data others can check or extend. The softer spot is the two-stage attack surface. The paper states that hosts run the induced operations without independent verification, yet it contains no host implementation review, no runtime tests against actual MCP hosts, and no protocol citations that would require such checks. The evidence stays on the server and registry side, so the full chain from metadata to executed action remains an assumption rather than a measured outcome. This is useful for people tracking security in LLM-tool setups or building early defenses for agent ecosystems. A reader who wants initial measurements and a starting detection tool will find value here. It deserves a serious referee because the empirical counts are grounded and point to a real surface that needs attention, even if the host assumptions need more work. I would send it for peer review with a request to add host-side checks or clarify the execution model.

Referee Report

2 major / 2 minor

Summary. The manuscript presents the first cross-entity security study of the Model Context Protocol (MCP), describing a two-stage attack surface involving registry-level hijacking risks and subsequent metadata-driven manipulation of LLM reasoning in hosts. Through analysis of 67,057 servers across six public registries, the authors identify conditions enabling server hijacking and implement MCPInspect to flag 833 servers with misleading metadata or code vulnerabilities and 18 with suspicious descriptions.

Significance. If the results hold, this work is significant for highlighting security risks in an emerging standard for LLM-external tool integration. The concrete empirical data from scanning over 67,000 servers and the implementation of the MCPInspect tool provide strong grounding for the claims about vulnerable servers. These contributions could help the community address weak vetting practices and improve pre-integration checks.

major comments (2)

[Abstract (two-stage attack surface)] The description of the two-stage attack surface claims that hosts execute attacker-intended operations shaped by server-provided tool metadata 'without independent verification or sandboxing'. However, the manuscript does not include any analysis of MCP host implementations, runtime tests against hosts, or citations to protocol specifications requiring such verification. This leaves the execution-without-verification aspect as an assumption rather than an empirically demonstrated property.
[Results (scanning and detection)] The paper reports detecting 833 vulnerable servers using MCPInspect but provides insufficient detail on the scanning methodology, criteria for 'misleading tool metadata', and false-positive rates of the detection. This affects the reliability of the count and the claim of 'widespread conditions'.

minor comments (2)

[Abstract] The six public registries are not named; specifying them would aid reproducibility.
[Abstract] Some terms like 'MCPInspect' could benefit from a brief definition on first use for readers unfamiliar with the tool.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments, which have helped us identify areas for improvement in our manuscript on security issues in the Model Context Protocol ecosystem. We respond to each major comment below, indicating where revisions have been or will be made.

read point-by-point responses

Referee: [Abstract (two-stage attack surface)] The description of the two-stage attack surface claims that hosts execute attacker-intended operations shaped by server-provided tool metadata 'without independent verification or sandboxing'. However, the manuscript does not include any analysis of MCP host implementations, runtime tests against hosts, or citations to protocol specifications requiring such verification. This leaves the execution-without-verification aspect as an assumption rather than an empirically demonstrated property.

Authors: We acknowledge the referee's observation that the execution-without-verification aspect is presented without direct empirical analysis of host implementations. The two-stage attack surface is framed as a conceptual model grounded in the MCP protocol's design, where tool metadata from servers is passed to LLMs via hosts without protocol-mandated verification steps. To address this, we have added citations to the official MCP specification and a discussion of publicly documented host behaviors in the revised manuscript. We have also clarified the abstract language to indicate this follows from the protocol architecture rather than from runtime testing conducted in our study. Full runtime experiments against diverse hosts were outside the scope of this registry- and server-focused analysis. revision: partial
Referee: [Results (scanning and detection)] The paper reports detecting 833 vulnerable servers using MCPInspect but provides insufficient detail on the scanning methodology, criteria for 'misleading tool metadata', and false-positive rates of the detection. This affects the reliability of the count and the claim of 'widespread conditions'.

Authors: We agree that greater methodological transparency is needed to support the reported counts and the characterization of widespread conditions. In the revised manuscript, we have expanded the MCPInspect section with a detailed account of the data collection process across the six registries, the precise criteria and heuristics applied to detect misleading tool metadata (including name-description mismatches and suspicious phrasing), and the results of a manual validation on a sampled subset of detections to estimate false-positive rates. These additions improve reproducibility and allow readers to better assess the strength of the 833 vulnerable servers finding. revision: yes

Circularity Check

0 steps flagged

Empirical audit of MCP ecosystem with no circular derivation chain

full rationale

The manuscript is an empirical security analysis that scans public registries, identifies hijacking conditions in 67,057 entries, and implements MCPInspect to flag 833 servers with metadata or code issues. No equations, fitted parameters, predictions, or self-citations appear in the provided text that reduce any central claim to its own inputs by construction. The two-stage attack surface is presented as an observed property of the ecosystem rather than a derived result; the assumption that hosts execute metadata-induced operations without verification is stated directly from protocol description and is not obtained via self-definition, renaming, or load-bearing self-citation. The work therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on domain assumptions about current MCP host behavior rather than new mathematical axioms or fitted parameters. No free parameters or invented entities are introduced.

axioms (1)

domain assumption MCP hosts execute tool operations without independent verification of server-provided metadata or parameters.
This premise underpins the second stage of the described attack surface and is invoked when stating that hosts run attacker-intended operations.

pith-pipeline@v0.9.0 · 5687 in / 1208 out tokens · 33343 ms · 2026-05-18T06:08:09.959877+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

hosts execute tool operations based on server-provided metadata without performing independent verification or sandboxing
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

weak vetting and ownership checks allow adversarial or hijacked servers

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

A First Measurement Study on Authentication Security in Real-World Remote MCP Servers
cs.CR 2026-05 conditional novelty 8.0

First measurement study of 7,973 remote MCP servers finds 40.55% lack authentication and all 119 tested OAuth servers have flaws that risk data leaks or account takeover.
Free-Riding the Agentic Web: A Systematic Security Analysis of x402 Payments
cs.CR 2026-05 unverdicted novelty 6.0

Systematic analysis of x402 reveals four security flaw classes with up to 100% leakage, a proof that output-only pricing cannot be both fair and inflation-bounded, and mitigations reducing costs 47% while inverting at...
Security Threat Modeling for Emerging AI-Agent Protocols: A Comparative Analysis of MCP, A2A, Agora, and ANP
cs.CR 2026-02 unverdicted novelty 5.0

The paper identifies twelve protocol-level security risks across MCP, A2A, Agora, and ANP and quantifies wrong-provider tool execution risk in MCP via a measurement-driven case study on multi-server composition.

Reference graph

Works this paper leans on

60 extracted references · 60 canonical work pages · cited by 3 Pith papers · 3 internal anchors

[1]

Securing Large Language Models: Threats, Vulnerabilities and Responsible Practices,

Sara Abdali, Richard Anarfi, CJ Barberan, Jia He, and Er- fan Shayegani. Securing large language models: Threats, vulnerabilities and responsible practices.arXiv preprint arXiv:2403.12503, 2024

work page arXiv 2024
[2]

Tro- janpuzzle: Covertly poisoning code-suggestion models

Hojjat Aghakhani, Wei Dai, Andre Manoel, Xavier Fer- nandes, Anant Kharkar, Christopher Kruegel, Giovanni Vigna, David Evans, Ben Zorn, and Robert Sim. Tro- janpuzzle: Covertly poisoning code-suggestion models. In2024 IEEE Symposium on Security and Privacy (SP), pages 1122–1140. IEEE, 2024

work page 2024
[3]

Seven months’ worth of mistakes: A lon- gitudinal study of typosquatting abuse

Pieter Agten, Wouter Joosen, Frank Piessens, and Nick Nikiforakis. Seven months’ worth of mistakes: A lon- gitudinal study of typosquatting abuse. InProceedings of the 22nd Network and Distributed System Security Symposium (NDSS 2015). Internet Society, 2015

work page 2015
[4]

Introducing the model context pro- tocol

Anthropic. Introducing the model context pro- tocol. https://www.anthropic.com/news/ model-context-protocol, 2025

work page 2025
[5]

Cursor security

Cursor Security. Cursor security. https://cursor. com/security, 2025

work page 2025
[6]

arXiv preprint arXiv:2307.08715 , year=

Gelei Deng, Yi Liu, Yuekang Li, Kailong Wang, Ying Zhang, Zefeng Li, Haoyu Wang, Tianwei Zhang, and Yang Liu. Jailbreaker: Automated jailbreak across mul- tiple large language model chatbots.arXiv preprint arXiv:2307.08715, 2023

work page arXiv 2023
[7]

Ai agents under threat: A survey of key secu- rity challenges and future pathways

Z Deng, Y Guo, C Han, W Ma, J Xiong, S Wen, and Y Xiang. Ai agents under threat: A survey of key secu- rity challenges and future pathways. arxiv, 2024

work page 2024
[8]

Evaluating login challenges as adefense against account takeover

Periwinkle Doerfler, Kurt Thomas, Maija Marincenko, Juri Ranieri, Yu Jiang, Angelika Moscicki, and Damon McCoy. Evaluating login challenges as adefense against account takeover. InThe World Wide Web Conference, pages 372–382, 2019

work page 2019
[9]

Towards measuring supply chain attacks on package managers for interpreted languages

Ruian Duan, Omar Alrawi, Ranjita Pai Kasturi, Ryan Elder, Brendan Saltaformaggio, and Wenke Lee. To- wards measuring supply chain attacks on package managers for interpreted languages.arXiv preprint arXiv:2002.01139, 2020

work page arXiv 2002
[10]

Deleting your personal account

GitHub Docs. Deleting your personal account. https: //docs.github.com/en/account-and-profile/ setting-up-and-managing-your-personal-account-on-github/ managing-your-personal-account/ deleting-your-personal-account, 2025

work page 2025
[11]

Managing your personal access tokens

GitHub Docs. Managing your personal access tokens. https://docs.github.com/en/authentication/ keeping-your-account-and-data-secure/ managing-your-personal-access-tokens, 2025

work page 2025
[12]

Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injec- tion

Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injec- tion. InProceedings of the 16th ACM Workshop on Artificial Intelligence and Security, pages 79–90, 2023

work page 2023
[13]

Use-after-freemail: Generalizing the use-after-free problem and applying it to email services

Daniel Gruss, Michael Schwarz, Matthias Wübbeling, Simon Guggi, Timo Malderle, Stefan More, and Moritz Lipp. Use-after-freemail: Generalizing the use-after-free problem and applying it to email services. InProceed- ings of the 2018 on Asia Conference on Computer and Communications Security, pages 297–311, 2018

work page 2018
[14]

Investigating package related security threats in soft- ware registries

Yacong Gu, Lingyun Ying, Yingyuan Pu, Xiao Hu, Hua- jun Chai, Ruimin Wang, Xing Gao, and Haixin Duan. Investigating package related security threats in soft- ware registries. In2023 IEEE Symposium on Security and Privacy (SP), pages 1578–1595. IEEE, 2023

work page 2023
[15]

Catastrophic Jailbreak of Open-source LLMs via Exploiting Generation

Yangsibo Huang, Samyak Gupta, Mengzhou Xia, Kai Li, and Danqi Chen. Catastrophic jailbreak of open- source llms via exploiting generation.arXiv preprint arXiv:2310.06987, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[16]

Mcp security notification: Tool poi- soning attacks

Invariantlabs. Mcp security notification: Tool poi- soning attacks. https://invariantlabs.ai/blog/ mcp-security-notification-tool-poisoning-attacks , 2025

work page 2025
[17]

Whatsapp mcp exploited: Exfiltrating your message history via mcp

Invariantlabs. Whatsapp mcp exploited: Exfiltrating your message history via mcp. https://invariantlabs.ai/blog/ whatsapp-mcp-exploited, 2025

work page 2025
[18]

arXiv preprint arXiv:2307.14692 , year=

Nikhil Kandpal, Matthew Jagielski, Florian Tramèr, and Nicholas Carlini. Backdoor attacks for in-context learning with language models.arXiv preprint arXiv:2307.14692, 2023. 14

work page arXiv 2023
[19]

Every second counts: Quantifying the negative externalities of cybercrime via typosquatting

Mohammad Taha Khan, Xiang Huo, Zhou Li, and Chris Kanich. Every second counts: Quantifying the negative externalities of cybercrime via typosquatting. In2015 IEEE Symposium on Security and Privacy, pages 135–

work page
[20]

Hiding in plain sight: A longitudinal study of com- bosquatting abuse

Panagiotis Kintis, Najmeh Miramirkhani, Charles Lever, Yizheng Chen, Rosa Romero-Gómez, Nikolaos Pitropakis, Nick Nikiforakis, and Manos Antonakakis. Hiding in plain sight: A longitudinal study of com- bosquatting abuse. InProceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pages 569–586, 2017

work page 2017
[21]

arXiv preprint arXiv:2304.05197 , year=

Haoran Li, Dadi Guo, Wei Fan, Mingshi Xu, Jie Huang, Fanpu Meng, and Yangqiu Song. Multi-step jail- breaking privacy attacks on chatgpt.arXiv preprint arXiv:2304.05197, 2023

work page arXiv 2023
[22]

To- ward understanding the security of plugins in continuous integration services

Xiaofan Li, Yacong Gu, Chu Qiao, Zhenkai Zhang, Daip- ing Liu, Lingyun Ying, Haixin Duan, and Xing Gao. To- ward understanding the security of plugins in continuous integration services. InProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security, pages 482–496, 2024

work page 2024
[23]

A first look at security and privacy risks in the rapidapi ecosys- tem

Song Liao, Long Cheng, Xiapu Luo, Zheng Song, Haipeng Cai, Danfeng Yao, and Hongxin Hu. A first look at security and privacy risks in the rapidapi ecosys- tem. InProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security, pages 1626–1640, 2024

work page 2024
[24]

Exploring the unchartered space of container registry typosquatting

Guannan Liu, Xing Gao, Haining Wang, and Kun Sun. Exploring the unchartered space of container registry typosquatting. In31st USENIX Security Symposium (USENIX Security 22), pages 35–51, 2022

work page 2022
[25]

malicious code

Tong Liu, Zizhuang Deng, Guozhu Meng, Yuekang Li, and Kai Chen. Demystifying rce vulnerabilities in llm-integrated apps. arxiv 2023.arXiv preprint arXiv:2309.02926, 2023

work page arXiv 2023
[26]

Prompt Injection attack against LLM-integrated Applications

Yi Liu, Gelei Deng, Yuekang Li, Kailong Wang, Zihao Wang, Xiaofeng Wang, Tianwei Zhang, Yepang Liu, Haoyu Wang, Yan Zheng, et al. Prompt injection at- tack against llm-integrated applications.arXiv preprint arXiv:2306.05499, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[27]

Formalizing and benchmarking prompt injection attacks and defenses

Yupei Liu, Yuqi Jia, Runpeng Geng, Jinyuan Jia, and Neil Zhenqiang Gong. Formalizing and benchmarking prompt injection attacks and defenses. In33rd USENIX Security Symposium (USENIX Security 24), pages 1831– 1847, 2024

work page 2024
[28]

The mcp registry registry

mastra. The mcp registry registry. https://mastra. ai/mcp-registry-registry, 2025

work page 2025
[29]

Discover top mcp servers | mcp market

MCP Market. Discover top mcp servers | mcp market. https://mcpmarket.com/, 2025

work page 2025
[30]

Mcp store - find and connect to 20,000+ mcp servers.https://mcpstore.co/, 2025

MCP Store. Mcp store - find and connect to 20,000+ mcp servers.https://mcpstore.co/, 2025

work page 2025
[31]

Mcp servers.https://mcp.so/, 2025

mcp.so. Mcp servers.https://mcp.so/, 2025

work page 2025
[32]

Model context protocol (mcp) is now generally available in microsoft copilot studio

Microsoft. Model context protocol (mcp) is now generally available in microsoft copilot studio. https://www.microsoft.com/en-us/ microsoft-copilot/blog/copilot-studio/ model-context-protocol-mcp-is-now-generally-available-in-microsoft-copilot-studio/ , 2025

work page 2025
[33]

Measuring the perpetrators and funders of typosquatting

Tyler Moore and Benjamin Edelman. Measuring the perpetrators and funders of typosquatting. InInterna- tional Conference on Financial Cryptography and Data Security, pages 175–191. Springer, 2010

work page 2010
[34]

Soundsquatting: Uncovering the use of homophones in domain squatting

Nick Nikiforakis, Marco Balduzzi, Lieven Desmet, Frank Piessens, and Wouter Joosen. Soundsquatting: Uncovering the use of homophones in domain squatting. InInternational Conference on Information Security, pages 291–308. Springer, 2014

work page 2014
[35]

Bit- squatting: Exploiting bit-flips for fun, or profit? InPro- ceedings of the 22nd international conference on World Wide Web, pages 989–998, 2013

Nick Nikiforakis, Steven Van Acker, Wannes Meert, Lieven Desmet, Frank Piessens, and Wouter Joosen. Bit- squatting: Exploiting bit-flips for fun, or profit? InPro- ceedings of the 22nd international conference on World Wide Web, pages 989–998, 2013

work page 2013
[36]

Cheatagent: Attack- ing llm-empowered recommender systems via llm agent

Liang-bo Ning, Shijie Wang, Wenqi Fan, Qing Li, Xin Xu, Hao Chen, and Feiran Huang. Cheatagent: Attack- ing llm-empowered recommender systems via llm agent. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 2284– 2295, 2024

work page 2024
[37]

Prompt-to-sql injections in llm-integrated web applications: Risks and defenses

Rodrigo Pedro, Miguel E Coimbra, Daniel Castro, Paulo Carreira, and Nuno Santos. Prompt-to-sql injections in llm-integrated web applications: Risks and defenses. In 2025 IEEE/ACM 47th International Conference on Soft- ware Engineering (ICSE), pages 76–88. IEEE Computer Society, 2024

work page 2025
[38]

Pulsemcp | keep up-to-date with mcp

Pulse MCP. Pulsemcp | keep up-to-date with mcp. https://www.pulsemcp.com/, 2025

work page 2025
[39]

You autocomplete me: Poisoning vul- nerabilities in neural code completion

Roei Schuster, Congzheng Song, Eran Tromer, and Vi- taly Shmatikov. You autocomplete me: Poisoning vul- nerabilities in neural code completion. In30th USENIX Security Symposium (USENIX Security 21), pages 1559– 1575, 2021

work page 2021
[40]

arXiv preprint arXiv:2310.10844 (2023)

Erfan Shayegani, Md Abdullah Al Mamun, Yu Fu, Pe- dram Zaree, Yue Dong, and Nael Abu-Ghazaleh. Survey of vulnerabilities in large language models revealed by 15 adversarial attacks.arXiv preprint arXiv:2310.10844, 2023

work page arXiv 2023
[41]

do anything now

Xinyue Shen, Zeyuan Chen, Michael Backes, Yun Shen, and Yang Zhang. " do anything now": Characterizing and evaluating in-the-wild jailbreak prompts on large language models. InProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security, pages 1671–1685, 2024

work page 2024
[42]

Smithery - model context protocol registry

Smithery. Smithery - model context protocol registry. https://smithery.ai/, 2025

work page 2025
[43]

The long {“Taile”} of typosquatting domain names

Janos Szurdi, Balazs Kocso, Gabor Cseh, Jonathan Spring, Mark Felegyhazi, and Chris Kanich. The long {“Taile”} of typosquatting domain names. In23rd USENIX Security Symposium (USENIX Security 14), pages 191–206, 2014

work page 2014
[44]

Google to embrace anthropic’s standard for connecting ai models to data

TechCrunch. Google to embrace anthropic’s standard for connecting ai models to data. https://techcrunch.com/2025/04/09/ google-says-itll-embrace-anthropics-standard-for-connecting-ai-models-to-data/ , 2025

work page 2025
[45]

Openai adopts rival anthropic’s standard for connecting ai models to data

TechCrunch. Openai adopts rival anthropic’s standard for connecting ai models to data. https://techcrunch.com/2025/03/26/ openai-adopts-rival-anthropics-standard-for-connecting-ai-models-to-data/ , 2025

work page 2025
[46]

PhD thesis, Univer- sität Hamburg, Fachbereich Informatik, 2016

Nikolai Philipp Tschacher.Typosquatting in program- ming language package managers. PhD thesis, Univer- sität Hamburg, Fachbereich Informatik, 2016

work page 2016
[47]

Typosquatting and com- bosquatting attacks on the python ecosystem

Duc-Ly Vu, Ivan Pashchenko, Fabio Massacci, Henrik Plate, and Antonino Sabetta. Typosquatting and com- bosquatting attacks on the python ecosystem. In2020 ieee european symposium on security and privacy work- shops (euros&pw), pages 509–514. IEEE, 2020

work page 2020
[48]

Adversarial demonstration attacks on large language models.arXiv preprint arXiv:2305.14950,

Jiongxiao Wang, Zichen Liu, Keun Hee Park, Zhuo- jun Jiang, Zhaoheng Zheng, Zhuofeng Wu, Muhao Chen, and Chaowei Xiao. Adversarial demonstration attacks on large language models.arXiv preprint arXiv:2305.14950, 2023

work page arXiv 2023
[49]

Jailbroken: How does llm safety training fail?Advances in Neural Information Processing Systems, 36, 2024

Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. Jailbroken: How does llm safety training fail?Advances in Neural Information Processing Systems, 36, 2024

work page 2024
[50]

Deceptprompt: Exploiting llm-driven code generation via adversarial natural language instructions, 2023

Fangzhou Wu, Xiaogeng Liu, and Chaowei Xiao. De- ceptprompt: Exploiting llm-driven code generation via adversarial natural language instructions.arXiv preprint arXiv:2312.04730, 2023

work page arXiv 2023
[51]

I know what you asked: Prompt leakage via kv-cache sharing in multi- tenant llm serving

Guanlong Wu, Zheng Zhang, Yao Zhang, Weili Wang, Jianyu Niu, Ye Wu, and Yinqian Zhang. I know what you asked: Prompt leakage via kv-cache sharing in multi- tenant llm serving. InProceedings of the 2025 Network and Distributed System Security (NDSS) Symposium. San Diego, CA, USA, 2025

work page 2025
[52]

Backdooring instruction-tuned large lan- guage models with virtual prompt injection.arXiv preprint arXiv:2307.16888, 2023

Jun Yan, Vikas Yadav, Shiyang Li, Lichang Chen, Zheng Tang, Hai Wang, Vijay Srinivasan, Xiang Ren, and Hongxia Jin. Backdooring instruction-tuned large lan- guage models with virtual prompt injection.arXiv preprint arXiv:2307.16888, 2023

work page arXiv 2023
[53]

An llm-assisted easy-to-trigger backdoor attack on code completion models: Injecting disguised vulnerabilities against strong detection, 2024

Shenao Yan, Shen Wang, Yue Duan, Hanbin Hong, Kiho Lee, Doowon Kim, and Yuan Hong. An llm- assisted easy-to-trigger backdoor attack on code comple- tion models: Injecting disguised vulnerabilities against strong detection.arXiv preprint arXiv:2406.06822, 2024

work page arXiv 2024
[54]

What are weak links in the npm supply chain? InProceedings of the 44th International Conference on Software Engineering: Software Engineering in Prac- tice, pages 331–340, 2022

Nusrat Zahan, Thomas Zimmermann, Patrice Gode- froid, Brendan Murphy, Chandra Maddila, and Laurie Williams. What are weak links in the npm supply chain? InProceedings of the 44th International Conference on Software Engineering: Software Engineering in Prac- tice, pages 331–340, 2022

work page 2022
[55]

Advdoor: adversarial backdoor attack of deep learning system

Quan Zhang, Yifeng Ding, Yongqiang Tian, Jianmin Guo, Min Yuan, and Yu Jiang. Advdoor: adversarial backdoor attack of deep learning system. InProceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis, pages 127–138, 2021

work page 2021
[56]

Human-imperceptible re- trieval poisoning attacks in llm-powered applications

Quan Zhang, Binqi Zeng, Chijin Zhou, Gwihwan Go, Heyuan Shi, and Yu Jiang. Human-imperceptible re- trieval poisoning attacks in llm-powered applications. In Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering, pages 502–506, 2024

work page 2024
[57]

Imperceptible content poisoning in llm-powered applications

Quan Zhang, Chijin Zhou, Gwihwan Go, Binqi Zeng, Heyuan Shi, Zichen Xu, and Yu Jiang. Imperceptible content poisoning in llm-powered applications. InPro- ceedings of the 39th IEEE/ACM International Confer- ence on Automated Software Engineering, pages 242– 254, 2024

work page 2024
[58]

Priva- cyasst: Safeguarding user privacy in tool-using large language model agents.IEEE Transactions on Depend- able and Secure Computing, 21(6):5242–5258, 2024

Xinyu Zhang, Huiyu Xu, Zhongjie Ba, Zhibo Wang, Yuan Hong, Jian Liu, Zhan Qin, and Kui Ren. Priva- cyasst: Safeguarding user privacy in tool-using large language model agents.IEEE Transactions on Depend- able and Secure Computing, 21(6):5242–5258, 2024

work page 2024
[59]

Small world with high risks: A study of security threats in the npm ecosystem

Markus Zimmermann, Cristian-Alexandru Staicu, Cam Tenny, and Michael Pradel. Small world with high risks: A study of security threats in the npm ecosystem. In 28th USENIX Security symposium (USENIX security 19), pages 995–1010, 2019. 16

work page 2019
[60]

Universal and Transferable Adversarial Attacks on Aligned Language Models

Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr, J Zico Kolter, and Matt Fredrikson. Universal and trans- ferable adversarial attacks on aligned language models. arXiv preprint arXiv:2307.15043, 2023. 17

work page internal anchor Pith review Pith/arXiv arXiv 2023

[1] [1]

Securing Large Language Models: Threats, Vulnerabilities and Responsible Practices,

Sara Abdali, Richard Anarfi, CJ Barberan, Jia He, and Er- fan Shayegani. Securing large language models: Threats, vulnerabilities and responsible practices.arXiv preprint arXiv:2403.12503, 2024

work page arXiv 2024

[2] [2]

Tro- janpuzzle: Covertly poisoning code-suggestion models

Hojjat Aghakhani, Wei Dai, Andre Manoel, Xavier Fer- nandes, Anant Kharkar, Christopher Kruegel, Giovanni Vigna, David Evans, Ben Zorn, and Robert Sim. Tro- janpuzzle: Covertly poisoning code-suggestion models. In2024 IEEE Symposium on Security and Privacy (SP), pages 1122–1140. IEEE, 2024

work page 2024

[3] [3]

Seven months’ worth of mistakes: A lon- gitudinal study of typosquatting abuse

Pieter Agten, Wouter Joosen, Frank Piessens, and Nick Nikiforakis. Seven months’ worth of mistakes: A lon- gitudinal study of typosquatting abuse. InProceedings of the 22nd Network and Distributed System Security Symposium (NDSS 2015). Internet Society, 2015

work page 2015

[4] [4]

Introducing the model context pro- tocol

Anthropic. Introducing the model context pro- tocol. https://www.anthropic.com/news/ model-context-protocol, 2025

work page 2025

[5] [5]

Cursor security

Cursor Security. Cursor security. https://cursor. com/security, 2025

work page 2025

[6] [6]

arXiv preprint arXiv:2307.08715 , year=

Gelei Deng, Yi Liu, Yuekang Li, Kailong Wang, Ying Zhang, Zefeng Li, Haoyu Wang, Tianwei Zhang, and Yang Liu. Jailbreaker: Automated jailbreak across mul- tiple large language model chatbots.arXiv preprint arXiv:2307.08715, 2023

work page arXiv 2023

[7] [7]

Ai agents under threat: A survey of key secu- rity challenges and future pathways

Z Deng, Y Guo, C Han, W Ma, J Xiong, S Wen, and Y Xiang. Ai agents under threat: A survey of key secu- rity challenges and future pathways. arxiv, 2024

work page 2024

[8] [8]

Evaluating login challenges as adefense against account takeover

Periwinkle Doerfler, Kurt Thomas, Maija Marincenko, Juri Ranieri, Yu Jiang, Angelika Moscicki, and Damon McCoy. Evaluating login challenges as adefense against account takeover. InThe World Wide Web Conference, pages 372–382, 2019

work page 2019

[9] [9]

Towards measuring supply chain attacks on package managers for interpreted languages

Ruian Duan, Omar Alrawi, Ranjita Pai Kasturi, Ryan Elder, Brendan Saltaformaggio, and Wenke Lee. To- wards measuring supply chain attacks on package managers for interpreted languages.arXiv preprint arXiv:2002.01139, 2020

work page arXiv 2002

[10] [10]

Deleting your personal account

GitHub Docs. Deleting your personal account. https: //docs.github.com/en/account-and-profile/ setting-up-and-managing-your-personal-account-on-github/ managing-your-personal-account/ deleting-your-personal-account, 2025

work page 2025

[11] [11]

Managing your personal access tokens

GitHub Docs. Managing your personal access tokens. https://docs.github.com/en/authentication/ keeping-your-account-and-data-secure/ managing-your-personal-access-tokens, 2025

work page 2025

[12] [12]

Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injec- tion

Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injec- tion. InProceedings of the 16th ACM Workshop on Artificial Intelligence and Security, pages 79–90, 2023

work page 2023

[13] [13]

Use-after-freemail: Generalizing the use-after-free problem and applying it to email services

Daniel Gruss, Michael Schwarz, Matthias Wübbeling, Simon Guggi, Timo Malderle, Stefan More, and Moritz Lipp. Use-after-freemail: Generalizing the use-after-free problem and applying it to email services. InProceed- ings of the 2018 on Asia Conference on Computer and Communications Security, pages 297–311, 2018

work page 2018

[14] [14]

Investigating package related security threats in soft- ware registries

Yacong Gu, Lingyun Ying, Yingyuan Pu, Xiao Hu, Hua- jun Chai, Ruimin Wang, Xing Gao, and Haixin Duan. Investigating package related security threats in soft- ware registries. In2023 IEEE Symposium on Security and Privacy (SP), pages 1578–1595. IEEE, 2023

work page 2023

[15] [15]

Catastrophic Jailbreak of Open-source LLMs via Exploiting Generation

Yangsibo Huang, Samyak Gupta, Mengzhou Xia, Kai Li, and Danqi Chen. Catastrophic jailbreak of open- source llms via exploiting generation.arXiv preprint arXiv:2310.06987, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[16] [16]

Mcp security notification: Tool poi- soning attacks

Invariantlabs. Mcp security notification: Tool poi- soning attacks. https://invariantlabs.ai/blog/ mcp-security-notification-tool-poisoning-attacks , 2025

work page 2025

[17] [17]

Whatsapp mcp exploited: Exfiltrating your message history via mcp

Invariantlabs. Whatsapp mcp exploited: Exfiltrating your message history via mcp. https://invariantlabs.ai/blog/ whatsapp-mcp-exploited, 2025

work page 2025

[18] [18]

arXiv preprint arXiv:2307.14692 , year=

Nikhil Kandpal, Matthew Jagielski, Florian Tramèr, and Nicholas Carlini. Backdoor attacks for in-context learning with language models.arXiv preprint arXiv:2307.14692, 2023. 14

work page arXiv 2023

[19] [19]

Every second counts: Quantifying the negative externalities of cybercrime via typosquatting

Mohammad Taha Khan, Xiang Huo, Zhou Li, and Chris Kanich. Every second counts: Quantifying the negative externalities of cybercrime via typosquatting. In2015 IEEE Symposium on Security and Privacy, pages 135–

work page

[20] [20]

Hiding in plain sight: A longitudinal study of com- bosquatting abuse

Panagiotis Kintis, Najmeh Miramirkhani, Charles Lever, Yizheng Chen, Rosa Romero-Gómez, Nikolaos Pitropakis, Nick Nikiforakis, and Manos Antonakakis. Hiding in plain sight: A longitudinal study of com- bosquatting abuse. InProceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pages 569–586, 2017

work page 2017

[21] [21]

arXiv preprint arXiv:2304.05197 , year=

Haoran Li, Dadi Guo, Wei Fan, Mingshi Xu, Jie Huang, Fanpu Meng, and Yangqiu Song. Multi-step jail- breaking privacy attacks on chatgpt.arXiv preprint arXiv:2304.05197, 2023

work page arXiv 2023

[22] [22]

To- ward understanding the security of plugins in continuous integration services

Xiaofan Li, Yacong Gu, Chu Qiao, Zhenkai Zhang, Daip- ing Liu, Lingyun Ying, Haixin Duan, and Xing Gao. To- ward understanding the security of plugins in continuous integration services. InProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security, pages 482–496, 2024

work page 2024

[23] [23]

A first look at security and privacy risks in the rapidapi ecosys- tem

Song Liao, Long Cheng, Xiapu Luo, Zheng Song, Haipeng Cai, Danfeng Yao, and Hongxin Hu. A first look at security and privacy risks in the rapidapi ecosys- tem. InProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security, pages 1626–1640, 2024

work page 2024

[24] [24]

Exploring the unchartered space of container registry typosquatting

Guannan Liu, Xing Gao, Haining Wang, and Kun Sun. Exploring the unchartered space of container registry typosquatting. In31st USENIX Security Symposium (USENIX Security 22), pages 35–51, 2022

work page 2022

[25] [25]

malicious code

Tong Liu, Zizhuang Deng, Guozhu Meng, Yuekang Li, and Kai Chen. Demystifying rce vulnerabilities in llm-integrated apps. arxiv 2023.arXiv preprint arXiv:2309.02926, 2023

work page arXiv 2023

[26] [26]

Prompt Injection attack against LLM-integrated Applications

Yi Liu, Gelei Deng, Yuekang Li, Kailong Wang, Zihao Wang, Xiaofeng Wang, Tianwei Zhang, Yepang Liu, Haoyu Wang, Yan Zheng, et al. Prompt injection at- tack against llm-integrated applications.arXiv preprint arXiv:2306.05499, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[27] [27]

Formalizing and benchmarking prompt injection attacks and defenses

Yupei Liu, Yuqi Jia, Runpeng Geng, Jinyuan Jia, and Neil Zhenqiang Gong. Formalizing and benchmarking prompt injection attacks and defenses. In33rd USENIX Security Symposium (USENIX Security 24), pages 1831– 1847, 2024

work page 2024

[28] [28]

The mcp registry registry

mastra. The mcp registry registry. https://mastra. ai/mcp-registry-registry, 2025

work page 2025

[29] [29]

Discover top mcp servers | mcp market

MCP Market. Discover top mcp servers | mcp market. https://mcpmarket.com/, 2025

work page 2025

[30] [30]

Mcp store - find and connect to 20,000+ mcp servers.https://mcpstore.co/, 2025

MCP Store. Mcp store - find and connect to 20,000+ mcp servers.https://mcpstore.co/, 2025

work page 2025

[31] [31]

Mcp servers.https://mcp.so/, 2025

mcp.so. Mcp servers.https://mcp.so/, 2025

work page 2025

[32] [32]

Model context protocol (mcp) is now generally available in microsoft copilot studio

Microsoft. Model context protocol (mcp) is now generally available in microsoft copilot studio. https://www.microsoft.com/en-us/ microsoft-copilot/blog/copilot-studio/ model-context-protocol-mcp-is-now-generally-available-in-microsoft-copilot-studio/ , 2025

work page 2025

[33] [33]

Measuring the perpetrators and funders of typosquatting

Tyler Moore and Benjamin Edelman. Measuring the perpetrators and funders of typosquatting. InInterna- tional Conference on Financial Cryptography and Data Security, pages 175–191. Springer, 2010

work page 2010

[34] [34]

Soundsquatting: Uncovering the use of homophones in domain squatting

Nick Nikiforakis, Marco Balduzzi, Lieven Desmet, Frank Piessens, and Wouter Joosen. Soundsquatting: Uncovering the use of homophones in domain squatting. InInternational Conference on Information Security, pages 291–308. Springer, 2014

work page 2014

[35] [35]

Bit- squatting: Exploiting bit-flips for fun, or profit? InPro- ceedings of the 22nd international conference on World Wide Web, pages 989–998, 2013

Nick Nikiforakis, Steven Van Acker, Wannes Meert, Lieven Desmet, Frank Piessens, and Wouter Joosen. Bit- squatting: Exploiting bit-flips for fun, or profit? InPro- ceedings of the 22nd international conference on World Wide Web, pages 989–998, 2013

work page 2013

[36] [36]

Cheatagent: Attack- ing llm-empowered recommender systems via llm agent

Liang-bo Ning, Shijie Wang, Wenqi Fan, Qing Li, Xin Xu, Hao Chen, and Feiran Huang. Cheatagent: Attack- ing llm-empowered recommender systems via llm agent. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 2284– 2295, 2024

work page 2024

[37] [37]

Prompt-to-sql injections in llm-integrated web applications: Risks and defenses

Rodrigo Pedro, Miguel E Coimbra, Daniel Castro, Paulo Carreira, and Nuno Santos. Prompt-to-sql injections in llm-integrated web applications: Risks and defenses. In 2025 IEEE/ACM 47th International Conference on Soft- ware Engineering (ICSE), pages 76–88. IEEE Computer Society, 2024

work page 2025

[38] [38]

Pulsemcp | keep up-to-date with mcp

Pulse MCP. Pulsemcp | keep up-to-date with mcp. https://www.pulsemcp.com/, 2025

work page 2025

[39] [39]

You autocomplete me: Poisoning vul- nerabilities in neural code completion

Roei Schuster, Congzheng Song, Eran Tromer, and Vi- taly Shmatikov. You autocomplete me: Poisoning vul- nerabilities in neural code completion. In30th USENIX Security Symposium (USENIX Security 21), pages 1559– 1575, 2021

work page 2021

[40] [40]

arXiv preprint arXiv:2310.10844 (2023)

Erfan Shayegani, Md Abdullah Al Mamun, Yu Fu, Pe- dram Zaree, Yue Dong, and Nael Abu-Ghazaleh. Survey of vulnerabilities in large language models revealed by 15 adversarial attacks.arXiv preprint arXiv:2310.10844, 2023

work page arXiv 2023

[41] [41]

do anything now

Xinyue Shen, Zeyuan Chen, Michael Backes, Yun Shen, and Yang Zhang. " do anything now": Characterizing and evaluating in-the-wild jailbreak prompts on large language models. InProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security, pages 1671–1685, 2024

work page 2024

[42] [42]

Smithery - model context protocol registry

Smithery. Smithery - model context protocol registry. https://smithery.ai/, 2025

work page 2025

[43] [43]

The long {“Taile”} of typosquatting domain names

Janos Szurdi, Balazs Kocso, Gabor Cseh, Jonathan Spring, Mark Felegyhazi, and Chris Kanich. The long {“Taile”} of typosquatting domain names. In23rd USENIX Security Symposium (USENIX Security 14), pages 191–206, 2014

work page 2014

[44] [44]

Google to embrace anthropic’s standard for connecting ai models to data

TechCrunch. Google to embrace anthropic’s standard for connecting ai models to data. https://techcrunch.com/2025/04/09/ google-says-itll-embrace-anthropics-standard-for-connecting-ai-models-to-data/ , 2025

work page 2025

[45] [45]

Openai adopts rival anthropic’s standard for connecting ai models to data

TechCrunch. Openai adopts rival anthropic’s standard for connecting ai models to data. https://techcrunch.com/2025/03/26/ openai-adopts-rival-anthropics-standard-for-connecting-ai-models-to-data/ , 2025

work page 2025

[46] [46]

PhD thesis, Univer- sität Hamburg, Fachbereich Informatik, 2016

Nikolai Philipp Tschacher.Typosquatting in program- ming language package managers. PhD thesis, Univer- sität Hamburg, Fachbereich Informatik, 2016

work page 2016

[47] [47]

Typosquatting and com- bosquatting attacks on the python ecosystem

Duc-Ly Vu, Ivan Pashchenko, Fabio Massacci, Henrik Plate, and Antonino Sabetta. Typosquatting and com- bosquatting attacks on the python ecosystem. In2020 ieee european symposium on security and privacy work- shops (euros&pw), pages 509–514. IEEE, 2020

work page 2020

[48] [48]

Adversarial demonstration attacks on large language models.arXiv preprint arXiv:2305.14950,

Jiongxiao Wang, Zichen Liu, Keun Hee Park, Zhuo- jun Jiang, Zhaoheng Zheng, Zhuofeng Wu, Muhao Chen, and Chaowei Xiao. Adversarial demonstration attacks on large language models.arXiv preprint arXiv:2305.14950, 2023

work page arXiv 2023

[49] [49]

Jailbroken: How does llm safety training fail?Advances in Neural Information Processing Systems, 36, 2024

Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. Jailbroken: How does llm safety training fail?Advances in Neural Information Processing Systems, 36, 2024

work page 2024

[50] [50]

Deceptprompt: Exploiting llm-driven code generation via adversarial natural language instructions, 2023

Fangzhou Wu, Xiaogeng Liu, and Chaowei Xiao. De- ceptprompt: Exploiting llm-driven code generation via adversarial natural language instructions.arXiv preprint arXiv:2312.04730, 2023

work page arXiv 2023

[51] [51]

I know what you asked: Prompt leakage via kv-cache sharing in multi- tenant llm serving

Guanlong Wu, Zheng Zhang, Yao Zhang, Weili Wang, Jianyu Niu, Ye Wu, and Yinqian Zhang. I know what you asked: Prompt leakage via kv-cache sharing in multi- tenant llm serving. InProceedings of the 2025 Network and Distributed System Security (NDSS) Symposium. San Diego, CA, USA, 2025

work page 2025

[52] [52]

Backdooring instruction-tuned large lan- guage models with virtual prompt injection.arXiv preprint arXiv:2307.16888, 2023

Jun Yan, Vikas Yadav, Shiyang Li, Lichang Chen, Zheng Tang, Hai Wang, Vijay Srinivasan, Xiang Ren, and Hongxia Jin. Backdooring instruction-tuned large lan- guage models with virtual prompt injection.arXiv preprint arXiv:2307.16888, 2023

work page arXiv 2023

[53] [53]

An llm-assisted easy-to-trigger backdoor attack on code completion models: Injecting disguised vulnerabilities against strong detection, 2024

Shenao Yan, Shen Wang, Yue Duan, Hanbin Hong, Kiho Lee, Doowon Kim, and Yuan Hong. An llm- assisted easy-to-trigger backdoor attack on code comple- tion models: Injecting disguised vulnerabilities against strong detection.arXiv preprint arXiv:2406.06822, 2024

work page arXiv 2024

[54] [54]

What are weak links in the npm supply chain? InProceedings of the 44th International Conference on Software Engineering: Software Engineering in Prac- tice, pages 331–340, 2022

Nusrat Zahan, Thomas Zimmermann, Patrice Gode- froid, Brendan Murphy, Chandra Maddila, and Laurie Williams. What are weak links in the npm supply chain? InProceedings of the 44th International Conference on Software Engineering: Software Engineering in Prac- tice, pages 331–340, 2022

work page 2022

[55] [55]

Advdoor: adversarial backdoor attack of deep learning system

Quan Zhang, Yifeng Ding, Yongqiang Tian, Jianmin Guo, Min Yuan, and Yu Jiang. Advdoor: adversarial backdoor attack of deep learning system. InProceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis, pages 127–138, 2021

work page 2021

[56] [56]

Human-imperceptible re- trieval poisoning attacks in llm-powered applications

Quan Zhang, Binqi Zeng, Chijin Zhou, Gwihwan Go, Heyuan Shi, and Yu Jiang. Human-imperceptible re- trieval poisoning attacks in llm-powered applications. In Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering, pages 502–506, 2024

work page 2024

[57] [57]

Imperceptible content poisoning in llm-powered applications

Quan Zhang, Chijin Zhou, Gwihwan Go, Binqi Zeng, Heyuan Shi, Zichen Xu, and Yu Jiang. Imperceptible content poisoning in llm-powered applications. InPro- ceedings of the 39th IEEE/ACM International Confer- ence on Automated Software Engineering, pages 242– 254, 2024

work page 2024

[58] [58]

Priva- cyasst: Safeguarding user privacy in tool-using large language model agents.IEEE Transactions on Depend- able and Secure Computing, 21(6):5242–5258, 2024

Xinyu Zhang, Huiyu Xu, Zhongjie Ba, Zhibo Wang, Yuan Hong, Jian Liu, Zhan Qin, and Kui Ren. Priva- cyasst: Safeguarding user privacy in tool-using large language model agents.IEEE Transactions on Depend- able and Secure Computing, 21(6):5242–5258, 2024

work page 2024

[59] [59]

Small world with high risks: A study of security threats in the npm ecosystem

Markus Zimmermann, Cristian-Alexandru Staicu, Cam Tenny, and Michael Pradel. Small world with high risks: A study of security threats in the npm ecosystem. In 28th USENIX Security symposium (USENIX security 19), pages 995–1010, 2019. 16

work page 2019

[60] [60]

Universal and Transferable Adversarial Attacks on Aligned Language Models

Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr, J Zico Kolter, and Matt Fredrikson. Universal and trans- ferable adversarial attacks on aligned language models. arXiv preprint arXiv:2307.15043, 2023. 17

work page internal anchor Pith review Pith/arXiv arXiv 2023