arxiv: 2601.10338 · v1 · submitted 2026-01-15 · 💻 cs.CR · cs.AI· cs.CL· cs.SE

Recognition: 1 theorem link

Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale

Gelei Deng, Guangquan Xu, Leo Zhang, Ruitao Feng, Weizhe Wang, Yao Zhang, Yi Liu, Yuekang Li

Authors on Pith no claims yet

Pith reviewed 2026-05-14 23:36 UTC · model grok-4.3

classification 💻 cs.CR cs.AIcs.CLcs.SE

keywords AI agent skillssecurity vulnerabilitiesempirical studyprompt injectiondata exfiltrationprivilege escalationSkillScanvulnerability taxonomy

0 comments

The pith

More than one in four AI agent skills contain at least one security vulnerability.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper performs the first large-scale empirical security analysis of AI agent skills, modular packages that extend agent capabilities with instructions and code. Collecting 42,447 skills from major marketplaces and analyzing 31,132 with SkillScan reveals that 26.1% contain at least one vulnerability across 14 patterns in four categories: prompt injection, data exfiltration, privilege escalation, and supply chain risks. Data exfiltration appears in 13.3% and privilege escalation in 11.8% of skills, while skills bundling executable scripts prove 2.12 times more likely to contain vulnerabilities. These results demonstrate an attack surface that grows with minimal vetting in agent frameworks.

Core claim

Agent skills execute with implicit trust and minimal vetting. Analysis shows 26.1% of skills contain vulnerabilities spanning 14 patterns across prompt injection, data exfiltration, privilege escalation, and supply chain risks. Data exfiltration reaches 13.3% prevalence and privilege escalation 11.8%, with 5.2% of skills displaying high-severity patterns suggesting malicious intent. Skills that bundle executable scripts are 2.12 times more likely to be vulnerable than instruction-only skills.

What carries the argument

SkillScan, a multi-stage detection framework integrating static analysis with LLM-based semantic classification to flag vulnerabilities in agent skills.

Load-bearing premise

SkillScan accurately flags real vulnerabilities at the reported precision and recall without significant selection bias in the analyzed skills or marketplace sources.

What would settle it

An independent manual audit of a random sample of flagged skills showing a substantially lower true vulnerability rate than 26.1%, or clear evidence of over-representation of high-risk marketplaces in the dataset.

read the original abstract

The rise of AI agent frameworks has introduced agent skills, modular packages containing instructions and executable code that dynamically extend agent capabilities. While this architecture enables powerful customization, skills execute with implicit trust and minimal vetting, creating a significant yet uncharacterized attack surface. We conduct the first large-scale empirical security analysis of this emerging ecosystem, collecting 42,447 skills from two major marketplaces and systematically analyzing 31,132 using SkillScan, a multi-stage detection framework integrating static analysis with LLM-based semantic classification. Our findings reveal pervasive security risks: 26.1% of skills contain at least one vulnerability, spanning 14 distinct patterns across four categories: prompt injection, data exfiltration, privilege escalation, and supply chain risks. Data exfiltration (13.3%) and privilege escalation (11.8%) are most prevalent, while 5.2% of skills exhibit high-severity patterns strongly suggesting malicious intent. We find that skills bundling executable scripts are 2.12x more likely to contain vulnerabilities than instruction-only skills (OR=2.12, p<0.001). Our contributions include: (1) a grounded vulnerability taxonomy derived from 8,126 vulnerable skills, (2) a validated detection methodology achieving 86.7% precision and 82.5% recall, and (3) an open dataset and detection toolkit to support future research. These results demonstrate an urgent need for capability-based permission systems and mandatory security vetting before this attack vector is further exploited.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is the first large-scale empirical scan of real AI agent skills from marketplaces, turning up concrete prevalence numbers and a taxonomy, but the detection pipeline's validation details are thin.

read the letter

The paper's main value is straightforward measurement: they pulled 42k skills from two marketplaces, ran 31k through SkillScan, and report that 26% contain at least one vulnerability across 14 patterns in four categories, with data exfiltration and privilege escalation leading. Skills that bundle executable scripts show an odds ratio of 2.12 for containing issues. They also release the dataset and toolkit, which is the part that actually moves the field forward rather than just describing a problem.

Referee Report

3 major / 3 minor

Summary. The manuscript reports the first large-scale empirical security analysis of AI agent skills, collecting 42,447 skills from two marketplaces and analyzing 31,132 with SkillScan (static analysis plus LLM semantic classification). It claims 26.1% of skills contain at least one vulnerability across 14 patterns in four categories (prompt injection, data exfiltration at 13.3%, privilege escalation at 11.8%, supply chain risks), with 5.2% showing high-severity patterns suggesting malicious intent. Skills bundling executable scripts are 2.12x more likely to be vulnerable (OR=2.12, p<0.001). Contributions include a grounded taxonomy from 8,126 vulnerable skills, a detection method validated at 86.7% precision and 82.5% recall, and an open dataset/toolkit.

Significance. If the SkillScan pipeline proves reliable without circular labeling or selection bias, the work is significant as the first quantitative characterization of an emerging attack surface in AI agent frameworks. The scale, taxonomy, and open resources would provide a foundation for future security research and motivate capability-based permissions and vetting in agent platforms.

major comments (3)

[Abstract and Methods] Abstract and Methods: The headline 26.1% vulnerability rate, 5.2% malicious-intent subset, and OR=2.12 rest on SkillScan's 86.7% precision / 82.5% recall. The manuscript provides no details on how ground-truth labels for the validation set were produced (independent human raters vs. LLM-generated), inter-rater agreement, or false-positive measurement protocol. This is load-bearing; circularity in LLM labeling would propagate directly into the 8,126 vulnerable skills count and prevalence statistics.
[Data Collection] Data Collection: The reduction from 42,447 collected skills to 31,132 analyzed ones and the choice of two marketplaces require explicit exclusion criteria, discussion of selection bias, and assessment of representativeness. Without this, the 26.1% rate and category prevalences cannot be generalized beyond the sampled marketplaces.
[Results] Results: The inference that 5.2% of skills exhibit 'high-severity patterns strongly suggesting malicious intent' needs concrete criteria, decision rules, or example patterns used for this classification to avoid subjective over-interpretation.

minor comments (3)

[Abstract] Abstract: State the analyzed sample size (31,132) explicitly in the opening sentence for immediate clarity.
[Results] Results: Report the exact statistical test underlying the OR=2.12 and p<0.001 value, and include confidence intervals.
[Figures] Figures: The taxonomy diagram should include per-pattern counts or percentages alongside the four categories to support the 14-pattern claim.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed comments, which have helped us identify areas where additional transparency is needed. We address each major comment below and have revised the manuscript to incorporate the requested clarifications and details.

read point-by-point responses

Referee: [Abstract and Methods] The headline 26.1% vulnerability rate, 5.2% malicious-intent subset, and OR=2.12 rest on SkillScan's 86.7% precision / 82.5% recall. The manuscript provides no details on how ground-truth labels for the validation set were produced (independent human raters vs. LLM-generated), inter-rater agreement, or false-positive measurement protocol. This is load-bearing; circularity in LLM labeling would propagate directly into the 8,126 vulnerable skills count and prevalence statistics.

Authors: We agree that the validation methodology requires explicit documentation to rule out circularity. The ground-truth labels were produced by two independent human security researchers who annotated a random sample of 500 skills using a structured codebook; the validation set was held out from any LLM prompt development or fine-tuning. Inter-rater agreement reached Cohen's kappa of 0.83. False positives were quantified by expert review of all LLM-positive predictions on the validation set. We have added a new 'Validation Protocol' subsection to the Methods section describing the annotation guidelines, agreement statistics, and discrepancy resolution process. This revision directly addresses the concern and supports the reported precision and recall figures. revision: yes
Referee: [Data Collection] The reduction from 42,447 collected skills to 31,132 analyzed ones and the choice of two marketplaces require explicit exclusion criteria, discussion of selection bias, and assessment of representativeness. Without this, the 26.1% rate and category prevalences cannot be generalized beyond the sampled marketplaces.

Authors: We accept that the filtering steps and potential biases must be stated explicitly. We have expanded the Data Collection section to list the precise exclusion criteria: removal of exact duplicates (by SHA-256 hash), skills with malformed metadata, and non-English or empty content. This accounts for the reduction to 31,132. We added a paragraph discussing selection bias, noting that the two marketplaces were the dominant public sources at collection time, and included a supplementary table comparing category distributions before and after filtering. A limitations statement on generalizability to the broader (rapidly changing) ecosystem has also been inserted. These changes allow readers to evaluate representativeness directly. revision: yes
Referee: [Results] The inference that 5.2% of skills exhibit 'high-severity patterns strongly suggesting malicious intent' needs concrete criteria, decision rules, or example patterns used for this classification to avoid subjective over-interpretation.

Authors: We agree that the high-severity classification must be operationalized. In the revised Results section we now define high-severity patterns as the co-occurrence of at least one privilege-escalation pattern and one data-exfiltration pattern, or any pattern involving remote command execution with credential access. The decision rule and the exact 14-pattern taxonomy are presented in a new table. Three representative skill excerpts are provided as examples. The full decision logic has been moved to the appendix for reproducibility. These additions remove subjectivity while preserving the reported 5.2% figure. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical prevalence measurement

full rationale

The paper performs a direct count of vulnerabilities across 31,132 skills using a static-plus-LLM pipeline (SkillScan). No equations, fitted parameters, or predictions appear; the 26.1% rate, category breakdowns, and OR=2.12 are raw empirical outputs from the scanned corpus. Validation metrics (86.7% precision, 82.5% recall) are stated as external evaluation results without any reduction to the same LLM labels used in production detection or to self-citations. Marketplace filtering and taxonomy derivation are presented as independent steps with no self-referential closure. The study is self-contained against external benchmarks and contains none of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Central claims rest on the representativeness of marketplace samples and the reliability of the multi-stage detector; no free parameters or invented entities are introduced.

axioms (1)

domain assumption Agent skills execute with implicit trust and minimal vetting
Invoked in the abstract to explain the attack surface.

pith-pipeline@v0.9.0 · 5598 in / 1224 out tokens · 47516 ms · 2026-05-14T23:36:14.576840+00:00 · methodology

discussion (0)

Forward citations

Cited by 20 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Under the Hood of SKILL.md: Semantic Supply-chain Attacks on AI Agent Skill Registry
cs.AI 2026-05 unverdicted novelty 8.0

Semantic manipulations of SKILL.md descriptions enable effective supply-chain attacks that bias AI agent skill registries toward adversarial skills in discovery, selection, and governance.
Your Agent Is Mine: Measuring Malicious Intermediary Attacks on the LLM Supply Chain
cs.CR 2026-04 unverdicted novelty 8.0

Malicious LLM API routers actively perform payload injection and secret exfiltration, with 9 of 428 tested routers showing malicious behavior and further poisoning risks from leaked credentials.
Supply-Chain Poisoning Attacks Against LLM Coding Agent Skill Ecosystems
cs.CR 2026-04 unverdicted novelty 8.0

DDIPE poisons LLM agent skills by embedding malicious logic in documentation examples, achieving 11.6-33.5% bypass rates across frameworks while explicit attacks are blocked, with 2.5% evading detection.
Towards Secure Agent Skills: Architecture, Threat Taxonomy, and Security Analysis
cs.CR 2026-04 accept novelty 8.0

Agent Skills has structural security weaknesses from missing data-instruction boundaries, single-approval persistent trust, and absent marketplace reviews that require fundamental redesign.
Do Skill Descriptions Tell the Truth? Detecting Undisclosed Security Behaviors in Code-Backed LLM Skills
cs.CR 2026-05 conditional novelty 7.0

SKILLSCOPE detects undisclosed security behaviors in LLM skill implementations via security property graphs and taxonomy-based consistency checking, identifying confirmed inconsistencies in 9.4% of 4,556 evaluated ski...
Sealing the Audit-Runtime Gap for LLM Skills
cs.CR 2026-05 unverdicted novelty 7.0

SIGIL cryptographically seals the audit-runtime gap for LLM skills via an on-chain registry with four publication types, DAO vetting, and a runtime verification loader that enforces integrity and permissions.
Credential Leakage in LLM Agent Skills: A Large-Scale Empirical Study
cs.CR 2026-04 accept novelty 7.0

Analysis of 17k LLM agent skills reveals 520 vulnerable ones with 1,708 leakage issues, primarily from debug output exposure, with a 10-pattern taxonomy and released dataset for future detection.
SkillSafetyBench: Evaluating Agent Safety under Skill-Facing Attack Surfaces
cs.CR 2026-05 unverdicted novelty 6.0

SkillSafetyBench shows that localized non-user attacks via skills and artifacts can consistently induce unsafe agent behavior across domains and model backends, independent of user intent.
SearchSkill: Teaching LLMs to Use Search Tools with Evolving Skill Banks
cs.AI 2026-05 unverdicted novelty 6.0

SearchSkill introduces an evolving SkillBank and two-stage SFT to make LLM search query planning explicit via skill selection, improving exact match on QA benchmarks and retrieval behavior.
SkillScope: Toward Fine-Grained Least-Privilege Enforcement for Agent Skills
cs.CR 2026-05 unverdicted novelty 6.0

SkillScope detects over-privileged LLM agent skills with 94.53% F1 score via graph analysis and replay validation, finding 7,039 problematic skills in the wild and reducing violations by 88.56% while preserving task c...
Safe Bilevel Delegation (SBD): A Formal Framework for Runtime Delegation Safety in Multi-Agent Systems
cs.AI 2026-04 unverdicted novelty 6.0

SBD is a bilevel optimization framework that learns context-dependent safety weights for runtime task delegation in hierarchical multi-agent systems, with continuous authority transfer alpha and theoretical guarantees...
AgentWard: A Lifecycle Security Architecture for Autonomous AI Agents
cs.CR 2026-04 conditional novelty 6.0

AgentWard organizes stage-specific security controls with cross-layer coordination to intercept threats across the full lifecycle of autonomous AI agents.
RouteGuard: Internal-Signal Detection of Skill Poisoning in LLM Agents
cs.CR 2026-04 unverdicted novelty 6.0

RouteGuard uses response-conditioned attention and hidden-state alignment to detect skill poisoning in LLM agents, achieving 0.8834 F1 on Skill-Inject benchmarks and recovering 90.51% of attacks missed by lexical screening.
SkillSieve: A Hierarchical Triage Framework for Detecting Malicious AI Agent Skills
cs.CR 2026-04 unverdicted novelty 6.0

SkillSieve is a hierarchical triage framework combining regex/AST/XGBoost filtering, parallel LLM subtasks, and multi-LLM jury voting to detect malicious AI agent skills, reaching 0.800 F1 on a 400-skill benchmark at ...
Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw
cs.CR 2026-04 conditional novelty 6.0

Poisoning any single CIK dimension of an AI agent raises average attack success rate from 24.6% to 64-74% across models, and tested defenses leave substantial residual risk.
Safety in Embodied AI: A Survey of Risks, Attacks, and Defenses
cs.CR 2026-03 unverdicted novelty 6.0

The survey organizes over 400 papers on embodied AI safety into a multi-level taxonomy and flags overlooked issues such as fragile multimodal fusion and unstable planning under jailbreaks.
Bilevel Optimization of Agent Skills via Monte Carlo Tree Search
cs.AI 2026-04 unverdicted novelty 5.0

Bilevel optimization with outer-loop MCTS for skill structure and inner-loop LLM refinement improves agent accuracy on an operations-research question-answering dataset.
Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering
cs.SE 2026-04 accept novelty 5.0

LLM agent progress depends on externalizing cognitive functions into memory, skills, protocols, and harness engineering that coordinates them reliably.
Know When to Trust the Skill: Delayed Appraisal and Epistemic Vigilance for Single-Agent LLMs
cs.AI 2026-04 unverdicted novelty 4.0

MESA-S framework translates human metacognitive control into LLMs via delayed procedural probes and Metacognitive Skill Cards to separate parametric certainty from source trust and reduce overthinking.
Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward
cs.MA 2026-02 unverdicted novelty 4.0

The paper surveys agent skills for LLMs across architecture, acquisition, deployment, and security, proposing a four-tier Skill Trust and Lifecycle Governance Framework to address vulnerabilities in community skills.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · cited by 20 Pith papers · 1 internal anchor

[1]

Anonymous. 2025. SkillScan: Dataset, Detection Tools, and Collection Pipeline for Agent Skills Security Research. https://anonymous.4open.science/r/skillscan/. Anonymous repository containing annotated dataset of 31,132 labeled agent skills, automated collection pipeline, and detection framework. Potentially harmful code redacted; malicious skill URLs withheld

work page 2025
[2]

Anthropic. 2024. Model Context Protocol Specification. https:// modelcontextprotocol.io/. Open protocol for AI-tool integration

work page 2024
[3]

Anthropic. 2025. Agent Skills Open Standard Specification. https://agentskills.io. Open standard for portable agent skills, released October 2025

work page 2025
[4]

Anthropic. 2025. Claude Code Documentation. https://docs.anthropic.com/en/ docs/claude-code. Official Claude Code documentation. Conference’17, July 2017, Washington, DC, USA Yi Liu, Weizhe Wang, Ruitao Feng, Yao Zhang, Guangquan Xu, Gelei Deng, Yuekang Li, and Leo Zhang

work page 2025
[5]

Anthropic. 2025. Claude Code Skills Documentation. https://docs.anthropic.com/ en/docs/claude-code/skills. Official documentation for agent skills architecture

work page 2025
[6]

Anthropic. 2025. Disrupting the First Reported AI-Orchestrated Cyber Espionage Campaign. https://www.anthropic.com/news/disrupting-AI-espionage. GTG- 1002 campaign: state-sponsored actors weaponized Claude Code with malicious MCP servers; 80-90% of tactical operations executed autonomously

work page 2025
[7]

Adam Barth, Adrienne Porter Felt, Prateek Saxena, and Aaron Boodman. 2010. Protecting Browsers from Extension Vulnerabilities. InProceedings of the 17th Annual Network and Distributed System Security Symposium (NDSS ’10). Internet Society. Early analysis of Firefox extension security; proposed Chrome extension architecture

work page 2010
[8]

Inga Cherny. 2025. Cato CTRL Threat Research: From Productivity Boost to Ransomware Nightmare – Weaponizing Claude Skills with Medusa- Locker. https://www.catonetworks.com/blog/cato-ctrl-weaponizing-claude- skills-with-medusalocker/. Demonstrates weaponizing legitimate skills to de- liver ransomware; highlights consent gap vulnerability; disclosed to Anth...

work page 2025
[9]

Ruian Duan, Omar Alrawi, Ranjita Pai Kasturi, Ryan Elder, Brendan Saltafor- maggio, and Wenke Lee. 2021. Towards Measuring Supply Chain Attacks on Package Managers for Interpreted Languages. InProceedings of the 2021 Net- work and Distributed System Security Symposium (NDSS ’21). Internet Society. doi:10.14722/ndss.2021.23055 Identified 339 malware packag...

work page doi:10.14722/ndss.2021.23055 2021
[10]

Shehan Edirimannage, Charitha Elvitigala, Asitha Kottahachchi Kankanamge Don, Wathsara Daluwatta, Primal Wijesekara, and Ibrahim Khalil. 2024. Devel- opers Are Victims Too: A Comprehensive Analysis of The VS Code Extension Ecosystem. arXiv:2411.07479 [cs.CR] Analyzed 52,880 extensions, found 5.6% with suspicious behavior

work page arXiv 2024
[11]

Benjamin Eriksson, Pablo Picazo-Sanchez, and Andrei Sabelfeld. 2022. Hard- ening the Security Analysis of Browser Extensions. InProceedings of the 37th ACM/SIGAPP Symposium on Applied Computing (SAC ’22). ACM, 1694–1703. doi:10.1145/3477314.3507098 Found 4,410 extensions stealing search queries; Chalmers University

work page doi:10.1145/3477314.3507098 2022
[12]

Adrienne Porter Felt, Elizabeth Ha, Serge Egelman, Arber Haney, Erika Chin, and David Wagner. 2012. Android Permissions: User Attention, Comprehension, and Behavior. InProceedings of the 8th Symposium on Usable Privacy and Security (SOUPS ’12). ACM. Foundational study on permission consent fatigue

work page 2012
[13]

Google. 2025. Gemini CLI Skills Documentation. https://geminicli.com/docs/ cli/skills. Agent skills for Gemini CLI using SKILL.md format in .gemini/skills/ directory

work page 2025
[14]

Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. 2023. Not What You’ve Signed Up For: Compromising Real- World LLM-Integrated Applications with Indirect Prompt Injection. InProceedings of the 16th ACM Workshop on Artificial Intelligence and Security (AISec ’23). ACM, 79–90. doi:10.1145/3605764.3623985

work page doi:10.1145/3605764.3623985 2023
[15]

Mohammed Mehedi Hasan, Hao Li, Emad Fallahzadeh, Gopi Krishnan Rajbahadur, Bram Adams, and Ahmed E Hassan. 2025. Model context protocol (mcp) at first glance: Studying the security and maintainability of mcp servers.arXiv preprint arXiv:2506.13538(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[16]

Richard Landis and Gary G

J. Richard Landis and Gary G. Koch. 1977. The Measurement of Observer Agree- ment for Categorical Data.Biometrics33, 1 (1977), 159–174. doi:10.2307/2529310 Classic reference for interpreting Cohen’s kappa: 0.61–0.80 substantial, 0.81–1.00 almost perfect agreement

work page doi:10.2307/2529310 1977
[17]

Yupei Liu, Yuqi Jia, Runpeng Geng, Jinyuan Jia, and Neil Zhenqiang Gong. 2024. Formalizing and Benchmarking Prompt Injection Attacks and Defenses. InPro- ceedings of the 33rd USENIX Security Symposium. USENIX Association, 1831–1847. Penn State and Duke University

work page 2024
[18]

Ari Marzouk. 2025. IDEsaster: 30+ Critical Vulnerabilities Found in AI IDEs (Cur- sor, Copilot, Windsurf). https://techbytes.app/posts/idesaster-ai-ide-security- vulnerabilities/. 24 CVEs across AI-powered IDEs; attack chain: prompt injection to tool misuse to IDE feature exploitation

work page 2025
[19]

MITRE Corporation. 2024. MITRE ATT&CK: Privilege Escalation. https://attack. mitre.org/tactics/TA0004/. Adversary tactics and techniques: privilege escalation defined as techniques to gain higher-level permissions on a system or network

work page 2024
[20]

Marc Ohm, Henrik Plate, Arnold Sykosch, and Michael Meier. 2020. Backstabber’s Knife Collection: A Review of Open Source Software Supply Chain Attacks. In Proceedings of the 17th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (DIMV A ’20). Springer, 23–43

work page 2020
[21]

OpenAI. 2025. Codex CLI Skills Documentation. https://developers.openai.com/ codex/skills/. Agent skills for Codex CLI using SKILL.md format in .codex/skills/ directory

work page 2025
[22]

OWASP Foundation. 2025. OWASP Top 10 for Large Language Model Ap- plications. https://owasp.org/www-project-top-10-for-large-language-model- applications/. Industry standard taxonomy of LLM security risks including prompt injection, insecure output handling, supply chain vulnerabilities, and data leakage

work page 2025
[23]

OWASP GenAI Security Project. 2025. OWASP Top 10 for Agentic Applications. https://genai.owasp.org/2025/12/09/owasp-top-10-for-agentic-applications- the-benchmark-for-agentic-security-in-the-age-of-autonomous-ai/. Industry framework for agentic AI risks: Agent Goal Hijack, Identity Abuse, RCE, Tool Misuse, Supply Chain, Memory Poisoning, etc

work page 2025
[24]

OX Security Research. 2025. 900K Users Compromised: Chrome Extensions Steal ChatGPT and DeepSeek Conversations. https://www.ox.security/blog/ malicious-chrome-extensions-steal-chatgpt-deepseek-conversations/. Malicious AI-themed extensions with 900K+ downloads exfiltrating LLM conversations

work page 2025
[25]

Pillar Security. 2025. New Vulnerability in GitHub Copilot and Cursor: How Hackers Can Weaponize Code Agents. https://www.pillar.security/blog/new- vulnerability-in-github-copilot-and-cursor-how-hackers-can-weaponize- code-agents. Rules File Backdoor: supply chain attack via hidden instructions in AI IDE config files

work page 2025
[26]

Protect AI. 2024. LLM Guard: The Security Toolkit for LLM Interactions. https://llm-guard.com/. Open-source library providing modular input/output scanners for LLM security: prompt injection detection, PII anonymization, secrets detection, toxicity filtering, and more

work page 2024
[27]

Rogan and Beth Gladen

Walter J. Rogan and Beth Gladen. 1978. Estimating Prevalence from the Results of a Screening Test.American Journal of Epidemiology107, 1 (1978), 71–76. doi:10.1093/oxfordjournals.aje.a112510 Standard epidemiological method for correcting prevalence estimates when using imperfect diagnostic tests

work page doi:10.1093/oxfordjournals.aje.a112510 1978
[28]

David Schmotz, Sahar Abdelnabi, and Maksym Andriushchenko. 2025. Agent Skills Enable a New Class of Realistic and Trivially Simple Prompt Injections. arXiv:2510.26328 [cs.CR] Demonstrates prompt injection through agent skill files; shows how to bypass Claude Code guardrails

work page arXiv 2025
[29]

Xinyue Shen, Zeyuan Chen, Michael Backes, Yun Shen, and Yang Zhang

work page
[30]

Do Anything Now

“Do Anything Now”: Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models. InProceedings of the 2024 ACM SIGSAC Conference on Computer and Communications Security (CCS ’24)(Salt Lake City, UT, USA). ACM. doi:10.1145/3658644.3670388

work page doi:10.1145/3658644.3670388 2024
[31]

Xinyue Shen, Yun Shen, Michael Backes, and Yang Zhang. 2025. GPTracker: A Large-Scale Measurement of Misused GPTs. InProceedings of the 2025 IEEE Symposium on Security and Privacy (S&P ’25). IEEE. Collected 755,297 GPTs and identified 2,051 misused GPTs

work page 2025
[32]

Shreya Singh, Gaurav Varshney, Tarun Kumar Singh, Vidhi Mishra, and Khushi Verma. 2025. A Study on Malicious Browser Extensions in 2025. arXiv:2503.04292 [cs.CR] IIT Jammu

work page arXiv 2025
[33]

SkillsMP. 2025. SkillsMP: Agent Skills Marketplace. https://skillsmp.com. Community-driven marketplace aggregating skills from public GitHub reposito- ries; provides search, categorization, and quality indicators

work page 2025
[34]

Skills.rest. 2025. Skills.rest: Agent Skills Registry. https://skills.rest. Community registry for agent skills with automated indexing from GitHub repositories

work page 2025
[35]

Ohlsson, Björn Regnell, and Anders Wesslén

Claes Wohlin, Per Runeson, Martin Höst, Magnus C. Ohlsson, Björn Regnell, and Anders Wesslén. 2012.Experimentation in Software Engineering. Springer. doi:10. 1007/978-3-642-29044-2 Standard reference for validity threats in empirical software engineering: construct, internal, external, and conclusion validity

work page 2012
[36]

Junan Zhang, Kaifeng Huang, Yiheng Huang, Bihuan Chen, Ruisi Wang, Chong Wang, and Xin Yi Peng. 2025. Killing Two Birds with One Stone: Malicious Package Detection in NPM and PyPI using a Single Model of Malicious Behavior Sequence.ACM Transactions on Software Engineering and Methodology34, 4 (2025), 1–28. doi:10.1145/3705304 Detected 683 and 799 new mali...

work page doi:10.1145/3705304 2025
[37]

Markus Zimmermann, Cristian-Alexandru Staicu, Cam Tenny, and Michael Pradel

work page
[38]

Vulnerability

Small World with High Risks: A Study of Security Threats in the npm Ecosystem. InProceedings of the 28th USENIX Security Symposium. USENIX Asso- ciation, 995–1010. A Skill Author Analysis To understand the agent skill ecosystem’s risk landscape, we col- lected and analyzed repository metadata for all skill authors. This analysis reveals patterns in author...

work page 2017
[39]

** Prompt Injection Risk ** - P1 : Instruction Override ( High ) - explicit commands to ignore constraints - P2 : Hidden Instructions ( High ) - directives in comments / markup - P3 : Exfiltration Commands ( High ) - instructions to transmit context externally - P4 : Behavior Manipulation ( Medium ) - subtle decision alterations

work page
[40]

** Data Exfiltration Risk ** - E1 : External Transmission ( Medium ) - data sent to hardcoded URLs - E2 : Env Variable Harvesting ( High ) - collecting secrets from environment - E3 : File System Enumeration ( Medium ) - scanning for sensitive files - E4 : Context Leakage ( High ) - transmitting conversation context

work page
[41]

** Privilege Escalation Risk ** - PE1 : Excessive Permissions ( Low ) - scope beyond stated functionality - PE2 : Sudo / Root Execution ( Medium ) - elevated privileges without justification - PE3 : Credential Access ( High ) - reading auth tokens , keys , passwords

work page
[42]

E1 " ,

** Supply Chain Risk ** - SC1 : Unpinned Dependencies ( Low ) - no version constraints - SC2 : External Script Fetching ( High ) - runtime download andexecute - SC3 : Obfuscated Code ( High ) - intentionally obscured logic # # Output Format For each dimension , provide : - confidence : 0.0 -1.0 ( your certainty in the assessment ) Conference’17, July 2017...

work page 2017