SkillJect: Effectively Automating Skill-Based Prompt Injection for Skill-Enabled Agents
Pith reviewed 2026-05-21 12:41 UTC · model grok-4.3
The pith
SkillJect automates the generation of poisoned skills that inject hidden commands into LLM agents by hiding payloads in helper scripts and front-loading instructions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SkillJect decomposes the attack into an artifact channel that conceals the payload in a helper script and an instruction channel that rewrites SKILL.md with front-loaded inducement, explicitly referencing the script path and framing it as a mandatory prerequisite. It then applies a closed-loop multi-agent process in which an Attack Agent produces the poisoned skill, a Victim Agent runs downstream tasks, and an Evaluate Agent inspects execution traces to confirm payload execution, allowing the Attack Agent to diagnose failures and iteratively improve SKILL.md while leaving the payload unchanged.
What carries the argument
Dual-channel attack that hides the payload in an auxiliary script while using front-loaded prerequisite framing in SKILL.md, coordinated through a closed-loop multi-agent feedback process that refines instructions based on execution traces.
If this is right
- Poisoned skills can be produced automatically at scale rather than through brittle manual crafting.
- The same attack succeeds across different agent platforms and underlying LLMs.
- Reusable skill libraries create a persistent attack surface that direct or manual injections cannot exploit as effectively.
- Front-loaded instructions that present a helper script as a required initialization step bypass agent safeguards more reliably than explicit malicious prompts.
Where Pith is reading between the lines
- Skill marketplaces would need verification steps or sandboxing for uploaded files to limit this vector.
- The dual-channel approach could apply to other modular AI components such as plugins or tool extensions.
- Defensive testing frameworks might adopt similar multi-agent loops to probe and strengthen skills before release.
Load-bearing premise
The Evaluate Agent can reliably inspect execution traces to determine whether the hidden payload executed, enabling the Attack Agent to successfully rewrite SKILL.md while keeping the payload fixed.
What would settle it
If the generated poisoned skills fail to produce execution of the hidden payload in the majority of test runs across multiple platforms and backend LLMs, the claim of substantially improved attack effectiveness would not hold.
Figures
read the original abstract
Agent skills are increasingly used to extend LLM agents with task-specific instructions, executable scripts, and auxiliary resources. While improving reusability, this modular design also introduces a new supply-chain attack surface: a malicious or compromised skill may be repeatedly loaded as trusted guidance and steer an agent's tool use during downstream execution. Existing skill-based prompt-injection attacks are mostly manual and brittle, as explicit malicious instructions are often rejected or ignored when poorly aligned with the original skill workflow. We propose SkillJect, the first automated framework for generating effective poisoned skills against skill-enabled agent systems. SkillJect decomposes the attack into two coordinated channels. In the artifact channel, it hides the malicious payload in an auxiliary helper script. In the instruction channel, it rewrites SKILL.md using a front-loaded inducement strategy, placing injected content at the beginning and framing the helper script as a mandatory prerequisite or first step. The instruction explicitly references the helper-script path and provides an executable command, making the helper appear to be a legitimate initialization step before normal operations. SkillJect further adopts a closed-loop multi-agent process to improve attack performance. An Attack Agent generates poisoned skills, a Victim Agent executes downstream tasks with them, and an Evaluate Agent inspects execution traces to determine whether the hidden payload is executed. The Attack Agent then uses this feedback to diagnose failures and rewrite SKILL.md, while keeping the payload fixed. Experiments across platforms, backend LLMs, and attack categories show that SkillJect substantially outperforms naive direct injection and prior manual attacks, revealing poisoned skills as a persistent attack vector in reusable skill ecosystems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces SkillJect, the first automated framework for generating poisoned skills to perform prompt injection on skill-enabled LLM agents. It decomposes attacks into an artifact channel (hiding payloads in helper scripts) and an instruction channel (rewriting SKILL.md with front-loaded inducements that frame the helper as a mandatory initialization step). A closed-loop multi-agent process is used: an Attack Agent generates candidates, a Victim Agent executes downstream tasks, and an Evaluate Agent inspects traces to determine payload execution, feeding back to the Attack Agent for iterative SKILL.md rewrites while keeping the payload fixed. Experiments across platforms, backend LLMs, and attack categories claim that SkillJect substantially outperforms naive direct injection and prior manual attacks.
Significance. If the empirical results hold, this work is significant because it automates and demonstrates the persistence of a supply-chain attack vector in reusable agent skill ecosystems, moving beyond brittle manual attacks. The closed-loop multi-agent feedback mechanism for attack refinement is a methodological contribution that could generalize to other agent security problems. Cross-platform and cross-LLM validation adds practical relevance for the security community.
major comments (2)
- [Section 3] Section 3 (Closed-loop Multi-agent Process): The headline outperformance claim depends on the Evaluate Agent reliably determining from execution traces whether the hidden payload executed. The manuscript provides no details on the inspection procedure, decision criteria, handling of ambiguous logs, or validation against ground-truth cases, so it is unclear whether the feedback signal is accurate or whether the loop is optimizing against noise.
- [Section 4] Section 4 (Experiments): The central claim that SkillJect 'substantially outperforms' baselines is load-bearing for the paper's contribution, yet the text supplies no information on the precise success metric, the concrete implementation of the 'naive direct injection' and 'prior manual attacks' baselines, the number of trials, variance, or any statistical tests. Without these, the reported gains cannot be assessed.
minor comments (2)
- [Introduction] The distinction between the artifact channel and instruction channel should be defined explicitly with a short table or diagram in the introduction for clarity.
- [Section 4] Ensure all experimental figures include error bars or confidence intervals and label the y-axis with the exact success metric used.
Simulated Author's Rebuttal
We appreciate the referee's detailed feedback on our manuscript. We have carefully considered each comment and provide point-by-point responses below. Where the comments highlight areas needing additional clarification or detail, we have revised the manuscript accordingly.
read point-by-point responses
-
Referee: [Section 3] Section 3 (Closed-loop Multi-agent Process): The headline outperformance claim depends on the Evaluate Agent reliably determining from execution traces whether the hidden payload executed. The manuscript provides no details on the inspection procedure, decision criteria, handling of ambiguous logs, or validation against ground-truth cases, so it is unclear whether the feedback signal is accurate or whether the loop is optimizing against noise.
Authors: We agree that additional details are required. In the revised manuscript, we will include a comprehensive description of the Evaluate Agent's inspection procedure, including the specific decision criteria used to determine payload execution from traces, protocols for handling ambiguous or incomplete logs (such as treating them as non-execution to avoid false positives), and results from a validation study comparing the agent's assessments to human-annotated ground truth on a subset of cases. This will clarify the accuracy of the feedback signal and demonstrate that the optimization loop is not driven by noise. revision: yes
-
Referee: [Section 4] Section 4 (Experiments): The central claim that SkillJect 'substantially outperforms' baselines is load-bearing for the paper's contribution, yet the text supplies no information on the precise success metric, the concrete implementation of the 'naive direct injection' and 'prior manual attacks' baselines, the number of trials, variance, or any statistical tests. Without these, the reported gains cannot be assessed.
Authors: We recognize that the experimental section lacks critical details for reproducibility and assessment. We will revise Section 4 to explicitly define the success metric as the proportion of executions where the payload is triggered and completes its intended action. We will describe the implementation of the naive direct injection baseline as direct embedding of malicious instructions in the skill description without auxiliary scripts or framing. For prior manual attacks, we will detail how we replicated the approaches from the cited literature within our experimental framework. Furthermore, we will specify the number of trials conducted, report measures of variance such as standard deviation, and include appropriate statistical tests to support the significance of the observed performance differences. revision: yes
Circularity Check
No circularity: empirical attack-generation method with independent experimental validation
full rationale
The paper proposes SkillJect as a practical framework decomposing attacks into artifact and instruction channels, augmented by a closed-loop multi-agent feedback process (Attack Agent, Victim Agent, Evaluate Agent). Central claims rest on empirical outperformance across platforms, LLMs, and attack categories versus naive injection and manual baselines. No mathematical derivations, equations, fitted parameters, or self-citations appear in the provided text that would reduce any result to its inputs by construction. The evaluation is externally falsifiable via replication, making the work self-contained against benchmarks rather than circular.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Skill-enabled agents load SKILL.md and auxiliary scripts as trusted guidance and execute referenced commands without additional verification or sandboxing.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
SkillJect decomposes the attack into two coordinated channels... closed-loop multi-agent process... Evaluate Agent inspects execution traces
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Experiments across platforms, backend LLMs, and attack categories show that SkillJect substantially outperforms naive direct injection
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 9 Pith papers
-
Exploiting LLM Agent Supply Chains via Payload-less Skills
Semantic Compliance Hijacking lets attackers hijack LLM agents by disguising malicious instructions as compliance rules in skills, reaching up to 77.67% success on confidentiality breaches and 67.33% on RCE while evad...
-
Behavioral Integrity Verification for AI Agent Skills
BIV audits AI agent skills at scale, finding 80% deviate from declared behavior on 49,943 skills and achieving 0.946 F1 for malicious skill detection.
-
Red-Teaming Agent Execution Contexts: Open-World Security Evaluation on OpenClaw
DeepTrap automates discovery of contextual vulnerabilities in OpenClaw agents via trajectory optimization, showing that unsafe behavior can be induced while preserving task completion and that final-response checks ar...
-
Defense effectiveness across architectural layers: a mechanistic evaluation of persistent memory attacks on stateful LLM agents
A memory-layer defense called Memory Sandbox stops persistent memory attacks on most LLM agents while other layer defenses fail.
-
SkillScope: Toward Fine-Grained Least-Privilege Enforcement for Agent Skills
SkillScope detects over-privileged LLM agent skills with 94.53% F1 score via graph analysis and replay validation, finding 7,039 problematic skills in the wild and reducing violations by 88.56% while preserving task c...
-
SkillSieve: A Hierarchical Triage Framework for Detecting Malicious AI Agent Skills
SkillSieve is a hierarchical triage framework combining regex/AST/XGBoost filtering, parallel LLM subtasks, and multi-LLM jury voting to detect malicious AI agent skills, reaching 0.800 F1 on a 400-skill benchmark at ...
-
Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw
Poisoning any single CIK dimension of an AI agent raises average attack success rate from 24.6% to 64-74% across models, and tested defenses leave substantial residual risk.
-
Safety in Embodied AI: A Survey of Risks, Attacks, and Defenses
The survey organizes over 400 papers on embodied AI safety into a multi-level taxonomy and flags overlooked issues such as fragile multimodal fusion and unstable planning under jailbreaks.
-
Structured Security Auditing and Robustness Enhancement for Untrusted Agent Skills
SkillGuard-Robust formulates pre-load auditing of untrusted Agent Skills as a three-way classification task and achieves 97.30% exact match and 98.33% malicious-risk recall on held-out benchmarks.
Reference graph
Works this paper leans on
-
[1]
Claude code skills documentation.https:// docs.anthropic.com/en/docs/claude-code /skills, 2025
Anthropic. Claude code skills documentation.https:// docs.anthropic.com/en/docs/claude-code /skills, 2025. Official documentation for agent skills architecture. 2
work page 2025
-
[2]
Claude code documentation.https://docs .anthropic.com/en/docs/claude-code, 2025
Anthropic. Claude code documentation.https://docs .anthropic.com/en/docs/claude-code, 2025. Official Claude Code documentation. 2
work page 2025
-
[3]
Defending against prompt in- jection with a few defensivetokens
Sizhe Chen, Yizhu Wang, Nicholas Carlini, Chawin Sitawarin, and David Wagner. Defending against prompt in- jection with a few defensivetokens. InProceedings of the 18th ACM Workshop on Artificial Intelligence and Security, pages 242–252, 2025. 3
work page 2025
-
[4]
Se- calign: Defending against prompt injection with preference optimization
Sizhe Chen, Arman Zharmagambetov, Saeed Mahloujifar, Kamalika Chaudhuri, David Wagner, and Chuan Guo. Se- calign: Defending against prompt injection with preference optimization. InProceedings of the 2025 ACM SIGSAC Con- ference on Computer and Communications Security, pages 2833–2847, 2025. 3
work page 2025
-
[5]
Gemini CLI skills documentation.https://ge minicli.com/docs/cli/skills, 2025
Google. Gemini CLI skills documentation.https://ge minicli.com/docs/cli/skills, 2025. Agent skills for Gemini CLI using SKILL.md format in .gemini/skills/ directory. 2
work page 2025
-
[6]
Efficient universal goal hijacking with semantics-guided prompt orga- nization
Yihao Huang, Chong Wang, Xiaojun Jia, Qing Guo, Felix Juefei-Xu, Jian Zhang, Yang Liu, and Geguang Pu. Efficient universal goal hijacking with semantics-guided prompt orga- nization. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5796–5816, 2025. 3 9
work page 2025
-
[7]
Md Ashraful Islam, Mohammed Eunus Ali, and Md Rizwan Parvez. Mapcoder: Multi-agent code generation for com- petitive problem solving.arXiv preprint arXiv:2405.11403,
-
[8]
arXiv preprint arXiv:2405.21018
Xiaojun Jia, Tianyu Pang, Chao Du, Yihao Huang, Jindong Gu, Yang Liu, Xiaochun Cao, and Min Lin. Improved tech- niques for optimization-based jailbreaking on large language models.arXiv preprint arXiv:2405.21018, 2024. 3
-
[9]
Omnisafebench-mm: A unified benchmark and toolbox for multimodal jailbreak attack-defense evaluation
Xiaojun Jia, Jie Liao, Qi Guo, Teng Ma, Simeng Qin, Ran- jie Duan, Tianlin Li, Yihao Huang, Zhitao Zeng, Dongxian Wu, et al. Omnisafebench-mm: A unified benchmark and toolbox for multimodal jailbreak attack-defense evaluation. arXiv preprint arXiv:2512.06589, 2025. 3
-
[10]
Prompt Injection attack against LLM-integrated Applications
Yi Liu, Gelei Deng, Yuekang Li, Kailong Wang, Zihao Wang, Xiaofeng Wang, Tianwei Zhang, Yepang Liu, Haoyu Wang, Yan Zheng, et al. Prompt injection attack against llm- integrated applications.arXiv preprint arXiv:2306.05499,
work page internal anchor Pith review Pith/arXiv arXiv
-
[11]
Formalizing and benchmarking prompt injection attacks and defenses
Yupei Liu, Yuqi Jia, Runpeng Geng, Jinyuan Jia, and Neil Zhenqiang Gong. Formalizing and benchmarking prompt injection attacks and defenses. In33rd USENIX Se- curity Symposium (USENIX Security 24), pages 1831–1847,
-
[12]
Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale
Yi Liu, Weizhe Wang, Ruitao Feng, Yao Zhang, Guangquan Xu, Gelei Deng, Yuekang Li, and Leo Zhang. Agent skills in the wild: An empirical study of security vulnerabilities at scale.arXiv preprint arXiv:2601.10338, 2026. 2, 3
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[13]
Agent skills in the wild: An empirical study of security vulnerabilities at scale, 2026
Yi Liu, Weizhe Wang, Ruitao Feng, Yao Zhang, Guangquan Xu, Gelei Deng, Yuekang Li, and Leo Zhang. Agent skills in the wild: An empirical study of security vulnerabilities at scale, 2026. 8, 9
work page 2026
-
[14]
Advancing tool-augmented large language models via meta-verification and reflection learning
Zhiyuan Ma, Jiayu Liu, Xianzhen Luo, Zhenya Huang, Qingfu Zhu, and Wanxiang Che. Advancing tool-augmented large language models via meta-verification and reflection learning. InProceedings of the 31st ACM SIGKDD Confer- ence on Knowledge Discovery and Data Mining V . 2, pages 2078–2089, 2025. 2
work page 2078
-
[15]
Code like humans: A multi-agent solution for medical coding
Andreas Motzfeldt, Joakim Edin, Casper L Christensen, Christian Hardmeier, Lars Maaløe, and Anna Rogers. Code like humans: A multi-agent solution for medical coding. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 22612–22627. Association for Compu- tational Linguistics, 2025. 2
work page 2025
-
[16]
Codex CLI skills documentation.h t t p s : //developers.openai.com/codex/skills/,
OpenAI. Codex CLI skills documentation.h t t p s : //developers.openai.com/codex/skills/,
-
[17]
Agent skills for Codex CLI using SKILL.md format in .codex/skills/ directory. 2
-
[18]
David Schmotz, Sahar Abdelnabi, and Maksym An- driushchenko. Agent skills enable a new class of realis- tic and trivially simple prompt injections.arXiv preprint arXiv:2510.26328, 2025. 2, 3
-
[19]
Akashah Shabbir, Muhammad Akhtar Munir, Akshay Dud- hane, Muhammad Umer Sheikh, Muhammad Haris Khan, Paolo Fraccaro, Juan Bernabe Moreno, Fahad Shahbaz Khan, and Salman Khan. Thinkgeo: Evaluating tool- augmented agents for remote sensing tasks.arXiv preprint arXiv:2505.23752, 2025. 2
-
[20]
SkillsMP: Agent skills marketplace.https: //skillsmp.com, 2025
SkillsMP. SkillsMP: Agent skills marketplace.https: //skillsmp.com, 2025. Community-driven marketplace aggregating skills from public GitHub repositories; provides search, categorization, and quality indicators. 3
work page 2025
-
[21]
Skills.rest: Agent skills registry.https:// skills.rest, 2025
Skills.rest. Skills.rest: Agent skills registry.https:// skills.rest, 2025. Community registry for agent skills with automated indexing from GitHub repositories. 3
work page 2025
-
[22]
Manipulating multimodal agents via cross-modal prompt injection
Le Wang, Zonghao Ying, Tianyuan Zhang, Siyuan Liang, Shengshan Hu, Mingchuan Zhang, Aishan Liu, and Xiang- long Liu. Manipulating multimodal agents via cross-modal prompt injection. InProceedings of the 33rd ACM Inter- national Conference on Multimedia, pages 10955–10964,
-
[23]
Webinject: Prompt injection attack to web agents
Xilong Wang, John Bloch, Zedian Shao, Yuepeng Hu, Shuyan Zhou, and Neil Zhenqiang Gong. Webinject: Prompt injection attack to web agents. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Pro- cessing, pages 2010–2030, 2025. 3
work page 2025
-
[24]
Jailbreak Attacks and Defenses Against Large Language Models: A Survey
Sibo Yi, Yule Liu, Zhen Sun, Tianshuo Cong, Xinlei He, Ji- axing Song, Ke Xu, and Qi Li. Jailbreak attacks and defenses against large language models: A survey.arXiv preprint arXiv:2407.04295, 2024. 3
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[25]
Kechi Zhang, Jia Li, Ge Li, Xianjie Shi, and Zhi Jin. Codeagent: Enhancing code generation with tool-integrated agent systems for real-world repo-level coding challenges. arXiv preprint arXiv:2401.07339, 2024. 2
-
[26]
agentar: Creating augmented real- ity applications with tool-augmented llm-based autonomous agents
Chenfei Zhu, Shao-Kang Hsia, Xiyun Hu, Ziyi Liu, Jingyu Shi, and Karthik Ramani. agentar: Creating augmented real- ity applications with tool-augmented llm-based autonomous agents. InProceedings of the 38th Annual ACM Sympo- sium on User Interface Software and Technology, pages 1– 23, 2025. 2 10
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.