pith. sign in

arxiv: 2604.03070 · v2 · pith:UV5HTIH5new · submitted 2026-04-03 · 💻 cs.CR · cs.AI

How Your Credentials Are Leaked by LLM Agent Skills: An Empirical Study

Pith reviewed 2026-05-13 19:48 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords credential leakageLLM agentsthird-party skillsvulnerability analysisprompt injectiondebug loggingempirical studysecurity taxonomy
0
0 comments X

The pith

Third-party LLM agent skills leak credentials in over 500 cases through debug logs and prompt injections.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper conducts the first large-scale empirical study of credential leakage in third-party skills that extend LLM agents. By sampling and analyzing 17,022 skills from a platform containing 170,226 total entries, using static analysis, sandbox testing, and manual review, the work identifies 520 vulnerable skills containing 1,708 issues. These issues fall into a taxonomy of 10 leakage patterns, split between accidental and adversarial types. The results matter because the leaked credentials are frequently exploitable without extra privileges, remain available in skill forks, and arise mostly from everyday coding practices like debug logging that expose data to the LLM.

Core claim

The authors establish that 520 skills leak credentials, distributed across 10 patterns where debug logging via print and console.log causes 73.5 percent of cases, 76.3 percent of leaks require combined analysis of code and natural language, 3.1 percent arise purely from prompt injection, and 89.6 percent of leaked credentials are exploitable without privileges while persisting in forks even after upstream fixes.

What carries the argument

A taxonomy of 10 leakage patterns (4 accidental and 6 adversarial) derived from static analysis, sandbox testing, and manual inspection of skills.

If this is right

  • Debug logging statements cause 73.5 percent of leaks because their output reaches the LLM through stdout.
  • 76.3 percent of leaks are cross-modal and cannot be found without examining both code and natural language together.
  • 89.6 percent of leaked credentials can be used directly without any special privileges.
  • Credentials remain present in forks even after the original skill is updated or fixed.
  • Public disclosure resulted in removal of all malicious skills and fixes to 91.6 percent of hardcoded credential cases.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Skill platforms could integrate automated scanning for these 10 patterns before allowing public distribution.
  • Developers should receive explicit guidance to avoid any logging that writes credentials to outputs accessible by the LLM.
  • Persistence across forks indicates that public repositories for skills need stronger secret-masking practices.
  • Applying the same taxonomy to skills on other agent platforms would test whether the patterns are platform-specific.

Load-bearing premise

The sampled skills accurately represent the full population on the platform and the chosen detection methods reliably identify every leakage pattern present.

What would settle it

Repeating the full analysis on the unsampled skills or applying runtime execution monitoring to check whether additional leaks appear beyond those found by static and sandbox methods.

Figures

Figures reproduced from arXiv: 2604.03070 by Gelei Deng, Jianting Ning, Lei Ma, Leo Yu Zhang, Yanjun Zhang, Yi Liu, Ying Zhang, Yuekang Li, Zhihao Chen, Zhiqiang Li.

Figure 1
Figure 1. Figure 1: A real-world credential leakage case discovered in [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The Overview of the Methodology. The study proceeds through four phases: (1) dataset collection of 17,022 skills [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Distribution of Hardcoded Credential types. [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Exploitation Type Distribution Exploitation Lifecycle. We traced all 520 skills across the five lifecycle phases: install, load, configure, execute, and per￾sist [7]. 92.5% (481/520) become exploitable during the execute phase, when credentials are instantiated, sent to external APIs, and written to output streams. Once leaked, credentials persist beyond upstream fixes: of the 107 repositories that removed… view at source ↗
read the original abstract

Large Language Model (LLM) agents increasingly rely on third-party skills that operate within privileged execution environments and routinely handle sensitive credentials, yet how these credentials are leaked remains largely unexplored. To fill this gap, we present the first large-scale empirical study on credential leakage in agent skills. From 170,226 artifacts on SkillsMP, the largest open-source skill marketplace, we sampled 17,022 skills via stratified random sampling and analyzed each through static secret extraction (regex and AST parsing), dynamic sandbox testing with mock credentials, and cross-referencing developer intent against runtime behavior. Our analysis identifies 520 affected skills containing 1,708 security issues, and yields a taxonomy of 10 leakage patterns. Three findings stand out. First, 76.3% of cases require jointly analyzing natural-language descriptions and programming logic, showing that credential exposure in skills is fundamentally cross-modal. Second, debug logging accounts for 73.5% of vulnerabilities because agent frameworks feed stdout into the LLM context window, turning routine debugging into a credential exposure vector. Third, 89.6% of leaked credentials are immediately exploitable -- 92.5% during routine execution without elevated privileges -- and the fork-based distribution model defeats remediation, as secrets removed from 107 upstream repositories persist across 50+ independent forks. Following responsible disclosure, all malicious skills have been removed and 91.6% of hardcoded cases remediated. We release our dataset, taxonomy, and detection pipeline to support future agent security research.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents the first large-scale empirical study of credential leakage in third-party LLM agent skills, sampling 17,022 skills from the 170,226 available on SkillsMP. Using a combination of static analysis, sandbox testing, and manual inspection, the authors identify 520 vulnerable skills containing 1,708 issues and derive a taxonomy of 10 leakage patterns (4 accidental, 6 adversarial). Key findings include that 76.3% of leaks require joint code and natural-language analysis, 73.5% stem from debug logging (print/console.log exposing stdout to LLMs), 89.6% of leaked credentials are exploitable without privileges, and leaks persist in forks. After responsible disclosure, all malicious skills were removed and 91.6% of hardcoded credentials were fixed. The authors release the dataset, taxonomy, and detection pipeline.

Significance. If the empirical pipeline and counts hold, the work establishes a concrete baseline for credential-leakage prevalence in LLM agents and supplies a reusable taxonomy plus open artifacts that directly support follow-on detection research and platform policy. The cross-modal emphasis and persistence-in-forks observation are particularly actionable for agent runtime design.

major comments (2)
  1. [Methodology] Methodology section: the sampling procedure used to obtain the 17,022-skill subset from the full 170,226 is described only at the aggregate level; without explicit randomness, stratification, or exclusion criteria, it is impossible to evaluate whether the reported 520 vulnerable skills and 3.05% rate are representative or subject to selection bias.
  2. [Results and Taxonomy] Results and Taxonomy sections: the classification rules that map the 1,708 raw issues into the 10-pattern taxonomy (including the 76.3% cross-modal and 3.1% pure-prompt-injection splits) are not stated with sufficient operational detail; this leaves the boundary between accidental and adversarial categories open to post-hoc adjustment.
minor comments (2)
  1. [Abstract] Abstract: the statement that 'all malicious skills were removed' would be more informative if the absolute count of malicious skills were supplied alongside the 520 total vulnerable skills.
  2. [Figures and Tables] Figure and table captions: several summary tables listing per-pattern counts or exploitability percentages lack explicit column definitions or confidence intervals, reducing immediate interpretability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and positive overall assessment. We address each major comment below and will revise the manuscript to provide greater operational detail on sampling and classification.

read point-by-point responses
  1. Referee: [Methodology] Methodology section: the sampling procedure used to obtain the 17,022-skill subset from the full 170,226 is described only at the aggregate level; without explicit randomness, stratification, or exclusion criteria, it is impossible to evaluate whether the reported 520 vulnerable skills and 3.05% rate are representative or subject to selection bias.

    Authors: We agree that the sampling description was insufficiently detailed. In the revised manuscript we will expand the Methodology section to state that the 17,022 skills were obtained by uniform random sampling from the full 170,226 skills on SkillsMP (using a fixed random seed for reproducibility), with exclusion criteria limited to skills that failed to parse due to encoding errors or exceeded a 100k-token size limit. No stratification was applied. We will also add the exact sampling parameters, a short discussion of representativeness, and a note on why random sampling was chosen over stratified approaches. revision: yes

  2. Referee: [Results and Taxonomy] Results and Taxonomy sections: the classification rules that map the 1,708 raw issues into the 10-pattern taxonomy (including the 76.3% cross-modal and 3.1% pure-prompt-injection splits) are not stated with sufficient operational detail; this leaves the boundary between accidental and adversarial categories open to post-hoc adjustment.

    Authors: We accept that the classification rules lacked sufficient operational specificity. In the revision we will insert a dedicated subsection in the Taxonomy section that defines the decision criteria explicitly: an issue is labeled cross-modal if it requires joint inspection of code and natural-language prompts; it is accidental if the leakage stems from standard debugging constructs (e.g., print/console.log) without any exfiltration-oriented prompt engineering; it is adversarial if the skill contains deliberate patterns such as prompt-injection templates or hidden exfiltration instructions. We will report inter-annotator agreement (Cohen’s kappa), provide a decision flowchart, and include one canonical example per pattern to make the mapping reproducible and the accidental/adversarial boundary unambiguous. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

This is a purely empirical measurement study. The authors sample 17,022 skills from SkillsMP, apply static analysis plus sandbox testing plus manual review, count 520 vulnerable skills and 1,708 issues, and induce a 10-pattern taxonomy directly from the observed cases. No equations, fitted parameters, derivations, or self-referential definitions appear. The taxonomy is an inductive classification of detected patterns rather than a renaming or self-definition of the input data. No load-bearing self-citations or uniqueness theorems are invoked. The pipeline is externally falsifiable against the released dataset and detection code.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Empirical security measurement study with no free parameters, no invented entities, and only standard assumptions of static analysis soundness and representative sampling.

axioms (1)
  • domain assumption Static analysis combined with sandbox execution can detect credential leakage patterns in skill code and prompts.
    Invoked in the description of the analysis pipeline.

pith-pipeline@v0.9.0 · 5527 in / 1071 out tokens · 29196 ms · 2026-05-13T19:48:23.575436+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 9 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Cloak and Detonate: Scanner Evasion and Dynamic Detection of Agent Skill Malware

    cs.CR 2026-07 unverdicted novelty 7.0

    SkillCloak evades existing static scanners for agent skill malware at high rates, while SkillDetonate detects 97% of attacks at 2% false-positive rate using sandboxed runtime behavior analysis.

  2. From Registry to Repository: How AI Agent Skills Are Written, Adapted, and Maintained

    cs.SE 2026-07 unverdicted novelty 7.0

    Empirical study of 41k+ AI agent skills finds reuse is mostly one-time verbatim copying with 53% never modified afterward and maintenance focused on additive local adaptations.

  3. Trust Me, Import This: Dependency Steering Attacks via Malicious Agent Skills

    cs.CR 2026-05 unverdicted novelty 7.0

    Malicious Skills induce coding agents to hallucinate and import attacker-controlled packages at high rates while evading detection.

  4. Safety Testing LLM Agents at Scale: From Risk Discovery to Evidence-Grounded Verification

    cs.AI 2026-07 conditional novelty 6.0

    Vera automates safety testing for LLM agents via literature-driven risk taxonomies, combinatorial case generation, and evidence-grounded verification in isolated environments, showing 93.9% average attack success on f...

  5. SeClaw: Spec-Driven Security Task Synthesis for Evaluating Autonomous Agents

    cs.CR 2026-06 unverdicted novelty 6.0

    SeClaw provides spec-driven synthesis of security tasks and an execution-based docker testbed for evaluating unsafe behaviors in autonomous LLM agents.

  6. When Safe Skills Collide: Measuring Compositional Risk in Agent Skill Ecosystems

    cs.SE 2026-05 unverdicted novelty 6.0

    About 18.2% of structurally flagged skill pairs represent genuine compositional safety risks in agent skill registries, with exploitation gated by host model behavior.

  7. Harnessing Agent Skills: Architectural Patterns and a Reference Architecture for Skill-Mediated LLM Agents

    cs.AI 2026-05 unverdicted novelty 6.0

    Catalogs ten patterns and synthesizes a four-layer reference architecture for skill harnessing in LLM agents, evaluated via cross-instantiation on eight systems.

  8. AgentTrap: Measuring Runtime Trust Failures in Third-Party Agent Skills

    cs.CR 2026-05 conditional novelty 6.0

    AgentTrap shows that current LLM agents typically complete user tasks while silently accepting unsafe side effects from malicious third-party skills rather than refusing them.

  9. RouteGuard: Internal-Signal Detection of Skill Poisoning in LLM Agents

    cs.CR 2026-04 unverdicted novelty 6.0

    RouteGuard uses response-conditioned attention and hidden-state alignment to detect skill poisoning in LLM agents, achieving 0.8834 F1 on Skill-Inject benchmarks and recovering 90.51% of attacks missed by lexical screening.