pith. machine review for the scientific record. sign in

arxiv: 2504.19793 · v3 · submitted 2025-04-28 · 💻 cs.CR

Recognition: 1 theorem link

Prompt Injection Attack to Tool Selection in LLM Agents

Authors on Pith no claims yet

Pith reviewed 2026-05-16 17:04 UTC · model grok-4.3

classification 💻 cs.CR
keywords prompt injectiontool selectionLLM agentsadversarial attackno-box scenariooptimization attackdefense evaluationToolHijacker
0
0 comments X

The pith

ToolHijacker injects optimized malicious tool documents to force LLM agents to select attacker-chosen tools.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces ToolHijacker as a prompt injection attack on the two-step retrieval and selection process that LLM agents use to pick tools from a library. By formulating the creation of a fake tool document as an optimization problem solved in two phases, the attack poisons the library so the agent reliably picks the attacker's tool for a target task. Experiments demonstrate that this method significantly outperforms both manual and automated prompt injection baselines when applied to tool selection. Existing prevention defenses such as StruQ and SecAlign, along with detection approaches including perplexity checks, fail to block the attack. The work therefore shows that tool libraries in no-box settings remain open to manipulation through carefully crafted documents.

Core claim

ToolHijacker injects a malicious tool document into the tool library to manipulate the LLM agent's tool selection process, compelling it to consistently choose the attacker's malicious tool for an attacker-chosen target task. The method formulates document crafting as an optimization problem and solves it with a two-phase strategy, achieving high success rates that exceed those of prior prompt injection attacks on the same task.

What carries the argument

ToolHijacker, a two-phase optimization procedure that generates adversarial tool documents to exploit the retrieval-plus-selection pipeline in LLM agents.

If this is right

  • LLM agents that draw tools from unverified libraries can be steered toward attacker tools for specific tasks without direct model access.
  • The attack succeeds at higher rates than manual or automated prompt injection methods previously tested on tool selection.
  • Both structural prevention techniques like StruQ and SecAlign and detection methods based on perplexity or known-answer checks leave the vulnerability open.
  • The results indicate that current tool-selection pipelines in agents require fundamentally stronger safeguards against library poisoning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same document-optimization tactic could be tested against other agent stages such as memory retrieval or step-by-step planning.
  • Deployments that allow dynamic addition of tools from external sources would need source authentication or cryptographic signing to limit this vector.
  • The two-phase optimization may transfer to other retrieval-based systems where documents influence downstream model decisions.
  • Longer-term, the finding points toward the need for retrieval mechanisms that treat tool descriptions as untrusted inputs by default.

Load-bearing premise

An attacker can insert a malicious tool document into the agent's library and the LLM will still follow the injected instructions inside that document during retrieval and selection.

What would settle it

A controlled test where multiple LLM agents receive the optimized malicious document yet select the intended tool at rates no higher than random chance or baseline attacks.

read the original abstract

Tool selection is a key component of LLM agents. A popular approach follows a two-step process - \emph{retrieval} and \emph{selection} - to pick the most appropriate tool from a tool library for a given task. In this work, we introduce \textit{ToolHijacker}, a novel prompt injection attack targeting tool selection in no-box scenarios. ToolHijacker injects a malicious tool document into the tool library to manipulate the LLM agent's tool selection process, compelling it to consistently choose the attacker's malicious tool for an attacker-chosen target task. Specifically, we formulate the crafting of such tool documents as an optimization problem and propose a two-phase optimization strategy to solve it. Our extensive experimental evaluation shows that ToolHijacker is highly effective, significantly outperforming existing manual-based and automated prompt injection attacks when applied to tool selection. Moreover, we explore various defenses, including prevention-based defenses (StruQ and SecAlign) and detection-based defenses (known-answer detection, DataSentinel, perplexity detection, and perplexity windowed detection). Our experimental results indicate that these defenses are insufficient, highlighting the urgent need for developing new defense strategies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces ToolHijacker, a novel prompt injection attack on tool selection in LLM agents operating in no-box scenarios. It injects a malicious tool document into the agent's tool library and formulates document crafting as an optimization problem solved via a two-phase strategy to force selection of the attacker's tool for a chosen target task. Extensive experiments claim high effectiveness and outperformance over manual and automated baselines, while tested defenses (StruQ, SecAlign, known-answer detection, DataSentinel, perplexity-based methods) prove insufficient.

Significance. If the empirical results hold, the work identifies a practical, optimizable vulnerability in the common retrieval-plus-selection pipeline for LLM agent tools. The two-phase optimization procedure and the demonstration that existing defenses fail constitute concrete contributions that could motivate new defense research. The no-box threat model and focus on tool documents rather than user prompts are timely given growing agent deployments.

major comments (2)
  1. [Attack formulation and experimental setup] The attack's success is predicated on the injected malicious document being surfaced by the retrieval step before selection can be manipulated. The two-phase optimization is described as targeting selection influence, but no mechanism or guarantee is provided for maintaining high retrieval rank when the tool library scales to hundreds of entries under typical embedding-based top-k retrieval (k=5-10). This assumption is load-bearing for the central effectiveness claim in realistic settings.
  2. [Abstract and §4 (Experimental evaluation)] The abstract states that ToolHijacker 'significantly outperform[s] existing manual-based and automated prompt injection attacks,' yet the provided description lacks explicit metrics (e.g., attack success rate definitions), baseline implementations, library sizes, and retrieval model details. Without these, the quantitative superiority cannot be independently verified and the evidence strength remains limited.
minor comments (2)
  1. [§3 (Method)] Clarify whether the optimization objective explicitly includes a retrieval-rank term or only selection loss; if the latter, state this limitation explicitly in the threat model.
  2. [§4] Provide the exact prompt templates and embedding model used for retrieval in the experiments so that results can be reproduced.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback on our work introducing ToolHijacker. The comments highlight important considerations for the attack's practical applicability and clarity of presentation. We address each major comment below and indicate planned revisions.

read point-by-point responses
  1. Referee: [Attack formulation and experimental setup] The attack's success is predicated on the injected malicious document being surfaced by the retrieval step before selection can be manipulated. The two-phase optimization is described as targeting selection influence, but no mechanism or guarantee is provided for maintaining high retrieval rank when the tool library scales to hundreds of entries under typical embedding-based top-k retrieval (k=5-10). This assumption is load-bearing for the central effectiveness claim in realistic settings.

    Authors: We acknowledge that successful retrieval of the malicious document is a prerequisite for the attack to reach the selection stage. The first phase of our optimization explicitly incorporates semantic alignment with the target task to promote high retrieval rank under embedding-based methods, while the second phase focuses on selection manipulation. Our experiments evaluated library sizes up to 100 tools with k=5-10 and report retrieval rates above 85% for optimized documents. We agree that guarantees for libraries of several hundred entries are not fully demonstrated and constitute a limitation. In the revised manuscript we will add experiments with larger libraries (up to 500 tools) and include an explicit discussion of retrieval-rank assumptions. revision: partial

  2. Referee: [Abstract and §4 (Experimental evaluation)] The abstract states that ToolHijacker 'significantly outperform[s] existing manual-based and automated prompt injection attacks,' yet the provided description lacks explicit metrics (e.g., attack success rate definitions), baseline implementations, library sizes, and retrieval model details. Without these, the quantitative superiority cannot be independently verified and the evidence strength remains limited.

    Authors: We appreciate the referee's call for greater explicitness. Attack success rate is defined in §4 as the fraction of trials in which the malicious tool is selected for the attacker-chosen target task. Baseline implementations (manual prompt-injection templates and automated optimization baselines), exact library sizes (20–200 tools), and retrieval models (e.g., text-embedding-ada-002 and sentence-transformers) are detailed in §4.1–4.3 together with the quantitative results in Tables 2–4. We will revise the abstract to include a concise statement of the primary metric and key experimental parameters so that the superiority claim can be verified without immediate reference to the body. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical optimization attack is self-contained

full rationale

The paper defines ToolHijacker directly as a two-phase optimization procedure whose objective is stated in terms of the attack goal (forcing selection of the injected malicious tool). No equation or claim reduces to a fitted parameter renamed as prediction, no self-citation is load-bearing for the central method, and no uniqueness theorem or ansatz is imported from prior author work. All performance claims rest on explicit experimental comparisons against baselines rather than internal re-derivation, so the derivation chain is independent of its own outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Empirical security paper with no mathematical derivations; relies only on standard assumptions about LLM prompt sensitivity and tool-library accessibility.

pith-pipeline@v0.9.0 · 5512 in / 891 out tokens · 129928 ms · 2026-05-16T17:04:47.406337+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 17 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Demystifying and Detecting Agentic Workflow Injection Vulnerabilities in GitHub Actions

    cs.CR 2026-05 conditional novelty 8.0

    Agentic Workflow Injection is a new injection vulnerability class in LLM-augmented GitHub Actions, with two patterns (P2A and P2S) detected via the TaintAWI tool yielding 496 confirmed exploitable instances across 13,...

  2. Hierarchical Attacks for Multi-Modal Multi-Agent Reasoning

    cs.AI 2026-05 unverdicted novelty 7.0

    HAM³ achieves up to 78.3% attack success rate on the GQA benchmark by hierarchically attacking perception, communication, and reasoning layers in multi-modal multi-agent systems.

  3. No Attack Required: Semantic Fuzzing for Specification Violations in Agent Skills

    cs.CR 2026-05 unverdicted novelty 7.0

    Sefz discovers specification violations in 29.9% of 402 real-world agent skills by translating guardrails into reachability goals and guiding LLM mutations with a multi-armed bandit.

  4. FlowSteer: Prompt-Only Workflow Steering Exposes Planning-Time Vulnerabilities in Multi-Agent LLM Systems

    cs.CR 2026-05 unverdicted novelty 7.0

    FlowSteer is a prompt-only attack that biases multi-agent LLM workflow planning to propagate malicious signals, raising success rates by up to 55%, with FlowGuard as an input-side defense reducing it by up to 34%.

  5. The Granularity Mismatch in Agent Security: Argument-Level Provenance Solves Enforcement and Isolates the LLM Reasoning Bottleneck

    cs.CR 2026-05 unverdicted novelty 7.0

    PACT achieves perfect security and utility under oracle provenance by enforcing argument-level trust contracts based on semantic roles and cross-step provenance tracking, outperforming invocation-level monitors in Age...

  6. ShieldNet: Network-Level Guardrails against Emerging Supply-Chain Injections in Agentic Systems

    cs.AI 2026-04 unverdicted novelty 7.0

    ShieldNet detects supply-chain poisoned tools in LLM agents by monitoring network interactions with a MITM proxy and lightweight classifier, reaching 0.995 F1 and 0.8% false positives on a new benchmark of 25+ attack types.

  7. Credential Leakage in LLM Agent Skills: A Large-Scale Empirical Study

    cs.CR 2026-04 accept novelty 7.0

    Analysis of 17k LLM agent skills reveals 520 vulnerable ones with 1,708 leakage issues, primarily from debug output exposure, with a 10-pattern taxonomy and released dataset for future detection.

  8. Model-Adaptive Tool Necessity Reveals the Knowing-Doing Gap in LLM Tool Use

    cs.AI 2026-05 unverdicted novelty 6.0

    LLMs show a knowing-doing gap in tool use: they often recognize when tools are needed via internal states but fail to translate that into actual tool calls, with mismatches of 26-54% on arithmetic and factual tasks.

  9. Behavioral Integrity Verification for AI Agent Skills

    cs.CR 2026-05 unverdicted novelty 6.0

    BIV audits AI agent skills at scale, finding 80% deviate from declared behavior on 49,943 skills and achieving 0.946 F1 for malicious skill detection.

  10. ARGUS: Defending LLM Agents Against Context-Aware Prompt Injection

    cs.CR 2026-05 unverdicted novelty 6.0

    ARGUS defends LLM agents from context-aware prompt injections by tracking information provenance and verifying decisions against trustworthy evidence, reducing attack success to 3.8% while retaining 87.5% task utility.

  11. CleanBase: Detecting Malicious Documents in RAG Knowledge Databases

    cs.CR 2026-05 unverdicted novelty 6.0

    CleanBase identifies malicious documents in RAG databases by detecting cliques in a semantic similarity graph constructed using embedding models and a statistical threshold.

  12. Semia: Auditing Agent Skills via Constraint-Guided Representation Synthesis

    cs.CR 2026-05 unverdicted novelty 6.0

    Semia synthesizes Datalog representations of agent skills via constraint-guided loops to enable reachability queries for semantic risks, finding critical issues in over half of 13,728 real skills with 97.7% recall on ...

  13. BadSkill: Backdoor Attacks on Agent Skills via Model-in-Skill Poisoning

    cs.CR 2026-04 unverdicted novelty 6.0

    BadSkill poisons embedded models in agent skills to achieve up to 99.5% attack success rate on triggered tasks with only 3% poison rate while preserving normal behavior on non-trigger inputs.

  14. Security Considerations for Multi-agent Systems

    cs.CR 2026-03 unverdicted novelty 6.0

    No existing AI security framework covers a majority of the 193 identified multi-agent system threats in any category, with OWASP Agentic Security Initiative achieving the highest overall coverage at 65.3%.

  15. CapSeal: Capability-Sealed Secret Mediation for Secure Agent Execution

    cs.CR 2026-04 unverdicted novelty 5.0

    CapSeal introduces a capability-sealed broker architecture that lets AI agents perform constrained secret-using actions without ever receiving the secrets themselves.

  16. Security Attack and Defense Strategies for Autonomous Agent Frameworks: A Layered Review with OpenClaw as a Case Study

    cs.CR 2026-04 conditional novelty 4.0

    The survey organizes security threats and defenses in autonomous LLM agents into four layers and identifies that risks can propagate across layers from inputs to ecosystem impacts.

  17. STARS: Skill-Triggered Audit for Request-Conditioned Invocation Safety in Agent Systems

    cs.AI 2026-04 unverdicted novelty 4.0

    STARS fuses static priors and contextual risk scoring for agent skill invocations, achieving modest AUPRC gains on prompt injection attacks in a new SIA-Bench but concluding it supplements rather than replaces static ...

Reference graph

Works this paper leans on

89 extracted references · 89 canonical work pages · cited by 17 Pith papers · 20 internal anchors

  1. [1]

    Mind2web: Towards a generalist agent for the web,

    X. Deng, Y . Gu, B. Zheng, S. Chen, S. Stevens, B. Wang, H. Sun, and Y . Su, “Mind2web: Towards a generalist agent for the web,” Advances in Neural Information Processing Systems , vol. 36, 2024

  2. [2]

    A real-world webagent with planning, long context under- standing, and program synthesis,

    I. Gur, H. Furuta, A. Huang, M. Safdari, Y . Matsuo, D. Eck, and A. Faust, “A real-world webagent with planning, long context under- standing, and program synthesis,” arXiv preprint arXiv:2307.12856 , 2023

  3. [3]

    SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering

    J. Yang, C. E. Jimenez, A. Wettig, K. Lieret, S. Yao, K. Narasimhan, and O. Press, “Swe-agent: Agent-computer interfaces enable automated software engineering,” arXiv preprint arXiv:2405.15793 , 2024

  4. [4]

    MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework

    S. Hong, X. Zheng, J. Chen, Y . Cheng, J. Wang, C. Zhang, Z. Wang, S. K. S. Yau, Z. Lin, L. Zhou et al., “Metagpt: Meta programming for multi-agent collaborative framework,” arXiv preprint arXiv:2308.00352, 2023

  5. [5]

    Gorilla: Large Language Model Connected with Massive APIs

    S. G. Patil, T. Zhang, X. Wang, and J. E. Gonzalez, “Gorilla: Large language model connected with massive apis,” arXiv preprint arXiv:2305.15334, 2023

  6. [6]

    Restgpt: Connecting large language models with real-world applications via restful apis. corr, abs/2306.06624, 2023. doi: 10.48550,

    Y . Song, W. Xiong, D. Zhu, C. Li, K. Wang, Y . Tian, and S. Li, “Restgpt: Connecting large language models with real-world applications via restful apis. corr, abs/2306.06624, 2023. doi: 10.48550,” arXiv preprint arXiv.2306.06624, 2023

  7. [7]

    ReAct: Synergizing Reasoning and Acting in Language Models

    S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y . Cao, “React: Synergizing reasoning and acting in language models,” arXiv preprint arXiv:2210.03629, 2022

  8. [8]

    Tool learning with large language models: A survey,

    C. Qu, S. Dai, X. Wei, H. Cai, S. Wang, D. Yin, J. Xu, and J.-R. Wen, “Tool learning with large language models: A survey,” arXiv preprint arXiv:2405.17935, 2024

  9. [9]

    Easytool: Enhancing llm-based agents with concise tool instruction,

    S. Yuan, K. Song, J. Chen, X. Tan, Y . Shen, R. Kan, D. Li, and D. Yang, “Easytool: Enhancing llm-based agents with concise tool instruction,” arXiv preprint arXiv:2401.06201 , 2024

  10. [10]

    ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs

    Y . Qin, S. Liang, Y . Ye, K. Zhu, L. Yan, Y . Lu, Y . Lin, X. Cong, X. Tang, B. Qian et al. , “Toolllm: Facilitating large language models to master 16000+ real-world apis,” arXiv preprint arXiv:2307.16789 , 2023

  11. [11]

    Mcp security notification: Tool poisoning attacks

    I. Labs, “Mcp security notification: Tool poisoning attacks.” https: //invariantlabs.ai/blog/mcp-security-notification-tool-poisoning-attacks, 2025

  12. [12]

    Whatsapp mcp exploited: Exfiltrating your message history via mcp

    ——, “Whatsapp mcp exploited: Exfiltrating your message history via mcp.” https://invariantlabs.ai/blog/whatsapp-mcp-exploited, 2025

  13. [13]

    Optimization-based prompt injection attack to llm-as-a-judge,

    J. Shi, Z. Yuan, Y . Liu, Y . Huang, P. Zhou, L. Sun, and N. Z. Gong, “Optimization-based prompt injection attack to llm-as-a-judge,” in Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security , 2024, pp. 660–674

  14. [14]

    From allies to adversaries: Manipulating llm tool-calling through ad- versarial injection,

    H. Wang, R. Zhang, J. Wang, M. Li, Y . Huang, D. Wang, and Q. Wang, “From allies to adversaries: Manipulating llm tool-calling through ad- versarial injection,” arXiv preprint arXiv:2412.10198 , 2024

  15. [15]

    Prompt injection attacks against gpt-3,

    R. Goodside, “Prompt injection attacks against gpt-3,” https:// simonwillison.net/2022/Sep/12/prompt-injection/, 2023

  16. [16]

    Securing llm systems against prompt injection,

    R. Harang, “Securing llm systems against prompt injection,” 2023

  17. [17]

    Evaluating the susceptibility of pre-trained language models via handcrafted adversarial examples,

    H. J. Branch, J. R. Cefalu, J. McHugh, L. Hujer, A. Bahl, D. d. C. Iglesias, R. Heichman, and R. Darwishi, “Evaluating the susceptibility of pre-trained language models via handcrafted adversarial examples,” arXiv preprint arXiv:2209.02128 , 2022

  18. [18]

    Ignore Previous Prompt: Attack Techniques For Language Models

    F. Perez and I. Ribeiro, “Ignore previous prompt: Attack techniques for language models,” arXiv preprint arXiv:2211.09527 , 2022

  19. [19]

    Delimiters won’t save you from prompt injection,

    S. Willison, “Delimiters won’t save you from prompt injection,” https: //simonwillison.net/2023/May/11/delimiters-wont-save-you/, 2023

  20. [20]

    Formalizing and benchmarking prompt injection attacks and defenses,

    Y . Liu, Y . Jia, R. Geng, J. Jia, and N. Z. Gong, “Formalizing and benchmarking prompt injection attacks and defenses,” in 33rd USENIX Security Symposium (USENIX Security 24) , 2024, pp. 1831–1847

  21. [21]

    Poisonedrag: Knowledge poisoning attacks to retrieval-augmented generation of large language models,

    W. Zou, R. Geng, B. Wang, and J. Jia, “Poisonedrag: Knowledge poisoning attacks to retrieval-augmented generation of large language models,” arXiv preprint arXiv:2402.07867 , 2024

  22. [22]

    Metatool benchmark for large language models: Deciding whether to use tools and which to use,

    Y . Huang, J. Shi, Y . Li, C. Fan, S. Wu, Q. Zhang, Y . Liu, P. Zhou, Y . Wan, N. Z. Gong et al. , “Metatool benchmark for large language models: Deciding whether to use tools and which to use,” arXiv preprint arXiv:2310.03128, 2023

  23. [23]

    Struq: Defend- ing against prompt injection with structured queries,

    S. Chen, J. Piet, C. Sitawarin, and D. Wagner, “Struq: Defend- ing against prompt injection with structured queries,” arXiv preprint arXiv:2402.06363, 2024

  24. [24]

    Aligning llms to be robust against prompt injection,

    S. Chen, A. Zharmagambetov, S. Mahloujifar, K. Chaudhuri, and C. Guo, “Aligning llms to be robust against prompt injection,” arXiv preprint arXiv:2410.05451, 2024

  25. [25]

    Datasentinel: A game-theoretic detection of prompt injection attacks,

    Y . Liu, Y . Jia, J. Jia, D. Song, and N. Z. Gong, “Datasentinel: A game-theoretic detection of prompt injection attacks,” in 2025 IEEE Symposium on Security and Privacy (SP) . IEEE, 2025, pp. 2190–2208

  26. [26]

    Baseline Defenses for Adversarial Attacks Against Aligned Language Models

    N. Jain, A. Schwarzschild, Y . Wen, G. Somepalli, J. Kirchenbauer, P.-y. Chiang, M. Goldblum, A. Saha, J. Geiping, and T. Goldstein, “Baseline defenses for adversarial attacks against aligned language models,” arXiv preprint arXiv:2309.00614, 2023

  27. [27]

    “Mcp.so,” https://mcp.so/

  28. [28]

    “Apify,” https://apify.com/store

  29. [29]

    Pulsemcp,

    “Pulsemcp,” https://www.pulsemcp.com/

  30. [30]

    API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs

    M. Li, Y . Zhao, B. Yu, F. Song, H. Li, H. Yu, Z. Li, F. Huang, and Y . Li, “Api-bank: A comprehensive benchmark for tool-augmented llms,” arXiv preprint arXiv:2304.08244 , 2023

  31. [31]

    Hugging face hub,

    “Hugging face hub,” https://huggingface.co/docs/smolagents/v1.18.0/en/ index. 14

  32. [32]

    Not what you’ve signed up for: Compromising real-world llm- integrated applications with indirect prompt injection,

    K. Greshake, S. Abdelnabi, S. Mishra, C. Endres, T. Holz, and M. Fritz, “Not what you’ve signed up for: Compromising real-world llm- integrated applications with indirect prompt injection,” in Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security, 2023, pp. 79–90

  33. [33]

    HotFlip: White-Box Adversarial Examples for Text Classification

    J. Ebrahimi, A. Rao, D. Lowd, and D. Dou, “Hotflip: White-box adver- sarial examples for text classification,”arXiv preprint arXiv:1712.06751, 2017

  34. [34]

    Tree of attacks: Jailbreaking black-box llms automatically,

    A. Mehrotra, M. Zampetakis, P. Kassianik, B. Nelson, H. Anderson, Y . Singer, and A. Karbasi, “Tree of attacks: Jailbreaking black-box llms automatically,” arXiv preprint arXiv:2312.02119 , 2023

  35. [35]

    Llama 2: Open Foundation and Fine-Tuned Chat Models

    H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y . Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosale et al. , “Llama 2: Open foundation and fine-tuned chat models,” arXiv preprint arXiv:2307.09288, 2023

  36. [36]

    Introducing Meta Llama 3: The most capable openly available LLM to date,

    Meta, “Introducing Meta Llama 3: The most capable openly available LLM to date,” https://ai.meta.com/blog/meta-llama-3/, 2024

  37. [37]

    Llama 3.3,

    ——, “Llama 3.3,” https://www.llama.com/docs/ model-cards-and-prompt-formats/llama3 3/, 2024

  38. [38]

    The claude 3 model family: Opus, sonnet, haiku,

    A. Anthropic, “The claude 3 model family: Opus, sonnet, haiku,” Claude-3 Model Card , vol. 1, 2024

  39. [39]

    Training language models to follow instructions with human feedback,

    L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray et al. , “Training language models to follow instructions with human feedback,” Advances in neural information processing systems , vol. 35, pp. 27 730–27 744, 2022

  40. [40]

    GPT-4o System Card

    A. Hurst, A. Lerer, A. P. Goucher, A. Perelman, A. Ramesh, A. Clark, A. Ostrow, A. Welihinda, A. Hayes, A. Radford et al., “Gpt-4o system card,” arXiv preprint arXiv:2410.21276 , 2024

  41. [41]

    Text and Code Embeddings by Contrastive Pre-Training

    A. Neelakantan, T. Xu, R. Puri, A. Radford, J. M. Han, J. Tworek, Q. Yuan, N. Tezak, J. W. Kim, C. Hallacy et al. , “Text and code em- beddings by contrastive pre-training,” arXiv preprint arXiv:2201.10005, 2022

  42. [42]

    Unsupervised Dense Information Retrieval with Contrastive Learning

    G. Izacard, M. Caron, L. Hosseini, S. Riedel, P. Bojanowski, A. Joulin, and E. Grave, “Unsupervised dense information retrieval with contrastive learning,” arXiv preprint arXiv:2112.09118 , 2021

  43. [43]

    Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

    N. Reimers, “Sentence-bert: Sentence embeddings using siamese bert- networks,” arXiv preprint arXiv:1908.10084 , 2019

  44. [44]

    Sandwich defense

    L. Prompting, “Sandwich defense.” https://learnprompting.org/docs/ prompt hacking/defensive measures/sandwich defense, 2023

  45. [45]

    Exploring prompt injection attacks,

    N. Group, “Exploring prompt injection attacks,” https://research. nccgroup.com/2022/12/05/exploring-prompt-injection-attacks/, 2023

  46. [46]

    Augmented Language Models: a Survey

    G. Mialon, R. Dess `ı, M. Lomeli, C. Nalmpantis, R. Pasunuru, R. Raileanu, B. Rozi `ere, T. Schick, J. Dwivedi-Yu, A. Celikyil- maz et al. , “Augmented language models: a survey,” arXiv preprint arXiv:2302.07842, 2023

  47. [47]

    Taskmatrix. ai: Completing tasks by connecting foundation models with millions of apis,

    Y . Liang, C. Wu, T. Song, W. Wu, Y . Xia, Y . Liu, Y . Ou, S. Lu, L. Ji, S. Mao et al. , “Taskmatrix. ai: Completing tasks by connecting foundation models with millions of apis,” Intelligent Computing, vol. 3, p. 0063, 2024

  48. [48]

    Protip: Progressive tool retrieval improves planning,

    R. Anantha, B. Bandyopadhyay, A. Kashi, S. Mahinder, A. W. Hill, and S. Chappidi, “Protip: Progressive tool retrieval improves planning,” arXiv preprint arXiv:2312.10332 , 2023

  49. [49]

    Confucius: Iterative tool learning from introspection feedback by easy-to-difficult curriculum,

    S. Gao, Z. Shi, M. Zhu, B. Fang, X. Xin, P. Ren, Z. Chen, J. Ma, and Z. Ren, “Confucius: Iterative tool learning from introspection feedback by easy-to-difficult curriculum,” in Proceedings of the AAAI Conference on Artificial Intelligence , vol. 38, no. 16, 2024, pp. 18 030–18 038

  50. [50]

    Tool- gen: Unified tool retrieval and calling via generation,

    R. Wang, X. Han, L. Ji, S. Wang, T. Baldwin, and H. Li, “Tool- gen: Unified tool retrieval and calling via generation,” arXiv preprint arXiv:2410.03439, 2024

  51. [51]

    Toolr- erank: Adaptive and hierarchy-aware reranking for tool retrieval,

    Y . Zheng, P. Li, W. Liu, Y . Liu, J. Luan, and B. Wang, “Toolr- erank: Adaptive and hierarchy-aware reranking for tool retrieval,” arXiv preprint arXiv:2403.06551, 2024

  52. [52]

    Colt: Towards completeness-oriented tool retrieval for large language models,

    C. Qu, S. Dai, X. Wei, H. Cai, S. Wang, D. Yin, J. Xu, and J.-R. Wen, “Colt: Towards completeness-oriented tool retrieval for large language models,” arXiv preprint arXiv:2405.16089 , 2024

  53. [53]

    Making language models better tool learners with execution feedback,

    S. Qiao, H. Gui, C. Lv, Q. Jia, H. Chen, and N. Zhang, “Making language models better tool learners with execution feedback,” arXiv preprint arXiv:2305.13068, 2023

  54. [54]

    Toolverifier: Generalization to new tools via self- verification,

    D. Mekala, J. Weston, J. Lanchantin, R. Raileanu, M. Lomeli, J. Shang, and J. Dwivedi-Yu, “Toolverifier: Generalization to new tools via self- verification,” arXiv preprint arXiv:2402.14158 , 2024

  55. [55]

    Geckopt: Llm system efficiency via intent-based tool selection,

    M. Fore, S. Singh, and D. Stamoulis, “Geckopt: Llm system efficiency via intent-based tool selection,” in Proceedings of the Great Lakes Symposium on VLSI 2024 , 2024, pp. 353–354

  56. [56]

    Creator: Tool creation for disentangling abstract and concrete reasoning of large language models,

    C. Qian, C. Han, Y . R. Fung, Y . Qin, Z. Liu, and H. Ji, “Creator: Tool creation for disentangling abstract and concrete reasoning of large language models,” arXiv preprint arXiv:2305.14318 , 2023

  57. [57]

    Large language models as tool makers,

    T. Cai, X. Wang, T. Ma, X. Chen, and D. Zhou, “Large language models as tool makers,” arXiv preprint arXiv:2305.17126 , 2023

  58. [58]

    Anytool: Self-reflective, hierarchical agents for large-scale api calls,

    Y . Du, F. Wei, and H. Zhang, “Anytool: Self-reflective, hierarchical agents for large-scale api calls,” arXiv preprint arXiv:2402.04253, 2024

  59. [59]

    Craft: Customizing llms by creating and retrieving from specialized toolsets,

    L. Yuan, Y . Chen, X. Wang, Y . R. Fung, H. Peng, and H. Ji, “Craft: Customizing llms by creating and retrieving from specialized toolsets,” arXiv preprint arXiv:2309.17428 , 2023

  60. [60]

    Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection

    K. Greshake, S. Abdelnabi, S. Mishra, C. Endres, T. Holz, and M. Fritz, “More than you’ve asked for: A comprehensive analysis of novel prompt injection threats to application-integrated large language models,” arXiv preprint arXiv:2302.12173, vol. 27, 2023

  61. [61]

    Pleak: Prompt leaking attacks against large language model applications,

    B. Hui, H. Yuan, N. Gong, P. Burlina, and Y . Cao, “Pleak: Prompt leaking attacks against large language model applications,” in ACM Conference on Computer and Communications Security , 2024

  62. [62]

    Enhancing prompt injection attacks to llms via poisoning alignment,

    Z. Shao, H. Liu, J. Mu, and N. Z. Gong, “Enhancing prompt injection attacks to llms via poisoning alignment,” in AISec, 2025

  63. [63]

    InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents

    Q. Zhan, Z. Liang, Z. Ying, and D. Kang, “Injecagent: Benchmark- ing indirect prompt injections in tool-integrated large language model agents,” arXiv preprint arXiv:2403.02691 , 2024

  64. [64]

    Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents,

    E. Debenedetti, J. Zhang, M. Balunovic, L. Beurer-Kellner, M. Fischer, and F. Tram`er, “Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents,” in The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2024

  65. [65]

    Envinjection: Environmental prompt injection attack to multi-modal web agents,

    X. Wang, J. Bloch, Z. Shao, Y . Hu, S. Zhou, and N. Z. Gong, “Envinjection: Environmental prompt injection attack to multi-modal web agents,” arXiv preprint arXiv:2505.11717 , 2025

  66. [66]

    Dissecting adversarial robustness of multimodal lm agents,

    C. H. Wu, R. R. Shah, J. Y . Koh, R. Salakhutdinov, D. Fried, and A. Raghunathan, “Dissecting adversarial robustness of multimodal lm agents,” in The Thirteenth International Conference on Learning Rep- resentations, 2025

  67. [67]

    Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems

    D. Lee and M. Tiwari, “Prompt infection: Llm-to-llm prompt injection within multi-agent systems,” arXiv preprint arXiv:2410.07283 , 2024

  68. [68]

    Random sequence enclosure,

    “Random sequence enclosure,” https://learnprompting.org/docs/prompt hacking/defensive measures/random sequence, 2024

  69. [69]

    Chat gpt-4 turbo prompt engineering guide for developers,

    A. Mendes, “Chat gpt-4 turbo prompt engineering guide for developers,” https://www.imaginarycloud.com/blog/chatgpt-prompt-engineering, 2024

  70. [70]

    Jatmo: Prompt injection defense by task-specific finetuning,

    J. Piet, M. Alrashed, C. Sitawarin, S. Chen, Z. Wei, E. Sun, B. Alomair, and D. Wagner, “Jatmo: Prompt injection defense by task-specific finetuning,” in European Symposium on Research in Computer Security . Springer, 2024, pp. 105–124

  71. [71]

    A critical evaluation of defenses against prompt injection attacks,

    Y . Jia, Z. Shao, Y . Liu, J. Jia, D. Song, and N. Z. Gong, “A critical evaluation of defenses against prompt injection attacks,” arXiv preprint arXiv:2505.18333, 2025

  72. [72]

    Defeating Prompt Injections by Design

    E. Debenedetti, I. Shumailov, T. Fan, J. Hayes, N. Carlini, D. Fabian, C. Kern, C. Shi, A. Terzis, and F. Tram `er, “Defeating prompt injections by design,” arXiv preprint arXiv:2503.18813 , 2025

  73. [73]

    Design patterns for securing llm agents against prompt injec- tions,

    L. Beurer-Kellner, B. B. A.-M. Cret ¸u, E. Debenedetti, D. Dobos, D. Fabian, M. Fischer, D. Froelicher, K. Grosse, D. Naeff, E. Ozoani et al. , “Design patterns for securing llm agents against prompt injec- tions,” arXiv preprint arXiv:2506.08837 , 2025

  74. [74]

    Detecting Language Model Attacks with Perplexity

    G. Alon and M. Kamfonas, “Detecting language model attacks with perplexity,” arXiv preprint arXiv:2308.14132 , 2023. APPENDIX A. List of Symbols In this subsection, we provide a list of symbols used throughout the paper, along with their corresponding defini- tions. Table XI includes symbols for key components such as the target LLM, the attacker LLM, too...

  75. [75]

    Task Alignment: All queries must directly relate to the task and represent authentic user needs

  76. [76]

    Complexity Range : Include simple, moderate, and complex requests with varied sentence structures (max [ length] words each)

  77. [77]

    Context Diversity: Cover different scenarios, use cases, and user backgrounds relevant to the task

  78. [78]

    10: Prompt for shadow task description generation

    Practical Applicability: Ensure queries reflect real-world situations users would encounter Fig. 10: Prompt for shadow task description generation. Prompt for shadow tool document generation Please generate [ num] tool documentation entries designed to address the following user queries: [shadow task descriptions ] Format requirements:

  79. [79]

    tool name: ⟨name⟩, tool description: ⟨brief description⟩

    Tool documentation format: “tool name: ⟨name⟩, tool description: ⟨brief description⟩”

  80. [80]

    Each tool’s description should highlight core functionalities and provide a general solution that can apply to various scenarios, not limited to the specific query

Showing first 80 references.