arxiv: 2504.19793 · v3 · submitted 2025-04-28 · 💻 cs.CR

Recognition: 1 theorem link

Prompt Injection Attack to Tool Selection in LLM Agents

Jiawen Shi , Zenghui Yuan , Guiyao Tie , Pan Zhou , Neil Zhenqiang Gong , Lichao Sun

Authors on Pith no claims yet

Pith reviewed 2026-05-16 17:04 UTC · model grok-4.3

classification 💻 cs.CR

keywords prompt injectiontool selectionLLM agentsadversarial attackno-box scenariooptimization attackdefense evaluationToolHijacker

0 comments

The pith

ToolHijacker injects optimized malicious tool documents to force LLM agents to select attacker-chosen tools.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces ToolHijacker as a prompt injection attack on the two-step retrieval and selection process that LLM agents use to pick tools from a library. By formulating the creation of a fake tool document as an optimization problem solved in two phases, the attack poisons the library so the agent reliably picks the attacker's tool for a target task. Experiments demonstrate that this method significantly outperforms both manual and automated prompt injection baselines when applied to tool selection. Existing prevention defenses such as StruQ and SecAlign, along with detection approaches including perplexity checks, fail to block the attack. The work therefore shows that tool libraries in no-box settings remain open to manipulation through carefully crafted documents.

Core claim

ToolHijacker injects a malicious tool document into the tool library to manipulate the LLM agent's tool selection process, compelling it to consistently choose the attacker's malicious tool for an attacker-chosen target task. The method formulates document crafting as an optimization problem and solves it with a two-phase strategy, achieving high success rates that exceed those of prior prompt injection attacks on the same task.

What carries the argument

ToolHijacker, a two-phase optimization procedure that generates adversarial tool documents to exploit the retrieval-plus-selection pipeline in LLM agents.

If this is right

LLM agents that draw tools from unverified libraries can be steered toward attacker tools for specific tasks without direct model access.
The attack succeeds at higher rates than manual or automated prompt injection methods previously tested on tool selection.
Both structural prevention techniques like StruQ and SecAlign and detection methods based on perplexity or known-answer checks leave the vulnerability open.
The results indicate that current tool-selection pipelines in agents require fundamentally stronger safeguards against library poisoning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same document-optimization tactic could be tested against other agent stages such as memory retrieval or step-by-step planning.
Deployments that allow dynamic addition of tools from external sources would need source authentication or cryptographic signing to limit this vector.
The two-phase optimization may transfer to other retrieval-based systems where documents influence downstream model decisions.
Longer-term, the finding points toward the need for retrieval mechanisms that treat tool descriptions as untrusted inputs by default.

Load-bearing premise

An attacker can insert a malicious tool document into the agent's library and the LLM will still follow the injected instructions inside that document during retrieval and selection.

What would settle it

A controlled test where multiple LLM agents receive the optimized malicious document yet select the intended tool at rates no higher than random chance or baseline attacks.

read the original abstract

Tool selection is a key component of LLM agents. A popular approach follows a two-step process - \emph{retrieval} and \emph{selection} - to pick the most appropriate tool from a tool library for a given task. In this work, we introduce \textit{ToolHijacker}, a novel prompt injection attack targeting tool selection in no-box scenarios. ToolHijacker injects a malicious tool document into the tool library to manipulate the LLM agent's tool selection process, compelling it to consistently choose the attacker's malicious tool for an attacker-chosen target task. Specifically, we formulate the crafting of such tool documents as an optimization problem and propose a two-phase optimization strategy to solve it. Our extensive experimental evaluation shows that ToolHijacker is highly effective, significantly outperforming existing manual-based and automated prompt injection attacks when applied to tool selection. Moreover, we explore various defenses, including prevention-based defenses (StruQ and SecAlign) and detection-based defenses (known-answer detection, DataSentinel, perplexity detection, and perplexity windowed detection). Our experimental results indicate that these defenses are insufficient, highlighting the urgent need for developing new defense strategies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ToolHijacker shows a workable two-phase optimization for hijacking tool selection via injected documents, but retrieval success in large libraries is untested and could limit real impact.

read the letter

Hi, the main thing to know is that this paper introduces ToolHijacker, which frames the creation of a malicious tool document as a two-phase optimization problem aimed at both retrieval and final selection in LLM agents. It targets the common two-step pipeline and claims strong results in no-box settings. The work does a reasonable job extending existing prompt injection techniques to the specific mechanics of tool selection rather than generic user prompts. Their experiments report that the method beats both manual and automated baselines, and they evaluate several defenses including StruQ, SecAlign, and various detection approaches, showing none fully block the attack. That part gives a useful snapshot of current weaknesses. The soft spot is the retrieval phase. The attack requires the injected document to surface in the initial top-k results before selection can be influenced, yet the evaluation does not appear to test how performance drops as the tool library grows to hundreds of entries under realistic embedding-based retrieval. If the document is never retrieved, the rest of the attack never triggers. This is a practical limitation rather than a fatal one, but it narrows the scenarios where the attack would reliably succeed. The paper is aimed at researchers working on LLM agent security and tool integration. Readers who care about prompt-based attacks or safe agent deployment will find the formulation and defense checks worth seeing. The thinking is straightforward and engages prior work without obvious circularity. I would bring it to a reading group to discuss the optimization details and scaling questions. I would not cite it in my own work in the next year unless I am directly studying agent defenses. It deserves peer review because the core attack surface is real and the experiments provide a starting point, even if the evaluation needs tightening on library size.

Referee Report

2 major / 2 minor

Summary. The paper introduces ToolHijacker, a novel prompt injection attack on tool selection in LLM agents operating in no-box scenarios. It injects a malicious tool document into the agent's tool library and formulates document crafting as an optimization problem solved via a two-phase strategy to force selection of the attacker's tool for a chosen target task. Extensive experiments claim high effectiveness and outperformance over manual and automated baselines, while tested defenses (StruQ, SecAlign, known-answer detection, DataSentinel, perplexity-based methods) prove insufficient.

Significance. If the empirical results hold, the work identifies a practical, optimizable vulnerability in the common retrieval-plus-selection pipeline for LLM agent tools. The two-phase optimization procedure and the demonstration that existing defenses fail constitute concrete contributions that could motivate new defense research. The no-box threat model and focus on tool documents rather than user prompts are timely given growing agent deployments.

major comments (2)

[Attack formulation and experimental setup] The attack's success is predicated on the injected malicious document being surfaced by the retrieval step before selection can be manipulated. The two-phase optimization is described as targeting selection influence, but no mechanism or guarantee is provided for maintaining high retrieval rank when the tool library scales to hundreds of entries under typical embedding-based top-k retrieval (k=5-10). This assumption is load-bearing for the central effectiveness claim in realistic settings.
[Abstract and §4 (Experimental evaluation)] The abstract states that ToolHijacker 'significantly outperform[s] existing manual-based and automated prompt injection attacks,' yet the provided description lacks explicit metrics (e.g., attack success rate definitions), baseline implementations, library sizes, and retrieval model details. Without these, the quantitative superiority cannot be independently verified and the evidence strength remains limited.

minor comments (2)

[§3 (Method)] Clarify whether the optimization objective explicitly includes a retrieval-rank term or only selection loss; if the latter, state this limitation explicitly in the threat model.
[§4] Provide the exact prompt templates and embedding model used for retrieval in the experiments so that results can be reproduced.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback on our work introducing ToolHijacker. The comments highlight important considerations for the attack's practical applicability and clarity of presentation. We address each major comment below and indicate planned revisions.

read point-by-point responses

Referee: [Attack formulation and experimental setup] The attack's success is predicated on the injected malicious document being surfaced by the retrieval step before selection can be manipulated. The two-phase optimization is described as targeting selection influence, but no mechanism or guarantee is provided for maintaining high retrieval rank when the tool library scales to hundreds of entries under typical embedding-based top-k retrieval (k=5-10). This assumption is load-bearing for the central effectiveness claim in realistic settings.

Authors: We acknowledge that successful retrieval of the malicious document is a prerequisite for the attack to reach the selection stage. The first phase of our optimization explicitly incorporates semantic alignment with the target task to promote high retrieval rank under embedding-based methods, while the second phase focuses on selection manipulation. Our experiments evaluated library sizes up to 100 tools with k=5-10 and report retrieval rates above 85% for optimized documents. We agree that guarantees for libraries of several hundred entries are not fully demonstrated and constitute a limitation. In the revised manuscript we will add experiments with larger libraries (up to 500 tools) and include an explicit discussion of retrieval-rank assumptions. revision: partial
Referee: [Abstract and §4 (Experimental evaluation)] The abstract states that ToolHijacker 'significantly outperform[s] existing manual-based and automated prompt injection attacks,' yet the provided description lacks explicit metrics (e.g., attack success rate definitions), baseline implementations, library sizes, and retrieval model details. Without these, the quantitative superiority cannot be independently verified and the evidence strength remains limited.

Authors: We appreciate the referee's call for greater explicitness. Attack success rate is defined in §4 as the fraction of trials in which the malicious tool is selected for the attacker-chosen target task. Baseline implementations (manual prompt-injection templates and automated optimization baselines), exact library sizes (20–200 tools), and retrieval models (e.g., text-embedding-ada-002 and sentence-transformers) are detailed in §4.1–4.3 together with the quantitative results in Tables 2–4. We will revise the abstract to include a concise statement of the primary metric and key experimental parameters so that the superiority claim can be verified without immediate reference to the body. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical optimization attack is self-contained

full rationale

The paper defines ToolHijacker directly as a two-phase optimization procedure whose objective is stated in terms of the attack goal (forcing selection of the injected malicious tool). No equation or claim reduces to a fitted parameter renamed as prediction, no self-citation is load-bearing for the central method, and no uniqueness theorem or ansatz is imported from prior author work. All performance claims rest on explicit experimental comparisons against baselines rather than internal re-derivation, so the derivation chain is independent of its own outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Empirical security paper with no mathematical derivations; relies only on standard assumptions about LLM prompt sensitivity and tool-library accessibility.

pith-pipeline@v0.9.0 · 5512 in / 891 out tokens · 129928 ms · 2026-05-16T17:04:47.406337+00:00 · methodology

discussion (0)

Forward citations

Cited by 17 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Demystifying and Detecting Agentic Workflow Injection Vulnerabilities in GitHub Actions
cs.CR 2026-05 conditional novelty 8.0

Agentic Workflow Injection is a new injection vulnerability class in LLM-augmented GitHub Actions, with two patterns (P2A and P2S) detected via the TaintAWI tool yielding 496 confirmed exploitable instances across 13,...
Hierarchical Attacks for Multi-Modal Multi-Agent Reasoning
cs.AI 2026-05 unverdicted novelty 7.0

HAM³ achieves up to 78.3% attack success rate on the GQA benchmark by hierarchically attacking perception, communication, and reasoning layers in multi-modal multi-agent systems.
No Attack Required: Semantic Fuzzing for Specification Violations in Agent Skills
cs.CR 2026-05 unverdicted novelty 7.0

Sefz discovers specification violations in 29.9% of 402 real-world agent skills by translating guardrails into reachability goals and guiding LLM mutations with a multi-armed bandit.
FlowSteer: Prompt-Only Workflow Steering Exposes Planning-Time Vulnerabilities in Multi-Agent LLM Systems
cs.CR 2026-05 unverdicted novelty 7.0

FlowSteer is a prompt-only attack that biases multi-agent LLM workflow planning to propagate malicious signals, raising success rates by up to 55%, with FlowGuard as an input-side defense reducing it by up to 34%.
The Granularity Mismatch in Agent Security: Argument-Level Provenance Solves Enforcement and Isolates the LLM Reasoning Bottleneck
cs.CR 2026-05 unverdicted novelty 7.0

PACT achieves perfect security and utility under oracle provenance by enforcing argument-level trust contracts based on semantic roles and cross-step provenance tracking, outperforming invocation-level monitors in Age...
ShieldNet: Network-Level Guardrails against Emerging Supply-Chain Injections in Agentic Systems
cs.AI 2026-04 unverdicted novelty 7.0

ShieldNet detects supply-chain poisoned tools in LLM agents by monitoring network interactions with a MITM proxy and lightweight classifier, reaching 0.995 F1 and 0.8% false positives on a new benchmark of 25+ attack types.
Credential Leakage in LLM Agent Skills: A Large-Scale Empirical Study
cs.CR 2026-04 accept novelty 7.0

Analysis of 17k LLM agent skills reveals 520 vulnerable ones with 1,708 leakage issues, primarily from debug output exposure, with a 10-pattern taxonomy and released dataset for future detection.
Model-Adaptive Tool Necessity Reveals the Knowing-Doing Gap in LLM Tool Use
cs.AI 2026-05 unverdicted novelty 6.0

LLMs show a knowing-doing gap in tool use: they often recognize when tools are needed via internal states but fail to translate that into actual tool calls, with mismatches of 26-54% on arithmetic and factual tasks.
Behavioral Integrity Verification for AI Agent Skills
cs.CR 2026-05 unverdicted novelty 6.0

BIV audits AI agent skills at scale, finding 80% deviate from declared behavior on 49,943 skills and achieving 0.946 F1 for malicious skill detection.
ARGUS: Defending LLM Agents Against Context-Aware Prompt Injection
cs.CR 2026-05 unverdicted novelty 6.0

ARGUS defends LLM agents from context-aware prompt injections by tracking information provenance and verifying decisions against trustworthy evidence, reducing attack success to 3.8% while retaining 87.5% task utility.
CleanBase: Detecting Malicious Documents in RAG Knowledge Databases
cs.CR 2026-05 unverdicted novelty 6.0

CleanBase identifies malicious documents in RAG databases by detecting cliques in a semantic similarity graph constructed using embedding models and a statistical threshold.
Semia: Auditing Agent Skills via Constraint-Guided Representation Synthesis
cs.CR 2026-05 unverdicted novelty 6.0

Semia synthesizes Datalog representations of agent skills via constraint-guided loops to enable reachability queries for semantic risks, finding critical issues in over half of 13,728 real skills with 97.7% recall on ...
BadSkill: Backdoor Attacks on Agent Skills via Model-in-Skill Poisoning
cs.CR 2026-04 unverdicted novelty 6.0

BadSkill poisons embedded models in agent skills to achieve up to 99.5% attack success rate on triggered tasks with only 3% poison rate while preserving normal behavior on non-trigger inputs.
Security Considerations for Multi-agent Systems
cs.CR 2026-03 unverdicted novelty 6.0

No existing AI security framework covers a majority of the 193 identified multi-agent system threats in any category, with OWASP Agentic Security Initiative achieving the highest overall coverage at 65.3%.
CapSeal: Capability-Sealed Secret Mediation for Secure Agent Execution
cs.CR 2026-04 unverdicted novelty 5.0

CapSeal introduces a capability-sealed broker architecture that lets AI agents perform constrained secret-using actions without ever receiving the secrets themselves.
Security Attack and Defense Strategies for Autonomous Agent Frameworks: A Layered Review with OpenClaw as a Case Study
cs.CR 2026-04 conditional novelty 4.0

The survey organizes security threats and defenses in autonomous LLM agents into four layers and identifies that risks can propagate across layers from inputs to ecosystem impacts.
STARS: Skill-Triggered Audit for Request-Conditioned Invocation Safety in Agent Systems
cs.AI 2026-04 unverdicted novelty 4.0

STARS fuses static priors and contextual risk scoring for agent skill invocations, achieving modest AUPRC gains on prompt injection attacks in a new SIA-Bench but concluding it supplements rather than replaces static ...

Reference graph

Works this paper leans on

89 extracted references · 89 canonical work pages · cited by 17 Pith papers · 20 internal anchors

[1]

Mind2web: Towards a generalist agent for the web,

X. Deng, Y . Gu, B. Zheng, S. Chen, S. Stevens, B. Wang, H. Sun, and Y . Su, “Mind2web: Towards a generalist agent for the web,” Advances in Neural Information Processing Systems , vol. 36, 2024

work page 2024
[2]

A real-world webagent with planning, long context under- standing, and program synthesis,

I. Gur, H. Furuta, A. Huang, M. Safdari, Y . Matsuo, D. Eck, and A. Faust, “A real-world webagent with planning, long context under- standing, and program synthesis,” arXiv preprint arXiv:2307.12856 , 2023

work page arXiv 2023
[3]

SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering

J. Yang, C. E. Jimenez, A. Wettig, K. Lieret, S. Yao, K. Narasimhan, and O. Press, “Swe-agent: Agent-computer interfaces enable automated software engineering,” arXiv preprint arXiv:2405.15793 , 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[4]

MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework

S. Hong, X. Zheng, J. Chen, Y . Cheng, J. Wang, C. Zhang, Z. Wang, S. K. S. Yau, Z. Lin, L. Zhou et al., “Metagpt: Meta programming for multi-agent collaborative framework,” arXiv preprint arXiv:2308.00352, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[5]

Gorilla: Large Language Model Connected with Massive APIs

S. G. Patil, T. Zhang, X. Wang, and J. E. Gonzalez, “Gorilla: Large language model connected with massive apis,” arXiv preprint arXiv:2305.15334, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[6]

Restgpt: Connecting large language models with real-world applications via restful apis. corr, abs/2306.06624, 2023. doi: 10.48550,

Y . Song, W. Xiong, D. Zhu, C. Li, K. Wang, Y . Tian, and S. Li, “Restgpt: Connecting large language models with real-world applications via restful apis. corr, abs/2306.06624, 2023. doi: 10.48550,” arXiv preprint arXiv.2306.06624, 2023

work page arXiv 2023
[7]

ReAct: Synergizing Reasoning and Acting in Language Models

S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y . Cao, “React: Synergizing reasoning and acting in language models,” arXiv preprint arXiv:2210.03629, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[8]

Tool learning with large language models: A survey,

C. Qu, S. Dai, X. Wei, H. Cai, S. Wang, D. Yin, J. Xu, and J.-R. Wen, “Tool learning with large language models: A survey,” arXiv preprint arXiv:2405.17935, 2024

work page arXiv 2024
[9]

Easytool: Enhancing llm-based agents with concise tool instruction,

S. Yuan, K. Song, J. Chen, X. Tan, Y . Shen, R. Kan, D. Li, and D. Yang, “Easytool: Enhancing llm-based agents with concise tool instruction,” arXiv preprint arXiv:2401.06201 , 2024

work page arXiv 2024
[10]

ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs

Y . Qin, S. Liang, Y . Ye, K. Zhu, L. Yan, Y . Lu, Y . Lin, X. Cong, X. Tang, B. Qian et al. , “Toolllm: Facilitating large language models to master 16000+ real-world apis,” arXiv preprint arXiv:2307.16789 , 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[11]

Mcp security notification: Tool poisoning attacks

I. Labs, “Mcp security notification: Tool poisoning attacks.” https: //invariantlabs.ai/blog/mcp-security-notification-tool-poisoning-attacks, 2025

work page 2025
[12]

Whatsapp mcp exploited: Exfiltrating your message history via mcp

——, “Whatsapp mcp exploited: Exfiltrating your message history via mcp.” https://invariantlabs.ai/blog/whatsapp-mcp-exploited, 2025

work page 2025
[13]

Optimization-based prompt injection attack to llm-as-a-judge,

J. Shi, Z. Yuan, Y . Liu, Y . Huang, P. Zhou, L. Sun, and N. Z. Gong, “Optimization-based prompt injection attack to llm-as-a-judge,” in Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security , 2024, pp. 660–674

work page 2024
[14]

From allies to adversaries: Manipulating llm tool-calling through ad- versarial injection,

H. Wang, R. Zhang, J. Wang, M. Li, Y . Huang, D. Wang, and Q. Wang, “From allies to adversaries: Manipulating llm tool-calling through ad- versarial injection,” arXiv preprint arXiv:2412.10198 , 2024

work page arXiv 2024
[15]

Prompt injection attacks against gpt-3,

R. Goodside, “Prompt injection attacks against gpt-3,” https:// simonwillison.net/2022/Sep/12/prompt-injection/, 2023

work page 2022
[16]

Securing llm systems against prompt injection,

R. Harang, “Securing llm systems against prompt injection,” 2023

work page 2023
[17]

Evaluating the susceptibility of pre-trained language models via handcrafted adversarial examples,

H. J. Branch, J. R. Cefalu, J. McHugh, L. Hujer, A. Bahl, D. d. C. Iglesias, R. Heichman, and R. Darwishi, “Evaluating the susceptibility of pre-trained language models via handcrafted adversarial examples,” arXiv preprint arXiv:2209.02128 , 2022

work page arXiv 2022
[18]

Ignore Previous Prompt: Attack Techniques For Language Models

F. Perez and I. Ribeiro, “Ignore previous prompt: Attack techniques for language models,” arXiv preprint arXiv:2211.09527 , 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[19]

Delimiters won’t save you from prompt injection,

S. Willison, “Delimiters won’t save you from prompt injection,” https: //simonwillison.net/2023/May/11/delimiters-wont-save-you/, 2023

work page 2023
[20]

Formalizing and benchmarking prompt injection attacks and defenses,

Y . Liu, Y . Jia, R. Geng, J. Jia, and N. Z. Gong, “Formalizing and benchmarking prompt injection attacks and defenses,” in 33rd USENIX Security Symposium (USENIX Security 24) , 2024, pp. 1831–1847

work page 2024
[21]

Poisonedrag: Knowledge poisoning attacks to retrieval-augmented generation of large language models,

W. Zou, R. Geng, B. Wang, and J. Jia, “Poisonedrag: Knowledge poisoning attacks to retrieval-augmented generation of large language models,” arXiv preprint arXiv:2402.07867 , 2024

work page arXiv 2024
[22]

Metatool benchmark for large language models: Deciding whether to use tools and which to use,

Y . Huang, J. Shi, Y . Li, C. Fan, S. Wu, Q. Zhang, Y . Liu, P. Zhou, Y . Wan, N. Z. Gong et al. , “Metatool benchmark for large language models: Deciding whether to use tools and which to use,” arXiv preprint arXiv:2310.03128, 2023

work page arXiv 2023
[23]

Struq: Defend- ing against prompt injection with structured queries,

S. Chen, J. Piet, C. Sitawarin, and D. Wagner, “Struq: Defend- ing against prompt injection with structured queries,” arXiv preprint arXiv:2402.06363, 2024

work page arXiv 2024
[24]

Aligning llms to be robust against prompt injection,

S. Chen, A. Zharmagambetov, S. Mahloujifar, K. Chaudhuri, and C. Guo, “Aligning llms to be robust against prompt injection,” arXiv preprint arXiv:2410.05451, 2024

work page arXiv 2024
[25]

Datasentinel: A game-theoretic detection of prompt injection attacks,

Y . Liu, Y . Jia, J. Jia, D. Song, and N. Z. Gong, “Datasentinel: A game-theoretic detection of prompt injection attacks,” in 2025 IEEE Symposium on Security and Privacy (SP) . IEEE, 2025, pp. 2190–2208

work page 2025
[26]

Baseline Defenses for Adversarial Attacks Against Aligned Language Models

N. Jain, A. Schwarzschild, Y . Wen, G. Somepalli, J. Kirchenbauer, P.-y. Chiang, M. Goldblum, A. Saha, J. Geiping, and T. Goldstein, “Baseline defenses for adversarial attacks against aligned language models,” arXiv preprint arXiv:2309.00614, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[27]

“Mcp.so,” https://mcp.so/

work page
[28]

“Apify,” https://apify.com/store

work page
[29]

Pulsemcp,

“Pulsemcp,” https://www.pulsemcp.com/

work page
[30]

API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs

M. Li, Y . Zhao, B. Yu, F. Song, H. Li, H. Yu, Z. Li, F. Huang, and Y . Li, “Api-bank: A comprehensive benchmark for tool-augmented llms,” arXiv preprint arXiv:2304.08244 , 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[31]

Hugging face hub,

“Hugging face hub,” https://huggingface.co/docs/smolagents/v1.18.0/en/ index. 14

work page
[32]

Not what you’ve signed up for: Compromising real-world llm- integrated applications with indirect prompt injection,

K. Greshake, S. Abdelnabi, S. Mishra, C. Endres, T. Holz, and M. Fritz, “Not what you’ve signed up for: Compromising real-world llm- integrated applications with indirect prompt injection,” in Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security, 2023, pp. 79–90

work page 2023
[33]

HotFlip: White-Box Adversarial Examples for Text Classification

J. Ebrahimi, A. Rao, D. Lowd, and D. Dou, “Hotflip: White-box adver- sarial examples for text classification,”arXiv preprint arXiv:1712.06751, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[34]

Tree of attacks: Jailbreaking black-box llms automatically,

A. Mehrotra, M. Zampetakis, P. Kassianik, B. Nelson, H. Anderson, Y . Singer, and A. Karbasi, “Tree of attacks: Jailbreaking black-box llms automatically,” arXiv preprint arXiv:2312.02119 , 2023

work page arXiv 2023
[35]

Llama 2: Open Foundation and Fine-Tuned Chat Models

H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y . Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosale et al. , “Llama 2: Open foundation and fine-tuned chat models,” arXiv preprint arXiv:2307.09288, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[36]

Introducing Meta Llama 3: The most capable openly available LLM to date,

Meta, “Introducing Meta Llama 3: The most capable openly available LLM to date,” https://ai.meta.com/blog/meta-llama-3/, 2024

work page 2024
[37]

Llama 3.3,

——, “Llama 3.3,” https://www.llama.com/docs/ model-cards-and-prompt-formats/llama3 3/, 2024

work page 2024
[38]

The claude 3 model family: Opus, sonnet, haiku,

A. Anthropic, “The claude 3 model family: Opus, sonnet, haiku,” Claude-3 Model Card , vol. 1, 2024

work page 2024
[39]

Training language models to follow instructions with human feedback,

L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray et al. , “Training language models to follow instructions with human feedback,” Advances in neural information processing systems , vol. 35, pp. 27 730–27 744, 2022

work page 2022
[40]

GPT-4o System Card

A. Hurst, A. Lerer, A. P. Goucher, A. Perelman, A. Ramesh, A. Clark, A. Ostrow, A. Welihinda, A. Hayes, A. Radford et al., “Gpt-4o system card,” arXiv preprint arXiv:2410.21276 , 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[41]

Text and Code Embeddings by Contrastive Pre-Training

A. Neelakantan, T. Xu, R. Puri, A. Radford, J. M. Han, J. Tworek, Q. Yuan, N. Tezak, J. W. Kim, C. Hallacy et al. , “Text and code em- beddings by contrastive pre-training,” arXiv preprint arXiv:2201.10005, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[42]

Unsupervised Dense Information Retrieval with Contrastive Learning

G. Izacard, M. Caron, L. Hosseini, S. Riedel, P. Bojanowski, A. Joulin, and E. Grave, “Unsupervised dense information retrieval with contrastive learning,” arXiv preprint arXiv:2112.09118 , 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[43]

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

N. Reimers, “Sentence-bert: Sentence embeddings using siamese bert- networks,” arXiv preprint arXiv:1908.10084 , 2019

work page internal anchor Pith review Pith/arXiv arXiv 1908
[44]

Sandwich defense

L. Prompting, “Sandwich defense.” https://learnprompting.org/docs/ prompt hacking/defensive measures/sandwich defense, 2023

work page 2023
[45]

Exploring prompt injection attacks,

N. Group, “Exploring prompt injection attacks,” https://research. nccgroup.com/2022/12/05/exploring-prompt-injection-attacks/, 2023

work page 2022
[46]

Augmented Language Models: a Survey

G. Mialon, R. Dess `ı, M. Lomeli, C. Nalmpantis, R. Pasunuru, R. Raileanu, B. Rozi `ere, T. Schick, J. Dwivedi-Yu, A. Celikyil- maz et al. , “Augmented language models: a survey,” arXiv preprint arXiv:2302.07842, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[47]

Taskmatrix. ai: Completing tasks by connecting foundation models with millions of apis,

Y . Liang, C. Wu, T. Song, W. Wu, Y . Xia, Y . Liu, Y . Ou, S. Lu, L. Ji, S. Mao et al. , “Taskmatrix. ai: Completing tasks by connecting foundation models with millions of apis,” Intelligent Computing, vol. 3, p. 0063, 2024

work page 2024
[48]

Protip: Progressive tool retrieval improves planning,

R. Anantha, B. Bandyopadhyay, A. Kashi, S. Mahinder, A. W. Hill, and S. Chappidi, “Protip: Progressive tool retrieval improves planning,” arXiv preprint arXiv:2312.10332 , 2023

work page arXiv 2023
[49]

Confucius: Iterative tool learning from introspection feedback by easy-to-difficult curriculum,

S. Gao, Z. Shi, M. Zhu, B. Fang, X. Xin, P. Ren, Z. Chen, J. Ma, and Z. Ren, “Confucius: Iterative tool learning from introspection feedback by easy-to-difficult curriculum,” in Proceedings of the AAAI Conference on Artificial Intelligence , vol. 38, no. 16, 2024, pp. 18 030–18 038

work page 2024
[50]

Tool- gen: Unified tool retrieval and calling via generation,

R. Wang, X. Han, L. Ji, S. Wang, T. Baldwin, and H. Li, “Tool- gen: Unified tool retrieval and calling via generation,” arXiv preprint arXiv:2410.03439, 2024

work page arXiv 2024
[51]

Toolr- erank: Adaptive and hierarchy-aware reranking for tool retrieval,

Y . Zheng, P. Li, W. Liu, Y . Liu, J. Luan, and B. Wang, “Toolr- erank: Adaptive and hierarchy-aware reranking for tool retrieval,” arXiv preprint arXiv:2403.06551, 2024

work page arXiv 2024
[52]

Colt: Towards completeness-oriented tool retrieval for large language models,

C. Qu, S. Dai, X. Wei, H. Cai, S. Wang, D. Yin, J. Xu, and J.-R. Wen, “Colt: Towards completeness-oriented tool retrieval for large language models,” arXiv preprint arXiv:2405.16089 , 2024

work page arXiv 2024
[53]

Making language models better tool learners with execution feedback,

S. Qiao, H. Gui, C. Lv, Q. Jia, H. Chen, and N. Zhang, “Making language models better tool learners with execution feedback,” arXiv preprint arXiv:2305.13068, 2023

work page arXiv 2023
[54]

Toolverifier: Generalization to new tools via self- verification,

D. Mekala, J. Weston, J. Lanchantin, R. Raileanu, M. Lomeli, J. Shang, and J. Dwivedi-Yu, “Toolverifier: Generalization to new tools via self- verification,” arXiv preprint arXiv:2402.14158 , 2024

work page arXiv 2024
[55]

Geckopt: Llm system efficiency via intent-based tool selection,

M. Fore, S. Singh, and D. Stamoulis, “Geckopt: Llm system efficiency via intent-based tool selection,” in Proceedings of the Great Lakes Symposium on VLSI 2024 , 2024, pp. 353–354

work page 2024
[56]

Creator: Tool creation for disentangling abstract and concrete reasoning of large language models,

C. Qian, C. Han, Y . R. Fung, Y . Qin, Z. Liu, and H. Ji, “Creator: Tool creation for disentangling abstract and concrete reasoning of large language models,” arXiv preprint arXiv:2305.14318 , 2023

work page arXiv 2023
[57]

Large language models as tool makers,

T. Cai, X. Wang, T. Ma, X. Chen, and D. Zhou, “Large language models as tool makers,” arXiv preprint arXiv:2305.17126 , 2023

work page arXiv 2023
[58]

Anytool: Self-reflective, hierarchical agents for large-scale api calls,

Y . Du, F. Wei, and H. Zhang, “Anytool: Self-reflective, hierarchical agents for large-scale api calls,” arXiv preprint arXiv:2402.04253, 2024

work page arXiv 2024
[59]

Craft: Customizing llms by creating and retrieving from specialized toolsets,

L. Yuan, Y . Chen, X. Wang, Y . R. Fung, H. Peng, and H. Ji, “Craft: Customizing llms by creating and retrieving from specialized toolsets,” arXiv preprint arXiv:2309.17428 , 2023

work page arXiv 2023
[60]

Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection

K. Greshake, S. Abdelnabi, S. Mishra, C. Endres, T. Holz, and M. Fritz, “More than you’ve asked for: A comprehensive analysis of novel prompt injection threats to application-integrated large language models,” arXiv preprint arXiv:2302.12173, vol. 27, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[61]

Pleak: Prompt leaking attacks against large language model applications,

B. Hui, H. Yuan, N. Gong, P. Burlina, and Y . Cao, “Pleak: Prompt leaking attacks against large language model applications,” in ACM Conference on Computer and Communications Security , 2024

work page 2024
[62]

Enhancing prompt injection attacks to llms via poisoning alignment,

Z. Shao, H. Liu, J. Mu, and N. Z. Gong, “Enhancing prompt injection attacks to llms via poisoning alignment,” in AISec, 2025

work page 2025
[63]

InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents

Q. Zhan, Z. Liang, Z. Ying, and D. Kang, “Injecagent: Benchmark- ing indirect prompt injections in tool-integrated large language model agents,” arXiv preprint arXiv:2403.02691 , 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[64]

Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents,

E. Debenedetti, J. Zhang, M. Balunovic, L. Beurer-Kellner, M. Fischer, and F. Tram`er, “Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents,” in The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2024

work page 2024
[65]

Envinjection: Environmental prompt injection attack to multi-modal web agents,

X. Wang, J. Bloch, Z. Shao, Y . Hu, S. Zhou, and N. Z. Gong, “Envinjection: Environmental prompt injection attack to multi-modal web agents,” arXiv preprint arXiv:2505.11717 , 2025

work page arXiv 2025
[66]

Dissecting adversarial robustness of multimodal lm agents,

C. H. Wu, R. R. Shah, J. Y . Koh, R. Salakhutdinov, D. Fried, and A. Raghunathan, “Dissecting adversarial robustness of multimodal lm agents,” in The Thirteenth International Conference on Learning Rep- resentations, 2025

work page 2025
[67]

Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems

D. Lee and M. Tiwari, “Prompt infection: Llm-to-llm prompt injection within multi-agent systems,” arXiv preprint arXiv:2410.07283 , 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[68]

Random sequence enclosure,

“Random sequence enclosure,” https://learnprompting.org/docs/prompt hacking/defensive measures/random sequence, 2024

work page 2024
[69]

Chat gpt-4 turbo prompt engineering guide for developers,

A. Mendes, “Chat gpt-4 turbo prompt engineering guide for developers,” https://www.imaginarycloud.com/blog/chatgpt-prompt-engineering, 2024

work page 2024
[70]

Jatmo: Prompt injection defense by task-specific finetuning,

J. Piet, M. Alrashed, C. Sitawarin, S. Chen, Z. Wei, E. Sun, B. Alomair, and D. Wagner, “Jatmo: Prompt injection defense by task-specific finetuning,” in European Symposium on Research in Computer Security . Springer, 2024, pp. 105–124

work page 2024
[71]

A critical evaluation of defenses against prompt injection attacks,

Y . Jia, Z. Shao, Y . Liu, J. Jia, D. Song, and N. Z. Gong, “A critical evaluation of defenses against prompt injection attacks,” arXiv preprint arXiv:2505.18333, 2025

work page arXiv 2025
[72]

Defeating Prompt Injections by Design

E. Debenedetti, I. Shumailov, T. Fan, J. Hayes, N. Carlini, D. Fabian, C. Kern, C. Shi, A. Terzis, and F. Tram `er, “Defeating prompt injections by design,” arXiv preprint arXiv:2503.18813 , 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[73]

Design patterns for securing llm agents against prompt injec- tions,

L. Beurer-Kellner, B. B. A.-M. Cret ¸u, E. Debenedetti, D. Dobos, D. Fabian, M. Fischer, D. Froelicher, K. Grosse, D. Naeff, E. Ozoani et al. , “Design patterns for securing llm agents against prompt injec- tions,” arXiv preprint arXiv:2506.08837 , 2025

work page arXiv 2025
[74]

Detecting Language Model Attacks with Perplexity

G. Alon and M. Kamfonas, “Detecting language model attacks with perplexity,” arXiv preprint arXiv:2308.14132 , 2023. APPENDIX A. List of Symbols In this subsection, we provide a list of symbols used throughout the paper, along with their corresponding defini- tions. Table XI includes symbols for key components such as the target LLM, the attacker LLM, too...

work page internal anchor Pith review Pith/arXiv arXiv 2023
[75]

Task Alignment: All queries must directly relate to the task and represent authentic user needs

work page
[76]

Complexity Range : Include simple, moderate, and complex requests with varied sentence structures (max [ length] words each)

work page
[77]

Context Diversity: Cover different scenarios, use cases, and user backgrounds relevant to the task

work page
[78]

10: Prompt for shadow task description generation

Practical Applicability: Ensure queries reflect real-world situations users would encounter Fig. 10: Prompt for shadow task description generation. Prompt for shadow tool document generation Please generate [ num] tool documentation entries designed to address the following user queries: [shadow task descriptions ] Format requirements:

work page
[79]

tool name: ⟨name⟩, tool description: ⟨brief description⟩

Tool documentation format: “tool name: ⟨name⟩, tool description: ⟨brief description⟩”

work page
[80]

Each tool’s description should highlight core functionalities and provide a general solution that can apply to various scenarios, not limited to the specific query

work page

Showing first 80 references.