hub

Melon: Indirect prompt injection defense via masked re-execution and tool comparison

· 2025 · arXiv 2502.05174

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

read on arXiv browse 12 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Same Payload, Different Channel: Measuring Trust Asymmetry in Tool-Using Language Models

cs.LG · 2026-05-30 · unverdicted · novelty 7.0

Agent-native LLMs are substantially more vulnerable to adversarial instructions arriving in tool descriptions than user messages (with the pattern reversing for general-purpose models and inverting again for tool outputs), as quantified by the new Safety Asymmetry Score across six models and three a

Depth-Dependent Indirect Prompt Injection in Tool-Calling ReAct Agents: Injection Depth, Payload Framing, and Turn-Budget Sensitivity

cs.CR · 2026-05-29 · unverdicted · novelty 7.0

Controlled experiments on GPT-4o-mini and Claude Haiku show indirect prompt injection success in ReAct agents decays sharply with injection depth, varies with payload framing, and remains stable across turn budgets.

The Granularity Mismatch in Agent Security: Argument-Level Provenance Solves Enforcement and Isolates the LLM Reasoning Bottleneck

cs.CR · 2026-05-11 · unverdicted · novelty 7.0

PACT achieves perfect security and utility under oracle provenance by enforcing argument-level trust contracts based on semantic roles and cross-step provenance tracking, outperforming invocation-level monitors in AgentDojo evaluations.

PIIGuard: Mitigating PII Harvesting under Adversarial Sanitization

cs.CR · 2026-05-04 · conditional · novelty 7.0

PIIGuard uses optimized hidden HTML fragments on webpages to block LLMs from leaking contact PII via indirect prompt injection, achieving at least 97% defense success across tested models while preserving benign QA utility.

AgentVisor: Defending LLM Agents Against Prompt Injection via Semantic Virtualization

cs.CR · 2026-04-27 · unverdicted · novelty 7.0

AgentVisor cuts prompt injection success rate to 0.65% in LLM agents with only 1.45% utility loss via semantic privilege separation and one-shot self-correction.

Beyond Pattern Matching: Seven Cross-Domain Techniques for Prompt Injection Detection

cs.CR · 2026-04-20 · unverdicted · novelty 7.0 · 2 refs

The work introduces and partially evaluates seven cross-domain prompt injection detectors, reporting F1 gains on benchmarks like deepset/prompt-injections and indirect-injection sets via local alignment, stylometry, and fatigue tracking.

AgentShield: Deception-based Compromise Detection for Tool-using LLM Agents

cs.CR · 2026-05-10 · unverdicted · novelty 6.0

AgentShield uses layered deception traps in LLM agent tool interfaces to detect indirect prompt injection compromises with 90.7-100% success on commercial models, zero false positives, and cross-lingual transfer without retraining.

ARGUS: Defending LLM Agents Against Context-Aware Prompt Injection

cs.CR · 2026-05-05 · unverdicted · novelty 6.0

ARGUS defends LLM agents from context-aware prompt injections by tracking information provenance and verifying decisions against trustworthy evidence, reducing attack success to 3.8% while retaining 87.5% task utility.

Robustness via Referencing: Defending against Prompt Injection Attacks by Referencing the Executed Instruction

cs.CR · 2025-04-29 · unverdicted · novelty 6.0

The method prompts LLMs to output both answers and references to the executed instructions, then filters out any answers not linked to the original input instructions, reducing attack success rates to zero in tested scenarios while preserving utility.

Whispers of Wealth: Red-Teaming Google's Agent Payments Protocol via Prompt Injection

cs.CR · 2026-01-30 · unverdicted · novelty 5.0

Red-teaming of the Agent Payments Protocol reveals vulnerabilities to direct and indirect prompt injection, with Branded Whisper and Vault Whisper attacks enabling product ranking manipulation and sensitive data extraction.

Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges

cs.AI · 2025-10-27 · unverdicted · novelty 4.0

A survey that taxonomizes threats to agentic AI, reviews benchmarks and evaluation methods, discusses technical and governance defenses, and identifies open challenges.

LivePI: More Realistic Benchmarking of Agents Against Indirect Prompt Injection

cs.CR · 2026-05-18

citing papers explorer

Showing 12 of 12 citing papers.

Same Payload, Different Channel: Measuring Trust Asymmetry in Tool-Using Language Models cs.LG · 2026-05-30 · unverdicted · none · ref 17
Agent-native LLMs are substantially more vulnerable to adversarial instructions arriving in tool descriptions than user messages (with the pattern reversing for general-purpose models and inverting again for tool outputs), as quantified by the new Safety Asymmetry Score across six models and three a
Depth-Dependent Indirect Prompt Injection in Tool-Calling ReAct Agents: Injection Depth, Payload Framing, and Turn-Budget Sensitivity cs.CR · 2026-05-29 · unverdicted · none · ref 9
Controlled experiments on GPT-4o-mini and Claude Haiku show indirect prompt injection success in ReAct agents decays sharply with injection depth, varies with payload framing, and remains stable across turn budgets.
The Granularity Mismatch in Agent Security: Argument-Level Provenance Solves Enforcement and Isolates the LLM Reasoning Bottleneck cs.CR · 2026-05-11 · unverdicted · none · ref 33
PACT achieves perfect security and utility under oracle provenance by enforcing argument-level trust contracts based on semantic roles and cross-step provenance tracking, outperforming invocation-level monitors in AgentDojo evaluations.
PIIGuard: Mitigating PII Harvesting under Adversarial Sanitization cs.CR · 2026-05-04 · conditional · none · ref 21
PIIGuard uses optimized hidden HTML fragments on webpages to block LLMs from leaking contact PII via indirect prompt injection, achieving at least 97% defense success across tested models while preserving benign QA utility.
AgentVisor: Defending LLM Agents Against Prompt Injection via Semantic Virtualization cs.CR · 2026-04-27 · unverdicted · none · ref 19
AgentVisor cuts prompt injection success rate to 0.65% in LLM agents with only 1.45% utility loss via semantic privilege separation and one-shot self-correction.
Beyond Pattern Matching: Seven Cross-Domain Techniques for Prompt Injection Detection cs.CR · 2026-04-20 · unverdicted · none · ref 26 · 2 links
The work introduces and partially evaluates seven cross-domain prompt injection detectors, reporting F1 gains on benchmarks like deepset/prompt-injections and indirect-injection sets via local alignment, stylometry, and fatigue tracking.
AgentShield: Deception-based Compromise Detection for Tool-using LLM Agents cs.CR · 2026-05-10 · unverdicted · none · ref 16
AgentShield uses layered deception traps in LLM agent tool interfaces to detect indirect prompt injection compromises with 90.7-100% success on commercial models, zero false positives, and cross-lingual transfer without retraining.
ARGUS: Defending LLM Agents Against Context-Aware Prompt Injection cs.CR · 2026-05-05 · unverdicted · none · ref 21
ARGUS defends LLM agents from context-aware prompt injections by tracking information provenance and verifying decisions against trustworthy evidence, reducing attack success to 3.8% while retaining 87.5% task utility.
Robustness via Referencing: Defending against Prompt Injection Attacks by Referencing the Executed Instruction cs.CR · 2025-04-29 · unverdicted · none · ref 49
The method prompts LLMs to output both answers and references to the executed instructions, then filters out any answers not linked to the original input instructions, reducing attack success rates to zero in tested scenarios while preserving utility.
Whispers of Wealth: Red-Teaming Google's Agent Payments Protocol via Prompt Injection cs.CR · 2026-01-30 · unverdicted · none · ref 16
Red-teaming of the Agent Payments Protocol reveals vulnerabilities to direct and indirect prompt injection, with Branded Whisper and Vault Whisper attacks enabling product ranking manipulation and sensitive data extraction.
Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges cs.AI · 2025-10-27 · unverdicted · none · ref 72
A survey that taxonomizes threats to agentic AI, reviews benchmarks and evaluation methods, discusses technical and governance defenses, and identifies open challenges.
LivePI: More Realistic Benchmarking of Agents Against Indirect Prompt Injection cs.CR · 2026-05-18 · unreviewed · ref 26

Melon: Indirect prompt injection defense via masked re-execution and tool comparison

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer