{"total":13,"items":[{"citing_arxiv_id":"2606.00566","ref_index":17,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Same Payload, Different Channel: Measuring Trust Asymmetry in Tool-Using Language Models","primary_cat":"cs.LG","submitted_at":"2026-05-30T06:38:41+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Agent-native LLMs are substantially more vulnerable to adversarial instructions arriving in tool descriptions than user messages (with the pattern reversing for general-purpose models and inverting again for tool outputs), as quantified by the new Safety Asymmetry Score across six models and three a","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.30686","ref_index":9,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Depth-Dependent Indirect Prompt Injection in Tool-Calling ReAct Agents: Injection Depth, Payload Framing, and Turn-Budget Sensitivity","primary_cat":"cs.CR","submitted_at":"2026-05-29T00:28:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Controlled experiments on GPT-4o-mini and Claude Haiku show indirect prompt injection success in ReAct agents decays sharply with injection depth, varies with payload framing, and remains stable across turn budgets.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.28914","ref_index":29,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"AIRGuard: Guarding Agent Actions with Runtime Authority Control","primary_cat":"cs.CR","submitted_at":"2026-05-27T17:48:14+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"AIRGuard is a runtime authority-control layer for tool-using agents that reduces attack success on AgentTrap from 36.3% to 5.5% while retaining higher benign utility than ARGUS or MELON on DTAP-150.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.17986","ref_index":26,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"LivePI: More Realistic Benchmarking of Agents Against Indirect Prompt Injection","primary_cat":"cs.CR","submitted_at":"2026-05-18T07:41:35+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.11039","ref_index":33,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"The Granularity Mismatch in Agent Security: Argument-Level Provenance Solves Enforcement and Isolates the LLM Reasoning Bottleneck","primary_cat":"cs.CR","submitted_at":"2026-05-11T04:09:11+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"PACT achieves perfect security and utility under oracle provenance by enforcing argument-level trust contracts based on semantic roles and cross-step provenance tracking, outperforming invocation-level monitors in AgentDojo evaluations.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Capability-oriented approaches classify tools by the authority they expose [10], but a single tool often contains both authority-bearing and content- bearing arguments. Classical provenance and taint-tracking systems record where values originate [8], but origin alone does not specify which argument roles a value may safely fill in an LLM-agent action. Recent runtimes such as MELON [33], Progent [21], AgentArmor [26], and AgentSentry [19] impose structure on tool use or execution traces, but primarily mediate model decisions, invocations, or trace-level behavior. PACT does not claim that tracking provenance is new. Its difference is how provenance is interpreted: a source is checked against the semantic role of the argument it is about to"},{"citing_arxiv_id":"2605.11026","ref_index":16,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"AgentShield: Deception-based Compromise Detection for Tool-using LLM Agents","primary_cat":"cs.CR","submitted_at":"2026-05-10T20:08:27+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"AgentShield uses layered deception traps in LLM agent tool interfaces to detect indirect prompt injection compromises with 90.7-100% success on commercial models, zero false positives, and cross-lingual transfer without retraining.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.03378","ref_index":21,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"ARGUS: Defending LLM Agents Against Context-Aware Prompt Injection","primary_cat":"cs.CR","submitted_at":"2026-05-05T05:37:00+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"ARGUS defends LLM agents from context-aware prompt injections by tracking information provenance and verifying decisions against trustworthy evidence, reducing attack success to 3.8% while retaining 87.5% task utility.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.03129","ref_index":21,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"PIIGuard: Mitigating PII Harvesting under Adversarial Sanitization","primary_cat":"cs.CR","submitted_at":"2026-05-04T20:13:22+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"PIIGuard uses optimized hidden HTML fragments on webpages to block LLMs from leaking contact PII via indirect prompt injection, achieving at least 97% defense success across tested models while preserving benign QA utility.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.24118","ref_index":19,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"AgentVisor: Defending LLM Agents Against Prompt Injection via Semantic Virtualization","primary_cat":"cs.CR","submitted_at":"2026-04-27T07:12:52+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"AgentVisor cuts prompt injection success rate to 0.65% in LLM agents with only 1.45% utility loss via semantic privilege separation and one-shot self-correction.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.18248","ref_index":26,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Beyond Pattern Matching: Seven Cross-Domain Techniques for Prompt Injection Detection","primary_cat":"cs.CR","submitted_at":"2026-04-20T13:27:05+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"The work introduces and partially evaluates seven cross-domain prompt injection detectors, reporting F1 gains on benchmarks like deepset/prompt-injections and indirect-injection sets via local alignment, stylometry, and fatigue tracking.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2601.22569","ref_index":16,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Whispers of Wealth: Red-Teaming Google's Agent Payments Protocol via Prompt Injection","primary_cat":"cs.CR","submitted_at":"2026-01-30T05:10:16+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Red-teaming of the Agent Payments Protocol reveals vulnerabilities to direct and indirect prompt injection, with Branded Whisper and Vault Whisper attacks enabling product ranking manipulation and sensitive data extraction.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2510.23883","ref_index":72,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges","primary_cat":"cs.AI","submitted_at":"2025-10-27T21:48:11+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"A survey that taxonomizes threats to agentic AI, reviews benchmarks and evaluation methods, discusses technical and governance defenses, and identifies open challenges.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"instructions can force agents to perform unwanted actions, such as manipulating the interface or calling illegal tools, frequently while posing as legitimate duties. This type of attack is more pernicious than direct injections since the injected prompts can look like legitimate agent instructions, making it hard to tell them apart from regular processes [72]. Attacks against web agents, for instance, manipulate HTML structures or accessibility trees to redirect agent actions, while those targeting computer agents can exploit interface interactions to gain persistent control [71, 73]. The challenge is compounded by the fact that successful IPIs frequently exploit the decoupling between user inputs and"},{"citing_arxiv_id":"2504.20472","ref_index":49,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Robustness via Referencing: Defending against Prompt Injection Attacks by Referencing the Executed Instruction","primary_cat":"cs.CR","submitted_at":"2025-04-29T07:13:53+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"The method prompts LLMs to output both answers and references to the executed instructions, then filters out any answers not linked to the original input instructions, reducing attack success rates to zero in tested scenarios while preserving utility.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}