Trojan Hippo attacks on LLM agent memory achieve 85-100% success rates in data exfiltration across four memory backends even after 100 benign sessions, while evaluated defenses reduce success rates but impose varying utility costs.
StruQ: Defending Against Prompt Injection with Structured Queries
5 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CR 5verdicts
UNVERDICTED 5representative citing papers
AgentDojo introduces an extensible evaluation framework populated with realistic agent tasks and security test cases to measure prompt injection robustness in tool-using LLM agents.
Autonomous LLM agents can host self-propagating worms via persistent state re-entry, demonstrated with automated analysis tools and blocked by a formal no-propagation defense on three frameworks.
A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.
Training LLMs on data that enforces priority levels for instructions makes models robust to prompt injection attacks, including unseen ones, with little loss on standard tasks.
citing papers explorer
-
Trojan Hippo: Weaponizing Agent Memory for Data Exfiltration
Trojan Hippo attacks on LLM agent memory achieve 85-100% success rates in data exfiltration across four memory backends even after 100 benign sessions, while evaluated defenses reduce success rates but impose varying utility costs.
-
AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents
AgentDojo introduces an extensible evaluation framework populated with realistic agent tasks and security test cases to measure prompt injection robustness in tool-using LLM agents.
-
Autonomous LLM Agent Worms: Cross-Platform Propagation, Automated Discovery and Temporal Re-Entry Defense
Autonomous LLM agents can host self-propagating worms via persistent state re-entry, demonstrated with automated analysis tools and blocked by a formal no-propagation defense on three frameworks.
-
A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework
A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.
-
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
Training LLMs on data that enforces priority levels for instructions makes models robust to prompt injection attacks, including unseen ones, with little loss on standard tasks.