Policy directives can be lost during context assembly in language model agents, leading to unprompted policy violations that SafeContext can partially prevent.
Defending against prompt injection with datafilter.arXiv preprint arXiv:2510.19207, 2025a
5 Pith papers cite this work. Polarity classification is still indexing.
years
2026 5representative citing papers
LocalAlign generates near-target adversarial examples via prompting and applies margin-aware alignment training to enforce tighter boundaries against prompt injection attacks.
All six evaluated OpenClaw agent frameworks exhibit substantial security vulnerabilities, with reconnaissance behaviors as the most common weakness and agent systems proving significantly riskier than isolated backbone models.
Researchers developed a fast XGBoost-based detector using 42 runtime features to spot adversarial interaction patterns in LLM agents, running over 9 times faster than LLM detectors on synthetic multi-turn data.
The survey organizes security threats and defenses in autonomous LLM agents into four layers and identifies that risks can propagate across layers from inputs to ecosystem impacts.
citing papers explorer
-
Ghost in the Context: Measuring Policy-Carriage Failures in Decision-Time Assembly
Policy directives can be lost during context assembly in language model agents, leading to unprompted policy violations that SafeContext can partially prevent.
-
LocalAlign: Enabling Generalizable Prompt Injection Defense via Generation of Near-Target Adversarial Examples for Alignment Training
LocalAlign generates near-target adversarial examples via prompting and applies margin-aware alignment training to enforce tighter boundaries against prompt injection attacks.
-
A Systematic Security Evaluation of OpenClaw and Its Variants
All six evaluated OpenClaw agent frameworks exhibit substantial security vulnerabilities, with reconnaissance behaviors as the most common weakness and agent systems proving significantly riskier than isolated backbone models.
-
A Low-Latency Fraud Detection Layer for Detecting Adversarial Interaction Patterns in LLM-Powered Agents
Researchers developed a fast XGBoost-based detector using 42 runtime features to spot adversarial interaction patterns in LLM agents, running over 9 times faster than LLM detectors on synthetic multi-turn data.
-
Security Attack and Defense Strategies for Autonomous Agent Frameworks: A Layered Review with OpenClaw as a Case Study
The survey organizes security threats and defenses in autonomous LLM agents into four layers and identifies that risks can propagate across layers from inputs to ecosystem impacts.