Defending against prompt injection with datafilter.arXiv preprint arXiv:2510.19207, 2025a

Defending Against Prompt Injection with DataFilter · 2025 · arXiv 2510.19207

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

representative citing papers

Ghost in the Context: Measuring Policy-Carriage Failures in Decision-Time Assembly

cs.CR · 2026-05-02 · unverdicted · novelty 6.0

Policy directives can be lost during context assembly in language model agents, leading to unprompted policy violations that SafeContext can partially prevent.

LocalAlign: Enabling Generalizable Prompt Injection Defense via Generation of Near-Target Adversarial Examples for Alignment Training

cs.CR · 2026-05-02 · unverdicted · novelty 6.0

LocalAlign generates near-target adversarial examples via prompting and applies margin-aware alignment training to enforce tighter boundaries against prompt injection attacks.

A Systematic Security Evaluation of OpenClaw and Its Variants

cs.CR · 2026-04-03 · unverdicted · novelty 6.0

All six evaluated OpenClaw agent frameworks exhibit substantial security vulnerabilities, with reconnaissance behaviors as the most common weakness and agent systems proving significantly riskier than isolated backbone models.

A Low-Latency Fraud Detection Layer for Detecting Adversarial Interaction Patterns in LLM-Powered Agents

cs.AI · 2026-05-01 · unverdicted · novelty 5.0

Researchers developed a fast XGBoost-based detector using 42 runtime features to spot adversarial interaction patterns in LLM agents, running over 9 times faster than LLM detectors on synthetic multi-turn data.

Security Attack and Defense Strategies for Autonomous Agent Frameworks: A Layered Review with OpenClaw as a Case Study

cs.CR · 2026-04-30 · conditional · novelty 4.0

The survey organizes security threats and defenses in autonomous LLM agents into four layers and identifies that risks can propagate across layers from inputs to ecosystem impacts.

citing papers explorer

Showing 5 of 5 citing papers.

Ghost in the Context: Measuring Policy-Carriage Failures in Decision-Time Assembly cs.CR · 2026-05-02 · unverdicted · none · ref 29
Policy directives can be lost during context assembly in language model agents, leading to unprompted policy violations that SafeContext can partially prevent.
LocalAlign: Enabling Generalizable Prompt Injection Defense via Generation of Near-Target Adversarial Examples for Alignment Training cs.CR · 2026-05-02 · unverdicted · none · ref 37
LocalAlign generates near-target adversarial examples via prompting and applies margin-aware alignment training to enforce tighter boundaries against prompt injection attacks.
A Systematic Security Evaluation of OpenClaw and Its Variants cs.CR · 2026-04-03 · unverdicted · none · ref 6
All six evaluated OpenClaw agent frameworks exhibit substantial security vulnerabilities, with reconnaissance behaviors as the most common weakness and agent systems proving significantly riskier than isolated backbone models.
A Low-Latency Fraud Detection Layer for Detecting Adversarial Interaction Patterns in LLM-Powered Agents cs.AI · 2026-05-01 · unverdicted · none · ref 38
Researchers developed a fast XGBoost-based detector using 42 runtime features to spot adversarial interaction patterns in LLM agents, running over 9 times faster than LLM detectors on synthetic multi-turn data.
Security Attack and Defense Strategies for Autonomous Agent Frameworks: A Layered Review with OpenClaw as a Case Study cs.CR · 2026-04-30 · conditional · none · ref 31
The survey organizes security threats and defenses in autonomous LLM agents into four layers and identifies that risks can propagate across layers from inputs to ecosystem impacts.

Defending against prompt injection with datafilter.arXiv preprint arXiv:2510.19207, 2025a

fields

years

verdicts

representative citing papers

citing papers explorer