System prompt poisoning: Persistent attacks on large language models beyond user injection

Zongze Li, Jiawei Guo, Haipeng Cai · 2025 · arXiv 2505.06493

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

The Surface You Test Is Not the Surface That Breaks

cs.CR · 2026-05-28 · unverdicted · novelty 6.0

Prompt injection vulnerability in tool-augmented LLMs is a model-surface interaction rather than a fixed channel property; the same payload inverts success rates across models, and adaptive attack rate exceeds single-surface baselines by 9.1 pp on average.

Prompt Governance? On Governing Technologies Governed by Natural Language

cs.CY · 2026-04-29 · unverdicted · novelty 4.0

Literature on system prompts for AI shows fragmented and contradictory claims that complicate policy efforts to use them as reliable governance mechanisms.

Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges

cs.AI · 2025-10-27 · unverdicted · novelty 4.0

A survey that taxonomizes threats to agentic AI, reviews benchmarks and evaluation methods, discusses technical and governance defenses, and identifies open challenges.

citing papers explorer

Showing 2 of 2 citing papers after filters.

The Surface You Test Is Not the Surface That Breaks cs.CR · 2026-05-28 · unverdicted · none · ref 13
Prompt injection vulnerability in tool-augmented LLMs is a model-surface interaction rather than a fixed channel property; the same payload inverts success rates across models, and adaptive attack rate exceeds single-surface baselines by 9.1 pp on average.
Prompt Governance? On Governing Technologies Governed by Natural Language cs.CY · 2026-04-29 · unverdicted · none · ref 196
Literature on system prompts for AI shows fragmented and contradictory claims that complicate policy efforts to use them as reliable governance mechanisms.

System prompt poisoning: Persistent attacks on large language models beyond user injection

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer