attacker goals

Egor Zverev, Evgenii Kortukov, Alexander Panfilov, Alexandra Volkova, Soroush Tabesh, Sebastian Lapuschkin, Wojciech Samek, Christoph H · 2026 · arXiv 2503.10566

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

representative citing papers

On the Inseparability of Instructions and Data in Shared-Embedding Sequence Models

cs.CR · 2026-06-25 · unverdicted · novelty 7.0

Shared-embedding sequence models cannot achieve Semantic-Faithful Control over control-authoritative actions due to provenance-recovery impossibility, control-path exposure, and finite-coverage invariance gap.

Security--Fidelity Tradeoffs: The Hidden Cost of Prompt Injection Defense

cs.CR · 2026-06-29 · unverdicted · novelty 6.0

Prompt injection defenses create a security-fidelity tradeoff with no model or defense achieving both high security and high fidelity on the SecFid benchmark across 1,168 examples.

Assessing Automated Prompt Injection Attacks in Agentic Environments

cs.CR · 2026-06-09 · unverdicted · novelty 4.0

Black-box optimization outperforms gradient-based methods for prompt injection on LLM agents, with success depending on attacker model strength and limited transfer from small to frontier models.

citing papers explorer

Showing 3 of 3 citing papers after filters.

On the Inseparability of Instructions and Data in Shared-Embedding Sequence Models cs.CR · 2026-06-25 · unverdicted · none · ref 25
Shared-embedding sequence models cannot achieve Semantic-Faithful Control over control-authoritative actions due to provenance-recovery impossibility, control-path exposure, and finite-coverage invariance gap.
Security--Fidelity Tradeoffs: The Hidden Cost of Prompt Injection Defense cs.CR · 2026-06-29 · unverdicted · none · ref 75
Prompt injection defenses create a security-fidelity tradeoff with no model or defense achieving both high security and high fidelity on the SecFid benchmark across 1,168 examples.
Assessing Automated Prompt Injection Attacks in Agentic Environments cs.CR · 2026-06-09 · unverdicted · none · ref 61
Black-box optimization outperforms gradient-based methods for prompt injection on LLM agents, with success depending on attacker model strength and limited transfer from small to frontier models.

attacker goals

fields

years

verdicts

representative citing papers

citing papers explorer