The Defense Trilemma: Why Prompt Injection Defense Wrappers Fail?

· 2026 · cs.CR · arXiv 2604.06436

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

We prove that no continuous, utility-preserving wrapper defense-a function $D: X\to X$ that preprocesses inputs before the model sees them-can make all outputs strictly safe for a language model with connected prompt space, and we characterize exactly where every such defense must fail. We establish three results under successively stronger hypotheses: boundary fixation-the defense must leave some threshold-level inputs unchanged; an $\epsilon$-robust constraint-under Lipschitz regularity, a positive-measure band around fixed boundary points remains near-threshold; and a persistent unsafe region under a transversality condition, a positive-measure subset of inputs remains strictly unsafe. These constitute a defense trilemma: continuity, utility preservation, and completeness cannot coexist. We prove parallel discrete results requiring no topology, and extend to multi-turn interactions, stochastic defenses, and capacity-parity settings. The results do not preclude training-time alignment, architectural changes, or defenses that sacrifice utility. The full theory is mechanically verified in Lean 4 and validated empirically on three LLMs.

representative citing papers

Towards a Data-Parameter Correspondence for LLMs: A Preliminary Discussion

cs.LG · 2026-04-19 · unverdicted · novelty 4.0

A data-parameter correspondence unifies data-centric and parameter-centric LLM optimizations as dual geometric operations on the statistical manifold via Fisher-Rao metric and Legendre duality.

citing papers explorer

Showing 1 of 1 citing paper.

Towards a Data-Parameter Correspondence for LLMs: A Preliminary Discussion cs.LG · 2026-04-19 · unverdicted · none · ref 65 · internal anchor
A data-parameter correspondence unifies data-centric and parameter-centric LLM optimizations as dual geometric operations on the statistical manifold via Fisher-Rao metric and Legendre duality.

The Defense Trilemma: Why Prompt Injection Defense Wrappers Fail?

fields

years

verdicts

representative citing papers

citing papers explorer