ARGUS: Defending against multimodal indirect prompt injection via steering instruction-following behavior.arXiv preprint arXiv:2512.05745,

Weikai Lu, Ziqian Zeng, Kehua Zhang, Haoran Li, Huiping Zhuang, Ruidong Wang, Cen Chen, Hao Peng · arXiv 2512.05745

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

Closing the Activation-Cone Blind Spot: Response-Time Probing and Unified Defense

cs.CR · 2026-06-28 · unverdicted · novelty 7.0

Response-time linear probing on first generated tokens detects prefilling attacks missed by prompt-time activation defenses, achieving 0/40 attack success and 0% false positives across seven models while composing orthogonally with AlphaSteer.

citing papers explorer

Showing 1 of 1 citing paper.

Closing the Activation-Cone Blind Spot: Response-Time Probing and Unified Defense cs.CR · 2026-06-28 · unverdicted · none · ref 8
Response-time linear probing on first generated tokens detects prefilling attacks missed by prompt-time activation defenses, achieving 0/40 attack success and 0% false positives across seven models while composing orthogonally with AlphaSteer.

ARGUS: Defending against multimodal indirect prompt injection via steering instruction-following behavior.arXiv preprint arXiv:2512.05745,

fields

years

verdicts

representative citing papers

citing papers explorer