Safety Context Injection prepends structured external risk reports via static or agentic analysis to lower attack success rates and toxicity in reasoning models on AdvBench and GPTFuzz benchmarks.
LLM-Fuzzer: Scaling assessment of large language model jailbreaks, in: 33rd USENIX Security Symposium (USENIX Security 24), USENIX Association, Philadelphia, PA
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.CR 1years
2026 1verdicts
UNVERDICTED 1roles
background 1polarities
background 1representative citing papers
citing papers explorer
-
Safety Context Injection: Inference-Time Safety Alignment via Static Filtering and Agentic Analysis
Safety Context Injection prepends structured external risk reports via static or agentic analysis to lower attack success rates and toxicity in reasoning models on AdvBench and GPTFuzz benchmarks.