pith:BPZWJ3U2
Baseline Defenses for Adversarial Attacks Against Aligned Language Models
Weak discrete optimizers and high optimization costs make baseline defenses effective against jailbreaking attacks on aligned language models.
arxiv:2309.00614 v2 · 2023-09-01 · cs.LG · cs.CL · cs.CR
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{BPZWJ3U2IYPSLIDGVUX5K5S7AM}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
the weakness of existing discrete optimizers for text, combined with the relatively high costs of optimization, makes standard adaptive attacks more challenging for LLMs
That the specific attacks and threat models tested are representative of practical, real-world jailbreaking attempts against deployed LLMs.
Baseline defenses including perplexity-based detection, input preprocessing, and adversarial training offer partial robustness to text adversarial attacks on LLMs, with challenges arising from weak discrete optimizers.
References
Cited by
Receipt and verification
| First computed | 2026-05-18T03:45:00.709211Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
0bf364ee9a461f25a066ad2fd5765f0308a44f99511bb1d389d3ad1988a2a258
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/BPZWJ3U2IYPSLIDGVUX5K5S7AM \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 0bf364ee9a461f25a066ad2fd5765f0308a44f99511bb1d389d3ad1988a2a258
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "712e24533eaa0f724504ea6f532c35cf63171f1c848d58739133e131d492cffa",
"cross_cats_sorted": [
"cs.CL",
"cs.CR"
],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.LG",
"submitted_at": "2023-09-01T17:59:44Z",
"title_canon_sha256": "30c0897e6adbc00f6ac72b025b734528e1a6a01a369e621269727a5984e6ae75"
},
"schema_version": "1.0",
"source": {
"id": "2309.00614",
"kind": "arxiv",
"version": 2
}
}