pith:4K7NVIAP
Ablating Safety: Mechanisms for Removing Alignment in Language Models for Security Applications
Task-only LoRA adaptation enables high performance on authorized security tasks while keeping unsafe compliance low.
arxiv:2605.17413 v1 · 2026-05-17 · cs.CR · cs.AI
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{4K7NVIAP4RMR7CGGXTX4LS7J74}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
Task-only LoRA raises mean security score to 0.87 with general score 0.83 and unsafe compliance 0.13, while refusal-suppression with retention raises spillover to 0.27. These results support evaluating alignment removal as a utility-risk frontier, not as an uncensoring recipe.
The Security-AR 60-prompt suite and its executable secure-repair validators accurately capture authorized defensive tasks and correctly distinguish valid security outputs from unsafe spillover without introducing selection bias or validator errors.
Empirical comparison of alignment ablation methods on a 60-prompt security evaluation suite shows task-only LoRA achieves 0.87 mean security score with 0.13 unsafe compliance.
References
Formal links
Receipt and verification
| First computed | 2026-05-20T00:03:57.174485Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
e2bedaa00fe4591f88c6bcefc5cbe9ff1370637720b8f025561799ab643b96af
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/4K7NVIAP4RMR7CGGXTX4LS7J74 \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: e2bedaa00fe4591f88c6bcefc5cbe9ff1370637720b8f025561799ab643b96af
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "36a4ba954cf411b8fc4ea159a8acd5b3434ce3d8b7192524575470ed1da7d979",
"cross_cats_sorted": [
"cs.AI"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.CR",
"submitted_at": "2026-05-17T12:18:20Z",
"title_canon_sha256": "be6da42f45742f08a97df68dd330140a159846a46b391c9cfcd355270d356d24"
},
"schema_version": "1.0",
"source": {
"id": "2605.17413",
"kind": "arxiv",
"version": 1
}
}