pith. sign in
Pith Number

pith:C44PCWVM

pith:2026:C44PCWVMADHSUO3LNN5V52GKD6
not attested not anchored not stored refs pending

Safety Is Not Universal: The Selective Safety Trap in LLM Alignment

Arlindo Rodrigues Galv\~ao Filho, Diogo Fernandes Costa Silva, Iago Alves Brito, Julia Soares Dollis, Walcy Santos Rezende Rios

LLM safety alignment protects some demographic groups far more than others, with defense rates varying up to 42 percent within one model.

arxiv:2601.04389 v3 · 2026-01-07 · cs.CL · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{C44PCWVMADHSUO3LNN5V52GKD6}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

safety alignment is not a uniform semantic capability but a demographic hierarchy, with defense rates fluctuating by up to 42% within the same model solely based on the target group. This disparity persists across architectures and languages and is amplified by scaling, indicating that current alignment methods learn group-specific safeguards rather than a generalized notion of harm.

C2weakest assumption

The 43,961 prompts in MiJaBench represent equivalent adversarial difficulty and comparable harm potential across the 16 demographic groups; if prompt construction or translation introduces systematic differences in attack strength, the measured defense-rate gaps would be artifacts rather than evidence of selective safety.

C3one line summary

Safety alignment in LLMs is not uniform but forms a demographic hierarchy, with defense rates varying by up to 42% across groups; a new benchmark and DPO method demonstrate transferable safety.

Receipt and verification
First computed 2026-06-23T02:13:19.864820Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

1738f15aac00cf2a3b6b6b7b5ee8ca1fb41e11aacf7fe8a8b94432f1f2f9a670

Aliases

arxiv: 2601.04389 · arxiv_version: 2601.04389v3 · doi: 10.48550/arxiv.2601.04389 · pith_short_12: C44PCWVMADHS · pith_short_16: C44PCWVMADHSUO3L · pith_short_8: C44PCWVM
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/C44PCWVMADHSUO3LNN5V52GKD6 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 1738f15aac00cf2a3b6b6b7b5ee8ca1fb41e11aacf7fe8a8b94432f1f2f9a670
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "1e5858e07c56143bc8ec3894c094d9d5073d73193f6f9129aec9b301cb525788",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2026-01-07T20:53:18Z",
    "title_canon_sha256": "cc28d64672e15f525a89530e71aa2adb9c50a1d166cde7c89ec0ff0bf76754b6"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2601.04389",
    "kind": "arxiv",
    "version": 3
  }
}