pith. sign in
Pith Number

pith:2U7RVSTI

pith:2026:2U7RVSTIX3OCS7G3XDRD66NW7A
not attested not anchored not stored refs pending

Moral Sensitivity in LLMs: A Tiered Evaluation of Contextual Bias via Behavioral Profiling and Mechanistic Interpretability

Aman Chadha, Atmika Gorti, Krishnaprasad Thirunarayan, Manas Gaur, Vinija Jain, Yash Aggarwal

LLMs follow a U-curve in criminal bias: strong in small models, removed by instruction tuning, and restored by reasoning distillation at the same scale.

arxiv:2605.03217 v2 · 2026-05-04 · cs.LG · cs.CY

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{2U7RVSTIX3OCS7G3XDRD66NW7A}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Circuit-level analysis reveals a U-curve of bias: SLMs exhibit strong criminal bias; scaling to instruction-tuned models eliminates it; reasoning distillation reintroduces bias to SLM-like levels despite identical parameter counts, suggesting distillation compresses reasoning traces in ways that reactivate shallow statistical associations.

C2weakest assumption

That the chosen criminal-bias scenarios and interpretability probes (logit lens, attention analysis, activation patching, semantic probing) isolate bias circuits without confounding from prompt wording, model scale, or other unmeasured factors.

C3one line summary

LLMs exhibit context-sensitive moral bias with model-specific patterns; mechanistic analysis shows a U-curve in which instruction tuning removes bias but reasoning distillation reintroduces it despite unchanged size.

Formal links

2 machine-checked theorem links

Receipt and verification
First computed 2026-06-05T01:14:39.972307Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

d53f1aca68bedc297cdbb8e23f79b6f83fe8f3ff51a5e6cfbeb572dcbe7c6078

Aliases

arxiv: 2605.03217 · arxiv_version: 2605.03217v2 · doi: 10.48550/arxiv.2605.03217 · pith_short_12: 2U7RVSTIX3OC · pith_short_16: 2U7RVSTIX3OCS7G3 · pith_short_8: 2U7RVSTI
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/2U7RVSTIX3OCS7G3XDRD66NW7A \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: d53f1aca68bedc297cdbb8e23f79b6f83fe8f3ff51a5e6cfbeb572dcbe7c6078
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "f551c6ae015815f3ec2a903e619a098b646a1c74c3f608a740f3bd91a991dc95",
    "cross_cats_sorted": [
      "cs.CY"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-05-04T23:12:32Z",
    "title_canon_sha256": "3052527ba04e593aad9f68352d09a30bfa650a0a3c9eeac963323c24ad3cc7ca"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.03217",
    "kind": "arxiv",
    "version": 2
  }
}