pith. sign in
Pith Number

pith:JOKXE3KP

pith:2026:JOKXE3KPOTTN7PUHUHZQYYVVQQ
not attested not anchored not stored refs resolved

Negation Neglect: When models fail to learn negations in training

Adam Karvonen, Harry Mayne, James Chua, Jan Dubi\'nski, Lev McKinney, Owain Evans

Finetuning LLMs on documents that flag a claim as false makes them treat the claim as true.

arxiv:2605.13829 v1 · 2026-05-13 · cs.CL · cs.AI · cs.LG

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{JOKXE3KPOTTN7PUHUHZQYYVVQQ}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

finetuning LLMs on documents that flag a claim as false makes them believe the claim is true. [...] average belief rate increases from 2.5% to 88.6% when finetuning on negated documents, compared to 92.4% on documents without negations. [...] Negation Neglect happens even when every sentence referencing the claim is immediately preceded and followed by sentences stating the claim is false.

C2weakest assumption

That the measured increase in belief rate after finetuning reflects a stable internal representation change caused by an inductive bias, rather than transient effects from training dynamics, evaluation prompt sensitivity, or incomplete negation coverage in the data.

C3one line summary

Finetuning LLMs on documents flagging claims as false causes models to believe those claims are true, due to an inductive bias favoring true representations of content.

References

24 extracted · 24 resolved · 1 Pith anchors

[1] Emergent Misalignment: Narrow finetuning can pro- duce broadly misaligned LLMs.Nature, 649(8097):584–589, January 2026 2023 · doi:10.1038/s41586-025-09937-5
[2] Alignment faking in large language models 1971 · doi:10.1037/xge0000098
[3] Thinh Hung Truong, Timothy Baldwin, Karin Verspoor, and Trevor Cohn 2023
[4] doi: 10.18653/v1/2023.starsem-1.10 2023 · doi:10.18653/v1/2023.starsem-1.10
[5] URLhttps://alignment.anthropic.com/2025/modifying-beliefs-via-sdf/. Daniel M. Wegner, David J. Schneider, Samuel R. Carter, and Teri L. White. Paradoxical effects of thought suppression.Journal of Per 2025 · doi:10.1037/0022-3514.53.1.5

Formal links

2 machine-checked theorem links

Cited by

1 paper in Pith

Receipt and verification
First computed 2026-05-18T02:44:15.099369Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

4b95726d4f74e6dfbe87a1f30c62b5841b89b8f043ff5cb53c19b6c8a45ba3eb

Aliases

arxiv: 2605.13829 · arxiv_version: 2605.13829v1 · doi: 10.48550/arxiv.2605.13829 · pith_short_12: JOKXE3KPOTTN · pith_short_16: JOKXE3KPOTTN7PUH · pith_short_8: JOKXE3KP
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/JOKXE3KPOTTN7PUHUHZQYYVVQQ \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 4b95726d4f74e6dfbe87a1f30c62b5841b89b8f043ff5cb53c19b6c8a45ba3eb
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "4e673fd99bfb130ed9ae8bfbc7fb6c5aa6f4656ae3c60af08ce18f4cac797475",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.LG"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2026-05-13T17:51:31Z",
    "title_canon_sha256": "5107f87dc8486e3e5842fb988ff4f2496ea8d84dfba6ce9c0861c05221f057bf"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.13829",
    "kind": "arxiv",
    "version": 1
  }
}