Pith Number

pith:SGBBLAWZ

pith:2026:SGBBLAWZYZ4724XP2FM2Z4QNNY

not attested not anchored not stored refs resolved

The Unlearnability Phenomenon in RLVR for Language Models

Chen Zhao, He He, Yulin Chen

A substantial subset of hard examples remains unlearnable in RLVR even when correct rollouts are available.

arxiv:2605.16787 v1 · 2026-05-16 · cs.LG · cs.CL

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{SGBBLAWZYZ4724XP2FM2Z4QNNY}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

among hard examples that the model initially struggles with, a substantial subset remains unlearnable even when correct rollouts are present, characterized by low gradient similarity with the rest of the examples and ungeneralizable reasoning patterns.

C2weakest assumption

That cross-example gradient analysis reliably detects fundamental representation issues causing unlearnability, and that failure of data augmentation to improve gradient similarity demonstrates inherent limitations of RL approaches.

C3one line summary

RLVR training for language models exhibits an unlearnability phenomenon where certain hard examples stay unlearnable due to low gradient similarity and ungeneralizable reasoning patterns.

References

22 extracted · 22 resolved · 1 Pith anchors

[1] arXiv preprint arXiv:2512.01775 , year= 2025

[2] Qwen2.5 Technical Report 2025 · arXiv:2412.15115

[3] We have also tried different sampling batch size and gradient update batch size to vary the maximum number of off-policy update

[4] A reader should be able to solve any single subproblem without seeing the others

[5] Clarity: Each subproblem must be unambiguous and have a unique, well-defined answer

Formal links

2 machine-checked theorem links

Receipt and verification

First computed	2026-05-20T00:03:21.984021Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

91821582d9c679fd72efd159acf20d6e359accc592e17e1a9e4fcc9c9e133bce

Aliases

arxiv: 2605.16787 · arxiv_version: 2605.16787v1 · doi: 10.48550/arxiv.2605.16787 · pith_short_12: SGBBLAWZYZ47 · pith_short_16: SGBBLAWZYZ4724XP · pith_short_8: SGBBLAWZ

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/SGBBLAWZYZ4724XP2FM2Z4QNNY \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 91821582d9c679fd72efd159acf20d6e359accc592e17e1a9e4fcc9c9e133bce

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "d17927674a6936d454676c062b0206446384d52b5f939868107fff7be642c183",
    "cross_cats_sorted": [
      "cs.CL"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-05-16T03:43:19Z",
    "title_canon_sha256": "ec1a0c61f014ba0d0ab86b2b250cf1910d0c67d264cd9b26ef7f759e9fc17930"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.16787",
    "kind": "arxiv",
    "version": 1
  }
}