pith. sign in
Pith Number

pith:Y4V6HMAC

pith:2026:Y4V6HMACAJD67ZG4JGLUWEGRKY
not attested not anchored not stored refs resolved

What properties of reasoning supervision are associated with improved downstream model quality?

Dzmitry Pihulski, Jan Eliasz, Jan Koco\'n, Maciej Piasecki, Micha{\l} Rajkowski, Miko{\l}aj Langner, Przemys{\l}aw Kazienko, Teddy Ferdinan

Intrinsic metrics on reasoning data strongly predict downstream model performance in a scale-dependent way.

arxiv:2605.13290 v1 · 2026-05-13 · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{Y4V6HMACAJD67ZG4JGLUWEGRKY}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Our analysis reveals that these intrinsic metrics demonstrate strong and significant correlations with downstream model performance. Crucially, we find that the predictors of utility are scale-dependent.

C2weakest assumption

That the semantically distinct variants of a single Polish reasoning dataset are representative enough for the observed scale-dependent patterns to generalize to other languages, domains, and model families.

C3one line summary

Intrinsic data metrics predict reasoning dataset utility for model fine-tuning, with different predictors working best for smaller versus larger models.

References

40 extracted · 40 resolved · 3 Pith anchors

[1] Bandarkar, L., et al.: The belebele benchmark: a parallel reading comprehension dataset in 122 language variants. In: ACL. pp. 749–775 (2024) 14 M. Langner et al 2024
[2] Bercovich, A., et al.: Llama-nemotron: Efficient reasoning models (2025) 2025
[3] A.et al.Global piqa: Evaluating physical commonsense reasoning across 100+ languages and cultures (2025) 2025
[4] In: Proceedings of SIGMOD 2024
[5] Reasoning Models Don't Always Say What They Think 2025 · arXiv:2505.05410

Formal links

2 machine-checked theorem links

Receipt and verification
First computed 2026-05-18T02:44:49.126010Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

c72be3b0020247efe4dc49974b10d1563dc8e33e016149d2e6e10327808b9913

Aliases

arxiv: 2605.13290 · arxiv_version: 2605.13290v1 · doi: 10.48550/arxiv.2605.13290 · pith_short_12: Y4V6HMACAJD6 · pith_short_16: Y4V6HMACAJD67ZG4 · pith_short_8: Y4V6HMAC
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/Y4V6HMACAJD67ZG4JGLUWEGRKY \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: c72be3b0020247efe4dc49974b10d1563dc8e33e016149d2e6e10327808b9913
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "066e71923d82a2f4dd5e76eb6faedb1a5f9f66947a96db1434281bc01d53e406",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.AI",
    "submitted_at": "2026-05-13T10:04:38Z",
    "title_canon_sha256": "953e1d82724d775a6928ad2fe96e76f53fc9a7250e47331bcf88840cc3f13822"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.13290",
    "kind": "arxiv",
    "version": 1
  }
}