pith. sign in
Pith Number

pith:YF6H7X47

pith:2026:YF6H7X47LYDRIUZRYUEIT4YCNQ
not attested not anchored not stored refs resolved

How Well Do Large-Scale Chemical Language Models Transfer to Downstream Tasks?

Ryosuke Kojima, Tatsuya Sagawa

Scaling chemical language models reduces pretraining loss but delivers limited gains on downstream molecular tasks.

arxiv:2602.11618 v4 · 2026-02-12 · cs.LG · q-bio.QM

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{YF6H7X47LYDRIUZRYUEIT4YCNQ}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

while pretraining loss consistently decreases with increased training resources, downstream task performance shows limited improvement

C2weakest assumption

That the chosen downstream molecular property prediction tasks and evaluation protocol are representative enough that limited observed gains reflect a general scaling failure rather than task-specific or experimental artifacts.

C3one line summary

Scaling chemical language models reduces pretraining loss but yields only limited or saturating gains on downstream molecular property prediction tasks, exposing a disconnect between pretraining metrics and actual transfer performance.

References

32 extracted · 32 resolved · 0 Pith anchors

[1] Pappu, and Vijay Pande 2017
[2] ChemBERTa: Large - scale self -supervised pretraining for molecular property prediction 2020
[3] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 2019
[4] UniCorn: A unified contrastive learning approach for multi-view molecular representation learning 2024
[5] Frey, Ryan Soklaski, Simon Ax- elrod, Siddharth Samsi, Rafael Gómez -Bombarelli, Con- nor W 2023
Receipt and verification
First computed 2026-05-18T03:09:23.630097Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

c17c7fdf9f5e07145331c50889f3026c26f261e12db3e34d8d31aa895e88852e

Aliases

arxiv: 2602.11618 · arxiv_version: 2602.11618v4 · doi: 10.48550/arxiv.2602.11618 · pith_short_12: YF6H7X47LYDR · pith_short_16: YF6H7X47LYDRIUZR · pith_short_8: YF6H7X47
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/YF6H7X47LYDRIUZRYUEIT4YCNQ \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: c17c7fdf9f5e07145331c50889f3026c26f261e12db3e34d8d31aa895e88852e
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "453216340a058ee8fad0d5a80c53395ccace2e0f5523edaea5845ee9bc11fd9b",
    "cross_cats_sorted": [
      "q-bio.QM"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-02-12T06:14:34Z",
    "title_canon_sha256": "51df8ba47e3cab93408e9849060252bde98832c80bfd97c054fec22b4e4baefa"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2602.11618",
    "kind": "arxiv",
    "version": 4
  }
}