pith:YF6H7X47
How Well Do Large-Scale Chemical Language Models Transfer to Downstream Tasks?
Scaling chemical language models reduces pretraining loss but delivers limited gains on downstream molecular tasks.
arxiv:2602.11618 v4 · 2026-02-12 · cs.LG · q-bio.QM
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{YF6H7X47LYDRIUZRYUEIT4YCNQ}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
while pretraining loss consistently decreases with increased training resources, downstream task performance shows limited improvement
That the chosen downstream molecular property prediction tasks and evaluation protocol are representative enough that limited observed gains reflect a general scaling failure rather than task-specific or experimental artifacts.
Scaling chemical language models reduces pretraining loss but yields only limited or saturating gains on downstream molecular property prediction tasks, exposing a disconnect between pretraining metrics and actual transfer performance.
References
Receipt and verification
| First computed | 2026-05-18T03:09:23.630097Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
c17c7fdf9f5e07145331c50889f3026c26f261e12db3e34d8d31aa895e88852e
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/YF6H7X47LYDRIUZRYUEIT4YCNQ \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: c17c7fdf9f5e07145331c50889f3026c26f261e12db3e34d8d31aa895e88852e
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "453216340a058ee8fad0d5a80c53395ccace2e0f5523edaea5845ee9bc11fd9b",
"cross_cats_sorted": [
"q-bio.QM"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.LG",
"submitted_at": "2026-02-12T06:14:34Z",
"title_canon_sha256": "51df8ba47e3cab93408e9849060252bde98832c80bfd97c054fec22b4e4baefa"
},
"schema_version": "1.0",
"source": {
"id": "2602.11618",
"kind": "arxiv",
"version": 4
}
}