Pith Number

pith:2VSLHXZL

pith:2026:2VSLHXZLA4T3HSYTCPVSDWTGDX

not attested not anchored not stored refs resolved

Mix, Don't Tune: Bilingual Pre-Training Outperforms Hyperparameter Search in Data-Constrained Settings

Anastasiia Sedova, Jes Frellsen, Louis B\'ethune, Natalie Schluter, Paul Jeha, Pierre Ablin, Skyler Seto

Mixing high-resource language data outperforms hyperparameter tuning for low-resource pre-training.

arxiv:2605.13225 v1 · 2026-05-13 · cs.LG

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{2VSLHXZLA4T3HSYTCPVSDWTGDX}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

mixing yields larger improvements than hyperparameter tuning on both validation loss and downstream task accuracy, and the gap grows with model size. We quantify how much mixing helps: it boosts performance by an amount equivalent to 2--3× the unique target data on validation loss and 2--13× on downstream task accuracy, with the gain scaling steeply with model size.

C2weakest assumption

That the chosen mixing ratios are near-optimal and that English data supplies useful, non-conflicting signal for Arabic without introducing domain mismatch that would require separate controls.

C3one line summary

Mixing auxiliary high-resource language data outperforms hyperparameter tuning in data-constrained bilingual pre-training, with gains equivalent to 2-13 times more unique target data.

References

32 extracted · 32 resolved · 10 Pith anchors

[1] Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, and Oyvind Tafjord

[2] Unsupervised Cross-lingual Representation Learning at Scale 1911 · arXiv:1911.02116

[3] arXiv preprint arXiv:2310.05492 , year=

[4] arXiv preprint arXiv:2403.08540 (2024)

[5] Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt

Formal links

2 machine-checked theorem links

Receipt and verification

First computed	2026-05-18T02:44:49.635911Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

d564b3df2b0727b3cb1313eb21da661de40c6207e186cadfeb0f3b59d4385ca8

Aliases

arxiv: 2605.13225 · arxiv_version: 2605.13225v1 · doi: 10.48550/arxiv.2605.13225 · pith_short_12: 2VSLHXZLA4T3 · pith_short_16: 2VSLHXZLA4T3HSYT · pith_short_8: 2VSLHXZL

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/2VSLHXZLA4T3HSYTCPVSDWTGDX \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: d564b3df2b0727b3cb1313eb21da661de40c6207e186cadfeb0f3b59d4385ca8

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "f611460a07c52135d26e5d0aa86bf5d2c0167ea58dbd4c572cd6c471189765f1",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-05-13T09:17:51Z",
    "title_canon_sha256": "d1f030a3df4a94c573b01b50ee1b517f6181a1e68243d22338561604cda508a0"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.13225",
    "kind": "arxiv",
    "version": 1
  }
}