pith:GEFYM37A
EmbeddingGemma: Powerful and Lightweight Text Representations
A 300 million parameter model reaches state-of-the-art text embedding results on MTEB
arxiv:2509.20354 v3 · 2025-09-24 · cs.CL · cs.AI
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{GEFYM37A2FREIHXV3RWZ2XONEL}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
EmbeddingGemma (300M) achieves state-of-the-art results on MTEB across multilingual, English, and code domains, outperforming prior top models with fewer than 500M parameters and providing performance comparable to models double its size.
That the described training recipe (encoder-decoder initialization, geometric embedding distillation, spread-out regularizer, and checkpoint merging from varied mixtures) is the primary driver of the reported gains rather than data selection, base model scale, or evaluation specifics.
A 300M-parameter open embedding model sets new SOTA on MTEB for its size class and matches models twice as large while staying effective when compressed.
References
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:52.589643Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
310b866fe0d162441ef5dc6d9d5dcd22f07efd7a8d302ebce5c10a124a95fc21
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/GEFYM37A2FREIHXV3RWZ2XONEL \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 310b866fe0d162441ef5dc6d9d5dcd22f07efd7a8d302ebce5c10a124a95fc21
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "285bdf1032753db782a592a27f228760328e02fd9ae661a3caf0cd9ccbaaa24e",
"cross_cats_sorted": [
"cs.AI"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.CL",
"submitted_at": "2025-09-24T17:56:51Z",
"title_canon_sha256": "8aa9784a6c14d3d4ef634410538db259293033755b109710214f8d3ca6132d0e"
},
"schema_version": "1.0",
"source": {
"id": "2509.20354",
"kind": "arxiv",
"version": 3
}
}