pith:PI5MLL6K
Automated Standardization of Legacy Biomedical Metadata Using an Ontology-Constrained LLM Agent
Augmenting an LLM with real-time queries to biomedical terminology services improves metadata standardization accuracy over the model alone.
arxiv:2604.08552 v2 · 2026-03-10 · cs.DB · cs.AI
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{PI5MLL6K24L6XEWLSGJAM46RFS}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
augmenting the LLM with real-time tool access consistently improves prediction accuracy over the LLM alone across both ontology-constrained and non-ontology-constrained fields
The expert-curated gold standard is treated as ground truth and the real-time terminology services always return canonically correct terms without introducing new errors.
An LLM system with real-time access to ontology services outperforms plain LLM prompting on standardizing 839 HuBMAP legacy metadata records against an expert gold standard.
Receipt and verification
| First computed | 2026-06-19T16:12:53.938509Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
7a3ac5afcad717eb92cb91920673d12cb4be95a68f874be6e9dace9626a96a31
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/PI5MLL6K24L6XEWLSGJAM46RFS \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 7a3ac5afcad717eb92cb91920673d12cb4be95a68f874be6e9dace9626a96a31
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "16e18b1c4c8b586ade2a62f68125a74dc64a80f4094857de5e530974db2b67b3",
"cross_cats_sorted": [
"cs.AI"
],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.DB",
"submitted_at": "2026-03-10T18:47:30Z",
"title_canon_sha256": "ba5d3cc5ac20f7f39f8a6d8e2cd81f2bf71d1fe99cddda9ca888fef75df27e13"
},
"schema_version": "1.0",
"source": {
"id": "2604.08552",
"kind": "arxiv",
"version": 2
}
}