pith. sign in
Pith Number

pith:ND2NF75G

pith:2024:ND2NF75GUDY6JST6IAF4VIN2KV
not attested not anchored not stored refs resolved

OLMo: Accelerating the Science of Language Models

Aakanksha Naik, Abhilasha Ravichander, Akshita Bhagia, Ananya Harsh Jha, Arman Cohan, Crystal Nam, David Atkinson, Dirk Groeneveld, Dustin Schwenk, Emma Strubell, Hamish Ivison, Hannaneh Hajishirzi, Ian Magnusson, Iz Beltagy, Jack Hessel, Jacob Morrison, Jennifer Dumas, Jesse Dodge, Khyathi Raghavi Chandu, Kyle Lo, Kyle Richardson, Luca Soldaini, Luke Zettlemoyer, Matthew E. Peters, Mitchell Wortsman, Nathan Lambert, Niklas Muennighoff, Nishant Subramani, Noah A. Smith, Oyvind Tafjord, Pete Walsh, Pradeep Dasigi, Rodney Kinney, Russell Authur, Saurabh Shah, Shane Arora, Tushar Khot, Valentina Pyatkin, William Merrill, Will Smith, Yanai Elazar, Yizhong Wang, Yuling Gu

OLMo is a competitive open language model released with its full training data, training code, and evaluation code to enable scientific study.

arxiv:2402.00838 v4 · 2024-02-01 · cs.CL

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{ND2NF75GUDY6JST6IAF4VIN2KV}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

we have built OLMo, a competitive, truly Open Language Model, to enable the scientific study of language models. Unlike most prior efforts that have only released model weights and inference code, we release OLMo alongside open training data and training and evaluation code.

C2weakest assumption

That the released OLMo is sufficiently competitive with closed models and that the research community will actively use the openness for rigorous scientific study rather than just inference.

C3one line summary

OLMo delivers a fully open competitive language model with training data, code, and evaluations to enable community-driven scientific research on LMs.

References

12 extracted · 12 resolved · 5 Pith anchors

[1] Layer Normalization 2022 · arXiv:1607.06450
[2] Language Models are Few-Shot Learners 2016 · arXiv:2005.14165
[3] Sidney Greenbaum and Gerald Nelson 1996
[4] arXiv preprint arXiv:2312.10253 2023
[5] Mixtral of Experts 2022 · arXiv:2401.04088

Formal links

1 machine-checked theorem link

Cited by

31 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:46.313699Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

68f4d2ffa6a0f1e4ca7e400bcaa1ba556b7033f8df5509364d01ad4917a1d67a

Aliases

arxiv: 2402.00838 · arxiv_version: 2402.00838v4 · doi: 10.48550/arxiv.2402.00838 · pith_short_12: ND2NF75GUDY6 · pith_short_16: ND2NF75GUDY6JST6 · pith_short_8: ND2NF75G
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/ND2NF75GUDY6JST6IAF4VIN2KV \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 68f4d2ffa6a0f1e4ca7e400bcaa1ba556b7033f8df5509364d01ad4917a1d67a
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "d4d53a4b872203f217ea0bfe53384663009baa5cf1fe61d21c475db395032681",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2024-02-01T18:28:55Z",
    "title_canon_sha256": "d1edfe6bb041c3fcb3b826f1a0d3bb001721e5ea66f746a6fe7c393057a4ba82"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2402.00838",
    "kind": "arxiv",
    "version": 4
  }
}