Pith Number

pith:ND2NF75G

pith:2024:ND2NF75GUDY6JST6IAF4VIN2KV

not attested not anchored not stored refs resolved

OLMo: Accelerating the Science of Language Models

Aakanksha Naik, Abhilasha Ravichander, Akshita Bhagia, Ananya Harsh Jha, Arman Cohan, Crystal Nam, David Atkinson, Dirk Groeneveld, Dustin Schwenk, Emma Strubell, Hamish Ivison, Hannaneh Hajishirzi, Ian Magnusson, Iz Beltagy, Jack Hessel, Jacob Morrison, Jennifer Dumas, Jesse Dodge, Khyathi Raghavi Chandu, Kyle Lo, Kyle Richardson, Luca Soldaini, Luke Zettlemoyer, Matthew E. Peters, Mitchell Wortsman, Nathan Lambert, Niklas Muennighoff, Nishant Subramani, Noah A. Smith, Oyvind Tafjord, Pete Walsh, Pradeep Dasigi, Rodney Kinney, Russell Authur, Saurabh Shah, Shane Arora, Tushar Khot, Valentina Pyatkin, William Merrill, Will Smith, Yanai Elazar, Yizhong Wang, Yuling Gu

OLMo is a competitive open language model released with its full training data, training code, and evaluation code to enable scientific study.

arxiv:2402.00838 v4 · 2024-02-01 · cs.CL

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{ND2NF75GUDY6JST6IAF4VIN2KV}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

we have built OLMo, a competitive, truly Open Language Model, to enable the scientific study of language models. Unlike most prior efforts that have only released model weights and inference code, we release OLMo alongside open training data and training and evaluation code.

C2weakest assumption

That the released OLMo is sufficiently competitive with closed models and that the research community will actively use the openness for rigorous scientific study rather than just inference.

C3one line summary

OLMo delivers a fully open competitive language model with training data, code, and evaluations to enable community-driven scientific research on LMs.

References

12 extracted · 12 resolved · 5 Pith anchors

[1] Layer Normalization 2022 · arXiv:1607.06450

[2] Language Models are Few-Shot Learners 2016 · arXiv:2005.14165

[3] Sidney Greenbaum and Gerald Nelson 1996

[4] arXiv preprint arXiv:2312.10253 2023

[5] Mixtral of Experts 2022 · arXiv:2401.04088

Formal links

1 machine-checked theorem link

Cited by

31 papers in Pith

Will LLMs Scaling Hit the Wall? Breaking Barriers via Distributed Resources on Massive Edge Devices

Malicious and Unintentional Disclosure Risks in Large Language Models for Code Generation

VectraYX-Nano: A 42M-Parameter Spanish Cybersecurity Language Model with Curriculum Learning and Native Tool Use

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

VectraYX-Nano: A 42M-Parameter Spanish Cybersecurity Language Model with Curriculum Learning and Native Tool Use

Receipt and verification

First computed	2026-05-17T23:38:46.313699Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

68f4d2ffa6a0f1e4ca7e400bcaa1ba556b7033f8df5509364d01ad4917a1d67a

Aliases

arxiv: 2402.00838 · arxiv_version: 2402.00838v4 · doi: 10.48550/arxiv.2402.00838 · pith_short_12: ND2NF75GUDY6 · pith_short_16: ND2NF75GUDY6JST6 · pith_short_8: ND2NF75G

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/ND2NF75GUDY6JST6IAF4VIN2KV \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 68f4d2ffa6a0f1e4ca7e400bcaa1ba556b7033f8df5509364d01ad4917a1d67a

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "d4d53a4b872203f217ea0bfe53384663009baa5cf1fe61d21c475db395032681",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2024-02-01T18:28:55Z",
    "title_canon_sha256": "d1edfe6bb041c3fcb3b826f1a0d3bb001721e5ea66f746a6fe7c393057a4ba82"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2402.00838",
    "kind": "arxiv",
    "version": 4
  }
}