pith. sign in
Pith Number

pith:Y5LX4DQD

pith:2025:Y5LX4DQD7K3VH6DQIYSCJTQ6X6
not attested not anchored not stored refs resolved

MolmoAct: Action Reasoning Models that can Reason in Space

Ali Farhadi, Angelica Wu, Bohan Fang, Boyang Li, Dieter Fox, Eli VanderBilt, Haoquan Fang, Jason Lee, Jiafei Duan, Jieyu Zhang, Karen Farley, Ranjay Krishna, Rose Hendrix, Sangho Lee, Shuo Liu, Wilbert Pumacay, Winson Han, Yi Ru Wang, Yuquan Deng

MolmoAct encodes robot observations into depth-aware tokens, editable trajectory traces, and low-level actions through a three-stage pipeline.

arxiv:2508.07917 v4 · 2025-08-11 · cs.RO

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{Y5LX4DQD7K3VH6DQIYSCJTQ6X6}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

MolmoAct encodes observations and instructions into depth-aware perception tokens, generates mid-level spatial plans as editable trajectory traces, and predicts precise low-level actions, enabling explainable and steerable behavior.

C2weakest assumption

That the structured three-stage pipeline of depth-aware perception, editable trajectory planning, and low-level control produces meaningfully better adaptability, generalization, and semantic grounding than direct perception-to-action mapping models.

C3one line summary

MolmoAct is a 7B robotic foundation model using a three-stage pipeline of depth-aware perception, editable spatial trajectory planning, and low-level action prediction that reports state-of-the-art results on simulation and real-world tasks.

References

13 extracted · 13 resolved · 0 Pith anchors

[1] 2.ViT Image Encoder:encodes each crop independently into per-patch features 2024
[2] Layer selection and concatenation:features from the third-to-last (OpenAI CLIP) or fourth-to-last (SigLIP2) and the tenth-from-last ViT layers are concatenated for each patch; this slightly outperform 2024
[3] Attention pooling in2 × 2windows:within each2 × 2patch window, a multi-headed attention layer pools the four patches to a single vector, using the mean of the patches as the query. This pooling reduce 2024
[4] Language Description:Put the bowl into the sink
[5] Language Description:Wipe the table

Formal links

2 machine-checked theorem links

Cited by

44 papers in Pith

Receipt and verification
First computed 2026-05-17T23:39:19.842934Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

c7577e0e03fab753f870462424ce1ebfaa0f2f65a6419301d0bd265c96a85351

Aliases

arxiv: 2508.07917 · arxiv_version: 2508.07917v4 · doi: 10.48550/arxiv.2508.07917 · pith_short_12: Y5LX4DQD7K3V · pith_short_16: Y5LX4DQD7K3VH6DQ · pith_short_8: Y5LX4DQD
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/Y5LX4DQD7K3VH6DQIYSCJTQ6X6 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: c7577e0e03fab753f870462424ce1ebfaa0f2f65a6419301d0bd265c96a85351
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "56ef2d440f74d2c02bb85414d3dcda76a36e92fba696eda6936c2531f9103018",
    "cross_cats_sorted": [],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.RO",
    "submitted_at": "2025-08-11T12:32:45Z",
    "title_canon_sha256": "754cffeb0bf838ee1ab60cfe42a62b9da773132274b7d5fcfe9e000645a4e9ce"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2508.07917",
    "kind": "arxiv",
    "version": 4
  }
}