Pith Number

pith:G6BON2HG

pith:2026:G6BON2HG4JT5WVEOWT3BNBGVOL

not attested not anchored not stored refs resolved

A Hierarchical Spatiotemporal Action Tokenizer for In-Context Imitation Learning in Robotics

Ali Shah Ali, Andrey Konin, Fawad Javed Fateh, Murad Popattia, M. Zeeshan Zia, Quoc-Huy Tran, Usman Nizamani

A two-level vector quantizer that clusters robot actions while also reconstructing their timestamps improves in-context imitation learning.

arxiv:2604.15215 v2 · 2026-04-16 · cs.RO

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{G6BON2HG4JT5WVEOWT3BNBGVOL}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Extensive evaluations on multiple simulation and real robotic manipulation benchmarks show that our approach establishes a new state-of-the-art performance in in-context imitation learning.

C2weakest assumption

The assumption that the hierarchical spatiotemporal clustering and joint reconstruction of actions plus timestamps will produce tokens that generalize beyond the specific benchmarks and robot platforms used in the evaluations.

C3one line summary

A two-level vector quantization tokenizer that jointly reconstructs robot actions and timestamps to improve in-context imitation learning performance on manipulation benchmarks.

References

52 extracted · 52 resolved · 4 Pith anchors

[1] A. Mandlekar, D. Xu, J. Wong, S. Nasiriany, C. Wang, R. Kulkarni, L. Fei-Fei, S. Savarese, Y . Zhu, and R. Mart´ın-Mart´ın. What matters in learning from offline human demonstrations for robot manipul 2021

[2] Octo: An Open-Source Generalist Robot Policy 2024 · arXiv:2405.12213

[3] Open x-embodiment: Robotic learning datasets and rt-x models: Open x- embodiment collaboration 0 2024

[4] Khazatsky et al 2024

[5] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al. Language models are few-shot learners.NeurIPS, 2020 2020

Formal links

2 machine-checked theorem links

Receipt and verification

First computed	2026-05-20T00:00:38.137942Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

3782e6e8e6e267db548eb4f61684d572ec8e532c8275b6f134ad4147820c17ef

Aliases

arxiv: 2604.15215 · arxiv_version: 2604.15215v2 · doi: 10.48550/arxiv.2604.15215 · pith_short_12: G6BON2HG4JT5 · pith_short_16: G6BON2HG4JT5WVEO · pith_short_8: G6BON2HG

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/G6BON2HG4JT5WVEOWT3BNBGVOL \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 3782e6e8e6e267db548eb4f61684d572ec8e532c8275b6f134ad4147820c17ef

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "ff645d7c776c74a41a146573031129e0bf487376c93a0ec3798471549a0f975c",
    "cross_cats_sorted": [],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.RO",
    "submitted_at": "2026-04-16T16:47:08Z",
    "title_canon_sha256": "8919fa8822be0ee0b38442bb96c1d16f44170cc3affe78f6fffbbc347ebc25be"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2604.15215",
    "kind": "arxiv",
    "version": 2
  }
}