pith. sign in
Pith Number

pith:G6BON2HG

pith:2026:G6BON2HG4JT5WVEOWT3BNBGVOL
not attested not anchored not stored refs resolved

A Hierarchical Spatiotemporal Action Tokenizer for In-Context Imitation Learning in Robotics

Ali Shah Ali, Andrey Konin, Fawad Javed Fateh, Murad Popattia, M. Zeeshan Zia, Quoc-Huy Tran, Usman Nizamani

A two-level vector quantizer that clusters robot actions while also reconstructing their timestamps improves in-context imitation learning.

arxiv:2604.15215 v2 · 2026-04-16 · cs.RO

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{G6BON2HG4JT5WVEOWT3BNBGVOL}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Extensive evaluations on multiple simulation and real robotic manipulation benchmarks show that our approach establishes a new state-of-the-art performance in in-context imitation learning.

C2weakest assumption

The assumption that the hierarchical spatiotemporal clustering and joint reconstruction of actions plus timestamps will produce tokens that generalize beyond the specific benchmarks and robot platforms used in the evaluations.

C3one line summary

A two-level vector quantization tokenizer that jointly reconstructs robot actions and timestamps to improve in-context imitation learning performance on manipulation benchmarks.

References

52 extracted · 52 resolved · 4 Pith anchors

[1] A. Mandlekar, D. Xu, J. Wong, S. Nasiriany, C. Wang, R. Kulkarni, L. Fei-Fei, S. Savarese, Y . Zhu, and R. Mart´ın-Mart´ın. What matters in learning from offline human demonstrations for robot manipul 2021
[2] Octo: An Open-Source Generalist Robot Policy 2024 · arXiv:2405.12213
[3] Open x-embodiment: Robotic learning datasets and rt-x models: Open x- embodiment collaboration 0 2024
[4] Khazatsky et al 2024
[5] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al. Language models are few-shot learners.NeurIPS, 2020 2020

Formal links

2 machine-checked theorem links

Receipt and verification
First computed 2026-05-20T00:00:38.137942Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

3782e6e8e6e267db548eb4f61684d572ec8e532c8275b6f134ad4147820c17ef

Aliases

arxiv: 2604.15215 · arxiv_version: 2604.15215v2 · doi: 10.48550/arxiv.2604.15215 · pith_short_12: G6BON2HG4JT5 · pith_short_16: G6BON2HG4JT5WVEO · pith_short_8: G6BON2HG
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/G6BON2HG4JT5WVEOWT3BNBGVOL \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 3782e6e8e6e267db548eb4f61684d572ec8e532c8275b6f134ad4147820c17ef
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "ff645d7c776c74a41a146573031129e0bf487376c93a0ec3798471549a0f975c",
    "cross_cats_sorted": [],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.RO",
    "submitted_at": "2026-04-16T16:47:08Z",
    "title_canon_sha256": "8919fa8822be0ee0b38442bb96c1d16f44170cc3affe78f6fffbbc347ebc25be"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2604.15215",
    "kind": "arxiv",
    "version": 2
  }
}