Pith Number

pith:3TM73BSX

pith:2025:3TM73BSXD4H7I5353NPVHUX6HM

not attested not anchored not stored refs resolved

A Survey on Vision-Language-Action Models: An Action Tokenization Perspective

Fengshuo Bai, Ka Nam Lui, Shaofei Cai, Shaoyang Guo, Tianrui Guan, Xiaowei Zhang, Xuchuan Huang, Yaodong Yang, Yifan Zhong, Yitao Liang, Yuanfei Wang, Yuanpei Chen, Zhang Chen, Zhiquan Qi

Vision-language-action models unify under one framework of action token chains from inputs to actions.

arxiv:2507.01925 v1 · 2025-07-02 · cs.RO

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{3TM73BSXD4H7I5353NPVHUX6HM}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

current VLA models can be unified under a single framework: vision and language inputs are processed by a series of VLA modules, producing a chain of action tokens that progressively encode more grounded and actionable information, ultimately generating executable actions.

C2weakest assumption

the primary design choice distinguishing VLA models lies in how action tokens are formulated, which can be categorized into language description, code, affordance, trajectory, goal state, latent representation, raw action, and reasoning.

C3one line summary

The survey frames VLA models as pipelines that generate progressively grounded action tokens and classifies those tokens into eight types to guide future development.

References

299 extracted · 299 resolved · 58 Pith anchors

[1] On the Opportunities and Risks of Foundation Models 2021 · arXiv:2108.07258

[2] A comprehensive survey on pretrained foundation models: A history from bert to chatgpt 2024

[3] GPT-4 Technical Report 2023 · arXiv:2303.08774

[4] DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning 2025 · arXiv:2501.12948

[5] Learning transferable visual models from natural language supervision 2021

Formal links

2 machine-checked theorem links

Cited by

26 papers in Pith

Large Language Models for Multi-Robot Systems: A Survey

SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning

Pre-VLA: Preemptive Runtime Verification for Reliable Vision-Language-Action and World-Model Rollouts

Agentic Physical AI toward a Domain-Specific Foundation Model for Nuclear Reactor Control

Self-Supervised Bootstrapping of Action-Predictive Embodied Reasoning

Receipt and verification

First computed	2026-05-17T23:38:13.882054Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

dcd9fd86571f0ff4777ddb5f53d2fe3b0b829472527471f8198f0dc3a6c6dc06

Aliases

arxiv: 2507.01925 · arxiv_version: 2507.01925v1 · doi: 10.48550/arxiv.2507.01925 · pith_short_12: 3TM73BSXD4H7 · pith_short_16: 3TM73BSXD4H7I535 · pith_short_8: 3TM73BSX

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/3TM73BSXD4H7I5353NPVHUX6HM \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: dcd9fd86571f0ff4777ddb5f53d2fe3b0b829472527471f8198f0dc3a6c6dc06

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "1684ec07c21257a1da9c84eae86ef835a4a06bedfdb53bc256ef53935533bb40",
    "cross_cats_sorted": [],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.RO",
    "submitted_at": "2025-07-02T17:34:52Z",
    "title_canon_sha256": "55de44bf23e2520adab3a5805a62d82f2eff96566380ccc5cd38c9d8b684069c"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2507.01925",
    "kind": "arxiv",
    "version": 1
  }
}