pith. sign in

arxiv: 2606.19769 · v1 · pith:CFLL7N2Pnew · submitted 2026-06-18 · 💻 cs.RO · cs.AI

Data Standards for Humanoid Robotics: The Missing Infrastructure for Physical AI

Pith reviewed 2026-06-26 17:32 UTC · model grok-4.3

classification 💻 cs.RO cs.AI
keywords humanoid roboticsdata standardsembodied interactionphysical AIrobot datasetstraceabilitydata reusabilityISO standards
0
0 comments X

The pith

Data standards for humanoid robotics turn isolated physical interactions into cumulative, reusable embodied experience.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that humanoid robot scalability depends on accumulating physical experience across robots, tasks, organizations, and time, not solely on models or hardware. It identifies three core properties: embodied interaction data must preserve relationships among body, action, task, scene, trace, and outcome; its value requires inspectable physical coherence in timing, coordinates, and synchronization; and the chief bottleneck is non-cumulative data arising from collection costs, silos, and inconsistent evaluation. Standards address this by supplying horizontal infrastructure for metadata, provenance, quality, versioning, and traceability, plus capability-specific grammars for manipulation, locomotion, and other domains. This infrastructure would make experience interpretable, shareable, traceable, and reusable as AI transitions from screens to bodies.

Core claim

Drawing on the development of ISO/WD 26264-1, the paper establishes that humanoid robot data is embodied interaction data whose value requires physical coherence in multimodal streams, and that data standards address the bottleneck of non-cumulative data by enabling interpretability, shareability, traceability, and reusability through horizontal infrastructure for lifecycle management, metadata, provenance, quality, versioning, and traceability, together with capability-specific parts that define domain grammar for manipulation, locomotion, human-robot interaction, cognition, and future capabilities.

What carries the argument

A two-layer standard structure: a general horizontal layer covering lifecycle management, metadata, provenance, quality, versioning, and traceability, plus capability-specific layers that define domain grammar for individual humanoid capabilities.

If this is right

  • Embodied datasets will preserve explicit relationships among robot body, action, task, scene, execution trace, and outcome.
  • Multimodal streams will become reusable once timing, coordinate frames, calibration, kinematics, units, and synchronization assumptions are inspectable.
  • High collection costs and data silos will be mitigated by shareability and traceability mechanisms.
  • Inconsistent evaluations will decrease through standardized quality, versioning, and provenance requirements.
  • Physical AI systems will structure physical interaction data in the same way current standards organize digital information.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Standardized data could enable shared public repositories where recordings from different manufacturers combine without format conflicts.
  • The traceability layer might later support regulatory audits of robot behavior and failure modes.
  • The same horizontal infrastructure could be adapted for non-humanoid embodied systems such as mobile manipulators.
  • Adoption might accelerate simulation-to-real transfer by making real-world traces directly comparable to simulated ones.

Load-bearing premise

The primary barrier to scaling humanoid robots is non-cumulative data caused by silos and inconsistent evaluation, rather than hardware limits, model architecture, or safety constraints.

What would settle it

A controlled transfer experiment in which data collected under the proposed standards fails to produce measurable performance gains when reused on a different humanoid platform or task compared with non-standardized data.

Figures

Figures reproduced from arXiv: 2606.19769 by Jialu Liu, Jie Tang, Ning Ding, Shaoshan Liu, Xiugong Qin, Xuan Wu, Xuan Xia.

Figure 1
Figure 1. Figure 1: Humanoid data is fundamentally different from the digital samples that have driven much of virtual AI. Virtual AI [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Physical coherence (temporal synchronization and [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Two-dimensional design of humanoid robot data standards: horizontal infrastructure addresses common bottlenecks, [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
read the original abstract

The scalability of humanoid robots will depend not only on models and hardware, but also on whether physical experience can accumulate across robots, tasks, organizations, and time. Drawing on the authors' work in developing ISO/WD 26264-1, Humanoid robot datasets -- Part 1: General requirements, within ISO/TC 299/WG 16, this article argues that data standards are becoming foundational infrastructure for Physical AI. We develop three insights. First, humanoid robot data is embodied interaction data, not a collection of isolated digital samples; a useful dataset must preserve the relationship among robot body, action, task, scene, execution trace, and outcome. Second, its value depends on physical coherence: multimodal streams are reusable only when timing, coordinate frames, calibration, kinematics, units, and synchronization assumptions remain inspectable. Third, the main bottleneck is not only data scarcity, but non-cumulative data caused by high collection costs, data silos, and inconsistent evaluation. We argue that humanoid robot data standards address these bottlenecks by making embodied experience interpretable, shareable, traceable, and reusable. A general standard should provide horizontal infrastructure for lifecycle management, metadata, provenance, quality, versioning, and traceability, while capability-specific parts should define domain grammar for manipulation, locomotion, human-robot interaction, cognition, and future humanoid capabilities. As AI moves from screens into bodies, data standards must evolve from organizing digital information to structuring physical interaction.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript is a position paper arguing that data standards are the missing infrastructure for humanoid robotics and Physical AI. Drawing on the authors' participation in developing ISO/WD 26264-1, it articulates three insights: humanoid data as embodied interaction (preserving relationships among body, action, task, scene, trace, and outcome), the necessity of physical coherence (inspectable timing, frames, calibration, kinematics, units, and synchronization), and non-cumulative data (due to collection costs, silos, and inconsistent evaluation) as the primary bottleneck beyond scarcity. It proposes that horizontal standards for lifecycle management, metadata, provenance, quality, versioning, and traceability, plus capability-specific parts for manipulation, locomotion, HRI, and cognition, would render embodied experience interpretable, shareable, traceable, and reusable.

Significance. If the advocated standards are developed and adopted, they could accelerate cumulative progress in Physical AI by enabling data reuse across robots, tasks, organizations, and time, addressing a recognized infrastructure gap in embodied robotics. The paper's grounding in ongoing ISO/TC 299/WG 16 work provides a concrete, actionable basis for standardization efforts that could influence community practices.

major comments (1)
  1. [Abstract] Abstract (third insight): the assertion that non-cumulative data caused by silos and inconsistent evaluation is the main bottleneck (rather than hardware limits, model architecture, or safety constraints) is presented as a foundational premise without quantitative evidence, controlled comparisons, error bars, or falsifiable tests showing that standards would measurably improve data accumulation; this assumption is load-bearing for the claim that standards solve the identified bottlenecks.
minor comments (1)
  1. The manuscript would benefit from explicit discussion of how the proposed general standard and capability-specific parts would interoperate with existing robotics data formats or ontologies (e.g., references to ROS, OpenXR, or prior ISO robotics standards).

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive assessment of the paper's significance and for the detailed comment. As a position paper grounded in ongoing ISO standardization work, we address the concern about the third insight below.

read point-by-point responses
  1. Referee: [Abstract] Abstract (third insight): the assertion that non-cumulative data caused by silos and inconsistent evaluation is the main bottleneck (rather than hardware limits, model architecture, or safety constraints) is presented as a foundational premise without quantitative evidence, controlled comparisons, error bars, or falsifiable tests showing that standards would measurably improve data accumulation; this assumption is load-bearing for the claim that standards solve the identified bottlenecks.

    Authors: We acknowledge that the manuscript offers no new quantitative evidence, controlled comparisons, or statistical tests to demonstrate that standards would accelerate data accumulation. The third insight is advanced as a synthesized observation drawn from the authors' participation in ISO/TC 299/WG 16 and from recurring practical difficulties reported across humanoid robotics efforts (high per-dataset collection costs, platform-specific formats, and lack of shared evaluation protocols). Position papers in this domain commonly rely on such field-derived reasoning rather than primary empirical validation. We maintain that the claim is appropriate for the paper's argumentative purpose, but we will revise the abstract and introduction to explicitly characterize the three insights as hypotheses motivated by standardization experience, thereby clarifying their evidentiary basis without altering the core argument. revision: partial

Circularity Check

0 steps flagged

No significant circularity

full rationale

This is a position paper that develops three normative insights about embodied data and advocates for ISO-style standards as infrastructure. The central claim is an argument about the value of standards for making data cumulative; it contains no derivations, equations, predictions, fitted parameters, or uniqueness theorems. The disclosed reference to the authors' participation in ISO/WD 26264-1 is the basis of the proposal itself rather than a load-bearing citation that reduces any result to its own inputs by construction. The paper is self-contained as advocacy with no reduction of claims to self-referential fits or renamings.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper rests on the domain assumption that data standards are the primary missing piece for scaling physical AI; no free parameters, invented entities, or formal axioms are introduced because this is a position paper.

axioms (1)
  • domain assumption Embodied interaction data must preserve relationships among body, action, task, scene, execution trace, and outcome to be reusable.
    Stated as the first insight in the abstract; treated as definitional for useful datasets.

pith-pipeline@v0.9.1-grok · 5806 in / 1237 out tokens · 21961 ms · 2026-06-26T17:32:42.384603+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

28 extracted references · 7 canonical work pages

  1. [1]

    Rise of the autonomous machines.Computer, 55(1):64–73, 2022

    Shaoshan Liu and Jean-Luc Gaudiot. Rise of the autonomous machines.Computer, 55(1):64–73, 2022

  2. [2]

    Shaping the outlook for the autonomy economy.Communications of the ACM, 67(6):10–12, 2024

    Shaoshan Liu. Shaping the outlook for the autonomy economy.Communications of the ACM, 67(6):10–12, 2024

  3. [3]

    Autonomy 2.0: The quest for economies of scale.Com- munications of the ACM, 68(4):28–32, 2025

    Shuang Wu, Bo Yu, Shaoshan Liu, and Yuhao Zhu. Autonomy 2.0: The quest for economies of scale.Com- munications of the ACM, 68(4):28–32, 2025

  4. [4]

    Putting the smarts into robot bodies.Communications of the ACM, 68(3):6–8, 2025

    Wang Fan and Shaoshan Liu. Putting the smarts into robot bodies.Communications of the ACM, 68(3):6–8, 2025

  5. [5]

    ISO/TC 299/WG 16: Humanoid robot datasets

    International Organization for Standardization. ISO/TC 299/WG 16: Humanoid robot datasets. https://www. iso.org/committee/5915511.html, 2026. Accessed: 2026- 06-15

  6. [6]

    ISO/WD 26264-1: Humanoid robot datasets — Part 1: General requirements

    International Organization for Standardiza- tion. ISO/WD 26264-1: Humanoid robot datasets — Part 1: General requirements. https://www.iso.org/standard/93011.html, 2026. Accessed: 2026-06-15

  7. [7]

    An evolutionary path for embodied robotics.Communications of the ACM, 69(3):6–7, 2026

    Shaoshan Liu. An evolutionary path for embodied robotics.Communications of the ACM, 69(3):6–7, 2026

  8. [8]

    A survey of embodied artificial intelligence data engineering

    Xuan Xia, Haoran Tong, Xing He, Bo Yu, Ning Ding, Xue Liu, and Shaoshan Liu. A survey of embodied artificial intelligence data engineering. Technical Report of Shenzhen Institute of Artificial Intelligence and Robotics for Society, 2025

  9. [9]

    Open x-embodiment: Robotic learning datasets and RT-X models

    Open X-Embodiment Collaboration, Abby O’Neill, Ab- dul Rehman, Abhiram Maddukuri, Abhishek Gupta, Abhishek Padalkar, Abraham Lee, Acorn Pooley, Agrim Gupta, Ajay Mandlekar, et al. Open x-embodiment: Robotic learning datasets and RT-X models. InProceed- ings of the IEEE International Conference on Robotics and Automation, pages 6892–6903, 2024

  10. [10]

    All robots in one: A new standard and unified dataset for versatile, general- purpose embodied agents, 2024

    Zhiqiang Wang, Hao Zheng, Yunshuang Nie, Wenjun Xu, Qingwei Wang, Hua Ye, Zhe Li, Kaidong Zhang, Xuewen Cheng, Wanxi Dong, Chang Cai, Liang Lin, Feng Zheng, and Xiaodan Liang. All robots in one: A new standard and unified dataset for versatile, general- purpose embodied agents, 2024

  11. [11]

    DROID: A large-scale in-the-wild robot manipulation dataset

    Alexander Khazatsky, Karl Pertsch, Suraj Nair, Ash- win Balakrishna, Sudeep Dasari, Siddharth Karam- cheti, Soroush Nasiriany, Mohan Kumar Srirama, Lawrence Yunliang Chen, Kirsty Ellis, et al. DROID: A large-scale in-the-wild robot manipulation dataset. In Robotics: Science and Systems, 2024

  12. [12]

    Learning from massive human videos for universal humanoid pose control, 2024

    Jiageng Mao, Siheng Zhao, Siqi Song, Tianheng Shi, Junjie Ye, Mingtong Zhang, Haoran Geng, Jitendra Malik, Vitor Guizilini, and Yue Wang. Learning from massive human videos for universal humanoid pose control, 2024

  13. [13]

    Agibot world colosseo: A large-scale manipulation platform for scalable and intelligent embodied systems, 2025

    AgiBot-World-Contributors, Qingwen Bu, Jisong Cai, Li Chen, Xiuqi Cui, Yan Ding, Siyuan Feng, Shenyuan Gao, Xindong He, Xu Huang, Shu Jiang, Yuxin Jiang, Cheng Jing, Hongyang Li, Jialu Li, Chiming Liu, Yi Liu, Yuxiang Lu, Jianlan Luo, Ping Luo, et al. Agibot world colosseo: A large-scale manipulation platform for scalable and intelligent embodied systems, 2025

  14. [14]

    Tong, Bo Yu, J

    Xuan Xia, X. Tong, Bo Yu, J. Jiao, X. Ding, Xing He, Hongjun Zhou, Haoran Tong, Y. Lin, T. Shen, Ning Ding, and Shaoshan Liu. AIRSPEED: An open-source data production platform for embodied artificial intelligence. ACM Transactions on Cyber-Physical Systems, 2026

  15. [15]

    Brief industry paper: The matter of time—a general and efficient system for precise sensor synchronization in robotic computing

    Shaoshan Liu, Bo Yu, Yahui Liu, Kunai Zhang, Yisong Qiao, Thomas Yuang Li, Jie Tang, and Yuhao Zhu. Brief industry paper: The matter of time—a general and efficient system for precise sensor synchronization in robotic computing. In2021 IEEE 27th Real-Time and Embedded Technology and Applications Symposium (RTAS), 9 pages 413–416. IEEE, 2021

  16. [16]

    Uni- fied temporal and spatial calibration for multi-sensor systems

    Paul Furgale, Joern Rehder, and Roland Siegwart. Uni- fied temporal and spatial calibration for multi-sensor systems. In2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 1280–1286. IEEE,

  17. [17]

    doi: 10.1109/IROS.2013.6696514

  18. [18]

    Mousavian, C

    Roger Y. Tsai and Reimar K. Lenz. A new technique for fully autonomous and efficient 3d robotics hand/eye calibration.IEEE Transactions on Robotics and Automation, 5(3):345–358, 1989. doi: 10.1109/70.34770

  19. [19]

    tf: The transform library

    Tully Foote. tf: The transform library. In2013 IEEE Conference on Technologies for Practical Robot Applica- tions, pages 1–6. IEEE, 2013. doi: 10.1109/TePRA.2013. 6556373

  20. [20]

    Automatic camera and range sensor calibration using a single shot

    Andreas Geiger, Frank Moosmann, Ömer Car, and Bernhard Schuster. Automatic camera and range sensor calibration using a single shot. In2012 IEEE International Conference on Robotics and Automation, pages 3936–3943. IEEE, 2012. doi: 10.1109/ICRA.2012.6224570

  21. [21]

    Haptic rendering: Introductory concepts.IEEE Computer Graphics and Applications, 24(2):24–32, 2004

    Kenneth Salisbury, Francois Conti, and Federico Barbagli. Haptic rendering: Introductory concepts.IEEE Computer Graphics and Applications, 24(2):24–32, 2004. doi: 10.1109/MCG.2004.1274058

  22. [22]

    Recommen- dation ITU-R BT.1359-1: Relative timing of sound and vision for broadcasting

    International Telecommunication Union. Recommen- dation ITU-R BT.1359-1: Relative timing of sound and vision for broadcasting. https://www.itu.int/rec/ R-REC-BT.1359-1-199811-I/en, 1998

  23. [23]

    Robotics and Computer- Integrated Manufacturing94, 102957 (2025).https://doi.org/10.1016/j.rcim

    Jeremy A. Marvel, Joe Falco, and Ilari Marstio. Imple- menting speed and separation monitoring in collabora- tive robot workcells.Robotics and Computer-Integrated Manufacturing, 44:144–155, 2017. doi: 10.1016/j.rcim. 2016.08.001

  24. [24]

    Towards large-scale in-context reinforce- ment learning by meta-training in randomized worlds

    Fan Wang, Pengtao Shao, Yiming Zhang, Bo Yu, Shaoshan Liu, Ning Ding, Yang Cao, Yu Kang, and Haifeng Wang. Towards large-scale in-context reinforce- ment learning by meta-training in randomized worlds. Advances in Neural Information Processing Systems, 38: 171669–171704, 2026

  25. [25]

    The value of data in embodied artificial intelligence

    Shaoshan Liu. The value of data in embodied artificial intelligence. Communications of the ACM, 2024. URL https://cacm.acm.org/blogcacm/ the-value-of-data-in-embodied-artificial-intelligence/. Accessed: 2026-06-08

  26. [26]

    Kemp, Yoshio Matsumoto, Kazuhito Yokoi, and Eiichi Yoshida

    Paul Fitzpatrick, Kensuke Harada, Charles C. Kemp, Yoshio Matsumoto, Kazuhito Yokoi, and Eiichi Yoshida. Humanoids. In Bruno Siciliano and Oussama Khatib, editors,Springer Handbook of Robotics, pages 1789–1818. Springer International Publishing, Cham, 2 edition, 2016. doi: 10.1007/978-3-319-32552-1_67

  27. [27]

    2026 IEEE-RAS 25th International Conference on Humanoid Robots

    IEEE Robotics and Automation Society. 2026 IEEE-RAS 25th International Conference on Humanoid Robots. IEEE Robotics and Automation Society website, 2026. Accessed: 2026-06-15

  28. [28]

    Karen Liu, Ab- derrahmane Kheddar, Xue Bin Peng, Yuke Zhu, Guanya Shi, Quan Nguyen, Gordon Cheng, Huijun Gao, and Ye Zhao

    Zhaoyuan Gu, Junheng Li, Wenlan Shen, Wenhao Yu, Zhaoming Xie, Stephen McCrory, Xianyi Cheng, Abdulaziz Shamsah, Robert Griffin, C. Karen Liu, Ab- derrahmane Kheddar, Xue Bin Peng, Yuke Zhu, Guanya Shi, Quan Nguyen, Gordon Cheng, Huijun Gao, and Ye Zhao. Humanoid locomotion and manipulation: Current progress and challenges in control, planning, and learning, 2025