Same Weights, Different Robot: A Deployment Safety View of VLA Policies

Jianwei Tai

arxiv: 2606.03724 · v1 · pith:I6ZEWYBKnew · submitted 2026-06-02 · 💻 cs.CR

Same Weights, Different Robot: A Deployment Safety View of VLA Policies

Jianwei Tai This is my paper

Pith reviewed 2026-06-28 09:28 UTC · model grok-4.3

classification 💻 cs.CR

keywords VLA policiesdeployment safetyaction normalizationmetadata mismatchexecutable policyLIBERO benchmarkquantile normalization

0 comments

The pith

Identical VLA checkpoints can be executable-inequivalent due to action metadata differences.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Vision-language-action policies are often assumed to be defined solely by their weights, prompt, and benchmark. However, robot execution depends on action representation, metadata-selected unnormalizer, and controller conventions, creating a deployment safety gap. The paper formalizes this as an executable policy specification problem where identical checkpoints can produce different physical actions. For quantile-style normalization, a closed-form transform and ExecSpec certificate detect semantic drift without running the model. Replay experiments on LIBERO show that metadata substitution can drastically reduce success rates, supporting the need to check action-space metadata before rollout.

Core claim

We formalize the gap as an executable policy specification problem: a VLA policy includes the learned model, action representation, metadata-selected unnormalizer, and controller-facing conventions. Under this view, identical checkpoints can be executable-inequivalent. For quantile-style action normalization, we derive a closed-form metadata mismatch transform and an ExecSpec certificate that measures action-space semantic drift without model inference or rollout.

What carries the argument

The ExecSpec certificate that measures action-space semantic drift from metadata mismatches in quantile-style action normalization without requiring model inference or rollout.

Load-bearing premise

The replay-based substitution experiments on LIBERO benchmarks accurately indicate a general deployment safety issue.

What would settle it

Finding that different metadata keys produce identical unnormalized action sequences and success rates on the same checkpoint would falsify the claim of executable inequivalence.

read the original abstract

Vision-language-action (VLA) policies are often treated as checkpoint-defined objects: if the weights, prompt, and benchmark suite match, the deployment is assumed to be the same policy. Robot execution breaks this assumption because the same normalized model output can become a different physical action after action unnormalization and controller conventions are applied. This creates a deployment-safety gap: safety review can certify the checkpoint while missing the executable robot policy that reaches the controller. We formalize this gap as an executable policy specification problem: a VLA policy includes the learned model, action representation, metadata-selected unnormalizer, and controller-facing conventions. Under this view, identical checkpoints can be executable-inequivalent. For quantile-style action normalization, we derive a closed-form metadata mismatch transform and an ExecSpec certificate that measures action-space semantic drift without model inference or rollout. On LIBERO-Goal replay, substituting a plausible sibling metadata key yields mean drift 0.199 over six non-gripper action dimensions and reduces success from 28/28 to 2/28 under full substitution. On LIBERO-Spatial replay, the same substituted key reduces success from 26/26 to 0/26. The same full-substitution protocol gives 0/28 success for all four Object substitutions and 0/23 or 1/23 success on Long. Identity-key, replay-validity, no-op filtering, raw-vs-correct replay, mask/gripper, synthetic upper-bound, and OpenVLA-style unnormalizer interface checks rule out several simpler explanations. These results do not certify closed-loop or hardware safety. They support a narrower deployment-safety view: action-space metadata is part of the executable policy and should be checked before rollout.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper correctly identifies that VLA checkpoints need action metadata in their executable spec, with a clean closed-form quantile transform and direct replay measurements, though the safety implications stay limited to the replay setting they describe.

read the letter

The new piece is the formalization of the executable policy specification that bundles weights with unnormalizer metadata and controller conventions. For quantile normalization they derive a closed-form mismatch transform and an ExecSpec certificate that flags semantic drift without any model run or rollout. That derivation follows straight from the normalization definition and is a useful, model-free tool.

The LIBERO replay results are straightforward: mean action drift of 0.199 across dimensions and large success drops when a sibling metadata key is substituted. They include several controls (identity keys, no-op filtering, raw-vs-correct replay, gripper masks) that rule out obvious confounds. These numbers make the narrower claim—that metadata belongs in the policy spec—easy to see.

The soft spot is exactly what the abstract flags: everything rests on open-loop replay substitution of recorded actions. No closed-loop execution or hardware is tested, so the results do not speak to whether the drift would persist, shrink, or matter once feedback and dynamics are present. The paper does not overclaim here, but that keeps the deployment-safety angle narrower than the title suggests.

This is for people who actually ship or certify VLA systems on robots. The formal part and the replay checks are solid enough to discuss in a review, even if the broader safety conclusion needs more work. I would send it to referees.

Referee Report

0 major / 3 minor

Summary. The manuscript claims that VLA policies are not fully specified by model weights alone, since action unnormalization metadata and controller conventions determine the executable robot policy. Identical checkpoints can therefore be executable-inequivalent. For quantile-style normalization the authors derive a closed-form metadata mismatch transform and introduce an ExecSpec certificate that quantifies action-space semantic drift without model inference or rollout. Replay substitution experiments on LIBERO-Goal and LIBERO-Spatial report mean drift of 0.199 across six non-gripper dimensions and success-rate drops (28/28 to 2/28; 26/26 to 0/26) under full substitution of a plausible sibling metadata key. Multiple controls (identity-key, replay-validity, no-op filtering, raw-vs-correct replay, mask/gripper, synthetic upper-bound, OpenVLA-style interface) rule out simpler explanations. The results support treating action metadata as part of the executable specification that should be checked before rollout, while explicitly stating that the replay protocol does not certify closed-loop or hardware safety.

Significance. If the central claim holds, the work highlights an under-appreciated deployment-safety consideration for VLA policies: metadata must be included in the policy specification. The closed-form derivation and the model-free ExecSpec certificate are concrete strengths that allow drift detection without inference or rollouts. The manuscript carefully scopes its conclusions to the replay setting, which prevents overgeneralization and directly addresses the bridging-assumption concern raised in the stress-test note. The empirical measurements on standard benchmarks provide direct, falsifiable evidence of the phenomenon.

minor comments (3)

The abstract lists the controls that rule out simpler explanations but does not indicate where in the manuscript the detailed results of each control appear; a short summary table or dedicated paragraph would improve traceability.
The ExecSpec certificate is introduced as a model-free measure, yet the abstract provides no equation or pseudocode; including the precise definition (even if only referenced) would aid reproducibility.
Success rates are reported as exact fractions (28/28, 2/28) without accompanying trial counts, variance, or statistical tests; adding these details would strengthen the presentation of the quantitative results.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive and accurate summary of our manuscript, including the recognition of the closed-form metadata mismatch transform, the ExecSpec certificate, and the careful scoping to replay-based evidence. The recommendation of minor revision is noted; we will incorporate any editorial or minor clarifications in the revised version.

Circularity Check

0 steps flagged

No circularity: closed-form transform follows directly from quantile definition; results are independent empirical measurements

full rationale

The paper's central derivation is an algebraic closed-form transform obtained directly from the standard definition of quantile-style action normalization; this is ordinary mathematical expansion rather than any self-referential loop or fitted input renamed as prediction. The LIBERO replay substitution results are direct empirical observations of success-rate changes under metadata substitution and are not quantities generated by the paper's own equations. No self-citations, uniqueness theorems, or ansatzes imported from prior author work appear as load-bearing steps. The derivation chain is therefore self-contained against external benchmarks and does not reduce to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that quantile-style normalization applies and introduces the ExecSpec certificate as a new measurement tool without external falsifiable evidence.

axioms (1)

domain assumption Action normalization follows a quantile-style process
The closed-form metadata mismatch transform is derived specifically under this normalization type.

invented entities (1)

ExecSpec certificate no independent evidence
purpose: Measures action-space semantic drift without requiring model inference or rollout
Newly defined in the paper to quantify the executable policy gap.

pith-pipeline@v0.9.1-grok · 5839 in / 1263 out tokens · 32328 ms · 2026-06-28T09:28:42.995983+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

25 extracted references · 13 linked inside Pith

[1]

D.; Chernova, S.; Veloso, M.; and Browning, B

Argall, B. D.; Chernova, S.; Veloso, M.; and Browning, B. 2009. A Survey of Robot Learning from Demonstration. Robotics and Autonomous Systems, 57(5): 469--483

2009
[2]

Brohan, A.; Brown, N.; Carbajal, J.; Chebotar, Y.; Chen, X.; Choromanski, K.; Ding, T.; Driess, D.; Dubey, A.; Finn, C.; Florence, P.; Fu, C.; Gonzalez Arenas, M.; Gopalakrishnan, K.; Han, K.; Hausman, K.; Herzog, A.; Hsu, J.; Ichter, B.; Irpan, A.; Joshi, N.; Julian, R.; Kalashnikov, D.; Kuang, Y.; Leal, I.; Lee, L.; Lee, T.-W. E.; Levine, S.; Lu, Y.; Mi...

Pith/arXiv arXiv 2023
[3]

W.; Yuan, Z.; Zhou, S.; Panerati, J.; and Schoellig, A

Brunke, L.; Greeff, M.; Hall, A. W.; Yuan, Z.; Zhou, S.; Panerati, J.; and Schoellig, A. P. 2022. Safe Learning in Robotics: From Learning-Based Control to Safe Reinforcement Learning. Annual Review of Control, Robotics, and Autonomous Systems, 5: 411--444

2022
[4]

Cadene, R.; Aliberts, S.; Capuano, F.; Aractingi, M.; Zouitine, A.; Kooijmans, P.; Choghari, J.; Russi, M.; Pascal, C.; Palma, S.; Shukor, M.; Moss, J.; Soare, A.; Aubakirova, D.; Lhoest, Q.; Gallouedec, Q.; and Wolf, T. 2026. LeRobot : An Open-Source Library for End-to-End Robot Learning. arXiv:2602.22818

arXiv 2026
[5]

Chi, C.; Xu, Z.; Feng, S.; Cousineau, E.; Du, Y.; Burchfiel, B.; Tedrake, R.; and Song, S. 2023. Diffusion Policy: Visuomotor Policy Learning via Action Diffusion. In Robotics: Science and Systems

2023
[6]

Chi, C.; Xu, Z.; Pan, C.; Cousineau, E.; Burchfiel, B.; Feng, S.; Tedrake, R.; and Song, S. 2024. Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots. arXiv:2402.10329

Pith/arXiv arXiv 2024
[7]

D.; Krishna, R.; Fox, D.; and Yu, Y

Choi, S.; Lee, Y.; Park, Y.; Kim, C. D.; Krishna, R.; Fox, D.; and Yu, Y. 2026. vla-eval : A Unified Evaluation Harness for Vision-Language-Action Models. arXiv:2603.13966

Pith/arXiv arXiv 2026
[8]

Dasari, S.; Ebert, F.; Tian, S.; Nair, S.; Bucher, B.; Schmeckpeper, K.; Singh, S.; Levine, S.; and Finn, C. 2020. RoboNet : Large-Scale Multi-Robot Learning. In Proceedings of the Conference on Robot Learning, volume 100 of Proceedings of Machine Learning Research, 885--897. PMLR

2020
[9]

W.; Wallach, H.; Daum \'e III, H.; and Crawford, K

Gebru, T.; Morgenstern, J.; Vecchione, B.; Vaughan, J. W.; Wallach, H.; Daum \'e III, H.; and Crawford, K. 2021. Datasheets for Datasets. Communications of the ACM, 64(12): 86--92

2021
[10]

S.; Zhang, J.; Tang, S.; and Xiang, Y

Huang, A. S.; Zhang, J.; Tang, S.; and Xiang, Y. 2026. VLA-REPLICA : A Low-Cost, Reproducible Benchmark for Real-World Evaluation of Vision-Language-Action Models. arXiv:2605.20774

Pith/arXiv arXiv 2026
[11]

Huang, S.; Papernot, N.; Goodfellow, I.; Duan, Y.; and Abbeel, P. 2017. Adversarial Attacks on Neural Network Policies. arXiv:1702.02284

Pith/arXiv arXiv 2017
[12]

Khazatsky, A.; Pertsch, K.; Nair, S.; Balakrishna, A.; Dasari, S.; Karamcheti, S.; Nasiriany, S.; Sreekanth, K.; Fang, K.; Schaal, S.; Finn, C.; and Levine, S. 2024. DROID : A Large-Scale In-The-Wild Robot Manipulation Dataset. arXiv:2403.12945

Pith/arXiv arXiv 2024
[13]

Kim, M. J.; Pertsch, K.; Karamcheti, S.; Xiao, T.; Balakrishna, A.; Nair, S.; Rafailov, R.; Foster, E.; Lam, G.; Sanketi, P.; Vuong, Q.; Kollar, T.; Burchfiel, B.; Tedrake, R.; Sadigh, D.; Levine, S.; Liang, P.; and Finn, C. 2024. OpenVLA : An Open-Source Vision-Language-Action Model. arXiv:2406.09246

Pith/arXiv arXiv 2024
[14]

Li, Q.; Liang, Y.; Wang, Z.; Luo, L.; Chen, X.; et al. 2024. CogACT : A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation. arXiv:2411.19650

Pith/arXiv arXiv 2024
[15]

Liu, B.; Zhu, Y.; Gao, C.; Feng, Y.; Liu, Q.; Zhu, Y.; and Stone, P. 2023. LIBERO : Benchmarking Knowledge Transfer for Lifelong Robot Learning. In Advances in Neural Information Processing Systems, volume 36

2023
[16]

Mandlekar, A.; Xu, D.; Wong, J.; Nasiriany, S.; Wang, C.; Kulkarni, R.; Fei-Fei, L.; Savarese, S.; Zhu, Y.; and Mart \'i n-Mart \'i n, R. 2022. What Matters in Learning from Offline Human Demonstrations for Robot Manipulation. In Proceedings of the 5th Conference on Robot Learning, volume 164 of Proceedings of Machine Learning Research, 1678--1690. PMLR

2022
[17]

D.; and Gebru, T

Mitchell, M.; Wu, S.; Zaldivar, A.; Barnes, P.; Vasserman, L.; Hutchinson, B.; Spitzer, E.; Raji, I. D.; and Gebru, T. 2019. Model Cards for Model Reporting. In Proceedings of the Conference on Fairness, Accountability, and Transparency, 220--229

2019
[18]

L.; Chen, L

Octo Model Team ; Ghosh, D.; Walke, H.; Pertsch, K.; Black, K.; Mees, O.; Dasari, S.; Hejna, J.; Kreiman, T.; Xu, C.; Luo, J.; Tan, Y. L.; Chen, L. Y.; Sanketi, P.; Vuong, Q.; Xiao, T.; Sadigh, D.; Finn, C.; and Levine, S. 2024. Octo : An Open-Source Generalist Robot Policy. arXiv:2405.12213

Pith/arXiv arXiv 2024
[19]

Open X-Embodiment Collaboration ; O'Neill, A.; Rehman, A.; Gupta, A.; Maddukuri, A.; Gupta, A.; Padalkar, A.; Lee, A.; Pooley, A.; Gupta, A.; Mandlekar, A.; Jain, A.; Tung, A.; Bewley, A.; Herzog, A.; Irpan, A.; Khazatsky, A.; Rai, A.; Gupta, A.; Wang, A.; Kolobov, A.; Singh, A.; Garg, A.; Kembhavi, A.; Xie, A.; Brohan, A.; Finn, C.; Ichter, B.; Levine, S...

Pith/arXiv arXiv 2023
[20]

Physical Intelligence . 2026. OpenPI Normalization Statistics Documentation. https://github.com/Physical-Intelligence/openpi. Docs/norm\_stats.md

2026
[21]

Pineau, J.; Vincent-Lamarre, P.; Sinha, K.; Larivi \`e re, V.; Beygelzimer, A.; d'Alch \'e Buc, F.; Fox, E.; and Larochelle, H. 2021. Improving Reproducibility in Machine Learning Research: A Report from the NeurIPS 2019 Reproducibility Program. Journal of Machine Learning Research, 22(164): 1--20

2021
[22]

StarVLA Community . 2026. StarVLA : A Lego-like Codebase for Vision-Language-Action Model Developing. arXiv:2604.05014

Pith/arXiv arXiv 2026
[23]

Zhang, J.; and Cho, K. 2017. Query-Efficient Imitation Learning for End-to-End Simulated Driving. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 31

2017
[24]

Z.; Kumar, V.; Levine, S.; and Finn, C

Zhao, T. Z.; Kumar, V.; Levine, S.; and Finn, C. 2023. Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware. arXiv:2304.13705

Pith/arXiv arXiv 2023
[25]

Zhu, Y.; Wong, J.; Mandlekar, A.; Mart \'i n-Mart \'i n, R.; Joshi, A.; Lin, K.; Maddukuri, A.; Nasiriany, S.; and Zhu, Y. 2020. robosuite : A Modular Simulation Framework and Benchmark for Robot Learning. arXiv:2009.12293

Pith/arXiv arXiv 2020

[1] [1]

D.; Chernova, S.; Veloso, M.; and Browning, B

Argall, B. D.; Chernova, S.; Veloso, M.; and Browning, B. 2009. A Survey of Robot Learning from Demonstration. Robotics and Autonomous Systems, 57(5): 469--483

2009

[2] [2]

Brohan, A.; Brown, N.; Carbajal, J.; Chebotar, Y.; Chen, X.; Choromanski, K.; Ding, T.; Driess, D.; Dubey, A.; Finn, C.; Florence, P.; Fu, C.; Gonzalez Arenas, M.; Gopalakrishnan, K.; Han, K.; Hausman, K.; Herzog, A.; Hsu, J.; Ichter, B.; Irpan, A.; Joshi, N.; Julian, R.; Kalashnikov, D.; Kuang, Y.; Leal, I.; Lee, L.; Lee, T.-W. E.; Levine, S.; Lu, Y.; Mi...

Pith/arXiv arXiv 2023

[3] [3]

W.; Yuan, Z.; Zhou, S.; Panerati, J.; and Schoellig, A

Brunke, L.; Greeff, M.; Hall, A. W.; Yuan, Z.; Zhou, S.; Panerati, J.; and Schoellig, A. P. 2022. Safe Learning in Robotics: From Learning-Based Control to Safe Reinforcement Learning. Annual Review of Control, Robotics, and Autonomous Systems, 5: 411--444

2022

[4] [4]

Cadene, R.; Aliberts, S.; Capuano, F.; Aractingi, M.; Zouitine, A.; Kooijmans, P.; Choghari, J.; Russi, M.; Pascal, C.; Palma, S.; Shukor, M.; Moss, J.; Soare, A.; Aubakirova, D.; Lhoest, Q.; Gallouedec, Q.; and Wolf, T. 2026. LeRobot : An Open-Source Library for End-to-End Robot Learning. arXiv:2602.22818

arXiv 2026

[5] [5]

Chi, C.; Xu, Z.; Feng, S.; Cousineau, E.; Du, Y.; Burchfiel, B.; Tedrake, R.; and Song, S. 2023. Diffusion Policy: Visuomotor Policy Learning via Action Diffusion. In Robotics: Science and Systems

2023

[6] [6]

Chi, C.; Xu, Z.; Pan, C.; Cousineau, E.; Burchfiel, B.; Feng, S.; Tedrake, R.; and Song, S. 2024. Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots. arXiv:2402.10329

Pith/arXiv arXiv 2024

[7] [7]

D.; Krishna, R.; Fox, D.; and Yu, Y

Choi, S.; Lee, Y.; Park, Y.; Kim, C. D.; Krishna, R.; Fox, D.; and Yu, Y. 2026. vla-eval : A Unified Evaluation Harness for Vision-Language-Action Models. arXiv:2603.13966

Pith/arXiv arXiv 2026

[8] [8]

Dasari, S.; Ebert, F.; Tian, S.; Nair, S.; Bucher, B.; Schmeckpeper, K.; Singh, S.; Levine, S.; and Finn, C. 2020. RoboNet : Large-Scale Multi-Robot Learning. In Proceedings of the Conference on Robot Learning, volume 100 of Proceedings of Machine Learning Research, 885--897. PMLR

2020

[9] [9]

W.; Wallach, H.; Daum \'e III, H.; and Crawford, K

Gebru, T.; Morgenstern, J.; Vecchione, B.; Vaughan, J. W.; Wallach, H.; Daum \'e III, H.; and Crawford, K. 2021. Datasheets for Datasets. Communications of the ACM, 64(12): 86--92

2021

[10] [10]

S.; Zhang, J.; Tang, S.; and Xiang, Y

Huang, A. S.; Zhang, J.; Tang, S.; and Xiang, Y. 2026. VLA-REPLICA : A Low-Cost, Reproducible Benchmark for Real-World Evaluation of Vision-Language-Action Models. arXiv:2605.20774

Pith/arXiv arXiv 2026

[11] [11]

Huang, S.; Papernot, N.; Goodfellow, I.; Duan, Y.; and Abbeel, P. 2017. Adversarial Attacks on Neural Network Policies. arXiv:1702.02284

Pith/arXiv arXiv 2017

[12] [12]

Khazatsky, A.; Pertsch, K.; Nair, S.; Balakrishna, A.; Dasari, S.; Karamcheti, S.; Nasiriany, S.; Sreekanth, K.; Fang, K.; Schaal, S.; Finn, C.; and Levine, S. 2024. DROID : A Large-Scale In-The-Wild Robot Manipulation Dataset. arXiv:2403.12945

Pith/arXiv arXiv 2024

[13] [13]

Kim, M. J.; Pertsch, K.; Karamcheti, S.; Xiao, T.; Balakrishna, A.; Nair, S.; Rafailov, R.; Foster, E.; Lam, G.; Sanketi, P.; Vuong, Q.; Kollar, T.; Burchfiel, B.; Tedrake, R.; Sadigh, D.; Levine, S.; Liang, P.; and Finn, C. 2024. OpenVLA : An Open-Source Vision-Language-Action Model. arXiv:2406.09246

Pith/arXiv arXiv 2024

[14] [14]

Li, Q.; Liang, Y.; Wang, Z.; Luo, L.; Chen, X.; et al. 2024. CogACT : A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation. arXiv:2411.19650

Pith/arXiv arXiv 2024

[15] [15]

Liu, B.; Zhu, Y.; Gao, C.; Feng, Y.; Liu, Q.; Zhu, Y.; and Stone, P. 2023. LIBERO : Benchmarking Knowledge Transfer for Lifelong Robot Learning. In Advances in Neural Information Processing Systems, volume 36

2023

[16] [16]

Mandlekar, A.; Xu, D.; Wong, J.; Nasiriany, S.; Wang, C.; Kulkarni, R.; Fei-Fei, L.; Savarese, S.; Zhu, Y.; and Mart \'i n-Mart \'i n, R. 2022. What Matters in Learning from Offline Human Demonstrations for Robot Manipulation. In Proceedings of the 5th Conference on Robot Learning, volume 164 of Proceedings of Machine Learning Research, 1678--1690. PMLR

2022

[17] [17]

D.; and Gebru, T

Mitchell, M.; Wu, S.; Zaldivar, A.; Barnes, P.; Vasserman, L.; Hutchinson, B.; Spitzer, E.; Raji, I. D.; and Gebru, T. 2019. Model Cards for Model Reporting. In Proceedings of the Conference on Fairness, Accountability, and Transparency, 220--229

2019

[18] [18]

L.; Chen, L

Octo Model Team ; Ghosh, D.; Walke, H.; Pertsch, K.; Black, K.; Mees, O.; Dasari, S.; Hejna, J.; Kreiman, T.; Xu, C.; Luo, J.; Tan, Y. L.; Chen, L. Y.; Sanketi, P.; Vuong, Q.; Xiao, T.; Sadigh, D.; Finn, C.; and Levine, S. 2024. Octo : An Open-Source Generalist Robot Policy. arXiv:2405.12213

Pith/arXiv arXiv 2024

[19] [19]

Open X-Embodiment Collaboration ; O'Neill, A.; Rehman, A.; Gupta, A.; Maddukuri, A.; Gupta, A.; Padalkar, A.; Lee, A.; Pooley, A.; Gupta, A.; Mandlekar, A.; Jain, A.; Tung, A.; Bewley, A.; Herzog, A.; Irpan, A.; Khazatsky, A.; Rai, A.; Gupta, A.; Wang, A.; Kolobov, A.; Singh, A.; Garg, A.; Kembhavi, A.; Xie, A.; Brohan, A.; Finn, C.; Ichter, B.; Levine, S...

Pith/arXiv arXiv 2023

[20] [20]

Physical Intelligence . 2026. OpenPI Normalization Statistics Documentation. https://github.com/Physical-Intelligence/openpi. Docs/norm\_stats.md

2026

[21] [21]

Pineau, J.; Vincent-Lamarre, P.; Sinha, K.; Larivi \`e re, V.; Beygelzimer, A.; d'Alch \'e Buc, F.; Fox, E.; and Larochelle, H. 2021. Improving Reproducibility in Machine Learning Research: A Report from the NeurIPS 2019 Reproducibility Program. Journal of Machine Learning Research, 22(164): 1--20

2021

[22] [22]

StarVLA Community . 2026. StarVLA : A Lego-like Codebase for Vision-Language-Action Model Developing. arXiv:2604.05014

Pith/arXiv arXiv 2026

[23] [23]

Zhang, J.; and Cho, K. 2017. Query-Efficient Imitation Learning for End-to-End Simulated Driving. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 31

2017

[24] [24]

Z.; Kumar, V.; Levine, S.; and Finn, C

Zhao, T. Z.; Kumar, V.; Levine, S.; and Finn, C. 2023. Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware. arXiv:2304.13705

Pith/arXiv arXiv 2023

[25] [25]

Zhu, Y.; Wong, J.; Mandlekar, A.; Mart \'i n-Mart \'i n, R.; Joshi, A.; Lin, K.; Maddukuri, A.; Nasiriany, S.; and Zhu, Y. 2020. robosuite : A Modular Simulation Framework and Benchmark for Robot Learning. arXiv:2009.12293

Pith/arXiv arXiv 2020