pith. sign in

arxiv: 2606.31993 · v1 · pith:WHA4J7XUnew · submitted 2026-06-30 · 💻 cs.RO

OopsieVerse: A Safety Benchmark with Damage-Aware Simulation for Robot Manipulation

Pith reviewed 2026-07-01 05:03 UTC · model grok-4.3

classification 💻 cs.RO
keywords robot manipulationsafety benchmarkdamage simulationhousehold robotsphysical safetysimulation frameworkpolicy learning
0
0 comments X

The pith

OOPSIEVERSE supplies damage as an explicit, physically-grounded signal in robot manipulation simulations by converting contact forces, temperature changes, and liquid interactions into mechanical, thermal, or fluid damage metrics.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that task success alone is insufficient for household robots because damage to the robot or surroundings can still occur, and existing simulators lack general mechanisms to detect and quantify such harm. OOPSIEVERSE addresses this by providing DAMAGESIM, a framework that turns physical interaction sources into corresponding damage values, plus a suite of household tasks that separate task completion from safe execution. The framework is shown to work across two different physics simulators. A sympathetic reader would care because it opens a path to training and evaluating policies that avoid harm without requiring dangerous real-world trials.

Core claim

OOPSIEVERSE provides damage as an explicit, physically-grounded, and task-agnostic signal by converting sources such as contact forces, temperature changes, and liquid interactions into corresponding mechanical, thermal or fluid damage. It comprises DAMAGESIM for detecting and quantifying damage during navigation and manipulation, plus tasks designed to evaluate common damage modes. The framework is instantiated in OmniGibson and RoboCasa, and supports use cases including safer demonstration collection, damage-conditioned policy learning, safety benchmarking of vision-language-action models, and improved sim-to-real transfer.

What carries the argument

DAMAGESIM, a simulator-agnostic framework that detects and quantifies damage from physical sources during robot navigation and manipulation.

If this is right

  • Real-time damage feedback can guide collection of safer human demonstrations.
  • Damage-conditioned imitation learning and reinforcement learning can produce safer manipulation policies.
  • State-of-the-art vision-language-action policies can be benchmarked for safety in addition to task success.
  • Sim-to-real transferred policies can achieve improved real-world safety.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same damage-conversion approach could be applied to non-household domains such as industrial or outdoor robotics.
  • Integrating DAMAGESIM outputs with real sensor streams might enable hybrid sim-real safety training loops.
  • Standardized damage metrics from this framework could support regulatory or certification requirements for household robots.

Load-bearing premise

The conversion rules inside DAMAGESIM produce damage values that meaningfully correspond to real physical damage.

What would settle it

A direct comparison of damage values produced by DAMAGESIM against measured outcomes from equivalent real-world robot experiments using force sensors, temperature probes, and fluid volume tracking.

Figures

Figures reproduced from arXiv: 2606.31993 by Arnav Balaji, Arpit Bahety, Daniel Lam, Junhong Xu, Roberto Mart\'in-Mart\'in, Sriniket Ambatipudi.

Figure 1
Figure 1. Figure 1: Oops, did I do that? Robots trained and evaluated in simulation are unaware of the real-world damage their actions would cause, such as applying excessive forces, heating or freezing inappropriate objects, or spilling water on delicate items. OOPSIEVERSE addresses this gap by providing DamageSim, a general damage-aware simulation framework, including a plugin for multiple simulators (instantiated in Omnive… view at source ↗
Figure 2
Figure 2. Figure 2: Damage-Augmented POMDP Implementation with DAM￾AGESIM in OOPSIEVERSE. Conceptually, our DAMAGESIM plugin (blue) extends an existing POMDPs in simulation (green, dotted arrow to agent) by augmenting the state with health. The health state can in turn influence the observations and rewards, and/or provide access to new damage-aware terminal states. DAMAGESIM implements the health state to augment the origina… view at source ↗
Figure 3
Figure 3. Figure 3: Types of physical damage detected by DAMAGESIM. Object-level damage caused by excessive a) impact, b) compression forces, or c) tensile forces, by d) temperature changes, or by e) water spills are measured and tracked by our simulation plugin, DAMAGESIM, enabling learning and evaluating their effect in robot behaviors. In the pictures, the top row shows a state without damage while in the bottom row the pi… view at source ↗
Figure 4
Figure 4. Figure 4: Tasks in OOPSIEBENCH. The image show 21 different tasks instantiated in Behavior-1k/OmniGibson (built on Nvidia Omniverse) and/or RoboCasa/Robosuite (built on DeepMind MuJoCo). The complete task suite includes 32 tasks combining both simulators spanning diverse household objects, scenes, and interaction patterns, and is designed to expose agents to potential hazards across multiple damage modalities, inclu… view at source ↗
Figure 5
Figure 5. Figure 5: Imitation Learning with OOPSIEVERSE. Task completion rate (solid bar) and safe task completion rate (striped bar) for policies trained with only demonstrations collected without health feedback (blue), only demonstrations collected with health feedback (orange), all the demonstrations (green), all demonstrations filtering entire episodes (red) and individual datapoints (purple) with health losses > 5, for … view at source ↗
Figure 6
Figure 6. Figure 6: Real-time damage visualization provided by DAMAGESIM. The interface tracks and displays to the teleoperator the per-object health as the interactions occur, enabling immediate identification and reaction to unsafe behaviors. It displays the health in two complementary ways: health bars, which allow for visualizing the health of objects out of frame, and as object coloration, which gives an intuitive cue of… view at source ↗
Figure 7
Figure 7. Figure 7: visualizes the trajectory distributions for the Shelve Cereal Box task, comparing policies trained without health feedback (red) versus with health-filtered episodes (green). The policy trained on filtered data exhibits a tighter distribution concentrated toward the right side of the shelf—a region free of fragile objects. In contrast, the unfiltered policy produces more trajectories that frequently place … view at source ↗
Figure 8
Figure 8. Figure 8: Reinforcement Learning with OOPSIEVERSE. We evaluate three RL variants across 3 tasks: training from scratch, fine-tuning a BC-GMM policy, and DSRL fine-tuning. Comparing safe task com￾pletion rates between the baseline that does not use OOPSIEVERSE’s health information (blue) and methods that do (orange), shows that OOPSIEVERSE enables learning safer policies both from scratch and through the fine-tuning … view at source ↗
Figure 9
Figure 9. Figure 9: Sim-to-real transfer of policies trained in OOPSIEVERSE. We evaluate baseline IL policy (without health feedback) with the IL policy with health-filtered episodes on Panda robot. The baseline IL policy often performs unsafe behaviors like spilling water on the laptop or pushing a fragile bottle over the shelf. Whereas the damage-aware filtered-episode IL policy learns safer behaviors. The damage-aware IL p… view at source ↗
read the original abstract

While robotic manipulation capabilities have advanced rapidly, physical safety remains a major barrier to deploying household robots: task success is insufficient if the robot damages itself or its surroundings. Simulation offers a harm-free alternative to costly and dangerous real-world training and evaluation, yet existing simulators lack general mechanisms to detect, quantify, and represent damage. To address this gap, we introduce OOPSIEVERSE, a unified simulation framework and benchmark for damage-aware household manipulation. OOPSIEVERSE provides damage as an explicit, physically-grounded, and taskagnostic signal by converting sources such as contact forces, temperature changes, and liquid interactions into corresponding mechanical, thermal or fluid damage. OOPSIEVERSE comprises two core elements: (1) DAMAGESIM, a simulator-agnostic framework for detecting and quantifying damage during navigation and manipulation, and (2) a suite of household tasks designed to evaluate common damage modes and distinguish between task completion and safe execution. We demonstrate the generality of our framework by instantiating DAMAGESIM in two simulators with different physics backends, OmniGibson (Nvidia Omniverse) and RoboCasa (MuJoCo). We further showcase the utility of OOPSIEVERSE across multiple use cases, including (1) guiding safer demonstration collection via real-time damage feedback, (2) learning safer manipulation policies through damage-conditioned imitation learning and reinforcement learning, (3) benchmarking the safety of state-of-the-art Vision Language Action policies, and (4) improving real-world safety of sim-to-real transferred policies. Together, our results highlight the potential of OOPSIEVERSE as an open-source foundation for systematic, scalable research on safe robot manipulation. For code and more information, please refer to https://robin-lab.cs.utexas.edu/oopsieverse/

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces OOPSIEVERSE, a unified simulation framework and benchmark for damage-aware household robot manipulation. It consists of DAMAGESIM, a simulator-agnostic layer that converts contact forces, temperature changes, and liquid interactions into mechanical, thermal, or fluid damage scores, instantiated in OmniGibson and RoboCasa; a suite of household tasks; and four use cases demonstrating safer demonstration collection, damage-conditioned policy learning, VLA benchmarking, and sim-to-real transfer.

Significance. If the damage conversion rules prove accurate, OOPSIEVERSE could fill a clear gap in existing simulators by supplying an explicit, task-agnostic safety signal, enabling more systematic research on safe manipulation. The open-source release, dual-simulator instantiation, and emphasis on reproducibility are concrete strengths that support adoption as a foundation for future work.

major comments (2)
  1. [Abstract / DAMAGESIM] Abstract and DAMAGESIM description: the central claim that damage is provided as a 'physically-grounded' signal rests on conversion rules whose outputs 'meaningfully correspond to real physical damage,' yet the manuscript supplies no calibration data, material-property thresholds, correlation coefficients, or post-interaction measurements (e.g., strain, fracture, or functional degradation) against real-world observations; this is load-bearing for the 'physically-grounded' property.
  2. [Use cases] Use-cases section: the four demonstrated applications (real-time feedback, imitation/RL, VLA benchmarking, sim-to-real) are described at a high level but contain no quantitative results, error bars, ablation studies, or statistical validation, so the claim that OOPSIEVERSE 'highlights the potential' as a foundation cannot be assessed from the presented evidence.
minor comments (1)
  1. [Abstract] Abstract: 'taskagnostic' is missing a hyphen and should read 'task-agnostic'.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and commit to revisions that strengthen the physical grounding and quantitative support in the manuscript.

read point-by-point responses
  1. Referee: [Abstract / DAMAGESIM] Abstract and DAMAGESIM description: the central claim that damage is provided as a 'physically-grounded' signal rests on conversion rules whose outputs 'meaningfully correspond to real physical damage,' yet the manuscript supplies no calibration data, material-property thresholds, correlation coefficients, or post-interaction measurements (e.g., strain, fracture, or functional degradation) against real-world observations; this is load-bearing for the 'physically-grounded' property.

    Authors: We agree this is a valid concern and that direct real-world calibration data would further support the claim. DAMAGESIM conversions are derived from established physical models and literature thresholds (e.g., contact force limits from material failure studies). In revision we will add a new subsection explicitly listing the physical basis, specific thresholds, and citations for each damage type. This directly addresses the load-bearing issue. revision: yes

  2. Referee: [Use cases] Use-cases section: the four demonstrated applications (real-time feedback, imitation/RL, VLA benchmarking, sim-to-real) are described at a high level but contain no quantitative results, error bars, ablation studies, or statistical validation, so the claim that OOPSIEVERSE 'highlights the potential' as a foundation cannot be assessed from the presented evidence.

    Authors: The use-cases section includes quantitative metrics (damage reduction, success rates) via figures, but we acknowledge the absence of error bars, ablations, and statistical tests limits assessment. We will expand the section with repeated-trial statistics, error bars, and ablation studies on damage conditioning to provide the requested rigor. revision: yes

Circularity Check

0 steps flagged

No circularity: new definitional framework with no equations or self-referential reductions

full rationale

The paper introduces OOPSIEVERSE and DAMAGESIM as a new simulation framework and benchmark that defines damage conversion rules from contact forces, temperature, and liquids. No equations, fitted parameters, predictions, or self-citations are present in the provided text that would cause any claimed result to reduce to its own inputs by construction. The contribution is the creation of this tooling and task suite rather than a derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that damage can be usefully quantified from the listed physical signals inside simulation; no free parameters or invented physical entities are described.

axioms (1)
  • domain assumption Damage can be quantified from contact forces, temperature changes, and liquid interactions in simulation
    The framework converts these sources into damage signals as stated in the abstract.
invented entities (1)
  • DAMAGE SIM no independent evidence
    purpose: Simulator-agnostic framework for detecting and quantifying damage during navigation and manipulation
    New component introduced by the paper; no independent evidence outside the paper is provided.

pith-pipeline@v0.9.1-grok · 5875 in / 1311 out tokens · 45586 ms · 2026-07-01T05:03:07.844914+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

38 extracted references · 16 canonical work pages · 9 internal anchors

  1. [1]

    BEHAVIOR-1K: A Human-Centered, Embodied AI Benchmark with 1,000 Everyday Activities and Realistic Simulation

    Chengshu Li, Ruohan Zhang, Josiah Wong, Cem Gokmen, Sanjana Srivastava, Roberto Mart ´ın-Mart´ın, Chen Wang, Gabrael Levine, Wensi Ai, Benjamin Martinez, Hang Yin, Michael Lingelbach, Minjune Hwang, Ayano Hiranaka, Sujay Garlanka, Arman Aydin, Sharon Lee, Jiankai Sun, Mona Anvari, Manasi Sharma, Dhruva Bansal, Samuel Hunter, Kyu-Young Kim, Alan Lou, Caleb...

  2. [2]

    Nvidia omniverse developer overview

    NVIDIA. Nvidia omniverse developer overview. https://docs.omniverse.nvidia.com/dev-overview/latest/ index.html. Accessed 2026-01-31

  3. [3]

    RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots

    Soroush Nasiriany, Abhiram Maddukuri, Lance Zhang, Adeet Parikh, Aaron Lo, Abhishek Joshi, Ajay Mandlekar, and Yuke Zhu. Robocasa: Large-scale simulation of everyday tasks for generalist robots.arXiv preprint arXiv:2406.02523, 2024

  4. [4]

    robosuite: A modular simulation framework and benchmark for robot learning,

    Yuke Zhu, Josiah Wong, Ajay Mandlekar, Roberto Mart´ın- Mart´ın, Abhishek Joshi, Kevin Lin, Abhiram Maddukuri, Soroush Nasiriany, and Yifeng Zhu. robosuite: A modular simulation framework and benchmark for robot learning,

  5. [5]

    URL https://arxiv.org/abs/2009.12293

  6. [6]

    Mujoco: A physics engine for model-based control

    Emanuel Todorov, Tom Erez, and Yuval Tassa. Mujoco: A physics engine for model-based control. In2012 IEEE/RSJ international conference on intelligent robots and systems, pages 5026–5033. IEEE, 2012

  7. [7]

    A time-dependent hamilton-jacobi formulation of reachable sets for continuous dynamic games.IEEE Transactions on automatic control, 50(7):947–957, 2005

    Ian M Mitchell, Alexandre M Bayen, and Claire J Tomlin. A time-dependent hamilton-jacobi formulation of reachable sets for continuous dynamic games.IEEE Transactions on automatic control, 50(7):947–957, 2005

  8. [8]

    Conflict resolution for air traffic management: A study in multiagent hybrid systems.IEEE Transactions on automatic control, 43(4):509–521, 2002

    Claire Tomlin, George J Pappas, and Shankar Sastry. Conflict resolution for air traffic management: A study in multiagent hybrid systems.IEEE Transactions on automatic control, 43(4):509–521, 2002

  9. [9]

    Control barrier function based quadratic programs for safety critical systems.IEEE Transactions on Automatic Control, 62(8):3861–3876, 2016

    Aaron D Ames, Xiangru Xu, Jessy W Grizzle, and Paulo Tabuada. Control barrier function based quadratic programs for safety critical systems.IEEE Transactions on Automatic Control, 62(8):3861–3876, 2016

  10. [10]

    Control barrier functions: Theory and applications

    Aaron D Ames, Samuel Coogan, Magnus Egerstedt, Gennaro Notomista, Koushil Sreenath, and Paulo Tabuada. Control barrier functions: Theory and applications. In 2019 18th European control conference (ECC), pages 3420–3431. Ieee, 2019

  11. [11]

    Set invariance in control.Automatica, 35(11):1747–1767, 1999

    Franco Blanchini. Set invariance in control.Automatica, 35(11):1747–1767, 1999

  12. [12]

    Safe model-based reinforcement learning with stability guarantees.Advances in neural information processing systems, 30, 2017

    Felix Berkenkamp, Matteo Turchetta, Angela Schoellig, and Andreas Krause. Safe model-based reinforcement learning with stability guarantees.Advances in neural information processing systems, 30, 2017

  13. [13]

    Safe exploration in continuous action spaces

    Gal Dalal, Daniel Gilboa, Shie Mannor, and Andreas Schumann. Safe exploration in continuous action spaces. InInternational Conference on Machine Learning (ICML), 2018

  14. [14]

    Recovery rl: Safe reinforcement learning with learned recovery zones.Robotics: Science and Systems (RSS), 2021

    Brijen Thananjeyan, Ashwin Balakrishna, Suraj Nair, et al. Recovery rl: Safe reinforcement learning with learned recovery zones.Robotics: Science and Systems (RSS), 2021

  15. [15]

    Schoellig

    Lukas Brunke, Yanni Zhang, Ralf R ¨omer, Jack Naimer, Nikola Staykov, Siqi Zhou, and Angela P. Schoellig. Semantically safe robot manipulation: From semantic scene understanding to motion safeguards, 2025. URL https://arxiv.org/abs/2410.15185

  16. [16]

    Don’t let your robot be harmful: Responsible robotic manipulation via safety-as-policy, 2025

    Minheng Ni, Lei Zhang, Zihan Chen, Kaixin Bai, Zhaopeng Chen, Jianwei Zhang, Lei Zhang, and Wang- meng Zuo. Don’t let your robot be harmful: Responsible robotic manipulation via safety-as-policy, 2025. URL https://arxiv.org/abs/2411.18289

  17. [17]

    Safemimic: Towards safe and autonomous human-to-robot imitation for mobile manip- ulation, 2025

    Arpit Bahety, Arnav Balaji, Ben Abbatematteo, and Roberto Mart ´ın-Mart´ın. Safemimic: Towards safe and autonomous human-to-robot imitation for mobile manip- ulation, 2025. URL https://arxiv.org/abs/2506.15847

  18. [18]

    Generalizing safety beyond collision-avoidance via latent- space reachability analysis

    Kensuke Nakamura, Lasse Peters, and Andrea Bajcsy. Generalizing safety beyond collision-avoidance via latent- space reachability analysis. InRobotics: Science and Systems XXI, RSS2025. Robotics: Science and Systems Foundation, 2025. doi: 10.15607/rss.2025.xxi.113. URL http://dx.doi.org/10.15607/RSS.2025.XXI.113

  19. [19]

    A comprehensive survey on safe reinforcement learning.Journal of Machine Learning Research, 16:1437–1480, 2015

    Javier Garcia and Fernando Fern ´andez. A comprehensive survey on safe reinforcement learning.Journal of Machine Learning Research, 16:1437–1480, 2015

  20. [20]

    Benchmarking Batch Deep Reinforcement Learning Algorithms

    Alex Ray, Joshua Achiam, and Dario Amodei. Bench- marking safe exploration in deep reinforcement learning. arXiv preprint arXiv:1910.01708, 7(1):2, 2019

  21. [21]

    Safety gymnasium: A unified safe reinforcement learning benchmark.Advances in Neural Information Processing Systems, 36:18964– 18993, 2023

    Jiaming Ji, Borong Zhang, Jiayi Zhou, Xuehai Pan, Weidong Huang, Ruiyang Sun, Yiran Geng, Yifan Zhong, Josef Dai, and Yaodong Yang. Safety gymnasium: A unified safe reinforcement learning benchmark.Advances in Neural Information Processing Systems, 36:18964– 18993, 2023

  22. [22]

    Safe-control-gym: A unified benchmark suite for safe learning-based control and reinforcement learning in robotics.IEEE Robotics and Automation Letters, 7(4): 11142–11149, 2022

    Zhaocong Yuan, Adam W Hall, Siqi Zhou, Lukas Brunke, Melissa Greeff, Jacopo Panerati, and Angela P Schoellig. Safe-control-gym: A unified benchmark suite for safe learning-based control and reinforcement learning in robotics.IEEE Robotics and Automation Letters, 7(4): 11142–11149, 2022

  23. [23]

    Guard: A safe reinforcement learning benchmark.Transactions on Machine Learning Research, 2023

    Weiye Zhao, Yifan Sun, Feihan Li, Rui Chen, Ruixuan Liu, Tianhao Wei, and Changliu Liu. Guard: A safe reinforcement learning benchmark.Transactions on Machine Learning Research, 2023

  24. [24]

    AI Safety Gridworlds

    Jan Leike, Miljan Martic, Victoria Krakovna, Pedro A. Ortega, Tom Everitt, Andrew Lefrancq, Laurent Orseau, and Shane Legg. Ai safety gridworlds, 2017. URL https://arxiv.org/abs/1711.09883

  25. [25]

    Datasets and benchmarks for offline safe reinforcement learning, 2023

    Zuxin Liu, Zijian Guo, Haohong Lin, Yihang Yao, Jiacheng Zhu, Zhepeng Cen, Hanjiang Hu, Wenhao Yu, Tingnan Zhang, Jie Tan, and Ding Zhao. Datasets and benchmarks for offline safe reinforcement learning, 2023. URL https://arxiv.org/abs/2306.09303

  26. [26]

    Redman: reli- able dexterous manipulation with safe reinforcement learning.Machine Learning, 114, 07 2025

    Geng Yiran, Jiaming Ji, Yuanpei Chen, Geng Haoran, Fangwei Zhong, and Yaodong Yang. Redman: reli- able dexterous manipulation with safe reinforcement learning.Machine Learning, 114, 07 2025. doi: 10.1007/s10994-025-06825-x

  27. [27]

    Hasard: A benchmark for vision-based safe reinforce- ment learning in embodied agents

    Tristan Tomilin, Meng Fang, and Mykola Pechenizkiy. Hasard: A benchmark for vision-based safe reinforce- ment learning in embodied agents. InThe Thirteenth International Conference on Learning Representations, 2025

  28. [28]

    Littman, and An- thony R

    Leslie Pack Kaelbling, Michael L. Littman, and An- thony R. Cassandra. Planning and acting in partially observable stochastic domains.Artif. Intell., 101:99– 134, 1998. URL https://api.semanticscholar.org/CorpusID: 5613003

  29. [29]

    Value alignment or misalignment-what will keep systems accountable? InAAAI Workshops, pages 81–88, 2017

    Thomas Arnold, Daniel Kasenberg, and Matthias Scheutz. Value alignment or misalignment-what will keep systems accountable? InAAAI Workshops, pages 81–88, 2017

  30. [30]

    T. L. Anderson.Fracture Mechanics: Fundamentals and Applications. CRC Press, Boca Raton, FL, 4 edition, 2017. ISBN 978-1-4987-2813-3. doi: 10.1201/9781315370293

  31. [31]

    Cam- bridge university press, 1987

    Kenneth Langstreth Johnson.Contact mechanics. Cam- bridge university press, 1987

  32. [32]

    John wiley & sons, 2020

    William D Callister Jr and David G Rethwisch.Materials science and engineering: an introduction. John wiley & sons, 2020

  33. [33]

    Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maxi- milian Nickel, and Matt Le. Flow matching for generative modeling, 2023. URL https://arxiv.org/abs/2210.02747

  34. [34]

    Steering your diffusion policy with latent space reinforcement learning,

    Andrew Wagenmaker, Mitsuhiko Nakamoto, Yunchu Zhang, Seohong Park, Waleed Yagoub, Anusha Naga- bandi, Abhishek Gupta, and Sergey Levine. Steering your diffusion policy with latent space reinforcement learning,

  35. [35]

    URL https://arxiv.org/abs/2506.15799

  36. [37]

    GR00T N1: An open foundation model for generalist humanoid robots

    NVIDIA, Nikita Cherniadev Johan Bjorck andFernando Casta˜neda, Xingye Da, Runyu Ding, Linxi ”Jim” Fan, Yu Fang, Dieter Fox, Fengyuan Hu, Spencer Huang, Joel Jang, Zhenyu Jiang, Jan Kautz, Kaushil Kundalia, Lawrence Lao, Zhiqi Li, Zongyu Lin, Kevin Lin, Guilin Liu, Edith Llontop, Loic Magne, Ajay Mandlekar, Avnish Narayan, Soroush Nasiriany, Scott Reed, Yo...

  37. [39]

    $\pi_0$: A Vision-Language-Action Flow Model for General Robot Control

    Kevin Black, Noah Brown, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, Lachy Groom, Karol Hausman, Brian Ichter, et al. π0: A vision-language- action flow model for general robot control.arXiv preprint arXiv:2410.24164, 2024

  38. [40]

    Proximal Policy Optimization Algorithms

    John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms, 2017. URL https://arxiv.org/abs/1707.06347. APPENDIX A. CONSTRAINEDMDP FORMULATION In the main text, we formulate DAMAGESIMas a Damage- Aware POMDP to provide maximum flexibility in how health and damage signals are utilized, whether as ...