Labimus: A Simulation and Benchmark for Humanoid Dexterous Manipulation in Chemical Laboratory

Jian Tang; Jun Jiang; Shuo Wang; Tao Li; Xiaobo Li; Yan Xia; Yanyong Zhang; Yuhan Wu; Yuheng Zhang; Zhao Jin

arxiv: 2606.31037 · v1 · pith:GEGFBFIQnew · submitted 2026-06-30 · 💻 cs.RO

Labimus: A Simulation and Benchmark for Humanoid Dexterous Manipulation in Chemical Laboratory

Yuhan Wu , Zhao Jin , Tao Li , Yuheng Zhang , Zhengping Che , Jian Tang , Zhichao Wang , Shuo Wang

show 4 more authors

Jun Jiang Xiaobo Li Yanyong Zhang Yan Xia

This is my paper

Pith reviewed 2026-07-01 05:51 UTC · model grok-4.3

classification 💻 cs.RO

keywords humanoid robotsdexterous manipulationlaboratory automationorganic chemistrysimulation benchmarkprecision evaluationsolid weighingrobot learning

0 comments

The pith

Labimus benchmark shows robot policies complete lab tasks but fail to meet required experimental precision.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Labimus as the first simulation benchmark for humanoid dexterous manipulation in organic chemistry laboratories. It reconstructs over 30 real lab assets with particle-based powder physics and closed-loop readouts to support a full manipulation-to-measurement pipeline. Six atomic operations and a seven-step solid-weighing workflow are defined from standard procedures. A precision-aware protocol evaluates policies on task completion, experimental tolerances, and long-horizon reliability. Benchmarking three policies reveals that successful task execution does not guarantee results within quantitative experimental limits.

Core claim

Labimus exposes a disconnect between task completion and experimental validity: policies that finish laboratory operations can still violate the precision tolerances demanded by real chemistry protocols, even under procedural layouts and perturbations.

What carries the argument

The Labimus benchmark, built from real-to-sim modeled lab assets, particle-based powder physics, and closed-loop instrument readouts that enable joint assessment of manipulation success and measurement validity.

If this is right

Evaluation of lab robots must include quantitative precision metrics in addition to task success rates.
Training methods need explicit mechanisms to enforce experimental tolerances during long-horizon sequences.
The benchmark supplies a standardized testbed for comparing humanoid policies on chemically relevant manipulations.
Development of reliable lab robots should prioritize closing the gap between task completion and valid experimental outcomes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The precision gap may indicate that current imitation or reinforcement learning approaches lack sufficient feedback from measurement outcomes during training.
Extending the benchmark to liquid handling or multi-step synthesis workflows could test whether the same disconnect appears in other lab domains.
If real-robot validation confirms the gap, it would motivate hybrid sim-real training loops that incorporate live instrument data.

Load-bearing premise

The simulated assets, powder dynamics, and instrument readouts capture the precision and variability of actual organic chemistry operations closely enough for the observed gap to hold in reality.

What would settle it

Running the same policies on physical lab equipment and finding that the precision failures either disappear or persist at the same rate as in simulation.

Figures

Figures reproduced from arXiv: 2606.31037 by Jian Tang, Jun Jiang, Shuo Wang, Tao Li, Xiaobo Li, Yan Xia, Yanyong Zhang, Yuhan Wu, Yuheng Zhang, Zhao Jin, Zhengping Che, Zhichao Wang.

**Figure 1.** Figure 1: Overview of Labimus. Top: real-to-sim reconstruction of a chemistry workstation with 30+ functional assets covering the fundamental operations of organic chemistry experiments. Bottom: Tianyi 2.0 humanoid performs precision-critical operations, showcasing instrument-level state readouts, particle-based powder physics, and contact-rich dexterous manipulation. Abstract Laboratory automation has made remarka… view at source ↗

**Figure 2.** Figure 2: Labimus simulation foundation. (a) Over 30 functional assets spanning containers, tools, and instruments. (b) Rigid-body particles deposited on the balance pan; the digital readout displays the accumulated mass in real time. (c) The SOP-to-simulation pipeline converts documented procedures into executable simulation tasks scored against the protocol specification. (d) The operator wears Manus gloves and u… view at source ↗

**Figure 3.** Figure 3: Labimus benchmark overview. Top-left: the simulation environment in Isaac Sim with the Tianyi humanoid. Top-right: the task suite spans six atomic operations (door open, door close, grasp & place, tare press, tool pickup, and scoop & weigh), covering discrete and sustained contact, single-arm and bimanual coordination, and instrument interaction. Bottom-left: the three-tier evaluation hierarchy progresses… view at source ↗

**Figure 4.** Figure 4: Solid-weighing task suite and evaluation conditions. (a) The task suite spans three manipulation categories (instrument interaction, basic tool use, and dexterous manipulation), with the solid-weighing procedure as a 7-step workflow (precision target 0.850 ± 0.001 g). (b) All tiers are evaluated under four conditions with layered perturbations applied on top of procedural layouts. lighting and texture are … view at source ↗

read the original abstract

Laboratory automation has made remarkable progress through robotic platforms and AI-driven scientific reasoning. However, many laboratory operations (e.g., solid--solid transfer) remain inherently dynamic and require real-time adaptation to different materials and experimental conditions. Such precision-critical manipulations are difficult to standardize, motivating the use of humanoid robots with dexterous hands. Despite this opportunity, no existing benchmark evaluates humanoid manipulation in precision-critical laboratory environments. We present Labimus, to our knowledge, the first benchmark for humanoid dexterous manipulation in organic chemistry laboratories. Labimus reconstructs over 30 functionally faithful assets from real organic chemistry workstations through real-to-sim modeling, collectively covering the core operations of routine organic chemistry experiments. The benchmark integrates articulated laboratory instruments, particle-based powder physics, and closed-loop instrument readouts, enabling a complete manipulation-to-measurement pipeline. It further defines six atomic operations and a seven-step solid-weighing workflow derived from real laboratory standard operating procedures. We introduce a precision-aware evaluation protocol designed to jointly measure task completion, experimental precision, and long-horizon execution. We benchmark three representative policies under procedural layouts and environmental perturbations. Results reveal a precision gap: policies that successfully complete laboratory tasks can still fail to satisfy the quantitative tolerances required by experimental protocols. Our benchmark exposes a fundamental disconnect between task completion and experimental validity, providing a new testbed for developing reliable humanoid robots for scientific laboratories.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Labimus introduces the first benchmark for humanoid lab manipulation with a precision protocol that separates task success from experimental validity, but the sim-to-real fidelity lacks direct checks.

read the letter

The paper's main contribution is a new benchmark called Labimus for humanoid dexterous manipulation in organic chemistry labs. It reconstructs over 30 real assets, adds particle-based powder physics and closed-loop instrument readouts, and defines a seven-step solid-weighing workflow from actual SOPs. They test three policies and show that completing the steps does not guarantee the required measurement precision.

The construction looks thorough on paper. Using real-to-sim modeling for lab equipment and tying the evaluation to quantitative tolerances is a step beyond standard manipulation benchmarks that stop at task completion. The precision-aware protocol is the clearest addition here.

The soft spot is validation. The stress-test note is on target: the abstract and setup describe the simulation elements but give no side-by-side physical measurements for weighing accuracy, powder flow, or sensor behavior. Without those comparisons, the reported precision gap could be driven by sim artifacts rather than transferable lab realities. That weakens the claim of exposing a fundamental disconnect.

This is aimed at robotics researchers building dexterous systems or lab automation benchmarks. Readers working on sim-to-real transfer or evaluation protocols would find the workflow and metrics useful to examine.

It should go to peer review. A benchmark paper in this domain can be worth referee time even with gaps in fidelity evidence, as long as the authors can address the validation question.

Referee Report

1 major / 2 minor

Summary. The paper introduces Labimus as the first benchmark for humanoid dexterous manipulation in organic chemistry laboratories. It reconstructs over 30 real-to-sim assets covering core operations, integrates articulated instruments with particle-based powder physics and closed-loop readouts, defines six atomic operations plus a seven-step solid-weighing workflow from real SOPs, and applies a precision-aware evaluation protocol. Benchmarking three policies under procedural and perturbed conditions reveals a precision gap in which task completion does not guarantee satisfaction of quantitative experimental tolerances.

Significance. If the simulation dynamics prove faithful to real laboratory tolerances, the benchmark supplies a needed testbed that shifts evaluation from binary task success to joint measurement of completion, precision, and long-horizon validity. The explicit construction from SOP-derived workflows and the precision-aware protocol constitute concrete strengths that could guide development of reliable lab robots.

major comments (1)

[Abstract] Abstract: the central claim that the benchmark 'exposes a fundamental disconnect between task completion and experimental validity' is load-bearing on the fidelity of the particle-based powder physics, articulated instruments, and closed-loop readouts to real organic-chemistry tolerances (e.g., mass-transfer accuracy within protocol limits). No side-by-side quantitative comparison of simulated versus physical outcomes for weighing precision, powder flow, or sensor readouts is described, leaving open the possibility that the reported precision gap reflects simulation artifacts rather than transferable experimental requirements.

minor comments (2)

The abstract states that 'over 30 functionally faithful assets' were reconstructed but supplies no quantitative metric or verification procedure for functional faithfulness.
The three representative policies are mentioned without naming or characterizing them, which limits assessment of result generality.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for this constructive comment on simulation fidelity, which directly impacts the strength of our central claim. We address it point-by-point below.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the benchmark 'exposes a fundamental disconnect between task completion and experimental validity' is load-bearing on the fidelity of the particle-based powder physics, articulated instruments, and closed-loop readouts to real organic-chemistry tolerances (e.g., mass-transfer accuracy within protocol limits). No side-by-side quantitative comparison of simulated versus physical outcomes for weighing precision, powder flow, or sensor readouts is described, leaving open the possibility that the reported precision gap reflects simulation artifacts rather than transferable experimental requirements.

Authors: We agree that the central claim relies on the simulation components being sufficiently faithful to real laboratory tolerances. The manuscript does not include side-by-side quantitative comparisons of simulated versus physical outcomes for weighing precision, powder flow, or sensor readouts; this is a genuine limitation, as the work prioritizes benchmark construction from real-to-sim assets and SOP-derived workflows rather than new physical validation experiments. The particle-based physics, articulated instruments, and closed-loop readouts follow standard simulation practices with parameters chosen to approximate typical organic chemistry conditions, but without explicit calibration data against physical trials. In the revised manuscript we will (1) qualify the abstract claim to specify that the disconnect is shown within the simulated environment and (2) add an explicit limitations subsection discussing modeling assumptions and the need for future sim-to-real studies. These textual changes will be incorporated. revision: partial

Circularity Check

0 steps flagged

No circularity in benchmark definition or evaluation protocol

full rationale

The paper constructs Labimus as a simulation benchmark by reconstructing real laboratory assets via real-to-sim modeling and deriving workflows from standard operating procedures. No equations, fitted parameters, or predictions are defined in a self-referential manner. The precision-aware evaluation protocol jointly measures task completion and experimental validity as independent metrics without reducing one to the definition of the other. No self-citations are invoked as load-bearing uniqueness theorems or ansatzes. The central claim about a disconnect between task completion and validity follows directly from running external policies on the defined benchmark, without any reduction to the benchmark's own inputs by construction. This is a standard benchmark presentation with fully independent content.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Abstract-only view yields minimal ledger entries; the central claim rests on unstated assumptions about simulation fidelity rather than explicit free parameters or invented entities.

axioms (2)

domain assumption Real laboratory standard operating procedures can be faithfully translated into six atomic operations and a seven-step workflow in simulation.
Invoked when defining the benchmark tasks from real SOPs.
domain assumption Particle-based powder physics and closed-loop instrument readouts produce dynamics representative of real organic chemistry manipulations.
Central to the real-to-sim modeling described in the abstract.

pith-pipeline@v0.9.1-grok · 5814 in / 1256 out tokens · 23743 ms · 2026-07-01T05:51:54.460097+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

48 extracted references · 14 canonical work pages · 9 internal anchors

[1]

Maffettone, Vladimir V

Benjamin Burger, Phillip M. Maffettone, Vladimir V . Gusev, et al. A mobile robotic chemist.Nature, 583:237–241, 2020

2020
[2]

LabUtopia: High-fidelity simulation and hierarchical benchmark for scientific embodied agents.arXiv preprint arXiv:2505.22634, 2025

Rui Li, Zixuan Hu, Wenxi Qu, et al. LabUtopia: High-fidelity simulation and hierarchical benchmark for scientific embodied agents.arXiv preprint arXiv:2505.22634, 2025

work page arXiv 2025
[3]

Boiko, Robert MacKnight, Ben Kline, et al

Daniil A. Boiko, Robert MacKnight, Ben Kline, et al. Autonomous chemical research with large language models.Nature, 624:570–578, 2023

2023
[4]

Bran, Sam Cox, Oliver Schilter, et al

Andres M. Bran, Sam Cox, Oliver Schilter, et al. Augmenting large language models with chemistry tools.Nature Machine Intelligence, 6:525–535, 2024

2024
[5]

Genie Sim 3.0 : A High-Fidelity Comprehensive Simulation Platform for Humanoid Robot

Chenghao Yin, Da Huang, Di Yang, et al. Genie Sim 3.0: A high-fidelity comprehensive simulation platform for humanoid robot.arXiv preprint arXiv:2601.02078, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[6]

BEHAVIOR-1K: A Human-Centered, Embodied AI Benchmark with 1,000 Everyday Activities and Realistic Simulation

Chengshu Li, Ruohan Zhang, Josiah Wong, et al. BEHA VIOR-1K: A human-centered, embodied AI benchmark with 1,000 everyday activities and realistic simulation.arXiv preprint arXiv:2403.09227, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[7]

23 Takano R, Oyama H and Yamakita M (2021) Continuous optimization-based task and motion planning with signal temporal logic specifications for sequential manipulation

Stone Tao, Fanbo Xiang, Arth Shukla, et al. ManiSkill3: GPU parallelized robotics simulation and rendering for generalizable embodied AI.arXiv preprint arXiv:2410.00425, 2024

work page arXiv 2024
[8]

Chemistry3D: Robotic interaction benchmark for chemistry experiments

Shoujie Li, Yan Huang, Changqing Guo, et al. Chemistry3D: Robotic interaction benchmark for chemistry experiments. InIEEE International Conference on Robotics and Automation (ICRA), 2025

2025
[9]

AutoBio: A simulation and benchmark for robotic au- tomation in digital biology laboratory.arXiv preprint arXiv:2505.14030, 2025

Zhiqian Lan, Yuxuan Jiang, Ruiqi Wang, et al. AutoBio: A simulation and benchmark for robotic au- tomation in digital biology laboratory.arXiv preprint arXiv:2505.14030, 2025

work page arXiv 2025
[10]

MATTERIX: Toward a digital twin for robotics- assisted chemistry laboratory automation.Nature Computational Science, 6:67–82, 2026

Kourosh Darvish, Arjun Sohal, Abhijoy Mandal, et al. MATTERIX: Toward a digital twin for robotics- assisted chemistry laboratory automation.Nature Computational Science, 6:67–82, 2026

2026
[11]

Zhao, Vikash Kumar, Sergey Levine, et al

Tony Z. Zhao, Vikash Kumar, Sergey Levine, et al. Learning fine-grained bimanual manipulation with low-cost hardware. InRobotics: Science and Systems (RSS), 2023

2023
[12]

Diffusion policy: Visuomotor policy learning via action diffu- sion

Cheng Chi, Siyuan Feng, Yilun Du, et al. Diffusion policy: Visuomotor policy learning via action diffu- sion. InRobotics: Science and Systems (RSS), 2023

2023
[13]

Kevin Black, Noah Brown, Danny Driess, et al.π 0: A vision-language-action flow model for general robot control.arXiv preprint arXiv:2410.24164, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[14]

RLBench: The robot learning benchmark & learning environment.IEEE Robotics and Automation Letters (RA-L), 5(2):3019–3026, 2020

Stephen James, Zicong Ma, David Rovick Arrojo, et al. RLBench: The robot learning benchmark & learning environment.IEEE Robotics and Automation Letters (RA-L), 5(2):3019–3026, 2020

2020
[15]

CALVIN: A benchmark for language-conditioned policy learning for long-horizon robot manipulation tasks.IEEE Robotics and Automation Letters (RA-L), 7(3):7327–7334, 2022

Oier Mees, Lukas Hermann, Erick Rosete-Beas, et al. CALVIN: A benchmark for language-conditioned policy learning for long-horizon robot manipulation tasks.IEEE Robotics and Automation Letters (RA-L), 7(3):7327–7334, 2022. 14

2022
[16]

RoboCasa: Large-scale simulation of ev- eryday tasks for generalist robots

Soroush Nasiriany, Abhiram Maddukuri, Lance Zhang, et al. RoboCasa: Large-scale simulation of ev- eryday tasks for generalist robots. InRobotics: Science and Systems (RSS), 2024

2024
[17]

LIBERO: Benchmarking knowledge transfer for lifelong robot learning

Bo Liu, Yifeng Zhu, Chongkai Gao, et al. LIBERO: Benchmarking knowledge transfer for lifelong robot learning. InAdvances in Neural Information Processing Systems (NeurIPS), 2023

2023
[18]

Habitat 3.0: A co-habitat for humans, avatars and robots.arXiv preprint arXiv:2310.13724, 2023

Xavi Puig, Eric Undersander, Andrew Szot, et al. Habitat 3.0: A co-habitat for humans, avatars and robots.arXiv preprint arXiv:2310.13724, 2023

work page arXiv 2023
[19]

RoboGen: Towards unleashing infinite data for automated robot learning via generative simulation

Yufei Wang, Zhou Xian, Feng Chen, et al. RoboGen: Towards unleashing infinite data for automated robot learning via generative simulation. InInternational Conference on Machine Learning (ICML), 2024

2024
[20]

RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation

Tianxing Chen, Zanxin Chen, Baijun Chen, et al. RoboTwin 2.0: A scalable data generator and bench- mark with strong domain randomization for robust bimanual robotic manipulation.arXiv preprint arXiv:2506.18088, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[21]

Factory: Fast contact for robotic assembly

Yashraj Narang, Kier Storey, Iretiayo Akinola, et al. Factory: Fast contact for robotic assembly. In Robotics: Science and Systems (RSS), 2022

2022
[22]

FurnitureBench: Reproducible real-world benchmark for long-horizon complex manipulation

Minho Heo, Youngwoon Lee, Doohyun Lee, et al. FurnitureBench: Reproducible real-world benchmark for long-horizon complex manipulation. InRobotics: Science and Systems (RSS), 2023

2023
[23]

Isaac Gym: High performance GPU- based physics simulation for robot learning

Viktor Makoviychuk, Lukasz Wawrzyniak, Yunrong Guo, et al. Isaac Gym: High performance GPU- based physics simulation for robot learning. InNeurIPS Datasets and Benchmarks, 2021

2021
[24]

Learning dexterous in-hand manipulation.The International Journal of Robotics Research, 39(1):3–20, 2020

OpenAI, Marcin Andrychowicz, Bowen Baker, et al. Learning dexterous in-hand manipulation.The International Journal of Robotics Research, 39(1):3–20, 2020

2020
[25]

Solving Rubik's Cube with a Robot Hand

OpenAI, Ilge Akkaya, Marcin Andrychowicz, et al. Solving Rubik’s cube with a robot hand.arXiv preprint arXiv:1910.07113, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1910
[26]

Learning complex dexterous manipulation with deep reinforcement learning and demonstrations

Aravind Rajeswaran, Vikash Kumar, Abhishek Gupta, et al. Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. InRobotics: Science and Systems (RSS), 2018

2018
[27]

DexMV: Imitation learning for dexterous manipulation from human videos

Yuzhe Qin, Yueh-Hua Wu, Shaowei Liu, et al. DexMV: Imitation learning for dexterous manipulation from human videos. InEuropean Conference on Computer Vision (ECCV), 2022

2022
[28]

DexGraspNet: A large-scale robotic dexterous grasp dataset for general objects based on simulation

Ruicheng Wang, Jialiang Zhang, Jiayi Chen, et al. DexGraspNet: A large-scale robotic dexterous grasp dataset for general objects based on simulation. InIEEE International Conference on Robotics and Automation (ICRA), pages 11359–11366, 2023

2023
[29]

UniDexGrasp: Universal robotic dexterous grasping via learning diverse proposal generation and goal-conditioned policy

Yinzhen Xu, Weikang Wan, Jialiang Zhang, et al. UniDexGrasp: Universal robotic dexterous grasping via learning diverse proposal generation and goal-conditioned policy. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023

2023
[30]

DexArt: Benchmarking generalizable dexterous manipulation with articulated objects

Chen Bao, Helin Xu, Yuzhe Qin, et al. DexArt: Benchmarking generalizable dexterous manipulation with articulated objects. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023

2023
[31]

GAPartNet: Cross-category domain-generalizable ob- ject perception and manipulation via generalizable and actionable parts

Haoran Geng, Helin Xu, Chengyang Zhao, et al. GAPartNet: Cross-category domain-generalizable ob- ject perception and manipulation via generalizable and actionable parts. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023

2023
[32]

DexJoCo: A Benchmark and Toolkit for Task-Oriented Dexterous Manipulation on MuJoCo

Hanwen Wang, Weizhi Zhao, Xiangyu Wang, et al. DexJoCo: A benchmark and toolkit for task-oriented dexterous manipulation on MuJoCo.arXiv preprint arXiv:2605.16257, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[33]

HumanPlus: Humanoid shadowing and imitation from humans

Zipeng Fu, Qingqing Zhao, Qi Wu, et al. HumanPlus: Humanoid shadowing and imitation from humans. InConference on Robot Learning (CoRL), 2024

2024
[34]

Learning human-to-humanoid real-time whole-body teleop- eration

Tairan He, Zhengyi Luo, Wenli Xiao, et al. Learning human-to-humanoid real-time whole-body teleop- eration. InIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2024

2024
[35]

OmniH2O: Universal and dexterous human-to-humanoid whole- body teleoperation and learning

Tairan He, Zhengyi Luo, Xialin He, et al. OmniH2O: Universal and dexterous human-to-humanoid whole- body teleoperation and learning. InConference on Robot Learning (CoRL), 2024

2024
[36]

HumanoidBench: Simulated humanoid bench- mark for whole-body locomotion and manipulation.arXiv preprint arXiv:2403.10506, 2024

Carmelo Sferrazza, Dun-Ming Huang, Xingyu Lin, et al. HumanoidBench: Simulated humanoid bench- mark for whole-body locomotion and manipulation.arXiv preprint arXiv:2403.10506, 2024. 15

work page arXiv 2024
[37]

ArtVIP: Articulated digital assets of visual realism, modular inter- action, and physical fidelity for robot learning

Zhao Jin, Zhengping Che, Tao Li, et al. ArtVIP: Articulated digital assets of visual realism, modular inter- action, and physical fidelity for robot learning. InInternational Conference on Learning Representations (ICLR), 2026

2026
[38]

ProcTHOR: Large-scale embodied AI using procedural generation

Matt Deitke, Eli VanderBilt, Alvaro Herrasti, et al. ProcTHOR: Large-scale embodied AI using procedural generation. InAdvances in Neural Information Processing Systems (NeurIPS), 2022

2022
[39]

NVIDIA Isaac Sim: Enabling Scalable, GPU-Accelerated Simulation for Robotics

Sicong Gao, Maurice Pagnucco, Tomasz Bednarz, et al. NVIDIA Isaac Sim: Enabling scalable, GPU- accelerated simulation for robotics.arXiv preprint arXiv:2606.03551, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[40]

AnyTeleop: A general vision-based dexterous robot arm- hand teleoperation system

Yuzhe Qin, Wei Yang, Binghao Huang, et al. AnyTeleop: A general vision-based dexterous robot arm- hand teleoperation system. InRobotics: Science and Systems (RSS), 2023

2023
[41]

Domain randomization for transferring deep neural networks from simulation to the real world

Josh Tobin, Rachel Fong, Alex Ray, et al. Domain randomization for transferring deep neural networks from simulation to the real world. InIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017

2017
[42]

THE COLOSSEUM: A benchmark for evaluating generalization for robotic manipulation

Wilbert Pumacay, Ishika Singh, Jiafei Duan, et al. THE COLOSSEUM: A benchmark for evaluating generalization for robotic manipulation. InRobotics: Science and Systems (RSS), 2024

2024
[43]

What matters in learning from offline human demonstra- tions for robot manipulation

Ajay Mandlekar, Danfei Xu, Josiah Wong, et al. What matters in learning from offline human demonstra- tions for robot manipulation. InConference on Robot Learning (CoRL), 2021

2021
[44]

CLIPort: What and where pathways for robotic manip- ulation

Mohit Shridhar, Lucas Manuelli, and Dieter Fox. CLIPort: What and where pathways for robotic manip- ulation. InConference on Robot Learning (CoRL), 2021

2021
[45]

RT-1: Robotics transformer for real-world control at scale

Anthony Brohan, Noah Brown, Justice Carbajal, et al. RT-1: Robotics transformer for real-world control at scale. InRobotics: Science and Systems (RSS), 2023

2023
[46]

RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

Anthony Brohan, Noah Brown, Justice Carbajal, et al. RT-2: Vision-language-action models transfer web knowledge to robotic control.arXiv preprint arXiv:2307.15818, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[47]

Octo: An open-source generalist robot policy

Octo Model Team, Dibya Ghosh, Homer Walke, et al. Octo: An open-source generalist robot policy. In Robotics: Science and Systems (RSS), 2024

2024
[48]

OpenVLA: An Open-Source Vision-Language-Action Model

Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, et al. OpenVLA: An open-source vision-language- action model.arXiv preprint arXiv:2406.09246, 2024. 16

work page internal anchor Pith review Pith/arXiv arXiv 2024

[1] [1]

Maffettone, Vladimir V

Benjamin Burger, Phillip M. Maffettone, Vladimir V . Gusev, et al. A mobile robotic chemist.Nature, 583:237–241, 2020

2020

[2] [2]

LabUtopia: High-fidelity simulation and hierarchical benchmark for scientific embodied agents.arXiv preprint arXiv:2505.22634, 2025

Rui Li, Zixuan Hu, Wenxi Qu, et al. LabUtopia: High-fidelity simulation and hierarchical benchmark for scientific embodied agents.arXiv preprint arXiv:2505.22634, 2025

work page arXiv 2025

[3] [3]

Boiko, Robert MacKnight, Ben Kline, et al

Daniil A. Boiko, Robert MacKnight, Ben Kline, et al. Autonomous chemical research with large language models.Nature, 624:570–578, 2023

2023

[4] [4]

Bran, Sam Cox, Oliver Schilter, et al

Andres M. Bran, Sam Cox, Oliver Schilter, et al. Augmenting large language models with chemistry tools.Nature Machine Intelligence, 6:525–535, 2024

2024

[5] [5]

Genie Sim 3.0 : A High-Fidelity Comprehensive Simulation Platform for Humanoid Robot

Chenghao Yin, Da Huang, Di Yang, et al. Genie Sim 3.0: A high-fidelity comprehensive simulation platform for humanoid robot.arXiv preprint arXiv:2601.02078, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[6] [6]

BEHAVIOR-1K: A Human-Centered, Embodied AI Benchmark with 1,000 Everyday Activities and Realistic Simulation

Chengshu Li, Ruohan Zhang, Josiah Wong, et al. BEHA VIOR-1K: A human-centered, embodied AI benchmark with 1,000 everyday activities and realistic simulation.arXiv preprint arXiv:2403.09227, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[7] [7]

23 Takano R, Oyama H and Yamakita M (2021) Continuous optimization-based task and motion planning with signal temporal logic specifications for sequential manipulation

Stone Tao, Fanbo Xiang, Arth Shukla, et al. ManiSkill3: GPU parallelized robotics simulation and rendering for generalizable embodied AI.arXiv preprint arXiv:2410.00425, 2024

work page arXiv 2024

[8] [8]

Chemistry3D: Robotic interaction benchmark for chemistry experiments

Shoujie Li, Yan Huang, Changqing Guo, et al. Chemistry3D: Robotic interaction benchmark for chemistry experiments. InIEEE International Conference on Robotics and Automation (ICRA), 2025

2025

[9] [9]

AutoBio: A simulation and benchmark for robotic au- tomation in digital biology laboratory.arXiv preprint arXiv:2505.14030, 2025

Zhiqian Lan, Yuxuan Jiang, Ruiqi Wang, et al. AutoBio: A simulation and benchmark for robotic au- tomation in digital biology laboratory.arXiv preprint arXiv:2505.14030, 2025

work page arXiv 2025

[10] [10]

MATTERIX: Toward a digital twin for robotics- assisted chemistry laboratory automation.Nature Computational Science, 6:67–82, 2026

Kourosh Darvish, Arjun Sohal, Abhijoy Mandal, et al. MATTERIX: Toward a digital twin for robotics- assisted chemistry laboratory automation.Nature Computational Science, 6:67–82, 2026

2026

[11] [11]

Zhao, Vikash Kumar, Sergey Levine, et al

Tony Z. Zhao, Vikash Kumar, Sergey Levine, et al. Learning fine-grained bimanual manipulation with low-cost hardware. InRobotics: Science and Systems (RSS), 2023

2023

[12] [12]

Diffusion policy: Visuomotor policy learning via action diffu- sion

Cheng Chi, Siyuan Feng, Yilun Du, et al. Diffusion policy: Visuomotor policy learning via action diffu- sion. InRobotics: Science and Systems (RSS), 2023

2023

[13] [13]

Kevin Black, Noah Brown, Danny Driess, et al.π 0: A vision-language-action flow model for general robot control.arXiv preprint arXiv:2410.24164, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[14] [14]

RLBench: The robot learning benchmark & learning environment.IEEE Robotics and Automation Letters (RA-L), 5(2):3019–3026, 2020

Stephen James, Zicong Ma, David Rovick Arrojo, et al. RLBench: The robot learning benchmark & learning environment.IEEE Robotics and Automation Letters (RA-L), 5(2):3019–3026, 2020

2020

[15] [15]

CALVIN: A benchmark for language-conditioned policy learning for long-horizon robot manipulation tasks.IEEE Robotics and Automation Letters (RA-L), 7(3):7327–7334, 2022

Oier Mees, Lukas Hermann, Erick Rosete-Beas, et al. CALVIN: A benchmark for language-conditioned policy learning for long-horizon robot manipulation tasks.IEEE Robotics and Automation Letters (RA-L), 7(3):7327–7334, 2022. 14

2022

[16] [16]

RoboCasa: Large-scale simulation of ev- eryday tasks for generalist robots

Soroush Nasiriany, Abhiram Maddukuri, Lance Zhang, et al. RoboCasa: Large-scale simulation of ev- eryday tasks for generalist robots. InRobotics: Science and Systems (RSS), 2024

2024

[17] [17]

LIBERO: Benchmarking knowledge transfer for lifelong robot learning

Bo Liu, Yifeng Zhu, Chongkai Gao, et al. LIBERO: Benchmarking knowledge transfer for lifelong robot learning. InAdvances in Neural Information Processing Systems (NeurIPS), 2023

2023

[18] [18]

Habitat 3.0: A co-habitat for humans, avatars and robots.arXiv preprint arXiv:2310.13724, 2023

Xavi Puig, Eric Undersander, Andrew Szot, et al. Habitat 3.0: A co-habitat for humans, avatars and robots.arXiv preprint arXiv:2310.13724, 2023

work page arXiv 2023

[19] [19]

RoboGen: Towards unleashing infinite data for automated robot learning via generative simulation

Yufei Wang, Zhou Xian, Feng Chen, et al. RoboGen: Towards unleashing infinite data for automated robot learning via generative simulation. InInternational Conference on Machine Learning (ICML), 2024

2024

[20] [20]

RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation

Tianxing Chen, Zanxin Chen, Baijun Chen, et al. RoboTwin 2.0: A scalable data generator and bench- mark with strong domain randomization for robust bimanual robotic manipulation.arXiv preprint arXiv:2506.18088, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[21] [21]

Factory: Fast contact for robotic assembly

Yashraj Narang, Kier Storey, Iretiayo Akinola, et al. Factory: Fast contact for robotic assembly. In Robotics: Science and Systems (RSS), 2022

2022

[22] [22]

FurnitureBench: Reproducible real-world benchmark for long-horizon complex manipulation

Minho Heo, Youngwoon Lee, Doohyun Lee, et al. FurnitureBench: Reproducible real-world benchmark for long-horizon complex manipulation. InRobotics: Science and Systems (RSS), 2023

2023

[23] [23]

Isaac Gym: High performance GPU- based physics simulation for robot learning

Viktor Makoviychuk, Lukasz Wawrzyniak, Yunrong Guo, et al. Isaac Gym: High performance GPU- based physics simulation for robot learning. InNeurIPS Datasets and Benchmarks, 2021

2021

[24] [24]

Learning dexterous in-hand manipulation.The International Journal of Robotics Research, 39(1):3–20, 2020

OpenAI, Marcin Andrychowicz, Bowen Baker, et al. Learning dexterous in-hand manipulation.The International Journal of Robotics Research, 39(1):3–20, 2020

2020

[25] [25]

Solving Rubik's Cube with a Robot Hand

OpenAI, Ilge Akkaya, Marcin Andrychowicz, et al. Solving Rubik’s cube with a robot hand.arXiv preprint arXiv:1910.07113, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1910

[26] [26]

Learning complex dexterous manipulation with deep reinforcement learning and demonstrations

Aravind Rajeswaran, Vikash Kumar, Abhishek Gupta, et al. Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. InRobotics: Science and Systems (RSS), 2018

2018

[27] [27]

DexMV: Imitation learning for dexterous manipulation from human videos

Yuzhe Qin, Yueh-Hua Wu, Shaowei Liu, et al. DexMV: Imitation learning for dexterous manipulation from human videos. InEuropean Conference on Computer Vision (ECCV), 2022

2022

[28] [28]

DexGraspNet: A large-scale robotic dexterous grasp dataset for general objects based on simulation

Ruicheng Wang, Jialiang Zhang, Jiayi Chen, et al. DexGraspNet: A large-scale robotic dexterous grasp dataset for general objects based on simulation. InIEEE International Conference on Robotics and Automation (ICRA), pages 11359–11366, 2023

2023

[29] [29]

UniDexGrasp: Universal robotic dexterous grasping via learning diverse proposal generation and goal-conditioned policy

Yinzhen Xu, Weikang Wan, Jialiang Zhang, et al. UniDexGrasp: Universal robotic dexterous grasping via learning diverse proposal generation and goal-conditioned policy. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023

2023

[30] [30]

DexArt: Benchmarking generalizable dexterous manipulation with articulated objects

Chen Bao, Helin Xu, Yuzhe Qin, et al. DexArt: Benchmarking generalizable dexterous manipulation with articulated objects. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023

2023

[31] [31]

GAPartNet: Cross-category domain-generalizable ob- ject perception and manipulation via generalizable and actionable parts

Haoran Geng, Helin Xu, Chengyang Zhao, et al. GAPartNet: Cross-category domain-generalizable ob- ject perception and manipulation via generalizable and actionable parts. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023

2023

[32] [32]

DexJoCo: A Benchmark and Toolkit for Task-Oriented Dexterous Manipulation on MuJoCo

Hanwen Wang, Weizhi Zhao, Xiangyu Wang, et al. DexJoCo: A benchmark and toolkit for task-oriented dexterous manipulation on MuJoCo.arXiv preprint arXiv:2605.16257, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[33] [33]

HumanPlus: Humanoid shadowing and imitation from humans

Zipeng Fu, Qingqing Zhao, Qi Wu, et al. HumanPlus: Humanoid shadowing and imitation from humans. InConference on Robot Learning (CoRL), 2024

2024

[34] [34]

Learning human-to-humanoid real-time whole-body teleop- eration

Tairan He, Zhengyi Luo, Wenli Xiao, et al. Learning human-to-humanoid real-time whole-body teleop- eration. InIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2024

2024

[35] [35]

OmniH2O: Universal and dexterous human-to-humanoid whole- body teleoperation and learning

Tairan He, Zhengyi Luo, Xialin He, et al. OmniH2O: Universal and dexterous human-to-humanoid whole- body teleoperation and learning. InConference on Robot Learning (CoRL), 2024

2024

[36] [36]

HumanoidBench: Simulated humanoid bench- mark for whole-body locomotion and manipulation.arXiv preprint arXiv:2403.10506, 2024

Carmelo Sferrazza, Dun-Ming Huang, Xingyu Lin, et al. HumanoidBench: Simulated humanoid bench- mark for whole-body locomotion and manipulation.arXiv preprint arXiv:2403.10506, 2024. 15

work page arXiv 2024

[37] [37]

ArtVIP: Articulated digital assets of visual realism, modular inter- action, and physical fidelity for robot learning

Zhao Jin, Zhengping Che, Tao Li, et al. ArtVIP: Articulated digital assets of visual realism, modular inter- action, and physical fidelity for robot learning. InInternational Conference on Learning Representations (ICLR), 2026

2026

[38] [38]

ProcTHOR: Large-scale embodied AI using procedural generation

Matt Deitke, Eli VanderBilt, Alvaro Herrasti, et al. ProcTHOR: Large-scale embodied AI using procedural generation. InAdvances in Neural Information Processing Systems (NeurIPS), 2022

2022

[39] [39]

NVIDIA Isaac Sim: Enabling Scalable, GPU-Accelerated Simulation for Robotics

Sicong Gao, Maurice Pagnucco, Tomasz Bednarz, et al. NVIDIA Isaac Sim: Enabling scalable, GPU- accelerated simulation for robotics.arXiv preprint arXiv:2606.03551, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[40] [40]

AnyTeleop: A general vision-based dexterous robot arm- hand teleoperation system

Yuzhe Qin, Wei Yang, Binghao Huang, et al. AnyTeleop: A general vision-based dexterous robot arm- hand teleoperation system. InRobotics: Science and Systems (RSS), 2023

2023

[41] [41]

Domain randomization for transferring deep neural networks from simulation to the real world

Josh Tobin, Rachel Fong, Alex Ray, et al. Domain randomization for transferring deep neural networks from simulation to the real world. InIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017

2017

[42] [42]

THE COLOSSEUM: A benchmark for evaluating generalization for robotic manipulation

Wilbert Pumacay, Ishika Singh, Jiafei Duan, et al. THE COLOSSEUM: A benchmark for evaluating generalization for robotic manipulation. InRobotics: Science and Systems (RSS), 2024

2024

[43] [43]

What matters in learning from offline human demonstra- tions for robot manipulation

Ajay Mandlekar, Danfei Xu, Josiah Wong, et al. What matters in learning from offline human demonstra- tions for robot manipulation. InConference on Robot Learning (CoRL), 2021

2021

[44] [44]

CLIPort: What and where pathways for robotic manip- ulation

Mohit Shridhar, Lucas Manuelli, and Dieter Fox. CLIPort: What and where pathways for robotic manip- ulation. InConference on Robot Learning (CoRL), 2021

2021

[45] [45]

RT-1: Robotics transformer for real-world control at scale

Anthony Brohan, Noah Brown, Justice Carbajal, et al. RT-1: Robotics transformer for real-world control at scale. InRobotics: Science and Systems (RSS), 2023

2023

[46] [46]

RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

Anthony Brohan, Noah Brown, Justice Carbajal, et al. RT-2: Vision-language-action models transfer web knowledge to robotic control.arXiv preprint arXiv:2307.15818, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[47] [47]

Octo: An open-source generalist robot policy

Octo Model Team, Dibya Ghosh, Homer Walke, et al. Octo: An open-source generalist robot policy. In Robotics: Science and Systems (RSS), 2024

2024

[48] [48]

OpenVLA: An Open-Source Vision-Language-Action Model

Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, et al. OpenVLA: An open-source vision-language- action model.arXiv preprint arXiv:2406.09246, 2024. 16

work page internal anchor Pith review Pith/arXiv arXiv 2024