pith. machine review for the scientific record. sign in

arxiv: 1808.00177 · v5 · submitted 2018-08-01 · 💻 cs.LG · cs.AI· cs.RO· stat.ML

Recognition: unknown

Learning Dexterous In-Hand Manipulation

Alex Ray, Arthur Petron, Bob McGrew, Bowen Baker, Glenn Powell, Jakub Pachocki, Jonas Schneider, Josh Tobin, Lilian Weng, Maciek Chociej, Marcin Andrychowicz, Matthias Plappert, OpenAI, Peter Welinder, Rafal Jozefowicz, Szymon Sidor, Wojciech Zaremba

Authors on Pith no claims yet
classification 💻 cs.LG cs.AIcs.ROstat.ML
keywords dexterousmanipulationphysicalhumanin-handlearningmanyobject
0
0 comments X
read the original abstract

We use reinforcement learning (RL) to learn dexterous in-hand manipulation policies which can perform vision-based object reorientation on a physical Shadow Dexterous Hand. The training is performed in a simulated environment in which we randomize many of the physical properties of the system like friction coefficients and an object's appearance. Our policies transfer to the physical robot despite being trained entirely in simulation. Our method does not rely on any human demonstrations, but many behaviors found in human manipulation emerge naturally, including finger gaiting, multi-finger coordination, and the controlled use of gravity. Our results were obtained using the same distributed RL system that was used to train OpenAI Five. We also include a video of our results: https://youtu.be/jwSbzNHGflM

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. DiscreteRTC: Discrete Diffusion Policies are Natural Asynchronous Executors

    cs.RO 2026-04 unverdicted novelty 7.0

    Discrete diffusion policies support native asynchronous execution via unmasking for real-time chunking, delivering higher success rates and 0.7x inference cost versus flow-matching RTC on dynamic robotics benchmarks a...

  2. HiPolicy: Hierarchical Multi-Frequency Action Chunking for Policy Learning

    cs.RO 2026-04 unverdicted novelty 7.0

    HiPolicy is a new hierarchical multi-frequency action chunking method for imitation learning that jointly generates coarse and fine action sequences with entropy-guided execution to improve performance and efficiency ...

  3. Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning

    cs.RO 2021-08 conditional novelty 6.0

    Isaac Gym achieves 2-3 orders of magnitude faster robot policy training by keeping physics simulation and PyTorch-based RL entirely on GPU with direct buffer sharing.