Learning Dexterous In-Hand Manipulation

Alex Ray; Arthur Petron; Bob McGrew; Bowen Baker; Glenn Powell; Jakub Pachocki; Jonas Schneider; Josh Tobin; Lilian Weng; Maciek Chociej

arxiv: 1808.00177 · v5 · pith:XDK2R7PAnew · submitted 2018-08-01 · 💻 cs.LG · cs.AI· cs.RO· stat.ML

Learning Dexterous In-Hand Manipulation

OpenAI , Marcin Andrychowicz , Bowen Baker , Maciek Chociej , Rafal Jozefowicz , Bob McGrew , Jakub Pachocki , Arthur Petron

show 9 more authors

Matthias Plappert Glenn Powell Alex Ray Jonas Schneider Szymon Sidor Josh Tobin Peter Welinder Lilian Weng Wojciech Zaremba

This is my paper

classification 💻 cs.LG cs.AIcs.ROstat.ML

keywords dexterousmanipulationphysicalhumanin-handlearningmanyobject

0 comments

read the original abstract

We use reinforcement learning (RL) to learn dexterous in-hand manipulation policies which can perform vision-based object reorientation on a physical Shadow Dexterous Hand. The training is performed in a simulated environment in which we randomize many of the physical properties of the system like friction coefficients and an object's appearance. Our policies transfer to the physical robot despite being trained entirely in simulation. Our method does not rely on any human demonstrations, but many behaviors found in human manipulation emerge naturally, including finger gaiting, multi-finger coordination, and the controlled use of gravity. Our results were obtained using the same distributed RL system that was used to train OpenAI Five. We also include a video of our results: https://youtu.be/jwSbzNHGflM

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 14 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

DiscreteRTC: Discrete Diffusion Policies are Natural Asynchronous Executors
cs.RO 2026-04 unverdicted novelty 7.0

Discrete diffusion policies support native asynchronous execution via unmasking for real-time chunking, delivering higher success rates and 0.7x inference cost versus flow-matching RTC on dynamic robotics benchmarks a...
HiPolicy: Hierarchical Multi-Frequency Action Chunking for Policy Learning
cs.RO 2026-04 unverdicted novelty 7.0

HiPolicy is a new hierarchical multi-frequency action chunking method for imitation learning that jointly generates coarse and fine action sequences with entropy-guided execution to improve performance and efficiency ...
Benchmarking Model-Based Reinforcement Learning
cs.LG 2019-07 accept novelty 7.0

Introduces a benchmark suite of over 18 MBRL environments, evaluates multiple algorithms under consistent settings, and identifies three core challenges: dynamics bottleneck, planning horizon dilemma, and early-termin...
Pose Estimation for Non-Cooperative Rendezvous Using Neural Networks
cs.CV 2019-06 unverdicted novelty 7.0

SPN is a CNN that detects a spacecraft bounding box, classifies then regresses attitude, and optimizes position via Gauss-Newton, achieving degree-level attitude and cm-level position errors on real images after train...
Distributionally Robust Control via Stein Variational Inference for Contact-Rich Manipulation
cs.RO 2026-05 unverdicted novelty 6.0

Introduces a Stein variational inference-based deterministic formulation for distributionally robust control in contact-rich robotic manipulation, reporting up to 3x improved robustness under parametric uncertainty.
RoboMD: Uncovering Robot Vulnerabilities through Semantic Potential Fields
cs.RO 2024-12 unverdicted novelty 6.0

A deep RL vulnerability-prediction policy trained in semantic embedding space finds up to 23% more unique robot manipulation failures than vision-language baselines and enables more efficient fine-tuning.
A Survey on Vision-Language-Action Models for Embodied AI
cs.RO 2024-05 unverdicted novelty 6.0

This is the first survey on vision-language-action models, providing a taxonomy across three lines, plus summaries of datasets, simulators, benchmarks, challenges, and future directions in embodied AI.
Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning
cs.RO 2021-08 conditional novelty 6.0

Isaac Gym achieves 2-3 orders of magnitude faster robot policy training by keeping physics simulation and PyTorch-based RL entirely on GPU with direct buffer sharing.
RoboNet: Large-Scale Multi-Robot Learning
cs.RO 2019-10 conditional novelty 6.0

RoboNet is a multi-robot video dataset that enables pre-training of vision-based manipulation models which, after fine-tuning on a new robot, outperform robot-specific training that uses 4-20 times more data.
Bayesian Optimization in Variational Latent Spaces with Dynamic Compression
cs.RO 2019-07 unverdicted novelty 6.0

Sequential VAE embeds simulated trajectories into latent paths for Bayesian optimization with dynamic compression to enable data-efficient high-dimensional controller tuning on robots.
Generalizing from a few environments in safety-critical reinforcement learning
cs.LG 2019-07 unverdicted novelty 6.0

RL agents fail dangerously on unseen environments; ensembles reduce catastrophes in gridworld but not CoinRun, with uncertainty enabling intervention prediction.
Learning to Solve a Rubik's Cube with a Dexterous Hand
cs.RO 2019-07 unverdicted novelty 5.0

Hierarchical RL combines a model-based cube solver with a model-free hand controller to solve Rubik's cubes in simulation, achieving 90.3% success on 1400 random scrambles.
ORRB -- OpenAI Remote Rendering Backend
cs.GR 2019-06 unverdicted novelty 4.0

ORRB is an open-source remote rendering backend that pairs Unity3d with MuJoCo for high-throughput, customizable visual domain randomization in robotics environments.
On Multi-Agent Learning in Team Sports Games
cs.MA 2019-06 unverdicted novelty 3.0

Describes a hierarchical RL method for multi-agent learning in team sports games aiming for human-like agents, reporting preliminary results that show promise.