Recognition: 2 theorem links
· Lean TheoremSolving Rubik's Cube with a Robot Hand
Pith reviewed 2026-05-15 09:33 UTC · model grok-4.3
The pith
Models trained only in simulation solve Rubik's cube with a real robot hand
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We demonstrate that models trained only in simulation can be used to solve a manipulation problem of unprecedented complexity on a real robot. This is made possible by automatic domain randomization (ADR), which automatically generates a distribution over randomized environments of ever-increasing difficulty, and a robot platform built for machine learning. Control policies and vision state estimators trained with ADR exhibit vastly improved sim2real transfer. Memory-augmented models trained on an ADR-generated distribution of environments show clear signs of emergent meta-learning at test time. The combination of ADR with our custom robot platform allows us to solve a Rubik's cube with a 1,
What carries the argument
Automatic domain randomization (ADR), an algorithm that generates distributions of randomized simulation environments of increasing difficulty to train policies and estimators that transfer to reality.
If this is right
- Memory-augmented models exhibit emergent meta-learning when trained on ADR distributions.
- Vision state estimators achieve improved sim-to-real transfer with ADR.
- The method solves a Rubik's cube task on a humanoid robot hand using only simulation-trained models.
- Both control and state estimation problems are addressed without real-world data collection.
Where Pith is reading between the lines
- ADR may extend to training policies for other dexterous manipulation tasks that require fine motor control.
- Greater reliance on simulation could lower the time and cost of developing new robotic skills.
- Emergent meta-learning hints that ADR produces policies capable of adapting during execution.
Load-bearing premise
The physics simulator, even when randomized over a wide distribution via ADR, captures enough of the real robot's dynamics, friction, and sensor characteristics for the policy to transfer successfully without real-world fine-tuning.
What would settle it
The physical robot hand failing to solve the Rubik's cube while the same policy succeeds in the ADR-trained simulation would show that transfer has not occurred.
read the original abstract
We demonstrate that models trained only in simulation can be used to solve a manipulation problem of unprecedented complexity on a real robot. This is made possible by two key components: a novel algorithm, which we call automatic domain randomization (ADR) and a robot platform built for machine learning. ADR automatically generates a distribution over randomized environments of ever-increasing difficulty. Control policies and vision state estimators trained with ADR exhibit vastly improved sim2real transfer. For control policies, memory-augmented models trained on an ADR-generated distribution of environments show clear signs of emergent meta-learning at test time. The combination of ADR with our custom robot platform allows us to solve a Rubik's cube with a humanoid robot hand, which involves both control and state estimation problems. Videos summarizing our results are available: https://openai.com/blog/solving-rubiks-cube/
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that Automatic Domain Randomization (ADR) enables training of control policies and vision estimators entirely in simulation that transfer zero-shot to a custom humanoid robot hand, allowing it to solve a Rubik's cube. This is supported by real-robot experiments and videos, with memory-augmented models showing emergent meta-learning.
Significance. If the result holds, this is a significant demonstration of scalable sim-to-real transfer for complex, long-horizon manipulation without real-world data or fine-tuning. The real-robot experiments and videos provide direct empirical grounding, and the emergent meta-learning observation is a notable byproduct of the ADR training regime.
major comments (1)
- [§3] §3 (ADR algorithm and randomization): The central claim that ADR produces a distribution bracketing real dynamics relies on the assumption that final randomization ranges cover real joint friction, contact stiffness, motor backlash, and camera parameters, yet the manuscript provides no system-identification measurements or direct comparisons confirming that hardware values lie inside the converged ADR support. This leaves open the possibility that success is due to platform-simulator proximity rather than ADR's automatic expansion.
minor comments (2)
- [Abstract] Abstract and §5 (Results): Success rates, number of independent trials, and statistical details on solve reliability are only summarized at a high level; adding quantitative tables or confidence intervals would strengthen verifiability of the transfer claims.
- [§4] §4 (Vision and state estimation): The interaction between ADR-trained vision models and control policies is described qualitatively; a clearer ablation isolating the contribution of vision randomization would help readers assess robustness.
Simulated Author's Rebuttal
We thank the referee for their positive assessment of the work and for the constructive comment on Section 3. We address the point directly below.
read point-by-point responses
-
Referee: [§3] §3 (ADR algorithm and randomization): The central claim that ADR produces a distribution bracketing real dynamics relies on the assumption that final randomization ranges cover real joint friction, contact stiffness, motor backlash, and camera parameters, yet the manuscript provides no system-identification measurements or direct comparisons confirming that hardware values lie inside the converged ADR support. This leaves open the possibility that success is due to platform-simulator proximity rather than ADR's automatic expansion.
Authors: We thank the referee for this observation. ADR initializes a narrow parameter distribution and automatically widens each range only when the current policy's success rate in simulation falls below a target threshold; expansion stops once the policy reliably solves the task across the broadened distribution. In the Rubik's cube experiments, policies trained on the initial narrow ranges failed to transfer, while the same architecture trained after ADR expansion succeeded zero-shot on the physical hand. This controlled progression indicates that the automatic widening, rather than static simulator fidelity, is responsible for the observed transfer. We did not perform separate system-identification measurements to obtain precise hardware values for joint friction, contact stiffness, backlash, or camera intrinsics and then verify containment within the final ADR intervals. The zero-shot real-robot results nevertheless constitute empirical evidence that the real dynamics lie inside the final support. We will add a short clarifying paragraph in the revised manuscript that makes this design rationale and the empirical validation explicit. revision: partial
Circularity Check
No circularity: empirical hardware validation independent of any fitted derivation
full rationale
The paper's core claim rests on direct physical experiments: policies trained in simulation with ADR are deployed zero-shot on a real Shadow Hand robot and successfully solve Rubik's cubes. No equations, predictions, or uniqueness theorems are presented that reduce by construction to the training distribution or to self-cited parameters. ADR is introduced as an algorithmic procedure whose coverage is tested by observed transfer success rather than assumed; the result is falsifiable against the external benchmark of real-robot performance. Self-citations (if any) are not load-bearing for the central result. This is the standard case of an experimental paper whose validity is measured outside its own fitted values.
Axiom & Free-Parameter Ledger
free parameters (2)
- ADR randomization ranges and difficulty schedule
- RL training hyperparameters
axioms (1)
- domain assumption A physics simulator with randomized parameters can produce trajectories whose distribution overlaps sufficiently with real robot behavior.
Lean theorems connected to this paper
-
IndisputableMonolith.Foundation.DimensionForcingeight_tick_forces_D3 unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
ADR automatically generates a distribution over randomized environments of ever-increasing difficulty. Control policies and vision state estimators trained with ADR exhibit vastly improved sim2real transfer.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 19 Pith papers
-
Taming the Curses of Multiagency in Robust Markov Games with Large State Space through Linear Function Approximation
The work gives the first algorithms for general robust Markov games with linear function approximation whose sample complexity breaks the curse of multiagency for large state spaces in both generative and online settings.
-
Distributionally Robust Multi-Task Reinforcement Learning via Adaptive Task Sampling
DRATS derives a minimax objective from a feasibility formulation of MTRL to adaptively sample tasks with the largest return gaps, leading to better worst-task performance on MetaWorld benchmarks.
-
Learning When to Stop: Selective Imitation Learning Under Arbitrary Dynamics Shift
SeqRejectron builds a stopping rule from a small set of validator policies to achieve horizon-free sample-complexity guarantees for selective imitation learning under arbitrary train-test dynamics shifts.
-
Worst-Case Discovery and Runtime Protection for RL-Based Network Controllers
ReGuard discovers network scenarios where RL controllers perform 43-64% worse than achievable and reduces those gaps by 79-85% with lightweight rule-based protection that preserves normal performance.
-
HANDFUL: Sequential Grasp-Conditioned Dexterous Manipulation with Resource Awareness
HANDFUL learns resource-aware grasps using finger contact rewards and curriculum learning to improve success on sequential dexterous tasks in simulation and on a real LEAP hand.
-
Betting for Sim-to-Real Performance Evaluation
Betting mechanisms can yield provably more accurate and efficient estimates of real-world robot behavior than Monte Carlo sampling under specified conditions, with practical approximations demonstrated on synthetic da...
-
SynthPID: P&ID digitization from Topology-Preserving Synthetic Data
Topology-preserving synthetic P&IDs generated by seeding from real drawings enable models trained solely on synthetics to achieve 63.8% edge mAP on real P&ID benchmarks, closing most of the gap to real-data training.
-
Dota 2 with Large Scale Deep Reinforcement Learning
OpenAI Five achieved superhuman performance in Dota 2 by defeating the world champions using scaled self-play reinforcement learning.
-
Zero-Shot Sim-to-Real Robot Learning: A Dexterous Manipulation Study on Reactive Catching
DRIS improves zero-shot sim-to-real transfer for reactive catching by maintaining and acting on sets of randomized dynamics instances instead of single instances per episode.
-
GS-Playground: A High-Throughput Photorealistic Simulator for Vision-Informed Robot Learning
GS-Playground delivers a high-throughput photorealistic simulator for vision-informed robot learning via parallel physics integrated with batch 3D Gaussian Splatting at 10^4 FPS and an automated Real2Sim workflow for ...
-
ViserDex: Visual Sim-to-Real for Robust Dexterous In-hand Reorientation
A framework using 3D Gaussian Splatting for visual domain randomization enables robust monocular RGB-based dexterous in-hand reorientation on real hardware for multiple objects under varied lighting.
-
Trajectory-based actuator identification via differentiable simulation
Differentiable simulation enables torque-sensor-free actuator model identification from trajectory data, achieving 1.88x better position tracking than a stand-trained baseline and 46% longer travel in downstream locom...
-
Learning Dexterous Grasping from Sparse Taxonomy Guidance
GRIT learns dexterous grasping from sparse taxonomy guidance, achieving 87.9% success and better generalization to novel objects via a two-stage prediction-plus-policy approach.
-
ROBOGATE: Adaptive Failure Discovery for Safe Robot Policy Deployment via Two-Stage Boundary-Focused Sampling
ROBOGATE applies adaptive boundary-focused sampling in simulation to discover robot policy failure boundaries, revealing a 97.65 percentage point performance gap for a VLA model between LIBERO and industrial scenarios.
-
Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning
Isaac Lab is a unified GPU-native platform combining high-fidelity physics, photorealistic rendering, multi-frequency sensors, domain randomization, and learning pipelines for scalable multi-modal robot policy training.
-
Language Models (Mostly) Know What They Know
Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.
-
A General Language Assistant as a Laboratory for Alignment
Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.
-
Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning
Isaac Gym achieves 2-3 orders of magnitude faster robot policy training by keeping physics simulation and PyTorch-based RL entirely on GPU with direct buffer sharing.
-
You're Pushing My Buttons: Instrumented Learning of Gentle Button Presses
Training-time instrumentation with audio and privileged button-state signals produces contact policies that match success rates but apply lower forces using only vision and audio at inference.
Reference graph
Works this paper leans on
-
[1]
T. Abell and M. A. Erdmann. Stably supported rotations of a planar polygon with two frictionless contacts. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 1995, August 5 - 9, 1995, Pittsburgh, PA, USA, pages 411–418, 1995
work page 1995
- [2]
-
[3]
Reinforcement Learning for Pivoting Task
R. Antonova, S. Cruciani, C. Smith, and D. Kragic. Reinforcement learning for pivoting task. CoRR, abs/1703.00472, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[4]
Task-Oriented Hand Motion Retargeting for Dexterous Manipulation Imitation
D. Antotsiou, G. Garcia-Hernando, and T. Kim. Task-oriented hand motion retargeting for dexterous manipulation imitation. CoRR, abs/1810.01845, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
- [5]
- [6]
-
[7]
Distributed Distributional Deterministic Policy Gradients
G. Barth-Maron, M. W. Hoffman, D. Budden, W. Dabney, D. Horgan, D. TB, A. Muldal, N. Heess, and T. P. Lillicrap. Distributed distributional deterministic policy gradients. CoRR, abs/1804.08617, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[8]
A. Beer. Fastest robot to solve a Rubik’s Cube. https://www.guinnessworldrecords.com/ world-records/fastest-robot-to-solve-a-rubiks-cube , 2016
work page 2016
-
[9]
A. Bicchi. Hands for dexterous manipulation and robust grasping: a difficult road toward simplicity. IEEE Trans. Robotics and Automation, 16(6):652–662, 2000
work page 2000
-
[10]
A. Bicchi and R. Sorrentino. Dexterous manipulation through rolling. In Proceedings of the 1995 International Conference on Robotics and Automation, Nagoya, Aichi, Japan, May 21-27, 1995, pages 452–457, 1995
work page 1995
-
[11]
M. Botvinick, S. Ritter, J. X. Wang, Z. Kurth-Nelson, C. Blundell, and D. Hassabis. Reinforcement learning, fast and slow. Trends in cognitive sciences, 2019
work page 2019
-
[12]
Exploration by Random Network Distillation
Y . Burda, H. Edwards, A. Storkey, and O. Klimov. Exploration by random network distillation.arXiv preprint arXiv:1810.12894, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
- [13]
-
[14]
Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience
Y . Chebotar, A. Handa, V . Makoviychuk, M. Macklin, J. Issac, N. Ratliff, and D. Fox. Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience. arXiv preprint 1810.05687, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[15]
M. Cherif and K. K. Gupta. Planning quasi-static fingertip manipulations for reconfiguring objects.IEEE Trans. Robotics and Automation, 15(5):837–848, 1999
work page 1999
-
[16]
ORRB -- OpenAI Remote Rendering Backend
M. Chociej, P. Welinder, and L. Weng. Orrb – openai remote rendering backend.arXiv preprint 1906.11633, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1906
-
[17]
S. Christen, S. Stevsic, and O. Hilliges. Demonstration-guided deep reinforcement learning of control policies for dexterous human-robot interaction. CoRR, abs/1906.11695, 2019
-
[18]
P. F. Christiano, Z. Shah, I. Mordatch, J. Schneider, T. Blackwell, J. Tobin, P. Abbeel, and W. Zaremba. Transfer from simulation to real world through learning deep inverse dynamics model. CoRR, abs/1610.03518, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[19]
Learning to Adapt in Dynamic, Real-World Environments Through Meta-Reinforcement Learning
I. Clavera, A. Nagabandi, R. S. Fearing, P. Abbeel, S. Levine, and C. Finn. Learning to adapt: Meta-learning for model-based control. CoRR, abs/1803.11347, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
- [20]
-
[21]
E. D. Cubuk, B. Zoph, D. Mane, V . Vasudevan, and Q. V . Le. AutoAugment: Learning Augmentation Policies from Data. arXiv preprint 1805.09501, may 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
- [22]
-
[23]
Distilling Policy Distillation
W. Czarnecki, R. Pascanu, S. Osindero, S. M. Jayakumar, G. Swirszcz, and M. Jaderberg. Distilling policy distillation. ArXiv, abs/1902.02186, 2019. 32 A PREPRINT - OCTOBER 17, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1902
-
[24]
N. C. Dafle and A. Rodriguez. Sampling-based planning of in-hand manipulation with external pushes. CoRR, abs/1707.00318, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[25]
N. C. Dafle, A. Rodriguez, R. Paolini, B. Tang, S. S. Srinivasa, M. A. Erdmann, M. T. Mason, I. Lundberg, H. Staab, and T. A. Fuhlbrigge. Extrinsic dexterity: In-hand manipulation with external forces. In 2014 IEEE International Conference on Robotics and Automation, ICRA 2014, Hong Kong, China, May 31 - June 7, 2014, pages 1578–1585, 2014
work page 2014
-
[26]
Z. Doulgeri and L. Droukas. On rolling contact motion by robotic fingers via prescribed performance control. In 2013 IEEE International Conference on Robotics and Automation, Karlsruhe, Germany, May 6-10, 2013, pages 3976–3981, 2013
work page 2013
-
[27]
Y . Duan, J. Schulman, X. Chen, P. L. Bartlett, I. Sutskever, and P. Abbeel. RL2: Fast reinforcement learning via slow reinforcement learning. CoRR, abs/1611.02779, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
- [28]
-
[29]
M. A. Erdmann. An exploration of nonprehensile two-palm manipulation. I. J. Robotics Res., 17(5):485–503, 1998
work page 1998
-
[30]
M. A. Erdmann and M. T. Mason. An exploration of sensorless manipulation. IEEE J. Robotics and Automation, 4(4):369–379, 1988
work page 1988
- [31]
-
[32]
R. S. Fearing. Implementing a force strategy for object re-orientation. In Proceedings of the 1986 IEEE International Conference on Robotics and Automation, San Francisco, California, USA, April 7-10, 1986, pages 96–102, 1986
work page 1986
-
[33]
C. Finn, P. Abbeel, and S. Levine. Model-agnostic meta-learning for fast adaptation of deep networks. CoRR, abs/1703.03400, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[34]
D. Gilday. MindCuber. https://mindcuber.com/, 2013
work page 2013
- [35]
-
[36]
S. Gu, E. Holly, T. P. Lillicrap, and S. Levine. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In 2017 IEEE International Conference on Robotics and Automation, ICRA 2017, Singapore, Singapore, May 29 - June 3, 2017, pages 3389–3396, 2017
work page 2017
-
[37]
M. Guo, A. Haque, D.-A. Huang, S. Yeung, and L. Fei-Fei. Dynamic task prioritization for multitask learning. In Proceedings of the European Conference on Computer Vision (ECCV), pages 270–287, 2018
work page 2018
-
[38]
Learning Invariant Feature Spaces to Transfer Skills with Reinforcement Learning
A. Gupta, C. Devin, Y . Liu, P. Abbeel, and S. Levine. Learning invariant feature spaces to transfer skills with reinforcement learning. CoRR, abs/1703.02949, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
- [39]
-
[40]
Soft Actor-Critic Algorithms and Applications
T. Haarnoja, A. Zhou, K. Hartikainen, G. Tucker, S. Ha, J. Tan, V . Kumar, H. Zhu, A. Gupta, P. Abbeel, and S. Levine. Soft actor-critic algorithms and applications. CoRR, abs/1812.05905, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[41]
L. Han, Y . Guan, Z. X. Li, S. Qi, and J. C. Trinkle. Dextrous manipulation with rolling contacts. InProceedings of the 1997 IEEE International Conference on Robotics and Automation, Albuquerque, New Mexico, USA, April 20-25, 1997, pages 992–997, 1997
work page 1997
- [42]
-
[43]
K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016
work page 2016
-
[44]
R. Higo, Y . Yamakawa, T. Senoo, and M. Ishikawa. Rubik’s cube handling using a high-speed multi-fingered hand and a high-speed vision system. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 6609–6614. IEEE, 2018
work page 2018
-
[45]
S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997
work page 1997
-
[46]
W. H. Huang and M. T. Mason. Mechanics, planning, and control for tapping.I. J. Robotics Res., 19(10):883–894, 2000. 33 A PREPRINT - OCTOBER 17, 2019
work page 2000
-
[47]
J. Humplik, A. Galashov, L. Hasenclever, P. A. Ortega, Y . W. Teh, and N. Heess. Meta reinforcement learning as task inference. CoRR, abs/1905.06424, 2019
-
[48]
J. Hwangbo, J. Lee, A. Dosovitskiy, D. Bellicoso, V . Tsounis, V . Koltun, and M. Hutter. Learning agile and dynamic motor skills for legged robots. Science Robotics, 4(26):eaau5872, 2019
work page 2019
-
[49]
D. Kalashnikov, A. Irpan, P. Pastor, J. Ibarz, A. Herzog, E. Jang, D. Quillen, E. Holly, M. Kalakrishnan, V . Vanhoucke, and S. Levine. QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation. ArXiv e-prints, June 2018
work page 2018
-
[50]
A. Kar, A. Prakash, M.-Y . Liu, E. Cameracci, J. Yuan, M. Rusiniak, D. Acuna, A. Torralba, and S. Fidler. Meta-Sim: Learning to Generate Synthetic Datasets. arXiv preprint 1904.11621, apr 2019
work page internal anchor Pith review Pith/arXiv arXiv 1904
-
[51]
D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. CoRR, abs/1412.6980, 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[52]
Learning Dexterous Manipulation Policies from Experience and Imitation
V . Kumar, A. Gupta, E. Todorov, and S. Levine. Learning dexterous manipulation policies from experience and imitation. CoRR, abs/1611.05095, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
- [53]
-
[54]
L. Lan, Z. Li, X. Guan, and P. Wang. Meta reinforcement learning with task embedding and shared policy. In IJCAI, 2019
work page 2019
- [55]
-
[56]
S. Levine and V . Koltun. Guided policy search. InProceedings of the 30th International Conference on Machine Learning, ICML 2013, Atlanta, GA, USA, 16-21 June 2013, pages 1–9, 2013
work page 2013
- [57]
- [58]
-
[59]
M. Li, Y . Bekiroglu, D. Kragic, and A. Billard. Learning of grasp adaptation through experience and tactile sensing. In 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, Chicago, IL, USA, September 14-18, 2014, pages 3339–3346, 2014
work page 2014
-
[60]
M. Li, H. Yin, K. Tahara, and A. Billard. Learning object-level impedance control for robust grasping and dexterous manipulation. In 2014 IEEE International Conference on Robotics and Automation, ICRA 2014, Hong Kong, China, May 31 - June 7, 2014, pages 6784–6791, 2014
work page 2014
-
[61]
Q. Li, M. Meier, R. Haschke, H. J. Ritter, and B. Bolder. Rotary object dexterous manipulation in hand: a feedback-based method. IJMA, 3(1):36–47, 2013
work page 2013
-
[62]
T. Li, W. Xi, M. Fang, J. Xu, and M. Qing-Hu Meng. Learning to Solve a Rubik’s Cube with a Dexterous Hand. arXiv e-prints, page arXiv:1907.11388, Jul 2019
work page internal anchor Pith review Pith/arXiv arXiv 1907
- [63]
-
[64]
R. R. Ma and A. M. Dollar. On dexterity and dexterous manipulation. In 15th International Conference on Advanced Robotics: New Boundaries for Robotics, ICAR 2011, Tallinn, Estonia, June 20-23, 2011., pages 1–7, 2011
work page 2011
-
[65]
J. Mahler, J. Liang, S. Niyaz, M. Laskey, R. Doan, X. Liu, J. A. Ojea, and K. Goldberg. Dex-net 2.0: Deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics. In Robotics: Science and Systems XIII, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA, July 12-16, 2017, 2017
work page 2017
-
[66]
J. Mahler, M. Matl, X. Liu, A. Li, D. V . Gealy, and K. Goldberg. Dex-net 3.0: Computing robust robot suction grasp targets in point clouds using a new analytic model and deep learning. CoRR, abs/1709.06670, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[67]
Teacher-Student Curriculum Learning
T. Matiisen, A. Oliver, T. Cohen, and J. Schulman. Teacher-student curriculum learning. arXiv preprint arXiv:1707.00183, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[68]
B. Mehta, M. Diaz, F. Golemo, C. J. Pal, and L. Paull. Active Domain Randomization.arXiv preprint 1904.04762, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1904
-
[69]
A Simple Neural Attentive Meta-Learner
N. Mishra, M. Rohaninejad, X. Chen, and P. Abbeel. Meta-learning with temporal convolutions. CoRR, abs/1707.03141, 2017. 34 A PREPRINT - OCTOBER 17, 2019
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[70]
I. Mordatch, Z. Popovic, and E. Todorov. Contact-invariant optimization for hand manipulation. InProceedings of the 2012 Eurographics/ACM SIGGRAPH Symposium on Computer Animation, SCA 2012, Lausanne, Switzerland, 2012, pages 137–144, 2012
work page 2012
-
[71]
A. Nagabandi, K. Konoglie, S. Levine, and V . Kumar. Deep Dynamics Models for Learning Dexterous Manipulation. In Conference on Robot Learning (CoRL), 2019
work page 2019
-
[72]
V . Nair and G. E. Hinton. Rectified linear units improve restricted boltzmann machines. InProceedings of the 27th international conference on machine learning (ICML-10), pages 807–814, 2010
work page 2010
-
[73]
nRF52832 Product Specification v1.1
Nordic Semiconductor. nRF52832 Product Specification v1.1. Technical report, Nordic Semiconductor, 2016
work page 2016
-
[74]
A. M. Okamura, N. Smaby, and M. R. Cutkosky. An overview of dexterous manipulation. In Proceedings of the 2000 IEEE International Conference on Robotics and Automation, ICRA 2000, April 24-28, 2000, San Francisco, CA, USA, pages 255–262, 2000
work page 2000
-
[75]
C. Olah, A. Satyanarayan, I. Johnson, S. Carter, L. Schubert, K. Ye, and A. Mordvintsev. The building blocks of interpretability. Distill, 2018. https://distill.pub/2018/building-blocks
work page 2018
- [76]
-
[77]
OpenAI, M. Andrychowicz, B. Baker, M. Chociej, R. Józefowicz, B. McGrew, J. Pachocki, A. Petron, M. Plappert, G. Powell, A. Ray, J. Schneider, S. Sidor, J. Tobin, P. Welinder, L. Weng, and W. Zaremba. Learning dexterous in-hand manipulation. CoRR, 2018
work page 2018
-
[78]
Otvinta. 3d-printed rubik’s cube robot. http://www.rcr3d.com/, 2017
work page 2017
-
[79]
Concurrent Meta Reinforcement Learning
E. Parisotto, S. Ghosh, S. B. Yalamanchi, V . Chinnaobireddy, Y . Wu, and R. Salakhutdinov. Concurrent meta reinforcement learning. CoRR, abs/1903.02710, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1903
-
[80]
X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel. Sim-to-real transfer of robotic control with dynamics randomization. CoRR, abs/1710.06537, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.