Sim-to-Real Robot Learning from Pixels with Progressive Nets

Andrei A. Rusu; Mel Vecerik; Nicolas Heess; Raia Hadsell; Razvan Pascanu; Thomas Roth\"orl

arxiv: 1610.04286 · v2 · pith:RKATIXA4new · submitted 2016-10-13 · 💻 cs.RO · cs.LG

Sim-to-Real Robot Learning from Pixels with Progressive Nets

Andrei A. Rusu , Mel Vecerik , Thomas Roth\"orl , Nicolas Heess , Razvan Pascanu , Raia Hadsell This is my paper

classification 💻 cs.RO cs.LG

keywords learningrobotapproachprogressivecomplexdeepexperimentspolicies

0 comments

read the original abstract

Applying end-to-end learning to solve complex, interactive, pixel-driven control tasks on a robot is an unsolved problem. Deep Reinforcement Learning algorithms are too slow to achieve performance on a real robot, but their potential has been demonstrated in simulated environments. We propose using progressive networks to bridge the reality gap and transfer learned policies from simulation to the real world. The progressive net approach is a general framework that enables reuse of everything from low-level visual features to high-level policies for transfer to new tasks, enabling a compositional, yet simple, approach to building complex skills. We present an early demonstration of this approach with a number of experiments in the domain of robot manipulation that focus on bridging the reality gap. Unlike other proposed approaches, our real-world experiments demonstrate successful task learning from raw visual input on a fully actuated robot manipulator. Moreover, rather than relying on model-based trajectory optimisation, the task learning is accomplished using only deep reinforcement learning and sparse rewards.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Temporal Transfer Learning for Traffic Optimization with Coarse-grained Advisory Autonomy
cs.RO 2023-11 unverdicted novelty 7.0

Temporal Transfer Learning selects source tasks for zero-shot transfer of RL policies to solve a range of coarse-grained advisory autonomy hold durations in traffic optimization more reliably than baselines.
Blending-target Domain Adaptation by Adversarial Meta-Adaptation Networks
cs.LG 2019-07 unverdicted novelty 7.0

AMEAN applies adversarial meta-learning to discover implicit meta-sub-target clusters in blended target data, reducing intra-target category misalignment and outperforming standard DA methods on three BTDA benchmarks.
Environment Probing Interaction Policies
cs.RO 2019-07 unverdicted novelty 6.0

EPI policies use a transition-predictability reward to probe environments and condition task policies, outperforming standard generalization methods on novel test environments.