pith. machine review for the scientific record. sign in

arxiv: 1812.03201 · v2 · submitted 2018-12-07 · 💻 cs.RO · cs.LG

Recognition: unknown

Residual Reinforcement Learning for Robot Control

Authors on Pith no claims yet
classification 💻 cs.RO cs.LG
keywords controlproblemscontactslearningmethodsrobotcontrollersconventional
0
0 comments X
read the original abstract

Conventional feedback control methods can solve various types of robot control problems very efficiently by capturing the structure with explicit models, such as rigid body equations of motion. However, many control problems in modern manufacturing deal with contacts and friction, which are difficult to capture with first-order physical modeling. Hence, applying control design methodologies to these kinds of problems often results in brittle and inaccurate controllers, which have to be manually tuned for deployment. Reinforcement learning (RL) methods have been demonstrated to be capable of learning continuous robot controllers from interactions with the environment, even for problems that include friction and contacts. In this paper, we study how we can solve difficult control problems in the real world by decomposing them into a part that is solved efficiently by conventional feedback control methods, and the residual which is solved with RL. The final control policy is a superposition of both control signals. We demonstrate our approach by training an agent to successfully perform a real-world block assembly task involving contacts and unstable objects.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. HandelBot: Real-World Piano Playing via Fast Adaptation of Dexterous Robot Policies

    cs.RO 2026-03 unverdicted novelty 6.0

    HandelBot achieves precise bimanual piano playing by refining a simulation policy through lateral finger adjustments and residual RL, outperforming direct sim deployment by 1.8x with only 30 minutes of physical data a...