Residual Reinforcement Learning for Robot Control

Tobias Johannink , Shikhar Bahl , Ashvin Nair , Jianlan Luo , Avinash Kumar , Matthias Loskyll , Juan Aparicio Ojea , Eugen Solowjow

show 1 more author

Sergey Levine

Authors on Pith no claims yet

classification 💻 cs.RO cs.LG

keywords controlproblemscontactslearningmethodsrobotcontrollersconventional

0 comments

read the original abstract

Conventional feedback control methods can solve various types of robot control problems very efficiently by capturing the structure with explicit models, such as rigid body equations of motion. However, many control problems in modern manufacturing deal with contacts and friction, which are difficult to capture with first-order physical modeling. Hence, applying control design methodologies to these kinds of problems often results in brittle and inaccurate controllers, which have to be manually tuned for deployment. Reinforcement learning (RL) methods have been demonstrated to be capable of learning continuous robot controllers from interactions with the environment, even for problems that include friction and contacts. In this paper, we study how we can solve difficult control problems in the real world by decomposing them into a part that is solved efficiently by conventional feedback control methods, and the residual which is solved with RL. The final control policy is a superposition of both control signals. We demonstrate our approach by training an agent to successfully perform a real-world block assembly task involving contacts and unstable objects.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

HandelBot: Real-World Piano Playing via Fast Adaptation of Dexterous Robot Policies
cs.RO 2026-03 unverdicted novelty 6.0

HandelBot achieves precise bimanual piano playing by refining a simulation policy through lateral finger adjustments and residual RL, outperforming direct sim deployment by 1.8x with only 30 minutes of physical data a...