Combining Self-Supervised Learning and Imitation for Vision-Based Rope Manipulation

Ashvin Nair , Dian Chen , Pulkit Agrawal , Phillip Isola , Pieter Abbeel , Jitendra Malik , Sergey Levine

Authors on Pith no claims yet

classification 💻 cs.CV cs.LGcs.RO

keywords ropeimagesrobothumanmanipulationsequencecombiningdemonstration

read the original abstract

Manipulation of deformable objects, such as ropes and cloth, is an important but challenging problem in robotics. We present a learning-based system where a robot takes as input a sequence of images of a human manipulating a rope from an initial to goal configuration, and outputs a sequence of actions that can reproduce the human demonstration, using only monocular images as input. To perform this task, the robot learns a pixel-level inverse dynamics model of rope manipulation directly from images in a self-supervised manner, using about 60K interactions with the rope collected autonomously by the robot. The human demonstration provides a high-level plan of what to do and the low-level inverse model is used to execute the plan. We show that by combining the high and low-level plans, the robot can successfully manipulate a rope into a variety of target shapes using only a sequence of human-provided images for direction.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Wiggle and Go! System Identification for Zero-Shot Dynamic Rope Manipulation
cs.RO 2026-04 unverdicted novelty 6.0

Wiggle and Go! uses system identification from rope motion observations to predict parameters that enable zero-shot goal-conditioned dynamic manipulation, achieving 3.55 cm accuracy on 3D target striking versus 15.34 ...