pith. sign in

arxiv: 1708.04225 · v3 · pith:GVWVJILBnew · submitted 2017-08-14 · 💻 cs.RO · cs.AI· cs.CV

Deep Object-Centric Representations for Generalizable Robot Learning

classification 💻 cs.RO cs.AIcs.CV
keywords objectsmanipulationattentiondemonstrationsgeneralizablelearnedlearningobject-centric
0
0 comments X
read the original abstract

Robotic manipulation in complex open-world scenarios requires both reliable physical manipulation skills and effective and generalizable perception. In this paper, we propose a method where general purpose pretrained visual models serve as an object-centric prior for the perception system of a learned policy. We devise an object-level attentional mechanism that can be used to determine relevant objects from a few trajectories or demonstrations, and then immediately incorporate those objects into a learned policy. A task-independent meta-attention locates possible objects in the scene, and a task-specific attention identifies which objects are predictive of the trajectories. The scope of the task-specific attention is easily adjusted by showing demonstrations with distractor objects or with diverse relevant objects. Our results indicate that this approach exhibits good generalization across object instances using very few samples, and can be used to learn a variety of manipulation tasks using reinforcement learning.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. See Less, Specify More: Visual Evidence Budgets for Generalizable VLAs

    cs.RO 2026-06 unverdicted novelty 6.0

    S2 improves generalization in vision-language-action models by using goal-preserving refined language guidance and explicit visual evidence budgets, raising mean subtask success from 54.2% to 79.0% on eight real-robot...