Well structured visual representations can make robot learning faster and can improve generalization. In this paper, we study how we can acquire effective object-centric representations for robotic manipulation tasks without human labeling by using autonomous robot interaction with the environment. Such representation learning methods can benefit from continuous refinement of the representation as the robot collects more experience, allowing them to scale effectively without human intervention. Our representation learning approach is based on object persistence: when a robot removes an object from a scene, the representation of that scene should change according to the features of the object that was removed. We formulate an arithmetic relationship between feature vectors from this observation, and use it to learn a representation of scenes and objects that can then be used to identify object instances, localize them in the scene, and perform goal-directed grasping tasks where the robot must retrieve commanded objects from a bin. The same grasping procedure can also be used to automatically collect training data for our method, by recording images of scenes, grasping and removing an object, and recording the outcome. Our experiments demonstrate that this self-supervised approach for tasked grasping substantially outperforms direct reinforcement learning from images and prior representation learning methods.
翻译:结构完善的视觉显示方法可以使机器人更快地学习,并且可以改进一般化。 在本文中,我们研究如何通过使用自主机器人与环境的互动,在不使用人类标签的情况下,为机器人操作任务获得有效的物体中心表示。这种代表学习方法可以受益于不断完善的表述方法,因为机器人收集更多的经验,使机器人能够在没有人类干预的情况下有效地进行规模化。我们的代理学习方法可以基于物体的持久性:当机器人从现场移除物体时,该场的表示应该根据被移除物体的特征而改变。我们从此观测中绘制特性矢量的计算关系,并利用它来学习能够用来识别物体实例的场景和物体的表示方式,将其本地化,并在机器人必须从垃圾箱中取回被控制物体的地方执行目标定位的掌握任务。同样的掌握程序也可以用来自动收集我们方法的培训数据,方法是记录图像的图像、捕捉和删除物体,以及记录结果。我们的实验表明,这种自行超标的办法来掌握大大超出功能,从而直接从图像和先前的表达方法中学习。