Perceptual understanding of the scene and the relationship between its different components is important for successful completion of robotic tasks. Representation learning has been shown to be a powerful technique for this, but most of the current methodologies learn task specific representations that do not necessarily transfer well to other tasks. Furthermore, representations learned by supervised methods require large labeled datasets for each task that are expensive to collect in the real world. Using self-supervised learning to obtain representations from unlabeled data can mitigate this problem. However, current self-supervised representation learning methods are mostly object agnostic, and we demonstrate that the resulting representations are insufficient for general purpose robotics tasks as they fail to capture the complexity of scenes with many components. In this paper, we explore the effectiveness of using object-aware representation learning techniques for robotic tasks. Our self-supervised representations are learned by observing the agent freely interacting with different parts of the environment and is queried in two different settings: (i) policy learning and (ii) object location prediction. We show that our model learns control policies in a sample-efficient manner and outperforms state-of-the-art object agnostic techniques as well as methods trained on raw RGB images. Our results show a 20 percent increase in performance in low data regimes (1000 trajectories) in policy training using implicit behavioral cloning (IBC). Furthermore, our method outperforms the baselines for the task of object localization in multi-object scenes.
翻译:对现场及其不同组成部分之间关系的认知性理解对于成功完成机器人任务十分重要。 演示学习已证明是这方面的一种强有力的技术,但多数现行方法都学习了特定任务的表现,但不一定能顺利地向其他任务转移。 此外,通过监督方法所学的演示要求为在现实世界中收集的每一项任务提供庞大的标签数据集。利用自我监督学习从未贴标签的数据中获得演示,可以缓解这一问题。然而,目前自我监督的代表学习方法大多是目标不可知性的,而且我们证明,由此产生的演示不足以用于一般目的机器人任务,因为它们无法用许多组成部分捕捉到场景的复杂性。在本文件中,我们探索了在机器人任务中使用目标认知的演示技术的实效。我们自我监督的演示是通过观察代理人与环境不同部分自由互动的方式学习的,在两种不同的环境下进行查询:(一) 政策学习和(二) 目标定位预测。我们显示,我们的模型以抽样高效的方式学习控制政策,并超越目标的实地分析,因为我们没有用许多组成部分的图像来捕捉到使用目标-认识的模型,在BIB的低层次的模型中,用我们经过训练的基线方法来显示我们B的低层次的学习的模型,在20级的模型中提高了的成绩方法。</s>