Learned visuomotor policies have shown considerable success as an alternative to traditional, hand-crafted frameworks for robotic manipulation tasks. Surprisingly, the extension of these methods to the multiview domain is relatively unexplored. A successful multiview policy could be deployed on a mobile manipulation platform, allowing it to complete a task regardless of its view of the scene. In this work, we demonstrate that a multiview policy can be found through imitation learning by collecting data from a variety of viewpoints. We illustrate the general applicability of the method by learning to complete several challenging multi-stage and contact-rich tasks, from numerous viewpoints, both in a simulated environment and on a real mobile manipulation platform. Furthermore, we analyze our policies to determine the benefits of learning from multiview data compared to learning with data from a fixed perspective. We show that learning from multiview data has little, if any, penalty to performance for a fixed-view task compared to learning with an equivalent amount of fixed-view data. Finally, we examine the visual features learned by the multiview and fixed-view policies. Our results indicate that multiview policies implicitly learn to identify spatially correlated features with a degree of view-invariance.
翻译:相对于传统的手工制造的机器人操纵任务框架而言,从相对而言,将这些方法推广到多视图域是相对没有探索的。成功的多视图政策可以在移动操纵平台上部署,使其能够完成一项任务,而不论其对场景的看法如何。在这项工作中,我们通过从各种角度收集数据,通过模拟学习可以找到一个多视角政策。我们从多种角度学习完成若干具有挑战性的多阶段和接触丰富的任务,从多种角度,从模拟环境和真正的移动操纵平台上,可以说明这种方法的一般适用性。此外,我们分析我们的政策,以确定从多视图数据学习而从从固定视角数据学习的好处。我们表明,从多视角数据学习对于执行固定视角任务没有多大(如果有的话)的处罚,与学习同等数量的固定视图数据相比,对于完成固定视角任务没有多大的处罚。最后,我们考察多视角和固定视图政策所学的视觉特征。我们的结果显示,多视角政策隐含着学会以不同视角程度识别空间相关特征。