Learned visuomotor policies have shown considerable success as an alternative to traditional, hand-crafted frameworks for robotic manipulation. Surprisingly, an extension of these methods to the multiview domain is relatively unexplored. A successful multiview policy could be deployed on a mobile manipulation platform, allowing the robot to complete a task regardless of its view of the scene. In this work, we demonstrate that a multiview policy can be found through imitation learning by collecting data from a variety of viewpoints. We illustrate the general applicability of the method by learning to complete several challenging multi-stage and contact-rich tasks, from numerous viewpoints, both in a simulated environment and on a real mobile manipulation platform. Furthermore, we analyze our policies to determine the benefits of learning from multiview data compared to learning with data collected from a fixed perspective. We show that learning from multiview data results in little, if any, penalty to performance for a fixed-view task compared to learning with an equivalent amount of fixed-view data. Finally, we examine the visual features learned by the multiview and fixed-view policies. Our results indicate that multiview policies implicitly learn to identify spatially correlated features.
翻译:相对于传统的手工设计的机器人操纵框架而言,我们取得了相当大的成功。令人惊讶的是,将这些方法推广到多视图域是相对没有探索的。成功的多视图政策可以在移动操纵平台上部署,允许机器人完成一项任务,而不论其对场景的看法如何。在这项工作中,我们通过从各种角度收集数据,通过模拟学习可以找到一种多视角政策。我们从多种角度学习完成若干具有挑战性的多阶段和接触丰富的任务,从多种角度,既在模拟环境中,又在真正的移动操纵平台上,我们从许多角度,来说明这种方法的一般适用性。此外,我们分析我们的政策,以确定从多视图数据学习的好处,与从固定角度收集的数据学习的好处。我们表明,从多视角数据学习的结果很少(如果有的话)会妨碍固定视图任务的执行,而学习同等数量的固定视图数据。最后,我们研究了多视图和固定视图政策所学的视觉特征。我们的结果表明,多视角政策隐含地学会确定空间关联性。