To aid humans in everyday tasks, robots need to know which objects exist in the scene, where they are, and how to grasp and manipulate them in different situations. Therefore, object recognition and grasping are two key functionalities for autonomous robots. Most state-of-the-art approaches treat object recognition and grasping as two separate problems, even though both use visual input. Furthermore, the knowledge of the robot is fixed after the training phase. In such cases, if the robot encounters new object categories, it must be retrained to incorporate new information without catastrophic forgetting. In order to resolve this problem, we propose a deep learning architecture with an augmented memory capacity to handle open-ended object recognition and grasping simultaneously. In particular, our approach takes multi-views of an object as input and jointly estimates pixel-wise grasp configuration as well as a deep scale- and rotation-invariant representation as output. The obtained representation is then used for open-ended object recognition through a meta-active learning technique. We demonstrate the ability of our approach to grasp never-seen-before objects and to rapidly learn new object categories using very few examples on-site in both simulation and real-world settings. A video of these experiments is available online at: https://youtu.be/n9SMpuEkOgk
翻译:为了帮助人类完成日常任务,机器人需要知道现场存在哪些物体,它们在哪里,以及如何在不同情况下掌握和操纵它们。 因此, 对象识别和捕捉是自主机器人的两个关键功能。 大多数最先进的方法将物体识别和捕捉作为两个不同的问题处理, 尽管两者都使用视觉输入。 此外, 机器人的知识在培训阶段之后被固定下来。 在这样的情况下, 如果机器人遇到新的物体类别, 必须重新训练它纳入新信息, 而不会忘记它们。 为了解决这个问题, 我们提议了一个深层次的学习结构, 其内存能力得到加强, 能够同时处理开放对象识别和捕捉。 特别是, 我们的方法将一个对象的多视图作为输入, 共同估计像素明智的抓取配置, 以及一个深度的尺度和旋转的动态表达方式作为输出。 在这样的情况下, 获得的表达方式被用来通过一种元积极学习技术来进行开放式的物体识别。 我们展示了我们办法的能力, 来捕捉从未见过的物体, 并用很少的例子快速学习新的对象类别。 在网站上, 模拟/ SM 和现实世界的这些视频设置中, 。