This paper presents a new technique for learning category-level manipulation from raw RGB-D videos of task demonstrations, with no manual labels or annotations. Category-level learning aims to acquire skills that can be generalized to new objects, with geometries and textures that are different from the ones of the objects used in the demonstrations. We address this problem by first viewing both grasping and manipulation as special cases of tool use, where a tool object is moved to a sequence of key-poses defined in a frame of reference of a target object. Tool and target objects, along with their key-poses, are predicted using a dynamic graph convolutional neural network that takes as input an automatically segmented depth and color image of the entire scene. Empirical results on object manipulation tasks with a real robotic arm show that the proposed network can efficiently learn from real visual demonstrations to perform the tasks on novel objects within the same category, and outperforms alternative approaches.
翻译:本文介绍了一种从任务演示原始 RGB-D 视频中学习分类操作的新技术,没有手动标签或说明; 类级学习的目的是获得能够普及到新对象的技能,具有与演示中所用对象不同的地理和纹理。 我们首先将捕捉和操作视为工具使用的特殊案例,将工具对象移动到目标对象参照框架界定的关键位置序列。 工具和目标对象及其关键位置,将使用动态图象神经神经网络进行预测,作为输入,将整个场景的自动分层深度和颜色图像。 真正机器人臂的物体操作任务的经验性结果显示,拟议的网络能够有效地从真实的视觉演示中学习如何执行同一类别中的新物体的任务,并超越其他方法。