Object segmentation is a key component in the visual system of a robot that performs tasks like grasping and object manipulation, especially in presence of occlusions. Like many other Computer Vision tasks, the adoption of deep architectures has made available algorithms that perform this task with remarkable performance. However, adoption of such algorithms in robotics is hampered by the fact that training requires large amount of computing time and it cannot be performed on-line. In this work, we propose a novel architecture for object segmentation, that overcomes this problem and provides comparable performance in a fraction of the time required by the state-of-the-art methods. Our approach is based on a pre-trained Mask R-CNN, in which various layers have been replaced with a set of classifiers and regressors that are retrained for a new task. We employ an efficient Kernel-based method that allows for fast training on large scale problems. Our approach is validated on the YCB-Video dataset which is widely adopted in the Computer Vision and Robotics community, demonstrating that we can achieve and even surpass performance of the state-of-the-art, with a significant reduction (${\sim}6\times$) of the training time. The code will be released upon acceptance.
翻译:物体分离是一个机器人的视觉系统中的一个关键组成部分,它执行的是掌握和物体操纵等任务,特别是在存在隔离的情况下。与许多其他计算机视野任务一样,深层建筑的采用提供了执行这项任务的算法,但机器人采用这种算法受到阻碍,因为培训需要大量计算时间,无法在线进行。在这项工作中,我们提出了一个新的物体分割结构,克服了这一问题,并在最新方法所要求的时间的一小部分时间里提供了可比的性能。我们的方法基于预先培训的面具R-CNN,其中各层已被一组分类器和累进器取代,这些分类器和累进器被重新培训用于新的任务。我们采用高效的内核法,可以快速进行大规模问题培训。我们的方法在计算机视野和机器人学界广泛采用的YCB-Video数据集上得到验证,表明我们能够实现甚至超过状态$的性能,在释放时将获得相当程度的接受。