We present a novel framework for self-supervised grasped object segmentation with a robotic manipulator. Our method successively learns an agnostic foreground segmentation followed by a distinction between manipulator and object solely by observing the motion between consecutive RGB frames. In contrast to previous approaches, we propose a single, end-to-end trainable architecture which jointly incorporates motion cues and semantic knowledge. Furthermore, while the motion of the manipulator and the object are substantial cues for our algorithm, we present means to robustly deal with distraction objects moving in the background, as well as with completely static scenes. Our method neither depends on any visual registration of a kinematic robot or 3D object models, nor on precise hand-eye calibration or any additional sensor data. By extensive experimental evaluation we demonstrate the superiority of our framework and provide detailed insights on its capability of dealing with the aforementioned extreme cases of motion. We also show that training a semantic segmentation network with the automatically labeled data achieves results on par with manually annotated training data. Code and pretrained model are available at https://github.com/DLR-RM/DistinctNet.
翻译:我们提出了一个与机器人操纵者进行自我监督的掌握的物体分割的新框架。 我们的方法通过观察连续的 RGB 框架之间的运动来连续地对操作器和物体进行区分,然后仅仅通过观察连续的 RGB 框架之间的运动来对操作器和物体进行区分。 与以往的方法不同, 我们提议了一个单一的、 端到端的训练结构, 将运动提示和语义学知识结合起来。 此外, 操纵器和物体的动作是我们算法的重要导线, 我们提出的手段是强有力地处理在背景中移动的分散物体以及完全静止的场景。 我们的方法既不取决于运动机器人或3D 对象模型的视觉登记,也不取决于精确的手心校准或任何额外的感官数据。 通过广泛的实验性评估,我们展示了我们框架的优越性,并详细洞察了它处理上述极端运动案例的能力。 我们还表明,对自动贴有标签的数据的语义分割网进行培训,其结果与人工附加说明的培训数据相同。 代码和经过预先训练的模型可在 https://github. com/DLRM-Distin/Distin.