Scene understanding is essential in determining how intelligent robotic grasping and manipulation could get. It is a problem that can be approached using different techniques: seen object segmentation, unseen object segmentation, or 6D pose estimation. These techniques can even be extended to multi-view. Most of the work on these problems depends on synthetic datasets due to the lack of real datasets that are big enough for training and merely use the available real datasets for evaluation. This encourages us to introduce a new dataset (called DoPose-6D). The dataset contains annotations for 6D Pose estimation, object segmentation, and multi-view annotations, which serve all the pre-mentioned techniques. The dataset contains two types of scenes bin picking and tabletop, with the primary motive for this dataset collection being bin picking. We illustrate the effect of this dataset in the context of unseen object segmentation and provide some insights on mixing synthetic and real data for the training. We train a Mask R-CNN model that is practical to be used in industry and robotic grasping applications. Finally, we show how our dataset boosted the performance of a Mask R-CNN model. Our DoPose-6D dataset, trained network models, pipeline code, and ROS driver are available online.
翻译:要确定智能机器人的掌握和操纵是如何获得智能机器人的掌握和操控,就必须了解场面。这是一个问题,可以使用不同技术来处理:可见的物体分割、看不见的物体分割或6D构成估计。这些技术甚至可以推广到多视角。这些问题的大部分工作取决于合成数据集,因为缺乏真正的数据集,这些数据集对于培训来说足够大,而只是使用现有的真实数据集进行评估。这鼓励我们引入一个新的数据集(称为DoPose-6D )。该数据集包含6D Pose估计、物体分割和多视图说明的说明,这些说明服务于所有预先提到的技术。该数据集包含两种类型的场景选取和桌面,而这种数据集收集的主要动机是选取。我们用这些数据集在无形物体分割的背景下展示了这些数据集的效果,并提供了将合成和真实数据混合起来用于培训的一些见解。我们训练了一个用于工业和机器人捕捉应用的Mask R-CN 模型。最后,我们展示了我们的数据集如何提升了Masy R-N 驱动器网络的性能、经过培训的RO-N 模型和网络。