Robots working in unstructured environments must be capable of sensing and interpreting their surroundings. One of the main obstacles of deep-learning-based models in the field of robotics is the lack of domain-specific labeled data for different industrial applications. In this article, we propose a sim2real transfer learning method based on domain randomization for object detection with which labeled synthetic datasets of arbitrary size and object types can be automatically generated. Subsequently, a state-of-the-art convolutional neural network, YOLOv4, is trained to detect the different types of industrial objects. With the proposed domain randomization method, we could shrink the reality gap to a satisfactory level, achieving 86.32% and 97.38% mAP50 scores, respectively, in the case of zero-shot and one-shot transfers, on our manually annotated dataset containing 190 real images. Our solution fits for industrial use as the data generation process takes less than 0.5 s per image and the training lasts only around 12 h, on a GeForce RTX 2080 Ti GPU. Furthermore, it can reliably differentiate similar classes of objects by having access to only one real image for training. To our best knowledge, this is the only work thus far satisfying these constraints.
翻译:在非结构化环境中工作的机器人必须能够对周围环境进行感测和解释。机器人领域深学习模型的主要障碍之一是缺乏针对不同工业应用的域标数据。在本篇文章中,我们建议一种模拟转移学习方法,其依据是域随机化,用于探测物体,可自动生成任意大小和物体类型的标签合成数据集。随后,一个最先进的神经神经神经网络YOLOv4受过检测不同类型工业物体的培训。根据拟议的域随机化方法,我们可以将现实差距缩小到令人满意的水平,在零点和一发传输的情况下,分别达到86.32%和97.38% mAP50分,这分别是在我们手动带有190个真实图像的附加数据集上。我们的解决方案适合工业使用,因为数据生成过程每摄取不到0.5秒的图像,培训仅持续12小时左右,在GeForce RTX 2080 Ti GPU上。此外,通过使用拟议的域随机化方法,我们只能可靠地区分相近的物体类别,只有获得这些真实图像的制约,才能确保我们获得这些真实图像。