Robots working in unstructured environments must be capable of sensing and interpreting their surroundings. One of the main obstacles of deep learning based models in the field of robotics is the lack of domain-specific labeled data for different industrial applications. In this paper, we propose a sim2real transfer learning method based on domain randomization for object detection with which labeled synthetic datasets of arbitrary size and object types can be automatically generated. Subsequently, a state-of-the-art convolutional neural network, YOLOv4, is trained to detect the different types of industrial objects. With the proposed domain randomization method, we could shrink the reality gap to a satisfactory level, achieving 86.32% and 97.38% mAP50 scores respectively in the case of zero-shot and one-shot transfers, on our manually annotated dataset containing 190 real images. On a GeForce RTX 2080 Ti GPU, the data generation process takes less than 0.5s per image and the training lasts around 12h which makes it convenient for industrial use. Our solution matches industrial needs as it can reliably differentiate similar classes of objects by using only 1 real image for training. To our best knowledge, this is the only work thus far satisfying these constraints.
翻译:在非结构化环境中工作的机器人必须能够对周围环境进行感测和解释。机器人领域深学习基础模型的主要障碍之一是缺乏针对不同工业应用的域标数据。在本文件中,我们建议一种模拟转移学习方法,其基础是域随机化,用于探测物体,可自动生成任意大小和物体类型的标注合成数据集。随后,一个最先进的进化神经网络,YOLOv4, 接受过检测不同类型工业物体的培训。根据拟议的域随机化方法,我们可以将现实差距缩小到令人满意的水平,在零点和一发传输的情况下,分别达到86.32%和97.38% mAP50分,分别达到86.32%和97.38% mAP50分。 手动附加注释的数据集包含190个真实图像。在GeForce RTX 2080 Ti GPU, 数据生成过程每张不到0.5秒,培训持续到12小时左右,因此便于工业使用。我们的解决办法与工业需要相匹配,因为只有1个真实的图像才能可靠地区分类似的对象类别。