利用合成数据分析培训目的探测模型 (Analysis of Training Object Detection Models with Synthetic Data)

Recently, the use of synthetic training data has been on the rise as it offers correctly labelled datasets at a lower cost. The downside of this technique is that the so-called domain gap between the real target images and synthetic training data leads to a decrease in performance. In this paper, we attempt to provide a holistic overview of how to use synthetic data for object detection. We analyse aspects of generating the data as well as techniques used to train the models. We do so by devising a number of experiments, training models on the Dataset of Industrial Metal Objects (DIMO). This dataset contains both real and synthetic images. The synthetic part has different subsets that are either exact synthetic copies of the real data or are copies with certain aspects randomised. This allows us to analyse what types of variation are good for synthetic training data and which aspects should be modelled to closely match the target data. Furthermore, we investigate what types of training techniques are beneficial towards generalisation to real data, and how to use them. Additionally, we analyse how real images can be leveraged when training on synthetic images. All these experiments are validated on real data and benchmarked to models trained on real data. The results offer a number of interesting takeaways that can serve as basic guidelines for using synthetic data for object detection. Code to reproduce results is available at https://github.com/EDM-Research/DIMO_ObjectDetection.

翻译：最近,由于合成培训数据以较低的成本提供了正确的标签数据集,合成培训数据的使用一直在增加。这一技术的下坡面是,实际目标图像和合成培训数据之间的所谓领域差距导致性能下降。在本文件中,我们试图对如何使用合成数据探测物体提供整体性概览;我们分析数据生成的各个方面以及用于培训模型的技术;我们通过设计一系列实验、工业金属物体数据集培训模型(DIMO)来这样做。该数据集包含真实和合成图像。合成部分有不同的子集,要么是真实数据的确切合成副本,要么是某些方面随机复制的。这使我们能够分析哪些类型的差异对合成培训数据是好的,哪些方面应该进行模型化,以密切匹配目标数据。此外,我们调查哪些类型的培训技术有利于对真实数据进行概括化,以及如何使用这些数据。此外,我们分析合成图像培训时如何利用真实图像。所有这些实验都是对真实数据进行验证的,并参照了经过实际数据培训的模型。这使我们能够分析哪些类型的差异,哪些是合成数据/Regib的复制结果,用于进行合成数据检测。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日