State-of-the-art approaches in computer vision heavily rely on sufficiently large training datasets. For real-world applications, obtaining such a dataset is usually a tedious task. In this paper, we present a fully automated pipeline to generate a synthetic dataset for instance segmentation in four steps. In contrast to existing work, our pipeline covers every step from data acquisition to the final dataset. We first scrape images for the objects of interest from popular image search engines and since we rely only on text-based queries the resulting data comprises a wide variety of images. Hence, image selection is necessary as a second step. This approach of image scraping and selection relaxes the need for a real-world domain-specific dataset that must be either publicly available or created for this purpose. We employ an object-agnostic background removal model and compare three different methods for image selection: Object-agnostic pre-processing, manual image selection and CNN-based image selection. In the third step, we generate random arrangements of the object of interest and distractors on arbitrary backgrounds. Finally, the composition of the images is done by pasting the objects using four different blending methods. We present a case study for our dataset generation approach by considering parcel segmentation. For the evaluation we created a dataset of parcel photos that were annotated automatically. We find that (1) our dataset generation pipeline allows a successful transfer to real test images (Mask AP 86.2), (2) a very accurate image selection process - in contrast to human intuition - is not crucial and a broader category definition can help to bridge the domain gap, (3) the usage of blending methods is beneficial compared to simple copy-and-paste. We made our full code for scraping, image composition and training publicly available at https://a-nau.github.io/parcel2d.
翻译:计算机视觉中最先进的方法严重依赖足够庞大的培训数据集。 对于真实世界的应用来说,获取这样的数据集通常是一个无聊的任务。 在本文中,我们展示了一个完全自动化的管道,以生成合成数据集,例如四步分割。与现有工作相比,我们的管道覆盖了从数据获取到最终数据集的每一个步骤。我们首先从流行图像搜索引擎中为感兴趣的对象筛选图像,因为我们只依赖基于文本的查询,由此产生的数据包含广泛的图像。因此,图像选择是第二步。图像筛选和选择方法通常是一个更宽泛的任务。在本文中,我们展示了一个必须公开提供或为此创建的真实世界域数据集的需要。我们采用了一个目标-不可知的背景删除模型,并将三种不同的图像选择方法进行比较:目标-前处理、手动图像选择和基于CNN的图像选择。在第三步中,我们生成了基于任意背景的对兴趣对象和分解器的随机安排。最后,图像的构成是通过在使用四种不同准确的直径图像定义中进行粘贴,而我们用四种不同的直径定义定义定义来进行。 我们用一个数据生成的直径解方法来进行我们的数据测试,我们用一个数据生成分析,我们的一个案例研究可以找到一个数据生成模型。 我们用一个数据生成模型,我们的一个案例研究可以找到一个数据生成的路径, 通过一个数据转换一个数据生成的方法。