SemAug:通过语言定位探测物体的具有广泛意义的图像放大 (SemAug: Semantically Meaningful Image Augmentations for Object Detection Through Language Grounding)

Data augmentation is an essential technique in improving the generalization of deep neural networks. The majority of existing image-domain augmentations either rely on geometric and structural transformations, or apply different kinds of photometric distortions. In this paper, we propose an effective technique for image augmentation by injecting contextually meaningful knowledge into the scenes. Our method of semantically meaningful image augmentation for object detection via language grounding, SemAug, starts by calculating semantically appropriate new objects that can be placed into relevant locations in the image (the what and where problems). Then it embeds these objects into their relevant target locations, thereby promoting diversity of object instance distribution. Our method allows for introducing new object instances and categories that may not even exist in the training set. Furthermore, it does not require the additional overhead of training a context network, so it can be easily added to existing architectures. Our comprehensive set of evaluations showed that the proposed method is very effective in improving the generalization, while the overhead is negligible. In particular, for a wide range of model architectures, our method achieved ~2-4% and ~1-2% mAP improvements for the task of object detection on the Pascal VOC and COCO datasets, respectively.

翻译：增强数据是改善深神经网络总体化的关键技术。大部分现有图像- 域增强功能依靠几何和结构转换, 或采用不同种类的光度扭曲。在本文中, 我们提出一种有效的放大图像技术, 将符合背景的知识注入到场景中。我们用语言地面探测物体的语义上有意义的图像增强方法, 即 SemAug, 开始计算可放置到图像相关位置( 是什么和哪里的问题) 。然后将这些对象嵌入相关目标位置, 从而促进对象实例分布的多样性。我们的方法允许引入新的对象实例和类别, 而在培训组中甚至可能不存在。此外, 它不需要额外的环境网络培训, 这样可以很容易地添加到现有的结构中。我们的全面评价显示, 拟议的方法对于改进一般化非常有效, 而间接费用则微不足道。特别是对于广泛的模型结构, 我们的方法达到了~2- 4% 和~ CO- 2 AP 。