先进数据增强方法:全面调查和未来方向 (Advanced Data Augmentation Approaches: A Comprehensive Survey and Future directions)

Deep learning (DL) algorithms have shown significant performance in various computer vision tasks. However, having limited labelled data lead to a network overfitting problem, where network performance is bad on unseen data as compared to training data. Consequently, it limits performance improvement. To cope with this problem, various techniques have been proposed such as dropout, normalization and advanced data augmentation. Among these, data augmentation, which aims to enlarge the dataset size by including sample diversity, has been a hot topic in recent times. In this article, we focus on advanced data augmentation techniques. we provide a background of data augmentation, a novel and comprehensive taxonomy of reviewed data augmentation techniques, and the strengths and weaknesses (wherever possible) of each technique. We also provide comprehensive results of the data augmentation effect on three popular computer vision tasks, such as image classification, object detection and semantic segmentation. For results reproducibility, we compiled available codes of all data augmentation techniques. Finally, we discuss the challenges and difficulties, and possible future direction for the research community. We believe, this survey provides several benefits i) readers will understand the data augmentation working mechanism to fix overfitting problems ii) results will save the searching time of the researcher for comparison purposes. iii) Codes of the mentioned data augmentation techniques are available at https://github.com/kmr2017/Advanced-Data-augmentation-codes iv) Future work will spark interest in research community.

翻译：深度学习(DL)算法在各种计算机愿景任务中表现出了显著的成绩。然而,由于标签有限的数据导致网络过于适应问题,与培训数据相比,网络的性能对无形数据不利。因此,它限制了绩效改进。为了应对这一问题,提出了各种技术,如辍学、目标检测和高级数据增强等。其中,数据扩增的目的是通过包括样本多样性来扩大数据集的大小,这是最近一个热门话题。在文章中,我们侧重于先进的数据增强技术。我们提供了数据增强的背景,经审查的数据增强技术的新而全面的分类,以及每种技术的优缺点(可能的话)。我们还提供了数据增强效果对三种受欢迎的计算机愿景任务,如图像分类、目标检测和语义分解等的全面结果。为了成果的可复制性,我们汇编了所有数据增强技术的可用代码。最后,我们讨论了挑战与困难,以及研究界今后可能的方向。我们认为,本次调查将提供若干好处读者将理解数据增强工作机制,以弥补过度的问题,以及各种技术的优缺点和缺点。 (a) 将节省研究时间搜索17 现有的数据库/搜索目的。