In the Machine Learning research community, there is a consensus regarding the relationship between model complexity and the required amount of data and computation power. In real world applications, these computational requirements are not always available, motivating research on regularization methods. In addition, current and past research have shown that simpler classification algorithms can reach state-of-the-art performance on computer vision tasks given a robust method to artificially augment the training dataset. Because of this, data augmentation techniques became a popular research topic in recent years. However, existing data augmentation methods are generally less transferable than other regularization methods. In this paper we identify the main areas of application of data augmentation algorithms, the types of algorithms used, significant research trends, their progression over time and research gaps in data augmentation literature. To do this, the related literature was collected through the Scopus database. Its analysis was done following network science, text mining and exploratory analysis approaches. We expect readers to understand the potential of data augmentation, as well as identify future research directions and open questions within data augmentation research.
翻译:在机器学习研究界,对于模型复杂程度与所需数据和计算能力数量之间的关系已形成共识。在现实世界的应用中,这些计算要求并非总能提供,从而激发对正规化方法的研究。此外,目前和以往的研究显示,更简单的分类算法可以达到计算机愿景任务方面的最先进的业绩,而这种计算法是人工增加培训数据集的有力方法。因此,数据增强技术近年来成为一个受欢迎的研究课题。然而,现有的数据增强方法通常比其他正规化方法更难转让。在本文件中,我们确定了数据增强算法应用的主要领域、使用的算法类型、重要的研究趋势、其随时间演变和数据增强文献的研究差距。为此,通过Scopus数据库收集了相关的文献。其分析是在网络科学、文本挖掘和探索性分析方法之后进行的。我们期望读者了解数据增强的潜力,以及数据增强研究的未来研究方向和公开问题。