Data augmentation (DA) is a widely used technique for enhancing the training of deep neural networks. Recent DA techniques which achieve state-of-the-art performance always meet the need for diversity in augmented training samples. However, an augmentation strategy that has a high diversity usually introduces out-of-distribution (OOD) augmented samples and these samples consequently impair the performance. To alleviate this issue, we propose ReSmooth, a framework that firstly detects OOD samples in augmented samples and then leverages them. To be specific, we first use a Gaussian mixture model to fit the loss distribution of both the original and augmented samples and accordingly split these samples into in-distribution (ID) samples and OOD samples. Then we start a new training where ID and OOD samples are incorporated with different smooth labels. By treating ID samples and OOD samples unequally, we can make better use of the diverse augmented data. Further, we incorporate our ReSmooth framework with negative data augmentation strategies. By properly handling their intentionally created ODD samples, the classification performance of negative data augmentations is largely ameliorated. Experiments on several classification benchmarks show that ReSmooth can be easily extended to existing augmentation strategies (such as RandAugment, rotate, and jigsaw) and improve on them.
翻译:数据增强(DA)是一种广泛使用的技术,用于加强深神经网络的培训。最新的DA技术,取得最新先进性能的DA技术,总是满足扩大培训样本多样性的需要。然而,具有高度多样性的增强战略,通常引入分配外(OOOD)增强样品,这些样品因此损害性能。为了缓解这一问题,我们提议了ReSmooth,这是一个首先在扩大的样品中检测OOOD样品,然后加以利用的框架。具体地说,我们首先使用高斯混合模型来适应原样品和扩充样品的损失分布,并因此将这些样品分成分配(ID)样品和OOD样品。然后,我们开始一项新的培训,将ID和OOD样品纳入不同的光滑滑标签。通过不平等地处理ID样品和OOD样品,我们可以更好地利用多样化的增强性数据。此外,我们把我们的ReSmooth框架与消极的数据增强战略结合起来。通过妥善处理它们有意生成的ODD样品,负数据增强的分类性工作正在大大改进。关于一些分类基准的实验表明,ReSmooth和变动战略可以很容易地加以改进(例如变压和变压)。