Several data augmentation methods deploy unlabeled-in-distribution (UID) data to bridge the gap between the training and inference of neural networks. However, these methods have clear limitations in terms of availability of UID data and dependence of algorithms on pseudo-labels. Herein, we propose a data augmentation method to improve generalization in both adversarial and standard learning by using out-of-distribution (OOD) data that are devoid of the abovementioned issues. We show how to improve generalization theoretically using OOD data in each learning scenario and complement our theoretical analysis with experiments on CIFAR-10, CIFAR-100, and a subset of ImageNet. The results indicate that undesirable features are shared even among image data that seem to have little correlation from a human point of view. We also present the advantages of the proposed method through comparison with other data augmentation methods, which can be used in the absence of UID data. Furthermore, we demonstrate that the proposed method can further improve the existing state-of-the-art adversarial training.
翻译:几种数据增强方法运用未贴标签的分布式(UID)数据来缩小神经网络培训和推断之间的差距,然而,这些方法在获得UID数据以及算法依赖假标签方面有明显的局限性。在这里,我们提出一种数据增强方法,通过使用与上述问题无关的分布式(OOD)数据,改进敌对性和标准学习的普及性。我们表明如何在理论上改进每个学习情景中使用OOOD数据的一般化,并以关于CIFAR-10、CIFAR-100和图像网络的一个子集的实验来补充我们的理论分析。结果显示,即使在图像数据中,即使从人的角度看似乎没有多少关联性,也分享了不良特征。我们还提出拟议方法的优点,办法是与其他数据增强性方法进行比较,在缺乏UID数据的情况下可以使用。此外,我们证明拟议的方法可以进一步改进现有的状态的对抗式培训。