Data augmentation is often used to enlarge datasets with synthetic samples generated in accordance with the underlying data distribution. To enable a wider range of augmentations, we explore negative data augmentation strategies (NDA)that intentionally create out-of-distribution samples. We show that such negative out-of-distribution samples provide information on the support of the data distribution, and can be leveraged for generative modeling and representation learning. We introduce a new GAN training objective where we use NDA as an additional source of synthetic data for the discriminator. We prove that under suitable conditions, optimizing the resulting objective still recovers the true data distribution but can directly bias the generator towards avoiding samples that lack the desired structure. Empirically, models trained with our method achieve improved conditional/unconditional image generation along with improved anomaly detection capabilities. Further, we incorporate the same negative data augmentation strategy in a contrastive learning framework for self-supervised representation learning on images and videos, achieving improved performance on downstream image classification, object detection, and action recognition tasks. These results suggest that prior knowledge on what does not constitute valid data is an effective form of weak supervision across a range of unsupervised learning tasks.
翻译:增加数据往往被用来扩大根据基本数据分布结果产生的合成样品的数据集。为了能够扩大扩大扩大范围,我们探索有意制造分配外样品的负面数据增加战略(NDA),我们表明,这种分配外的负面样品提供了支持数据分布的信息,可用于基因化建模和代表性学习。我们引入了新的GAN培训目标,即我们利用NDA作为歧视者的额外合成数据来源。我们证明,在适当条件下,最佳利用由此产生的目标仍然能够恢复真实的数据分配,但可以直接偏向于避免缺乏理想结构的样品的生成者。有规律地是,以我们的方法培训的模型能够改进有条件/无条件的图像生成,同时提高异常检测能力。此外,我们将同样的负面数据增加战略纳入一个反向学习框架,以自我监督地学习图像和视频,在下游图像分类、对象探测和行动识别任务方面实现更好的性能。这些结果表明,关于什么不构成有效数据的先前知识是有效的形式,在一系列不受监督的学习任务中,对哪些内容进行监督不力的有效形式。