Mixup is a data augmentation method that generates new data points by mixing a pair of input data. While mixup generally improves the prediction performance, it sometimes degrades the performance. In this paper, we first identify the main causes of this phenomenon by theoretically and empirically analyzing the mixup algorithm. To resolve this, we propose GenLabel, a simple yet effective relabeling algorithm designed for mixup. In particular, GenLabel helps the mixup algorithm correctly label mixup samples by learning the class-conditional data distribution using generative models. Via extensive theoretical and empirical analysis, we show that mixup, when used together with GenLabel, can effectively resolve the aforementioned phenomenon, improving the generalization performance and the adversarial robustness.
翻译:混合是一种数据增强方法,它通过混合输入数据来生成新的数据点。 虽然混在一起通常会改善预测性能, 但有时会降低性能。 在本文中, 我们首先通过从理论上和从经验上分析混合算法来辨明这一现象的主要原因。 为了解决这个问题, 我们提议GenLabel, 这是一种简单而有效的重标签算法, 旨在混合。 特别是, GenLabel 帮助混在一起算法正确标签混杂样本, 学习使用基因模型的等级条件数据分布。 经过广泛的理论和经验分析, 我们发现混在一起与 GenLabel 一起使用, 可以有效解决上述现象, 改善通用性表现和对抗性强健性。