Mixup is a popular data augmentation technique based on taking convex combinations of pairs of examples and their labels. This simple technique has been shown to substantially improve both the robustness and the generalization of the trained model. However, it is not well-understood why such improvement occurs. In this paper, we provide theoretical analysis to demonstrate how using Mixup in training helps model robustness and generalization. For robustness, we show that minimizing the Mixup loss corresponds to approximately minimizing an upper bound of the adversarial loss. This explains why models obtained by Mixup training exhibits robustness to several kinds of adversarial attacks such as Fast Gradient Sign Method (FGSM). For generalization, we prove that Mixup augmentation corresponds to a specific type of data-adaptive regularization which reduces overfitting. Our analysis provides new insights and a framework to understand Mixup.
翻译:混合是一种流行的数据增强技术, 其基础是使用一对示例及其标签的组合组合。 这一简单技术已证明大大改进了经过训练的模型的稳健性和一般化。 但是,它并没有很好地理解为什么出现这种改进。 在本文中, 我们提供理论分析, 以证明在培训中使用混合有助于模型稳健性和一般化。 关于稳健性, 我们显示, 尽量减少混合损失大约相当于将对抗性损失的上层界限缩小到最小。 这解释了为什么通过混合培训获得的模型显示对几种对抗性攻击(如快速渐进信号方法(FGSM))的稳健性。 关于概括化, 我们证明, 混合增强与特定类型的数据适应性规范相对应, 从而减少过度匹配。 我们的分析提供了新的洞察力和框架来理解混合。