In the Mixup training paradigm, a model is trained using convex combinations of data points and their associated labels. Despite seeing very few true data points during training, models trained using Mixup seem to still minimize the original empirical risk and exhibit better generalization and robustness on various tasks when compared to standard training. In this paper, we investigate how these benefits of Mixup training rely on properties of the data in the context of classification. For minimizing the original empirical risk, we compute a closed form for the Mixup-optimal classification, which allows us to construct a simple dataset on which minimizing the Mixup loss can provably lead to learning a classifier that does not minimize the empirical loss on the data. On the other hand, we also give sufficient conditions for Mixup training to also minimize the original empirical risk. For generalization, we characterize the margin of a Mixup classifier, and use this to understand why the decision boundary of a Mixup classifier can adapt better to the full structure of the training data when compared to standard training. In contrast, we also show that, for a large class of linear models and linearly separable datasets, Mixup training leads to learning the same classifier as standard training.
翻译:在Mixup培训模式中,一个模型是使用数据点及其相关标签的混成组合进行培训的。尽管在培训期间看到很少真实的数据点,但使用Mixup培训的模式似乎仍然将原始经验风险降到最低程度,并且与标准培训相比,在各种任务方面表现出更好的概括性和稳健性。在本文件中,我们调查混合培训的这些好处如何依赖分类中的数据属性。为了尽量减少最初的经验风险,我们为混合-最佳数据及其相关标签分类计算了一个封闭的表格,从而使我们能够建立一个简单的数据集,在数据集上最大限度地减少混合损失,从而有可能导致学习一个不最大限度地减少数据上的经验损失的分类器。另一方面,我们还为混合培训提供了充分的条件,以尽量减少原始经验风险。在总体化方面,我们把混合分类器分类器的边距定性,并以此来理解为什么与标准培训相比,混合分类器的确定边界可以更好地适应培训数据的全部结构。相比之下,我们还表明,对于大量的线性模型和线性数据链性链性模型的学习,我们还表明,要进行大量的线性培训。