Mixup is a data augmentation technique that creates new examples as convex combinations of training points and labels. This simple technique has empirically shown to improve the accuracy of many state-of-the-art models in different settings and applications, but the reasons behind this empirical success remain poorly understood. In this paper we take a substantial step in explaining the theoretical foundations of Mixup, by clarifying its regularization effects. We show that Mixup can be interpreted as standard empirical risk minimization estimator subject to a combination of data transformation and random perturbation of the transformed data. We gain two core insights from this new interpretation. First, the data transformation suggests that, at test time, a model trained with Mixup should also be applied to transformed data, a one-line change in code that we show empirically to improve both accuracy and calibration of the prediction. Second, we show how the random perturbation of the new interpretation of Mixup induces multiple known regularization schemes, including label smoothing and reduction of the Lipschitz constant of the estimator. These schemes interact synergistically with each other, resulting in a self calibrated and effective regularization effect that prevents overfitting and overconfident predictions. We corroborate our theoretical analysis with experiments that support our conclusions.
翻译:数据增强技术是一种数据增强技术,它创造了新的实例,将培训点和标签混为一体。这一简单技术从经验上表明,可以提高不同设置和应用中许多最先进的模型的准确性,但成功经验背后的原因仍然不甚清楚。在本文件中,我们通过澄清其规范化效果,在解释混合理论基础方面迈出了一大步。我们表明,混合可以被解释为标准的经验风险最小化估计器,条件是数据转换和变换数据随机扰动相结合。我们从这一新的解释中获得了两个核心见解。首先,数据转换表明,在测试时,还应将受过混合培训的模型用于转换数据,这是我们实验性地展示出改进混合理论准确性和校准的一线代码变化。第二,我们展示了对混合新解释的随机扰动感应如何引出多种已知的规范化计划,包括将测算器的利普施奇茨常数调和减缩。这些计划相互协同地相互作用,从而导致我们自我校准和规范化的理论性分析,从而防止我们作出自我校准和有效分析的结果。