Mixup is a data augmentation technique that creates new examples as convex combinationsof training points and labels. This simple technique has empirically shown to improvethe accuracy of many state-of-the-art models in different settings and applications, butthe reasons behind this empirical success remain poorly understood. In this paper wetake a substantial step in explaining the theoretical foundations of Mixup, by clarifyingits regularization effects. We show that Mixup can be interpreted as standard empiricalrisk minimization estimator subject to a combination of data transformation and randomperturbation of the transformed data. We gain two core insights from this new interpretation.First, the data transformation suggests that, at test time, a model trained with Mixup shouldalso be applied to transformed data, a one-line change in code that we show empirically toimprove both accuracy and calibration of the prediction. Second, we show how the randomperturbation of the new interpretation of Mixup induces multiple known regularizationschemes, including label smoothing and reduction of the Lipschitz constant of the estimator.These schemes interact synergistically with each other, resulting in a self calibrated andeffective regularization effect that prevents overfitting and overconfident predictions. Wecorroborate our theoretical analysis with experiments that support our conclusions.
翻译:数据增强技术是一种数据增强技术,它创造了新的实例,作为培训点和标签的组合。这一简单技术从经验上表明,可以提高不同设置和应用程序中许多最先进的模型的准确性,但成功经验背后的原因仍然不甚清楚。在本文中,我们在解释混合理论基础方面迈出了一大步,通过澄清其规范化效果来解释混合的理论基础。我们表明,混合可以被解读为标准的经验风险最小化估计器,条件是数据转换和变换数据随机扰动相结合。我们从这一新的解释中获得了两个核心见解。首先,数据转换表明,在测试时,应当将受过混合培训的模型用于转换数据,但这一成功经验成功的原因仍然不甚为人熟。在本文中,我们从经验上展示了一线变化的代码变化,以增进混合的准确性和校正性效应。第二,我们展示了对混合新解释的随机扰动作用如何引出多种已知的规范化模型,包括将测算师的利普西茨常数贴上和减少的标签。