Mixup, which creates synthetic training instances by linearly interpolating random sample pairs, is a simple and yet effective regularization technique to boost the performance of deep models trained with SGD. In this work, we report a previously unobserved phenomenon in Mixup training: on a number of standard datasets, the performance of Mixup-trained models starts to decay after training for a large number of epochs, giving rise to a U-shaped generalization curve. This behavior is further aggravated when the size of original dataset is reduced. To help understand such a behavior of Mixup, we show theoretically that Mixup training may introduce undesired data-dependent label noises to the synthesized data. Via analyzing a least-square regression problem with a random feature model, we explain why noisy labels may cause the U-shaped curve to occur: Mixup improves generalization through fitting the clean patterns at the early training stage, but as training progresses, Mixup becomes over-fitting to the noise in the synthetic data. Extensive experiments are performed on a variety of benchmark datasets, validating this explanation.
翻译:通过线性间插随机样本配对来创造合成培训实例的混合,是一种简单而有效的正规化技术,可以提高通过 SGD 培训的深层模型的性能。在这项工作中,我们报告在混合培训中以前没有观察到的现象:在若干标准数据集中,经过混合培训的模型的性能在培训大量时代后开始衰减,从而形成U形的概括曲线。当原始数据集的大小缩小时,这种行为会进一步恶化。为了帮助理解这种混合行为,我们理论上表明,混合培训可能会在合成数据中引入不理想的数据依赖标签的噪音。通过随机特征模型分析一个最差的回归问题,我们解释为什么噪音标签可能导致U形曲线发生:混合通过在早期培训阶段适应清洁模式来改进一般化,但随着培训的进展,混合变得过于适应合成数据的噪音。在一系列基准数据集上进行了广泛的实验,证实了这一解释。</s>