Re-initializing a neural network during training has been observed to improve generalization in recent works. Yet it is neither widely adopted in deep learning practice nor is it often used in state-of-the-art training protocols. This raises the question of when re-initialization works, and whether it should be used together with regularization techniques such as data augmentation, weight decay and learning rate schedules. In this work, we conduct an extensive empirical comparison of standard training with a selection of re-initialization methods to answer this question, training over 15,000 models on a variety of image classification benchmarks. We first establish that such methods are consistently beneficial for generalization in the absence of any other regularization. However, when deployed alongside other carefully tuned regularization techniques, re-initialization methods offer little to no added benefit for generalization, although optimal generalization performance becomes less sensitive to the choice of learning rate and weight decay hyperparameters. To investigate the impact of re-initialization methods on noisy data, we also consider learning under label noise. Surprisingly, in this case, re-initialization significantly improves upon standard training, even in the presence of other carefully tuned regularization techniques.
翻译:在培训期间发现重新启用神经网络是为了改进近期工程的普及性。然而,在深层次的学习实践中没有广泛采用,也没有经常在最先进的培训程序中使用。这就提出了重新初始化工作何时开始的问题,以及是否应与数据扩增、重量衰减和学习率表等正规化技术一起使用这种网络的问题。在这项工作中,我们对标准培训与为回答这个问题而选择的重新初始化方法进行广泛的经验比较,对各种图像分类基准的15,000多个模型进行培训。我们首先确定,在没有其他任何正规化技术的情况下,这些方法始终有利于普及化。然而,在与其他经过仔细调整的正规化技术同时使用时,重新初始化方法对于普遍化几乎没有额外的好处,尽管最佳的普及性表现对选择学习率和重量衰减超常参数不那么敏感。为了调查重新初始化方法对噪音数据的影响,我们还考虑在标签噪声下学习。令人惊讶的是,重新启用这些方法极大地改进了标准培训,即使是在其他仔细调整的正规化技术中也是如此。