Injecting noise within gradient descent has several desirable features, such as smoothing and regularizing properties. In this paper, we investigate the effects of injecting noise before computing a gradient step. We demonstrate that small perturbations can induce explicit regularization for simple models based on the L1-norm, group L1-norms, or nuclear norms. However, when applied to overparametrized neural networks with large widths, we show that the same perturbations can cause variance explosion. To overcome this, we propose using independent layer-wise perturbations, which provably allow for explicit regularization without variance explosion. Our empirical results show that these small perturbations lead to improved generalization performance compared to vanilla gradient descent.
翻译:梯度下沉的注入噪音具有若干可取的特征,如平滑和调节特性。 在本文中,我们在计算梯度步骤之前先调查注射噪音的影响。我们证明小扰动可以促使基于L1-诺姆、L1-诺姆或核规范的简单模型明确正规化。然而,当应用于宽度大、超平衡的神经网络时,我们发现同样的扰动可能导致差异爆炸。为了克服这一点,我们提议使用独立的多层扰动,这可以允许明确规范化,而不会发生差异爆炸。我们的经验结果表明,这些小扰动导致比香草梯度梯度下降更普遍化的性能。