Injecting noise within gradient descent has several desirable features. In this paper, we explore noise injection before computing a gradient step, which is known to have smoothing and regularizing properties. We show that small perturbations induce explicit regularization for simple finite-dimensional models based on the l1-norm, group l1-norms, or nuclear norms. When applied to overparametrized neural networks with large widths, we show that the same perturbations do not work due to variance explosion resulting from overparametrization. However, we also show that independent layer wise perturbations allow to avoid the exploding variance term, and explicit regularizers can then be obtained. We empirically show that the small perturbations lead to better generalization performance than vanilla (stochastic) gradient descent training, with minor adjustments to the training procedure.
翻译:梯度下沉的注入噪音具有若干可取的特征。 在本文中, 我们在计算梯度步骤之前先探索噪音注入, 已知梯度步骤具有平滑和规范化的特性。 我们显示小扰动会明显地规范以 l1- 诺姆、 1- 诺姆或核规范为基础的简单限维模型。 当应用到宽度大、 超平衡的神经网络时, 我们发现同样的扰动不会起作用, 原因是过度平衡造成的爆炸变化。 但是, 我们还表明, 独立的层明智的扰动可以避免爆炸性差异期, 然后可以取得明确的规范化。 我们从经验上表明, 小扰动会比香草( 沙丁基) 梯度下沉培训带来更好的一般化效果, 并且对培训程序稍作调整 。