Gaussian noise injections (GNIs) are a family of simple and widely-used regularisation methods for training neural networks, where one injects additive or multiplicative Gaussian noise to the network activations at every iteration of the optimisation algorithm, which is typically chosen as stochastic gradient descent (SGD). In this paper we focus on the so-called `implicit effect' of GNIs, which is the effect of the injected noise on the dynamics of SGD. We show that this effect induces an asymmetric heavy-tailed noise on SGD gradient updates. In order to model this modified dynamics, we first develop a Langevin-like stochastic differential equation that is driven by a general family of asymmetric heavy-tailed noise. Using this model we then formally prove that GNIs induce an `implicit bias', which varies depending on the heaviness of the tails and the level of asymmetry. Our empirical results confirm that different types of neural networks trained with GNIs are well-modelled by the proposed dynamics and that the implicit effect of these injections induces a bias that degrades the performance of networks.
翻译:Gausian 噪音注射(GIS)是一个简单和广泛使用的神经网络培训常规化方法组成的大家庭,其中一种为网络注入添加或倍增的Gaussian噪音,在优化算法的每一次迭代中激活网络,通常选择优化算法为随机梯度下沉(SGD)。在本文中,我们侧重于GNI所谓的“隐含效应”,这是注入噪音对SGD动态的影响。我们表明,这种效应在SGD梯度更新中诱发了不对称的重尾巴重尾巴噪音。为了模拟这一变异动态,我们首先开发了一个由非对称重尾巴噪音大家庭驱动的Langevin相似的随机偏差方程式。然后,我们用这一模型正式证明,GNI引发了“隐性偏差”,这取决于尾巴的高度和不对称程度。我们的经验结果证实,在SGDGE更新时,经过培训的不同种类的神经网络都受到拟议动态的完善的模拟,而且这些注入的隐含效果导致一种偏差,使网络的网络的性下降。