We show analytically that training a neural network by conditioned stochastic mutation or neuroevolution of its weights is equivalent, in the limit of small mutations, to gradient descent on the loss function in the presence of Gaussian white noise. Averaged over independent realizations of the learning process, neuroevolution is equivalent to gradient descent on the loss function. We use numerical simulation to show that this correspondence can be observed for finite mutations,for shallow and deep neural networks. Our results provide a connection between two families of neural-network training methods that are usually considered to be fundamentally different.
翻译:我们通过分析显示,在小变异的限度内,通过有条件的随机突变或神经突变来训练神经网络,其重量的神经网络在小变异的限度内相当于在高西亚白人噪音面前丧失功能时的梯度下降。在独立认识到学习过程之后,神经革命平均相当于损失功能的梯度下降。我们用数字模拟来显示,对于有限的突变,对于浅层和深层神经网络来说,可以观察到这种通信。我们的结果为通常被认为截然不同的神经网络培训方法的两个家庭提供了连接。