神经革命与梯度下降之间的对应关系 (Correspondence between neuroevolution and gradient descent)

We show analytically that training a neural network by stochastic mutation or "neuroevolution" of its weights is equivalent, in the limit of small mutations, to gradient descent on the loss function in the presence of Gaussian white noise. Averaged over independent realizations of the learning process, neuroevolution is equivalent to gradient descent on the loss function. We use numerical simulation to show that this correspondence can be observed for finite mutations, for shallow and deep neural networks. Our results provide a connection between two distinct types of neural-network training, and provide justification for the empirical success of neuroevolution.

翻译：我们通过分析显示,通过随机突变或“神经进化”来训练神经网络,其重量的“神经进化”在小变异的限度内相当于在高西亚白人噪音面前丧失功能时的梯度下降。在独立认识到学习过程之后,神经进化平均相当于损失功能的梯度下降。我们用数字模拟来显示,对于有限的突变,对于浅层和深层神经网络,可以观察到这种通信。我们的结果为两种不同的神经网络培训提供了联系,并为神经进化的成功经验提供了理由。

相关内容

损失函数（机器学习）

关注 10

损失函数，在AI中亦称呼距离函数，度量函数。此处的距离代表的是抽象性的，代表真实数据与预测数据之间的误差。损失函数（loss function）是用来估量你模型的预测值f(x)与真实值Y的不一致程度，它是一个非负实值函数,通常使用L(Y, f(x))来表示，损失函数越小，模型的鲁棒性就越好。损失函数是经验风险函数的核心部分，也是结构风险函数重要组成部分。

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日