误差梯度是神经网络训练过程中计算的方向和数量,用于以正确的方向和合适的量更新网络权重。 在深层网络或循环神经网络中,误差梯度可在更新中累积,变成非常大的梯度,然后导致网络权重的大幅更新,并因此使网络变得不稳定。在极端情况下,权重的值变得非常大,以至于溢出,导致NaN值。网络层之间的梯度(值大于 1.0)重复相乘导致的指数级增长会产生梯度爆炸。

VIP内容

简介:

梯度爆炸和消失的问题一直是阻碍神经网络有效训练的长期障碍。尽管在实践中采用了各种技巧和技术来缓解该问题,但仍然缺少令人满意的理论或可证明的解决方案。在本文中,我们从高维概率论的角度解决了这个问题。我们提供了严格的结果,表明在一定条件下,如果神经网络具有足够的宽度,则爆炸/消失梯度问题将很可能消失。我们的主要思想是通过一类新的激活函数(即高斯-庞加莱归一化函数和正交权重矩阵)来限制非线性神经网络中的正向和反向信号传播。在数据实验都可以验证理论,并在实际应用中将其有效性确认在非常深的神经网络上。

成为VIP会员查看完整内容
0
11

最新论文

In this paper, an adjustment to the original differentially private stochastic gradient descent (DPSGD) algorithm for deep learning models is proposed. As a matter of motivation, to date, almost no state-of-the-art machine learning algorithm hires the existing privacy protecting components due to otherwise serious compromise in their utility despite the vital necessity. The idea in this study is natural and interpretable, contributing to improve the utility with respect to the state-of-the-art. Another property of the proposed technique is its simplicity which makes it again more natural and also more appropriate for real world and specially commercial applications. The intuition is to trim and balance out wild individual discrepancies for privacy reasons, and at the same time, to preserve relative individual differences for seeking performance. The idea proposed here can also be applied to the recurrent neural networks (RNN) to solve the gradient exploding problem. The algorithm is applied to benchmark datasets MNIST and CIFAR-10 for a classification task and the utility measure is calculated. The results outperformed the original work.

0
0
下载
预览
Top