爆炸性的梯度问题 -- -- 定义、流行程度、影响、来源、权衡和解决办法 (The exploding gradient problem demystified - definition, prevalence, impact, origin, tradeoffs, and solutions)

from arxiv, An earlier version of this paper was named "Gradients explode - Deep Networks are shallow - ResNet explained" and presented at the ICLR 2018 workshop (https://openreview.net/forum?id=rJjcdFkPM)

Whereas it is believed that techniques such as Adam, batch normalization and, more recently, SeLU nonlinearities "solve" the exploding gradient problem, we show that this is not the case in general and that in a range of popular MLP architectures, exploding gradients exist and that they limit the depth to which networks can be effectively trained, both in theory and in practice. We explain why exploding gradients occur and highlight the *collapsing domain problem*, which can arise in architectures that avoid exploding gradients. ResNets have significantly lower gradients and thus can circumvent the exploding gradient problem, enabling the effective training of much deeper networks. We show this is a direct consequence of the Pythagorean equation. By noticing that *any neural network is a residual network*, we devise the *residual trick*, which reveals that introducing skip connections simplifies the network mathematically, and that this simplicity may be the major cause for their success.

翻译：虽然人们相信亚当、批量正常化以及最近SELU的非线性“解决”爆炸梯度问题等技术,但我们表明,一般情况并非如此,在一系列流行的 MLP 结构中,存在爆炸梯度,它们限制了网络在理论和实践上进行有效培训的深度。我们解释了爆炸梯度为何会发生,并突出在避免爆炸梯度的建筑中可能出现的* 重叠域的问题*。ResNet的梯度大大降低,从而可以绕过爆炸梯度问题,从而能够有效地培训更深得多的网络。我们显示了这是Pythagorean等式的直接后果。我们通过注意到“任何神经网络”是一个剩余网络,我们设计了“累进式”的伎俩,它揭示了引入跳过连接会从数学上简化网络,而这种简单化可能是其成功的主要原因。

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

回顾机器学习公平的数学框架，Review of Mathematical frameworks for Fairness in Machine Learning

专知会员服务

38+阅读 · 2020年5月30日

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日