简介:
梯度爆炸和消失的问题一直是阻碍神经网络有效训练的长期障碍。尽管在实践中采用了各种技巧和技术来缓解该问题,但仍然缺少令人满意的理论或可证明的解决方案。在本文中,我们从高维概率论的角度解决了这个问题。我们提供了严格的结果,表明在一定条件下,如果神经网络具有足够的宽度,则爆炸/消失梯度问题将很可能消失。我们的主要思想是通过一类新的激活函数(即高斯-庞加莱归一化函数和正交权重矩阵)来限制非线性神经网络中的正向和反向信号传播。在数据实验都可以验证理论,并在实际应用中将其有效性确认在非常深的神经网络上。
The expressiveness of deep neural networks of bounded width has recently been investigated in a series of articles. The understanding about the minimum width needed to ensure universal approximation for different kind of activation functions has progressively been extended (Park et al., 2020). In particular, it turned out that, with respect to approximation on general compact sets in the input space, a network width less than or equal to the input dimension excludes universal approximation. In this work, we focus on network functions of width less than or equal to the latter critical bound. We prove that in this regime the exact fit of partially constant functions on disjoint compact sets is still possible for ReLU network functions under some conditions on the mutual location of these components. Conversely, we conclude from a maximum principle that for all continuous and monotonic activation functions, universal approximation of arbitrary continuous functions is impossible on sets that coincide with the boundary of an open set plus an inner point of that set. We also show that some network functions of maximum width two, respectively one, allow universal approximation on finite sets.
The expressiveness of deep neural networks of bounded width has recently been investigated in a series of articles. The understanding about the minimum width needed to ensure universal approximation for different kind of activation functions has progressively been extended (Park et al., 2020). In particular, it turned out that, with respect to approximation on general compact sets in the input space, a network width less than or equal to the input dimension excludes universal approximation. In this work, we focus on network functions of width less than or equal to the latter critical bound. We prove that in this regime the exact fit of partially constant functions on disjoint compact sets is still possible for ReLU network functions under some conditions on the mutual location of these components. Conversely, we conclude from a maximum principle that for all continuous and monotonic activation functions, universal approximation of arbitrary continuous functions is impossible on sets that coincide with the boundary of an open set plus an inner point of that set. We also show that some network functions of maximum width two, respectively one, allow universal approximation on finite sets.