深度稳定神经网络的无限宽极限：亚线性、线性和超线性激活函数 (Infinitely wide limits for deep Stable neural networks: sub-linear, linear and super-linear activation functions)

There is a growing literature on the study of large-width properties of deep Gaussian neural networks (NNs), i.e. deep NNs with Gaussian-distributed parameters or weights, and Gaussian stochastic processes. Motivated by some empirical and theoretical studies showing the potential of replacing Gaussian distributions with Stable distributions, namely distributions with heavy tails, in this paper we investigate large-width properties of deep Stable NNs, i.e. deep NNs with Stable-distributed parameters. For sub-linear activation functions, a recent work has characterized the infinitely wide limit of a suitable rescaled deep Stable NN in terms of a Stable stochastic process, both under the assumption of a ``joint growth" and under the assumption of a ``sequential growth" of the width over the NN's layers. Here, assuming a ``sequential growth" of the width, we extend such a characterization to a general class of activation functions, which includes sub-linear, asymptotically linear and super-linear functions. As a novelty with respect to previous works, our results rely on the use of a generalized central limit theorem for heavy tails distributions, which allows for an interesting unified treatment of infinitely wide limits for deep Stable NNs. Our study shows that the scaling of Stable NNs and the stability of their infinitely wide limits may depend on the choice of the activation function, bringing out a critical difference with respect to the Gaussian setting.

翻译：近来，越来越多的文献关注于研究大宽深度高斯神经网络（NNs），即具有高斯分布参数或权重和高斯随机过程的深度NNs的性质。受到一些经验和理论研究，显示使用稳定分布（即具有重尾的分布）代替高斯分布具有潜在的优势，我们研究了大宽稳定NNs的性质，即具有稳定分布参数的深度NNs。在亚线性激活函数的情况下，最近有一项研究通过“联合增长”和“序列增长”两种假设，成功地将一个合适的经过缩放的深度稳定NN的无限宽极限以稳定随机过程的形式进行了表征。针对“序列增长”的假设，我们在本文中将该表征推广为一般类激活函数，包括亚线性、渐近线性和超线性函数。和之前的工作相比，我们的结果依赖于重尾分布的广义中心极限定理，提供了一个有趣的方法来统一研究深度稳定NNs的无限宽极限。我们的研究表明，Stable NNs的缩放和无限宽极限的稳定性可能取决于激活函数的选择，这与高斯设置存在关键差异。

相关内容

激活函数

关注 44

在人工神经网络中，给定一个输入或一组输入，节点的激活函数定义该节点的输出。一个标准集成电路可以看作是一个由激活函数组成的数字网络，根据输入的不同，激活函数可以是开(1)或关(0)。这类似于神经网络中的线性感知器的行为。然而，只有非线性激活函数允许这样的网络只使用少量的节点来计算重要问题，并且这样的激活函数被称为非线性。

牛津大学Patrick231页博士论文全面阐述《神经微分方程》Jeff Dean点赞

专知会员服务

60+阅读 · 2022年2月10日

深度学习激活函数全面综述论文

专知会员服务

72+阅读 · 2021年10月1日

【经典书】线性代数，436页pdf

专知会员服务

78+阅读 · 2021年3月16日

简明《神经网络数学》手册，16页pdf带你入门，Mathematics of Neural Networks

专知会员服务

68+阅读 · 2020年5月9日