There is a recent literature on large-width properties of Gaussian neural networks (NNs), i.e. NNs whose weights are distributed according to Gaussian distributions. Two popular problems are: i) the study of the large-width behaviour of NNs, which provided a characterization of the infinitely wide limit of a rescaled NN in terms of a Gaussian process; ii) the study of the training dynamics of NNs, which set forth a large-width equivalence between training the rescaled NN and performing a kernel regression with a deterministic kernel referred to as the neural tangent kernel (NTK). In this paper, we consider these problems for $\alpha$-Stable NNs, which generalize Gaussian NNs by assuming that the NN's weights are distributed as $\alpha$-Stable distributions with $\alpha\in(0,2]$, i.e. distributions with heavy tails. For shallow $\alpha$-Stable NNs with a ReLU activation function, we show that if the NN's width goes to infinity then a rescaled NN converges weakly to an $\alpha$-Stable process, i.e. a stochastic process with $\alpha$-Stable finite-dimensional distributions. As a novelty with respect to the Gaussian setting, in the $\alpha$-Stable setting the choice of the activation function affects the scaling of the NN, namely: to achieve the infinitely wide $\alpha$-Stable process, the ReLU function requires an additional logarithmic scaling with respect to sub-linear functions. Then, our main contribution is the NTK analysis of shallow $\alpha$-Stable ReLU-NNs, which leads to a large-width equivalence between training a rescaled NN and performing a kernel regression with an $(\alpha/2)$-Stable random kernel. The randomness of such a kernel is a novelty with respect to the Gaussian setting, namely: in the $\alpha$-Stable setting the randomness of the NN at initialization does not vanish in the NTK analysis, thus inducing a distribution for the kernel of the underlying kernel regression.
翻译:有关高斯神经网络(NNs) 的广度 Omarth 的文献是最近关于高斯神经网络(NNS) 的广度性能的文献,即根据高斯的分布分布分配其权重的NNS。有两个流行的问题是:i) 研究NS的大度性能,它提供了一个高斯进程对重新定级的NNN的无限宽度的描述;ii) 研究NS的培训动态,它规定在重新标定的NNNS和以确定性价比的内核之间有一个大宽度的等值;SNS 以确定性价价的美元进行内核分析;S 以低价的内核内核(NTNC) 运行一个低价的内核进程;S 以低价的内核正值进行一个普通化的内核分级的调值;S-e- 以重尾部的分布为基数的美元;S- 以低价的内价的内值进行一个低价的内压的内压的内压的内压的内压的内压的内值的内压的内值 。Sal- dal- drealsmasmax的调的正常的内的一个正常的内, 向一个正常的内调的内调的内调。