Recent works have revealed that infinitely-wide feed-forward or recurrent neural networks of any architecture correspond to Gaussian processes referred to as $\mathrm{NNGP}$. While these works have extended the class of neural networks converging to Gaussian processes significantly, however, there has been little focus on broadening the class of stochastic processes that such neural networks converge to. In this work, inspired by the scale mixture of Gaussian random variables, we propose the scale mixture of $\mathrm{NNGP}$ for which we introduce a prior distribution on the scale of the last-layer parameters. We show that simply introducing a scale prior on the last-layer parameters can turn infinitely-wide neural networks of any architecture into a richer class of stochastic processes. Especially, with certain scale priors, we obtain heavy-tailed stochastic processes, and we recover Student's $t$ processes in the case of inverse gamma priors. We further analyze the distributions of the neural networks initialized with our prior setting and trained with gradient descents and obtain similar results as for $\mathrm{NNGP}$. We present a practical posterior-inference algorithm for the scale mixture of $\mathrm{NNGP}$ and empirically demonstrate its usefulness on regression and classification tasks.
翻译:最近的工作显示,任何建筑的无穷无尽的向外或经常性神经网络与被称为$$\mathrm{NNGP}$的Gausian进程相对应。虽然这些工程扩大了神经网络的等级,使神经网络在很大程度上与Gausian进程相融合,但很少注重扩大神经网络聚集的透析过程类别。在这项工作中,在高斯随机变量比例混合的启发下,我们提议了美元/mathrm{NNNGP}美元的比例组合,为此我们引入了上层参数规模的先前分配。我们表明,只要在上层参数上引入一个尺度,就可以将任何结构的无限宽度神经网络转化为更丰富的神经过程类别。特别是,随着某些规模的扩大,我们获得了高度分级的透析过程,我们在前几类随机随机变量中回收了学生的美元进程。我们进一步分析了以我们先前的设置和训练过的底值为底值的神经网络的分布情况。我们用梯度的底值和正值的正价平平级平价平级平级平级平级平标,我们获得了类似的结果。