The logit outputs of a feedforward neural network at initialization are conditionally Gaussian, given a random covariance matrix defined by the penultimate layer. In this work, we study the distribution of this random matrix. Recent work has shown that shaping the activation function as network depth grows large is necessary for this covariance matrix to be non-degenerate. However, the current infinite-width-style understanding of this shaping method is unsatisfactory for large depth: infinite-width analyses ignore the microscopic fluctuations from layer to layer, but these fluctuations accumulate over many layers. To overcome this shortcoming, we study the random covariance matrix in the shaped infinite-depth-and-width limit. We identify the precise scaling of the activation function necessary to arrive at a non-trivial limit, and show that the random covariance matrix is governed by a stochastic differential equation (SDE) that we call the Neural Covariance SDE. Using simulations, we show that the SDE closely matches the distribution of the random covariance matrix of finite networks. Additionally, we recover an if-and-only-if condition for exploding and vanishing norms of large shaped networks based on the activation function.
翻译:初始化时的种子向神经网络的登录输出是有条件的 Gaussian, 以倒数第二层定义的随机共变矩阵为条件。 在此工作中, 我们研究该随机矩阵的分布。 最近的工作显示, 网络深度扩大后, 要让此共变矩阵成为非异变矩阵, 需要随着网络深度扩大而塑造激活功能的精确规模。 然而, 目前对这种成形方法的无限宽度理解对于大深度是不能令人满意的: 无限宽度分析忽略了从层到层的微分波动, 但是这些波动会累积在许多层上。 为了克服这一缺陷, 我们研究这个随机的随机共变异矩阵, 以形状的无限深度和宽宽度限制为形状。 我们确定启动功能的精确规模, 要达到非三角限制, 并显示随机的共变异性矩阵是由我们称之为神经变异性 SDE 的分等方程式(SDE) 。 我们通过模拟, 显示SDE 和 随机变异性矩阵的分布非常接近定数矩阵的分布 。 此外, 我们根据恒变变和变现的功能, 恢复了 变异性 。