Neuron death is a complex phenomenon with implications for model trainability, but until recently it was measured only empirically. Recent articles have claimed that, as the depth of a rectifier neural network grows to infinity, the probability of finding a valid initialization decreases to zero. In this work, we provide a simple and rigorous proof of that result. Then, we show what happens when the width of each layer grows simultaneously with the depth. We derive both upper and lower bounds on the probability that a ReLU network is initialized to a trainable point, as a function of model hyperparameters. Contrary to previous claims, we show that it is possible to increase the depth of a network indefinitely, so long as the width increases as well. Furthermore, our bounds are asymptotically tight under reasonable assumptions: first, the upper bound coincides with the true probability for a single-layer network with the largest possible input set. Second, the true probability converges to our lower bound when the network width and depth both grow without limit. Our proof is based on the striking observation that very deep rectifier networks concentrate all outputs towards a single eigenvalue, in the sense that their normalized output variance goes to zero regardless of the network width. Finally, we develop a practical sign flipping scheme which guarantees with probability one that for a $k$-layer network, the ratio of living training data points is at least $2^{-k}$. We confirm our results with numerical simulations, suggesting that the actual improvement far exceeds the theoretical minimum. We also discuss how neuron death provides a theoretical interpretation for various network design choices such as batch normalization, residual layers and skip connections, and could inform the design of very deep neural networks.
翻译:神经死亡是一个复杂的现象, 其影响模型的可变性, 但直到最近, 它只是通过实验性的方式测量。 最近的文章声称, 校正的神经神经网络的深度越来越深到无限性, 找到有效初始化的概率降低到零。 在这项工作中, 我们提供了一个简单而严格的证明结果。 在这项工作中, 我们提供了一个简单而严格的证明。 然后, 我们展示了当每层宽度随着深度同步增长, 每层的宽度与每层的深度同时与每层的深度同步增长时会发生什么。 我们从上下两个角度看ReLU 网络的初始化到一个可训练点的概率, 作为模型超度计的函数。 与以往的声称相反, 我们表明, 只要宽度增加一个校正的神经神经网络的深度, 只要宽度增加, 就可以无限期地提高网络的深度。 此外, 在合理的假设下, 我们的上限值和下限中, 当我们网络的深度和深度增长时, 真正的概率会达到我们较低的约束。 我们的证据来自它们的惊人的观察, 最深的观察, 最深的就是, 最深的內端的機化的網路化的內化的網絡的網絡, 正在浓缩的網絡, 一個的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內,, 的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的內的