Neuron death is a complex phenomenon with implications for model trainability: the deeper the network, the lower the probability of finding a valid initialization. In this work, we derive both upper and lower bounds on the probability that a ReLU network is initialized to a trainable point, as a function of model hyperparameters. We show that it is possible to increase the depth of a network indefinitely, so long as the width increases as well. Furthermore, our bounds are asymptotically tight under reasonable assumptions: first, the upper bound coincides with the true probability for a single-layer network with the largest possible input set. Second, the true probability converges to our lower bound as the input set shrinks to a single point, or as the network complexity grows under an assumption about the output variance. We confirm these results by numerical simulation, showing rapid convergence to the lower bound with increasing network depth. Then, motivated by the theory, we propose a practical sign flipping scheme which guarantees that the ratio of living data points in a $k$-layer network is at least $2^{-k}$. Finally, we show how these issues are mitigated by network design features currently seen in practice, such as batch normalization, residual connections, dense networks and skip connections. This suggests that neuron death may provide insight into the efficacy of various model architectures.
翻译:中子死亡是一个复杂现象,对模型可选性具有影响: 网络越深, 找到有效初始化的可能性越低。 在这项工作中, 我们从ReLU网络初始化为可训练点的概率的上限和下限中, 作为模型超参数函数的函数。 我们显示, 只要宽度增加, 网络的深度也有可能无限增加。 此外, 在合理的假设下, 我们的界限也非常短: 首先, 上限与拥有最大输入集的单层网络的真实概率相吻合。 第二, 输入集压缩到一个单一点时, 或网络复杂程度在对产出差异的假设下增长, 真正的概率会与我们较低的范围相交汇。 我们通过数字模拟来证实这些结果, 显示网络深度越低, 越快越接近。 然后, 在理论的驱动下, 我们提出一个实际的信号翻转式计划, 保证美元- 级网络中活数据点的比重至少为 $- k} 。 最后, 我们展示这些问题如何随着输入的输入点在网络的深度结构设计中, 显示, 将如何降低网络的深度连接。