This work finds the analytical expression of the global minima of a deep linear network with weight decay and stochastic neurons, a fundamental model for understanding the landscape of neural networks. Our result implies that zero is a special point in deep neural network architecture. We show that weight decay strongly interacts with the model architecture and can create bad minima at zero in a network with more than $1$ hidden layer, qualitatively different from a network with only $1$ hidden layer. Practically, our result implies that common deep learning initialization methods are insufficient to ease the optimization of neural networks in general.
翻译:本文找到了带权重衰减和随机神经元的深度线性网络全局最小值的解析表达式,这是理解神经网络景观的基本模型。我们的结果表明,在深度神经网络体系结构中,零是一个特殊的点。我们展示了权重衰减与模型架构强烈交互,可以在超过1个隐藏层的网络中在零处创建不良最小值,与仅有1个隐藏层的网络定性不同。实际上,我们的结果表明,常见的深度学习初始化方法不足以轻松优化神经网络。