This work finds the analytical expression of the global minima of a deep linear network with weight decay and stochastic neurons, a fundamental model for understanding the landscape of neural networks. Our result implies that zero is a special point in deep neural network architecture. We show that weight decay strongly interacts with the model architecture and can create bad minima at zero in a network with more than $1$ hidden layer, qualitatively different from a network with only $1$ hidden layer. Practically, our result implies that common deep learning initialization methods are insufficient to ease the optimization of neural networks in general.
翻译:这项工作发现了一个具有重量衰减和随机神经元的深度线性网络的全球微型网络的分析表现,这是一个了解神经网络景观的基本模型。 我们的结果意味着零是深神经网络结构中的特殊点。 我们表明,重量衰减与模型结构有着强烈的交互作用,并且可以在一个隐藏层超过1美元、质量不同于仅隐藏层的网络的网络的零点上创造坏微型网络。 实际上,我们的结果意味着共同的深层学习初始化方法不足以缓解神经网络总体优化。