Training deep neural networks is a well-known highly non-convex problem. In recent works, it is shown that there is no duality gap for regularized two-layer neural networks with ReLU activation, which enables global optimization via convex programs. For multi-layer linear networks with vector outputs, we formulate convex dual problems and demonstrate that the duality gap is non-zero for depth three and deeper networks. However, by modifying the deep networks to more powerful parallel architectures, we show that the duality gap is exactly zero. Therefore, strong convex duality holds, and hence there exist equivalent convex programs that enable training deep networks to global optimality. We also demonstrate that the weight decay regularization in the parameters explicitly encourages low-rank solutions via closed-form expressions. For three-layer non-parallel ReLU networks, we show that strong duality holds for rank-1 data matrices, however, the duality gap is non-zero for whitened data matrices. Similarly, by transforming the neural network architecture into a corresponding parallel version, the duality gap vanishes.
翻译:深层神经网络培训是一个众所周知的高度非凝固的高度神经网络问题。 在最近的工程中, 显示正规化的双层神经网络与ReLU 激活没有双重差距, 这使得通过 convex 程序实现全球优化。 对于带有矢量输出的多层线性网络, 我们开发了二次曲线双重问题, 并表明深度3 和深度网络的双重差距是非零的。 但是, 通过将深层网络改造为更强大的平行结构, 我们显示二元性差距完全为零。 因此, 强大的二次曲线双重性存在, 并因此存在等量的同类方案, 使深层网络培训达到全球最佳性。 我们还表明, 参数中的重量衰减正规化明确鼓励通过封闭式表达方式的低级解决方案。 对于三层非平行的RELU 网络, 我们显示, 级1 级数据矩阵的双重性差距是非零。 同样, 将神经网络结构转换成相应的平行版本, 双元差距消失 。