平行的深心神经网络没有零质量差距 (Parallel Deep Neural Networks Have Zero Duality Gap)

Training deep neural networks is a well-known highly non-convex problem. In recent works, it is shown that there is no duality gap for regularized two-layer neural networks with ReLU activation, which enables global optimization via convex programs. For multi-layer linear networks with vector outputs, we formulate convex dual problems and demonstrate that the duality gap is non-zero for depth three and deeper networks. However, by modifying the deep networks to more powerful parallel architectures, we show that the duality gap is exactly zero. Therefore, strong convex duality holds, and hence there exist equivalent convex programs that enable training deep networks to global optimality. We also demonstrate that the weight decay regularization in the parameters explicitly encourages low-rank solutions via closed-form expressions. For three-layer non-parallel ReLU networks, we show that strong duality holds for rank-1 data matrices, however, the duality gap is non-zero for whitened data matrices. Similarly, by transforming the neural network architecture into a corresponding parallel version, the duality gap vanishes.

翻译：深层神经网络培训是一个众所周知的高度非凝固的高度神经网络问题。在最近的工程中, 显示正规化的双层神经网络与ReLU 激活没有双重差距, 这使得通过 convex 程序实现全球优化。对于带有矢量输出的多层线性网络, 我们开发了二次曲线双重问题, 并表明深度3 和深度网络的双重差距是非零的。但是, 通过将深层网络改造为更强大的平行结构, 我们显示二元性差距完全为零。因此, 强大的二次曲线双重性存在, 并因此存在等量的同类方案, 使深层网络培训达到全球最佳性。我们还表明, 参数中的重量衰减正规化明确鼓励通过封闭式表达方式的低级解决方案。对于三层非平行的RELU 网络, 我们显示, 级1 级数据矩阵的双重性差距是非零。同样, 将神经网络结构转换成相应的平行版本, 双元差距消失。

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【KDD2021】图神经网络，NUS- Xavier Bresson教授

专知会员服务

66+阅读 · 2021年8月20日

深度概率图模型，Deep Probabilistic Models

专知会员服务

29+阅读 · 2021年8月2日

【KDD2020】更深的图神经网络，Towards Deeper Graph Neural Networks

专知会员服务

90+阅读 · 2020年7月22日

【MIT】最优传输图神经网络，Optimal Transport Graph Neural Networks

专知会员服务

66+阅读 · 2020年6月22日