Understanding the fundamental mechanism behind the success of deep neural networks is one of the key challenges in the modern machine learning literature. Despite numerous attempts, a solid theoretical analysis is yet to be developed. In this paper, we develop a novel unified framework to reveal a hidden regularization mechanism through the lens of convex optimization. We first show that the training of multiple three-layer ReLU sub-networks with weight decay regularization can be equivalently cast as a convex optimization problem in a higher dimensional space, where sparsity is enforced via a group $\ell_1$-norm regularization. Consequently, ReLU networks can be interpreted as high dimensional feature selection methods. More importantly, we then prove that the equivalent convex problem can be globally optimized by a standard convex optimization solver with a polynomial-time complexity with respect to the number of samples and data dimension when the width of the network is fixed. Finally, we numerically validate our theoretical results via experiments involving both synthetic and real datasets.
翻译:理解深神经网络成功背后的基本机制是现代机器学习文献中的关键挑战之一。 尽管做出了许多尝试, 但仍需要进行可靠的理论分析。 在本文中, 我们开发了一个新的统一框架, 通过曲线优化的镜头揭示隐藏的正规化机制。 我们首先显示, 培训多个三层RELU子网络, 使重量衰减正规化, 可以等同于高维空间的锥形优化问题, 在高维空间里, 空间里通过一组 $\ ell_ 1$-1$- 诺姆的正规化强制实施宽度。 因此, RELU 网络可以被解释为高维特征选择方法 。 更重要的是, 我们随后证明, 在网络宽度固定时, 将样品和数据维度的复杂程度, 可以通过一个标准的 convex优化解决方案在全球优化。 最后, 我们通过合成和真实数据集的实验来量化我们的理论结果 。