Despite several attempts, the fundamental mechanisms behind the success of deep neural networks still remain elusive. To this end, we introduce a novel analytic framework to unveil hidden convexity in training deep neural networks. We consider a parallel architecture with multiple ReLU sub-networks, which includes many standard deep architectures and ResNets as its special cases. We then show that the training problem with path regularization can be cast as a single convex optimization problem in a high-dimensional space. We further prove that the equivalent convex program is regularized via a group sparsity inducing norm. Thus, a path regularized parallel architecture with ReLU sub-networks can be viewed as a parsimonious feature selection method in high-dimensions. More importantly, we show that the computational complexity required to globally optimize the equivalent convex problem is polynomial-time with respect to the number of data samples and feature dimension. Therefore, we prove exact polynomial-time trainability for path regularized deep ReLU networks with global optimality guarantees. We also provide several numerical experiments corroborating our theory.
翻译:尽管进行了几次尝试,但深神经网络成功背后的基本机制仍然难以找到。为此,我们引入了一个新的分析框架,以揭开深神经网络培训中隐藏的共性。我们把多个RELU子网络的平行结构视为其特例,其中包括许多标准的深层建筑和ResNet。然后我们表明,路径正规化的培训问题可以作为一个高维空间的单一锥形优化问题来呈现。我们进一步证明,相当的锥形程序通过一个群集聚性诱导规范而正规化。因此,与RELU子网络的路径正规化平行结构可以被视为高二元中一种相似的特征选择方法。更重要的是,我们表明,全球优化等同的锥形问题所需的计算复杂性在数据样本数量和特征方面是多元时的。因此,我们证明,对具有全球最佳性保障的正统化深RELU网络路径的精确多元时训练。我们还提供了数个数字实验,以证实我们理论的理论。